Data Visualisation: An Intro to Seaborn

#Data-Visualisation:-An-Intro-to-Seaborn

Matplotlib is an incredibly useful and popular visualization tool, but even long-time users often feel frustrated with its shortcomings in relation to, among other things, Matplotlib's default parameters and its apparent disharmony with DataFrames (which comes as no surprise, seeing as it predated Pandas by over a decade.

Seaborn provides an API on top of Matplotlib that offers similar choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas DataFrames, allowing users to simply pass DataFrame labels to any plot.

In this Jupyter Notebook we'll explore two different datasets, Iris and Pokemon. The reason why Seaborn is so great with DataFrames is, for example, because labels from DataFrames are automatically propagated to plots or other data structures. Let's get on with it!

Loading output library...
Loading output library...

Exploratory Data Analysis

#Exploratory-Data-Analysis

Pairplots

#Pairplots

Pairplots are one of the best ways to visualise the multidimensional relationships within a dataset, and is as easy as calling sns.pairplot. This creates a matrix of axes and shows the relationship for each pair of columns in a DataFrame. By default, it also draws the univariate distribution of each variable on the diagonal axis.

Below we specify the different species of flower, the size of the graph (now done using the "height" parameter) and the color palette. We also set the title.

Loading output library...
Loading output library...

Kernel Density Estimations

#Kernel-Density-Estimations

The pairplot() function is built on top of a PairGrid object, which can be used directly for more flexibility.

Loading output library...

KDE plots are a useful tool for plotting the shape of a distribution. Like the histogram, the KDE plots encode the density of observations on one axis with height along the other axis. We can get a smooth estimate of the distribution, which Seaborn does with sns.kdeplot.

Loading output library...

Histograms and KDEs can be combined using distplot:

Loading output library...

Passing the full two-dimensional dataset to kdeplot, we get a two-dimensional visualization of the data.

Loading output library...

Joint Distributions

#Joint-Distributions

Likely, the simplest way to visualize a bivariate distribution, familiar to everyone, is a scatterplot. The scatterplot is the default plot of the jointplot() function.

Loading output library...

In Seaborn you can also pass plot syles as parameters to other plot functions. Take for example the example below, using the kernel density estimation procedure described above to visualize a bivariate distribution, as a style in jointplot().

Loading output library...

Fitting Parametric Distributions

#Fitting-Parametric-Distributions

Here we are using the distplot() function to plot the distribution of petal lengths (in cm) for the virginica flower only.

Loading output library...
Loading output library...

You can also use distplot() to fit a parametric distribution to a dataset and visually evaluate how closely it corresponds to our data.

Loading output library...

Time to explore the pokemon dataset (much more interesting)!

Categorical analysis

#Categorical-analysis

Here we're just plotting out the number of Pokemon "Type" categories using catplot(). Note that "catplot" is the updated name for this function (formerly factorplot()).

Loading output library...

Another Look at Joint Distributions

#Another-Look-at-Joint-Distributions

Let's look at the joint distribution of Attack and Defense capabilities for all pokemon in the dataset, which highlights some pretty strong outliers in these two strength categories!

Loading output library...

The joint plot can even do some automatic kernel density estimation and regression, this time on the special abilities.

Loading output library...

Violin Plots

#Violin-Plots

A nice way to compare distributions between different variables is to use a violin plot (I've dropped a few pokemon types for the purpose of visualisation).

Loading output library...
Loading output library...

Naturally, Flying pokemon have a heavier distribution at the higher end of the speed spectrum.

Subplotting (and Boxplots)

#Subplotting-(and-Boxplots)

As mentioned already, seaborn is built on top of matplotlib, so we are able to display subplots as follows:

Loading output library...

I hope you liked this short intro to seaborn. This piece is meant for those who have not used Seaborn in the past and may not know how useful it can be, especially when working with DataFrames. There are many other cool plot functions in seaborn's library, such as clustermaps, heatmaps, linear regressions and much, much more! It is an awesome high-level interface for drawing attractive and informative statistical graphics.

For more info, have a look at the official documentation.