Matplotlib is a library for making 2D plots of arrays in Python. Matplotlib is a multi-platform data visualization library built on NumPy arrays, and designed to work with the broader SciPy stack. It is designed with the philosophy that you should be able to create simple plots with just a few commands, or just one! If you want to see a histogram of your data, you shouldn’t need to instantiate objects, call methods, set properties and so on; it should just work.
This guide will just be a short introduction, to be followed by later posts in the future that will cover more complex applications. Matplotlib is the most popular data visualization library in Python. It allows us to create figures and plots, and makes it very easy to produce static raster or vector files without the need for any GUIs.
This tutorial is intended to help you get up-and-running with Matplotlib quickly. We’ll go over how to create the most commonly used plots, and discuss when to use each one.
The plt interface is what we will use most often, as we shall see throughout this chapter.
Plotting interactively within an IPython notebook can be done with the %matplotlib command, and works in a similar way to the IPython shell. In the IPython notebook, you also have the option of embedding graphics directly in the notebook, with two possible options:
%matplotlib notebook will lead to interactive plots embedded within the notebook
%matplotlib inline will lead to static images of your plot embedded in the notebook
After running this command (it needs to be done only once per kernel/session), any cell within the notebook that creates a plot will embed a PNG image of the resulting graphic.
There are 2 separate approaches for creating plots with Matplotlib:
Using the basic matplotlib command, we can easily create a plot. Let’s plot an example using two numpy arrays, x and y. We can also easily name the axes and add a title.
Now imagine we need more than one plot to visualise our data. Matplotlib allows us easily create multi-plots on the same figure using the .subplot() method. This .subplot() method takes in three parameters, namely:
nrows: the number of rows the Figure should have.
ncols: the number of columns the Figure should have.
plot_number : which refers to a specific plot in the Figure.
Note that we've passed a third parameter to the plot() method, namely the color and shape our data points will take ('gx' = green x's).
This is the best way to create plots. The idea here is to create Figure objects and call methods off it. We create a figure and add a set of axes to it using the .add_axes() method. The add_axes() method takes in a list of four arguments (left, bottom, width, and height — which are the positions where the axes should be placed) ranging from 0 to 1.
We can further add x and y labels and a title to our plot same way we did in the Function approach, but there’s a slight difference here. Using .set_xlabel(), .set_ylabel() and .set_title() let us go ahead and add labels and a title to our plot:
It takes more code, but we have full control over where the plot axes are placed, and we can easily add additional axes to the figure.
Let's plot the same figure as above, but this time with twin axes.
Like we did in the functional approach, we can also create multiple plots in the object-oriented approach using the .subplots() method, and NOT .subplot(). The .subplots() method takes in nrows, which is the number of rows the Figure should have, and ncols, the number of columns the Figure should have.
### Other Customisations
As we've seen, Matplotlib allows us to create customized plots by specifying the figure size. We can also change the dpi (the dots-per-inch (pixel-per-inch), set an axis grid, pass customisable arguments to the plot() method, add explanatory text and LaTeX strings, etc.
If you don't specify these parameters, Matplotlib assumes their default values.
We can also customise our line and data point styles. Let's do this for the above plot.
Ok, that's enough of line graphs!! Now you have the basics of creating and customising plots, let's take a look at a few other types of figures! The key to knowing which plot to use depends on the purpose of the visualization.
Histograms help us understand the distribution of a numeric value in a way that you cannot with mean or median alone.
#### Time Series
A time series plot is a chart that shows a trend over a period of time. It allows you to test various hypotheses under certain conditions.
#### Scatter Plots
Scatter plots offer a convenient way to visualize how two numeric values are related in your data. It helps in understanding relationships between multiple variables.
#### Contour Figures
These are used to display three-dimensional data in two dimensions using contours or color-coded regions. This is an example from the Python Data Science Handbook.
Another cool way to visualise 3D data is matplotlib's surface plot. Here is a three-dimensional contour diagram of a three-dimensional sinusoidal function, another example from the Python Data Science Handbook.
Once mplot3d is imported, a three-dimensional axes can be created by passing the keyword projection='3d' to any of the normal axes creation routines.
### Matplotlib and Pandas
Ok, now that you have all of the basics of Matplotlib down, let's take a quick look at the basics when working with a Pandas DataFrame.
The dataset below is a short list of the publicly held corporations which did over $30 billion in revenues in 2015, or the top 93 companies of the annual Fortune 500, most recently published in June of 2016.
While we often look at profits as a percent of sales (Return on Sales or ROS), this is not a fair way to compare companies across industries. Some industries require massive amounts of capital and earn more profits relative to sales in order to achieve competitive returns on the amount invested. Other industries turn over their inventories fast, making a smaller percent on sales (revenues) but an equal return on the capital required and invested.
Profit as a percent of equity, or Return on Equity (ROE), is important to shareholders, but management can decide to increase or decrease debt levels (“leverage”) and therefore increase ROE no matter how fundamentally profitable the business is, which is shown by Return on (total) Assets or ROA.
Ok, let's generate a simple bar chart of the top 10 companies in terms of revenue generated. Working with Pandas and matplotlib is generally straight forward, thanks to pandas plot() function, seen below.
Wow, Walmart is ahead by a long shot.
OK, but now I'd like to see the returns on Assets, Sales and Equity between these same companies to get a better insight into their financial performance.
As we figured, Return on Assets owned is the lowest of the 3 measures of the companies' financial success.
Finally, let's get a look at the distribution of equity returns of all the companies in the date set.