We have looked at other libraries designed to build "static" visualizations, like Matplotlib and Seaborn. Plotly's graphing library, on the other hand, makes interactive, publication-quality graphs online, that are designed to maximise interactivity and animation. This guide will quickly run through the basics and give a few examples, while a future post will expand on some of Plotly's more complex capabilities.
Plotly at its core is a data visualization toolbox. Under every plotly graph is a JSON object, which is a dictionary-like data structure. Moreover, Plotly plots are interactive, meaning you can manually explore the data by panning, selecting and zooming on the graphing surface (among other possible actions). Whether you see Plotly graphs in a browser or in a Jupyter notebook, all the visualizations and interactiveness is made possible by plotly.js.
You can work with plotly online or offline (see the docs for more info). On Kyso, we will use the following commands to initiate offline plotting so all of the generate plots are viewable on our frontend.
Note, if you have a plotly account, you can still save your plots and data to your cloud account. To initialise plotly for online plotting, we would run the following command, substituting in our credentials.
py.plot( ) returns the unique url.
py.iplot( ) displays the plot in the Jupyter notebook.
Now, to actually begin plotting our data. There are two main modules that we will need in order to generate Plotly graphs:
plotly.plotly contains the functions that will help us communicate with the Plotly servers.
plotly.graph_objs contains the functions that will generate graph objects for us.
There are 3 objects that define a Plotly plot:
Data - a list object in Python. Data contains all the traces that you wish to plot. A trace is just the name we give a collection of data and the specifications of which we want that data plotted. These traces will be named according to how you want the data displayed on the plotting surface.
Layout - defines the look of the plot and plot features which are unrelated to the data. This refers to elements like the title, axis titles, spacing, font and and we can even draw shapes on top of plots!
Figure - Creates the final object to be plotted, creating a dictionary-like object that contains both the data object and the layout object.
Ok, so now that we know how to set up our plotting environment in the notebook, and we understand the fundamental concepts of the underlying objects, let's generate some random data and create some cool plots!
Above we created our scatter plot, set our mode to markers, assiged our data object to our trace, and then configured the plot layout.
Let's take a look at some more examples of scatter plots, using both the markers and lines modes.
How about a bar chart?
Above, we manually passed the color method to our marker to our Data object, and we also set our axis titles in our Layout object.
We can set colors for each data trace we create. But, what about the color of our plot? Below, paper_bgcolor sets the color of the background, while plot_bgcolor sets the color within our plot.
What about visualising some descriptive statistics of some sample data? Box plots are perfect just for this!
Ok, now that we have the basics of generating different types of plotly plots, let's apply this to a real dataset. For this guide we'll carry out simple EDA on a FIFA 18 data.
What's the distribution of all the players' age in the game?
And by their overall rating:
This dataset is huge, with about 18 thousand players listed. Let's take a random sample of say, 500 players.
How about the geographical distribution? With plotly we cna plot a really cool choropleth map:
The distribution of nationalities is clearly concentrated in Europe, with Brazil and Argentina also highly represented.
Does a player's market value and wage justify his ranking in FIFA 18's ratings? Let's find out!
Ok, now let's get a distribution of the players' actual and potential overall rating as a function of their market value.
A one would expect, overall rating has a positive relationship with a player's value. Also, as market value increases, the spread between actual and potential decreases on average.
How are different clubs represented in this dataset? Which club has the highest number of players in the top 100 players in terms of overall rating?
Here, we see have added addtional parameters like domain, the hole diameter, as well as manually setting the hover information.
OK, how about visualising the the distribution of players' abilities between different countries?
Spanish and Brazilean players clearly dominate the upper quartile, but to be fair, FIFA 18 includes more lower league players in England, in comparison to other countries. Disclaimer: I happen to be Irish & so took particular delight in this graph, while also acknowledging Ireland's dismal performance at international level, if and when we even qualify!😀
Let's step it up a notch & segment the dataset by top finishers, so that we can look at the attributes of the game's forwards and strikers. First, let's generate a box-plot for descriptive stats on the games' finishers by country.
Argentina just about steals the show when it comes to prowess in front of goal.
Ok, one final plot, this time of the top 200 finishers in the game. Creating a bubble chart, where our y-values represent the players' finishing score, x-values their Composure and the players' wages are represented by the marker size.
Unsurprisingly, those two big red bubbles in the upper right-hand corner represent Ronaldo and Messi.
That's it for this guide guys. Hope you enjoyed it! Take a look at plotly's documentation for more info on the types of plots we generated today and the various customisations you can apply. Take a look at another post of mine on plotly's cufflinks, an awesome tool that simplifies data visualisation with pandas dataframes.