Data Visualisation: An Intro to Bokeh

#Data-Visualisation:-An-Intro-to-Bokeh

Bokeh differs from other visualization libraries such as those we've looked at in other guides, namely Matplotlib and Seaborn, in that it is one ideal for easily creating interactive plots, dashboards and data applications in a web browser. There are various advantages of Bokeh over these other tools, chief among them the various output options and the ability to embed visualisations in applications, along with the wide variety of visualization customisation options. The primary building blocks of Bokeh are the layers upon which a graph is built, called "glyphs." These elements are added individually to a figure. These can take on many different shapes and sizes, as we'll see below. So let's get started! Don't forget to show the code in this notebook to follow along!

To implement and use Bokeh, let's first import some of the basics that we need from the bokeh.plotting module.

Loading output library...
Loading output library...

First, we can create a plot using the figure method and sequentially append our glyphs to the plot by calling the appropriate method and passing in data. Finally, we must display our plot. Since we are working within the Jupyterlab environment we can show the plots inline by calling output_notbook( ). figure( ) is the core object used to create plots, which handles all styling attributes, including labels, axes, grids, etc. show( ) tells Bokeh that all of the data has been added to the plot and it is time to render it.

Data in Bokeh can take on different forms, but at its simplest, data is just a list of values. Let's start with a simple example.

Some Basic Examples

#Some-Basic-Examples

We've created three lists, x,y and z, and after intitiating the figure, we call the circle, line, and square methods, and have added styling attributes. Calling show and passing the instantiated figure will output the results. Now let’s run this code!

Loading output library...
Loading output library...

Along the right-hand side, the default toolbar is displayed. The tools include drag, box zoom, wheel zoom, save, reset, and help. Using these tools, a user can pan along the plot or zoom in on interesting portions of the data. The beauty about plotting Bokeh on Kyso is that the rendered jupyter notebook is posted directly to the web, where references to BokehJS will work smoothly, meaning the notebook can be shared with anyone for data exploration and analysis.

Ok let's have a look at a few other examples!

Loading output library...
Loading output library...

Above, we've set new parameters, namely the width and height of the plot, we've added a circle renderer, setting the size of the circles, as well as some other styling attributes.

Loading output library...
Loading output library...

This time we've added a square renderer, applied different sizes to our range of data points, and with alpha we can set the color transparency.

Loading output library...
Loading output library...

Now we've added both a line and circles to the same data list, as well as adding a plot title.

With a list of categorical values (factors), we can create some bar charts!

Loading output library...
Loading output library...

Sometimes we want to group bars together, instead of stacking them. Bokeh can handle up to three levels of nested (hierarchical) categories, and will automatically group output according to the outermost level. To specify neted categorical coordinates, the columns of the data source should contain tuples.

Values in other columns correspond to each item in x, exactly as in other cases. When plotting with these kinds of nested coordinates, we must tell Bokeh the contents and order the axis range, by explicitly passing a FactorRange to figure. In the example below, this is seen as

Loading output library...
Loading output library...

See below for some more styling attributes that can be applied to bokeh plots.

Loading output library...
Loading output library...

Bokeh and Pandas

#Bokeh-and-Pandas

What happens when the data you're working with is of an external format and comprises of thousands of rows of data? Pandas, a widely-used data science library, is ideally suited to large-scale data manipulation and analysis, and just so happens to integrate seamlessly with Bokeh to create interactive visualizations of data.

Let’s take a look at our first dataframe we'll be using in this brief guide.

1. Categorical Data - Pokemon Stats

#1.-Categorical-Data---Pokemon-Stats
Loading output library...

The Bokeh ColumnDataSource

#The-Bokeh-ColumnDataSource

Linking a Pandas DataFrame with Bokeh visualizations is relatively simple to comprehend, and is accomplished with Bokeh's object ColumnDataSource integration. The object’s constructor accepts a Pandas DataFrame as an argument, and can then be passed to glyph methods via certain parameters as you'll see below. We can reference column names from this constructor. We import this, along with Bokeh's Hovertool to further highlight Bokeh's interactivity.

So below we create a variable source which is assigned to the ColumnDataSource object with our dataframe as its argumnet. We also create a list of the type of pokemons, so that we can create a color palette equal to the number of unique pokemon types for when we're graphing.

Next, we initiate our figure object and call the circle glyph method to plot our data. The source variable that holds our ColumnDataSource is passed as our data source to the glyph method, along with our chosen column names. It is important to note that column names can also be passed for other parameters like size, colors and legend.

The 'tooltips property of HoverTool takes a list of tuples, the first of which is the name to be displayed when the cursor is hovered over a partciular data point, and the second is a column name from ColumnDataSource prefaced with @. We then add it to the plot using the add_tool( ) method.

Loading output library...
Loading output library...

Categorical Data and Bar Charts

#Categorical-Data-and-Bar-Charts

Let's divide our data into groups of the pokemons' type 1. Perhaps we'd like to zero in on a particular attribute, like the average Defense of each pokemon Type. The Type 1 here is our feature, or factor variable. In this section, we’ll use categorical data as our x-axis values in Bokeh and create vertical bar charts.

So first we'll group our dataset by Type 1 and get the average of each type's score.

Loading output library...

Again, we'll assign the ColumnDataSource with the grouped data as our argument to our source variable.

Loading output library...
Loading output library...

Notice above that we have used some of Bokeh's larger palettes to assign a different color to each pokemon type. You can read more about some of the available palettes on Bokeh's palettes page. We've also used math's pi in order to rotate the x-axis labels & make them legible.

To plot the average scores for each Type and for each attribute, we can use Bokeh's stacked bar chart, which is done using the vbar_stack glyph, and by setting the parameter stackers to the column names.

Loading output library...
Loading output library...

2. Time Series Data - Company Stocks

#2.-Time-Series-Data---Company-Stocks

Ok, time for another data set! Let's take a quick look at examples of plotting time series data with Bokeh. For this, I've downloaded some stock data from Bokeh's sample data libray.

import bokeh.sampledata

bokeh.sampledata.download()

So we have stock data for Apple, Google and IBM stocks and have converted the data to datetime format, so that it'll integrate with Bokeh more smoothly. This is also a good example of how we can plot multiple different data sets on the same plot.

If you look at the code below you'll see that each data set is stored in a separte ColumnDataSource object. We set the x_axis_type to datetime, and we add three separate line glyphs, which all take separate data sets as its source.

Loading output library...
Loading output library...

Pretty cool. We also figured we can add hovertool to each separate line. Now let's customise the plot a bit. What if we want to highlight a particular period in time? For example, to global financial crisis. Using Bokeh's BoxAnnotation library we can specify a window and set a start and an end date to bring to the attention of the reader.

We will also add some more interactivity by setting a click_policy on our legend, such that one can now click on each legend entry to show/hide that piece of data! The click_policy can also be set to mute instead of hide, which would mute the color rather than hide completely.

Loading output library...
Loading output library...

That's it for today! This was again just another quick-start guide ot one of python's amazing plotting libraries. There are a multitude of other customisations we could make to the above plots, as well as other plots entirely. For further reading take a look a Bokeh's documentation, which provides various plotting examples and hundreds of bokeh's glyphs.

In a future blog post I will take you through a more advanced guide of creating fully-fledged web-based interactive applications and dashboards with bokeh, a capability that really sets bokeh apart from other plotting libraries.