Streaming data animation with Bokeh

Yogesh Dhande

Want to animate a chart to look like it has data is streaming in?

I wanted to build one to serve as a demo for a dashboard I am building. In this post, I'll share my approach step by step so we can build the following chart together.

streaming-example-3.gif

Let's start by getting the data we need to make the chart. I am using the CA covid data from NY Times as I've used it previously. But the following code will work with any data that is read in as a pandas dataframe.

import pandas as pd

data = pd.read_csv(
    "https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")
data["date"] = pd.to_datetime(data["date"])
data["new_cases"] = data.groupby("state")["cases"].diff()

state = "California"
california_covid_data = data[data["state"] == state].copy()
california_covid_data.head()
date state fips cases deaths new_cases
5 2020-01-25 California 6 1 0 NaN
9 2020-01-26 California 6 2 0 1.0
13 2020-01-27 California 6 2 0 0.0
17 2020-01-28 California 6 2 0 0.0
21 2020-01-29 California 6 2 0 0.0

Next step is to make the chart as it should look in its final state.

from bokeh import models, plotting, io
source = models.ColumnDataSource(california_covid_data)

p = plotting.figure(
    x_axis_label="Date", y_axis_label="New Cases",
    plot_width=800, plot_height=250, x_axis_type="datetime", tools=["hover", "wheel_zoom"]
)

p.line(x="date", y="new_cases",
       source=source,
       legend_label=state,
       width=4,
       )
io.curdoc().add_root(p)
io.output_notebook()
io.show(p)
Loading BokehJS ...

To create the animated version of this plot which shows data streaming in, we will create a periodic callback for Bokeh to execute and update the plot.

To create the streaming data effect, we will update the data provided to the plot periodically and add in more data over time. A simple method to do that is to use the index in the Pandas dataframe. Let's create an iterator that will allow us to loop over the index indefinitely.

from itertools import cycle

index_generator = cycle(range(len(california_covid_data.index)))

Calling next on this iterator will yield the next index in the dataframe. We can use that index to generate a subset of data and update the source.data property with it.

def stream():
    index = next(index_generator)
    source.data = california_covid_data.iloc[:index]

Finally, we can set a periodic callback on the Bokeh document to call the stream function every 10 milliseconds.

io.curdoc().add_periodic_callback(stream, 10)

Here's what our final code written as a single file application looks like

# streaming/main.py
from bokeh import models, plotting, io
import pandas as pd
from time import sleep
from itertools import cycle


data = pd.read_csv(
    "https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")
data["date"] = pd.to_datetime(data["date"])
data["new_cases"] = data.groupby("state")["cases"].diff()

state = "California"
california_covid_data = data[data["state"] == state].copy()


source = models.ColumnDataSource(california_covid_data)

p = plotting.figure(
    x_axis_label="Date", y_axis_label="New Cases",
    plot_width=800, plot_height=250, x_axis_type="datetime", tools=["hover", "wheel_zoom"]
)

p.line(x="date", y="new_cases",
       source=source,
       legend_label=state,
       width=4,
       )

io.curdoc().add_root(p)

index_generator = cycle(range(len(california_covid_data.index)))

def stream():
    index = next(index_generator)
    source.data = california_covid_data.iloc[:index]

io.curdoc().add_periodic_callback(stream, 10)

Let's serve the application by using the bokeh serve command.

$ bokeh serve streaming

You will notice that the animation loops very quickly. To get a nicer effect, we should pause for a few seconds on the last frame to let the viewer see the plot without any futher changes. We can do that by addding a sleep command for 2 seconds. The update stream function will look like this:

def stream():
    index = next(index_generator)
    if index + 1 == len(california_covid_data.index):
        sleep(2)

    source.data = california_covid_data.iloc[:index]

Let's update our code and run the bokeh serve command one more time.

streaming-example-2.gif

Made with REPL Notes Build your own website in minutes with Jupyter notebooks.