Generating Climate Temperature Spirals in Python

Posted on Thu 29 August 2019 in posts • 7 min read

Welcome to my first post!

In this post, we will try to recreate the famous animated visulatization of climate scientist Ed Hawkins, that captivated the crowds from the moment Hawkins tweeted it back in 2017.
By the end of this post, you will be able to:

  • Manipulate data using simple Pandas and Numpy functions
  • learn how to plot with Polar coordinates using Matplotlib
  • Create an animation from multiple plots using Matplotlib's animation module

This project was inspired by Dataquest's post. To try the code by yourself, kindly visit my Github repository. Any comments are welcomed!

Let's start by taking a look at the original animation:

In [8]:
%%html
<style>.iframe-container {
  overflow: hidden;
  padding-top: 56.25%;
  position: relative;
}
 
.iframe-container iframe {
   border: 0;
   height: 100%;
   left: 0;
   position: absolute;
   top: 0;
   width: 100%;
}</style>
<div class="iframe-container">
<iframe src="https://www.youtube.com/embed/wXrYvd-LBu0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>

This visualization shows the deviations from the average temperature between 1850 and 2016. It was reshared millions of times over Twitter and Facebook. To understand the motivation behind this animation, check Ed Hawkins' website.

Exploring the Dataset

The underlying data was released by the Met Office in the United Kingdon, which does excellent work on weather and climate forecasting. The dataset can be downloaded directly here.
Let's first import the libraries needed in this project:

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

Let's download this dataset to our workspace:

In [3]:
from urllib.request import Request,urlopen
req=Request("https://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/time_series/HadCRUT.4.6.0.0.monthly_ns_avg.txt",headers={'User-Agent': 'Mozilla/5.0'})
content=urlopen(req).read()
file=open("hadcrut.txt","wb")
file.write(content)
file.close()

Next, we need to read the dataset into a Pandas DataFrame:

In [4]:
hadcrut=pd.read_csv(file.name,delim_whitespace=True,usecols=[0,1],header=None)
hadcrut.head()
Out[4]:
0 1
0 1850/01 -0.700
1 1850/02 -0.286
2 1850/03 -0.732
3 1850/04 -0.563
4 1850/05 -0.327

This dataset contains two columns:

  • The first column represents the month/year of recording
  • The second column represents the deviations from average temperature

Data Cleaning

Now, we need to:

  • split the first column into month and year columns
  • rename the column 1 to value
  • select and save all but the column 0
In [5]:
hadcrut["month"]=hadcrut[0].str.split("/").str[1].astype(int)
hadcrut["year"]=hadcrut[0].str.split("/").str[0].astype(int)
hadcrut.rename(columns={1:"value"},inplace=True)
hadcrut=hadcrut[["value","month","year"]].copy()
In [6]:
hadcrut.head()
Out[6]:
value month year
0 -0.700 1 1850
1 -0.286 2 1850
2 -0.732 3 1850
3 -0.563 4 1850
4 -0.327 5 1850
In [7]:
hadcrut["year"].value_counts(ascending=True).head()
Out[7]:
2019     6
1958    12
1959    12
1960    12
1961    12
Name: year, dtype: int64

In order to keep our data consistent and tidy, we will remove the rows containing data from 2019, since it is the only year with 6 months, not 12 months:

In [8]:
hadcrut=hadcrut.drop(hadcrut[hadcrut["year"]==2019].index)
hadcrut["year"].value_counts(ascending=True).head()
Out[8]:
1850    12
1957    12
1958    12
1959    12
1960    12
Name: year, dtype: int64

Lastly, let’s compute the mean of the global temperatures from 1850 to 1900 and subtract that value from the entire dataset. To make this easier, we’ll create a multiindex using the year and month columns:

In [9]:
hadcrut=hadcrut.set_index(["year","month"])
hadcrut.head(20)
Out[9]:
value
year month
1850 1 -0.700
2 -0.286
3 -0.732
4 -0.563
5 -0.327
6 -0.213
7 -0.125
8 -0.237
9 -0.439
10 -0.451
11 -0.187
12 -0.257
1851 1 -0.296
2 -0.356
3 -0.479
4 -0.441
5 -0.295
6 -0.197
7 -0.212
8 -0.157

This way, we will be only modifying the "value" column:

In [10]:
hadcrut -= hadcrut.loc[1850:1900].mean()
hadcrut.head()
Out[10]:
value
year month
1850 1 -0.386559
2 0.027441
3 -0.418559
4 -0.249559
5 -0.013559

Let's reset the index to its default layout:

In [11]:
hadcrut=hadcrut.reset_index()
hadcrut.head()
Out[11]:
year month value
0 1850 1 -0.386559
1 1850 2 0.027441
2 1850 3 -0.418559
3 1850 4 -0.249559
4 1850 5 -0.013559

Preparing data for polar plotting

The key steps to recreate the visualization:

  • transforming the data for polar visualization
  • customizing the aesthetics of the plot
  • stepping through the visualization year-by-year and turning the plot into a GIF

Let's start by plotting the data for the 1850 in polar coordinates:
It is important first to adjust the data to contain no negative values, let's find the minimum temperature value:

In [12]:
hadcrut["value"].min()
Out[12]:
-0.6605588235294118

Let’s add 1 to all temperature values, so they’ll be positive but there’s still some space reserved around the origin for displaying text:

In [13]:
hc_1850=hadcrut[hadcrut["year"]==1850]
r=hc_1850["value"]+1
theta=np.linspace(0,2*np.pi,12)
In [14]:
fig=plt.figure(figsize=(8,8))
ax1=plt.subplot(111,projection="polar")
ax1.plot(theta,r)
plt.show()

Tweaking the Aesthetics

Let's remove the tick labels for both axes:

In [15]:
fig=plt.figure(figsize=(8,8))
ax1=plt.subplot(111,projection="polar")
ax1.plot(theta,r)
ax1.set_xticklabels([])
ax1.set_yticklabels([])
ax1.set_xticks([])
ax1.set_yticks([])
plt.show()

Next, let's tweak the color; we need the background color within the polar plot to be black, and the color surrounding the polar plot to be gray:

In [16]:
fig=plt.figure(figsize=(8,8))
ax1=plt.subplot(111,projection="polar")
ax1.plot(theta,r)
ax1.set_xticklabels([])
ax1.set_yticklabels([])
fig.set_facecolor("#323331")
ax1.set_facecolor("#000100")
ax1.set_xticks([])
ax1.set_yticks([])
plt.show()

Next, let's add the title:

In [17]:
fig=plt.figure(figsize=(8,8))
ax1=plt.subplot(111,projection="polar")
ax1.plot(theta,r)
ax1.set_xticklabels([])
ax1.set_yticklabels([])
fig.set_facecolor("#323331")
ax1.set_facecolor("#000100")
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title("Global Temperature Change (1850-2018)",color="white",fontsize=25)
plt.show()

Lastly, let’s add the text in the center that specifies the current year that’s being visualized:

In [18]:
fig=plt.figure(figsize=(8,8))
ax1=plt.subplot(111,projection="polar")
ax1.plot(theta,r)
ax1.set_xticklabels([])
ax1.set_yticklabels([])
fig.set_facecolor("#323331")
ax1.set_facecolor("#000100")
ax1.set_title("Global Temperature Change (1850-2018)",color="white",fontsize=25)
ax1.text(0,0,"1850",color="white",size=30,ha="center")
ax1.set_xticks([])
ax1.set_yticks([])
plt.show()

Plotting the remaining years

It is important here is to manually set the axis limit for r (or y in matplotlib). This is because matplotlib scales the size of the plot automatically based on the data that’s used. This is why, in the last step, the data for just 1850 was displayed at the edge of the plotting area.
To mimick the original animation, let’s calculate the maximum temperature value in the entire dataset and add a generous amount of padding:

In [19]:
hadcrut["value"].max()
Out[19]:
1.4244411764705882
In [20]:
ax1.set_ylim(0,3.25)
Out[20]:
(0, 3.25)

Next, let's loop over the rest of the data to generate the plots for the rest of the years:

In [21]:
fig=plt.figure(figsize=(8,8))
ax1=plt.subplot(111,projection="polar")

ax1.set_xticklabels([])
ax1.set_yticklabels([])
ax1.set_ylim(0,3.25)
fig.set_facecolor("#323331")
ax1.set_facecolor("#000100")
ax1.set_title("Global Temperature Change (1850-2018)",color="white",fontsize=25)
ax1.set_xticks([])
ax1.set_yticks([])

theta = np.linspace(0, 2*np.pi, 12)
years=hadcrut["year"].unique()

for year in years:
  r=hadcrut.loc[hadcrut["year"]==year,"value"]+1
  ax1.plot(theta,r)
plt.show()

Customizing the colors

Right now, the colors feel a bit random and don’t correspond to the gradual heating of the climate that the original visualization conveys well. In the original visualization, the colors transition from blue / purple, to green, to yellow. This color scheme is known as sequential colormap, because the progression of colors has a meaning in the data.
Essentially, we use the color parameter in the Axes.plot() method and draw colors from plt.cm.(index*2) to progress from blue to green and eventually reach yellow:

In [22]:
fig=plt.figure(figsize=(8,8))
ax1=plt.subplot(111,projection="polar")

ax1.set_xticklabels([])
ax1.set_yticklabels([])
ax1.set_ylim(0,3.25)
fig.set_facecolor("#323331")
ax1.set_facecolor("#000100")
ax1.set_title("Global Temperature Change (1850-2018)",color="white",fontsize=25)
ax1.set_xticks([])
ax1.set_yticks([])

theta = np.linspace(0, 2*np.pi, 12)
years=hadcrut["year"].unique()

for index,year in enumerate(years):
  r=hadcrut.loc[hadcrut["year"]==year,"value"]+1
  ax1.plot(theta,r,c=plt.cm.viridis(index*2))
plt.show()

Adding Temperature Rings

At this stage, the viewer can't actually understand the underlying data at all. There is no indication of temperture values in the visualization.
Next, we will add temperature rings at 0.0, 1.5, 2.0 degrees Celsius:

In [23]:
full_circle_thetas=np.linspace(0,2*np.pi,1000)
blue_one_radii=[0.0+1.0]*1000
red_one_radii=[1.5+1.0]*1000
red_two_radii=[2.0+1.0]*1000
In [24]:
fig=plt.figure(figsize=(8,8))
ax1=plt.subplot(111,projection="polar")

ax1.plot(full_circle_thetas, blue_one_radii, c='blue')
ax1.plot(full_circle_thetas, red_one_radii, c='red')
ax1.plot(full_circle_thetas, red_two_radii, c='red')
ax1.set_xticklabels([])
ax1.set_yticklabels([])
ax1.set_ylim(0,3.25)
fig.set_facecolor("#323331")
ax1.set_facecolor("#000100")
ax1.set_title("Global Temperature Change (1850-2018)",color="white",fontsize=25)
ax1.set_xticks([])
ax1.set_yticks([])

theta = np.linspace(0, 2*np.pi, 12)
years=hadcrut["year"].unique()

for index,year in enumerate(years):
  r=hadcrut.loc[hadcrut["year"]==year,"value"]+1
  ax1.plot(theta,r,c=plt.cm.viridis(index*2))
plt.show()

Next, we can add the text specifying the ring’s temperature values. All 3 of these text values are at the 0.5*pi angle, at varying distance values:

In [25]:
fig=plt.figure(figsize=(8,8))
ax1=plt.subplot(111,projection="polar")

ax1.plot(full_circle_thetas, blue_one_radii, c='blue')
ax1.plot(full_circle_thetas, red_one_radii, c='red')
ax1.plot(full_circle_thetas, red_two_radii, c='red')
ax1.set_xticklabels([])
ax1.set_yticklabels([])
ax1.set_ylim(0,3.25)
fig.set_facecolor("#323331")
ax1.set_facecolor("#000100")
ax1.set_title("Global Temperature Change (1850-2018)",color="white",fontsize=25)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.text(np.pi/2, 0.90, "0.0 C", color="blue", ha='center')
ax1.text(np.pi/2, 2.40, "1.5 C", color="red", ha='center', fontsize= 15,bbox=dict(facecolor='#000100', edgecolor='#000100'))
ax1.text(np.pi/2, 2.90, "2.0 C", color="red", ha='center', fontsize= 15,bbox=dict(facecolor='#000100', edgecolor='#000100'))

theta = np.linspace(0, 2*np.pi, 12)
years=hadcrut["year"].unique()

for index,year in enumerate(years):
  r=hadcrut.loc[hadcrut["year"]==year,"value"]+1
  ax1.plot(theta,r,c=plt.cm.viridis(index*2))

plt.show()