It's Getting Hotter in Here

After enduring the hottest Australian summer on record in 2019/20 sheltered inside with air-con and air purifier running as large parts of my home state of New South Wales burned, I decided to run some exploratory analysis on the Bureau of Meteorology’s fantastic ACORN-SAT 2.1 database.

The Australian Climate Observations Reference Network – Surface Air Temperature (ACORN-SAT) is the dataset used by the Bureau of Meteorology to monitor long-term temperature trends in Australia. ACORN-SAT uses observations from 112 weather stations in all corners of Australia, selected for the quality and length of their available temperature data.

Setup

Let’s get started by importing the required packages

import pandas as pd
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import chart_studio.plotly as py
import plotly.graph_objects as go
init_notebook_mode(connected=False)
import plotly.express as px
import numpy as np
import seaborn as sns
import string
import glob
import plotly.io as pio

Some Plotly visualisations require a Mapbox account - you can register for the basic account tier for free.

#You will need your own token
token = open(".mapbox_token").read() 

places = pd.read_csv("acorn_sat_v2.1.0_stations.csv")
places.head()

	stn_num	stn_name	lat	lon	elevation
0	1019	Kalumburu	-14.30	126.65	23
1	2079	Halls Creek	-18.23	127.67	409
2	3003	Broome	-17.95	122.24	7
3	4032	Port Hedland	-20.37	118.63	6
4	4106	Marble Bar	-21.18	119.75	182

We then create a dataframe from a CSV containing a list of weather station names and locations

The dataset of temperature readings comes split into ‘max’ and ‘min’ folders containing a CSV for each weather station. The below code loops through each CSV, creating a dataframe for each station and running some pre-processing, before concatenating them together.

#Loop through the weather station CSVs, run some preprocessing and add the dataframes to a temporary list
path = "max"
li = []
 
for count, f in enumerate(glob.glob(path + "/*.csv")):
    df = pd.read_csv(f, index_col=None, header=0)
    df['site number'] = df['site number'][0]
    df['site name'] = string.capwords(df['site name'][0])
    df['lat'] = places['lat'][count]
    df['lon'] = places['lon'][count]
    df = df.drop(df.index[0])
    df['year'] = pd.to_datetime(df['date']).dt.year
    df['long term average'] = df['maximum temperature (degC)'][(df['year']<1980) & (df['year']>1949)].mean()
    li.append(df)

#Concatenate list of dataframes into one large dataframe of nearly 4 million rows. 
df = pd.concat(li, axis=0, ignore_index=True)

#Do likewise with the weather station data from the 'min' dataset
path = "min"
li = []

for count, f in enumerate(glob.glob(path + "/*.csv")):
    df_min = pd.read_csv(f, index_col=None, header=0)
    df_min = df_min.drop(df_min.index[0])
    li.append(df_min)

df_min = pd.concat(li, axis=0, ignore_index=True)

#Add the minimum temperature data to the first dataframe
df['min'] = df_min['minimum temperature (degC)']
df.rename(columns = {"maximum temperature (degC)": "max"}, inplace=True)

df['date'] = pd.to_datetime(df['date'])
df['year'] = pd.to_datetime(df['date']).dt.year

Exploring the Australian Bureau of Meteorology data

We start by printing some basic statistics describing the dataset

df.head()

	date	max	site number	site name	lat	lon	year	long term average	min
0	1941-09-01	31.0	1019.0	Kalumburu	-14.3	126.65	1941	33.678851	20.5
1	1941-09-02	31.0	1019.0	Kalumburu	-14.3	126.65	1941	33.678851	20.5
2	1941-09-03	30.5	1019.0	Kalumburu	-14.3	126.65	1941	33.678851	19.3
3	1941-09-04	38.8	1019.0	Kalumburu	-14.3	126.65	1941	33.678851	21.0
4	1941-09-05	32.1	1019.0	Kalumburu	-14.3	126.65	1941	33.678851	19.6

df.dtypes

date                 datetime64[ns]
max                         float64
site number                 float64
site name                    object
lat                         float64
lon                         float64
year                          int64
long term average           float64
min                         float64
dtype: object

df.describe(include=[np.object])

	site name
count	3882887
unique	112
top	Cairns Aero
freq	40177

There are a not insignificant number of missing values in this dataset, however given we are exploring mean temperature values this is not especially critical.

pd.isnull(df['max']).value_counts()

False    3779214
True      103673
Name: max, dtype: int64

Visualising Extreme Heat Events

These are the days when you just can’t afford for your air-con to break. When koalas come down from their gum trees for a swim in the billabong. Using the BoM dataset let’s find out where in Australia you would be better off living underground, and whether the occurences of 40°C+ (104 °F) are on the rise.

dfheat = df.copy()

#Tag days where the temperature reading exceeded 40°C (104 °F)
dfheat['Average Days of 40C+ / Year'] = np.where(dfheat['max'] >= 40, 1, 0)

dfheat.head()

	date	max	site number	site name	lat	lon	year	long term average	min
0	1941-09-01	31.0	1019.0	Kalumburu	-14.3	126.65	1941	33.678851	20.5
1	1941-09-02	31.0	1019.0	Kalumburu	-14.3	126.65	1941	33.678851	20.5
2	1941-09-03	30.5	1019.0	Kalumburu	-14.3	126.65	1941	33.678851	19.3
3	1941-09-04	38.8	1019.0	Kalumburu	-14.3	126.65	1941	33.678851	21.0
4	1941-09-05	32.1	1019.0	Kalumburu	-14.3	126.65	1941	33.678851	19.6

Hottest Places in Australia (2019)

Let’s start by finding the Australian locations with the most days of extreme heat (40°C / 104 °F) in 2019

#Slice dataframe and select the top 10 locations
df_hottest = dfheat[dfheat['year'] == 2019]
df_hottest = df_hottest.groupby(["site name"]).sum()
df_hottest = df_hottest.sort_values('Average Days of 40C+ / Year', ascending=False)
df_hottest = df_hottest.reset_index().loc[:9, :]

df_hottest

	site name	max	site number	lat	lon	year	long term average	min	Average Days of 40C+ / Year
0	Marble Bar	13335.4	1498690.0	-7730.70	43708.75	736935	12923.318800	7473.3	154
1	Rabbit Flat	13120.6	5718090.0	-7365.70	47453.65	736935	12218.645090	5216.3	139
2	Karijini North	12920.3	1860770.0	-8139.50	43234.25	736935	11679.412979	7623.9	127
3	Tennant Creek Airport	12160.2	5524275.0	-7168.60	48975.70	736935	11404.563166	7292.2	92
4	Victoria River Downs	12802.4	5411125.0	-5986.00	47818.65	736935	12676.517644	7079.4	89
5	Camooweal Township	12520.6	13508650.0	-7270.80	50413.80	736935	11865.371179	6600.0	88
6	Halls Creek Airport	12645.8	758835.0	-6653.95	46599.55	736935	11876.945578	6926.7	77
7	Oodnadatta Airport	11400.5	6220695.0	-10059.40	49439.25	736935	10442.994556	5205.0	74
8	Birdsville Airport	11622.4	13879490.0	-9453.50	50862.75	736935	11004.352290	5546.8	74
9	Learmonth Airport	12208.1	1827555.0	-8117.60	41646.50	736935	11368.705666	6463.8	74

The fact Marble Bar (located in Western Australia) is at the top of this list shouldn’t be a surprise. Marble Bar has a hot desert climate with sweltering summers and warm winters. Most of the annual rainfall occurs in the summer. The town set a world record of most consecutive days of 100 °F (37.8 °C) or above, during a period of 160 days from 31 October 1923 to 7 April 1924.

#Create barchart visualisation using Plotly

colors = ['lightsalmon',] * 10
colors[0] = 'indianred'

fig = go.Figure([go.Bar(x=df_hottest['site name'], y=df_hottest["Average Days of 40C+ / Year"], marker_color=colors)])

fig.update_yaxes(title="Count")
fig.update_layout(title_text= "Australia's Hottest Towns: Most Frequent Days Over 40°C (104 °F)", height=450, width=800)

fig.show()

Occurence of Extreme Weather Events Over Time

Let’s look at whether the number of days over 40°C (104 °F) has increased over time.

#Group dataframe by year and obtain average of days over 40C across weather stations
dfheat = dfheat.groupby([pd.Grouper(key="date", freq="Y")]).mean()*365
dfheat = dfheat.reset_index()
dfheat['year'] = dfheat['date'].dt.year

#Select records from 1950 to present
dfheat = dfheat[dfheat['year'] >= 1950]

dfheat.head()

	date	max	site number	lat	lon	year	long term average	min	Average Days of 40C+ / Year
40	1950-12-31	8835.971305	1.525218e+07	-10921.469084	50952.185347	1950	8957.598089	4584.059041	3.966220
41	1951-12-31	9005.318544	1.511193e+07	-10866.890937	50891.841143	1951	8978.719144	4438.756589	7.076769
42	1952-12-31	8908.120587	1.514267e+07	-10879.187456	50851.405749	1952	8981.081792	4479.180043	8.558434
43	1953-12-31	8953.312144	1.520950e+07	-10895.896354	50861.229167	1953	8971.730128	4441.370431	6.604167
44	1954-12-31	8945.124239	1.520064e+07	-10886.287193	50861.239298	1954	8985.271339	4578.838965	5.784125

Visualising this data, it appears that there is indeed an upward trend showing increasing extreme weather events across Australia

#Visualise the data
fig = px.bar(dfheat, x="year", y="Average Days of 40C+ / Year",
            hover_data=['year', 'Average Days of 40C+ / Year'], 
             color_continuous_scale="hot_r",
             color='Average Days of 40C+ / Year',
             labels={'year':'Year', 'Average Days of 40C+ / Year': "Count"}, height=450)
fig.update_layout(title_text="Australia Average Days of 40°C+ (104°F) / Year")

fig.show()

Visualising Temperature Data By Season

Given that there seems to be an increasing trend of extreme weather events, let’s explore the averaged temperature data in more detail and whether there is any discernable difference between Summer & Winter records

#Set dataframe date range and group by 3 months
start = '1950-08-01'
end = '2019-12-31'
mask = (df['date'] > start) & (df['date'] <= end)
df_seasons = (
    df.loc[mask]
    .groupby(pd.Grouper(freq="3M", key="date"))
    .mean()
    .iloc[::2]
    .reset_index()
)
#Label grouped months as seasons
df_seasons.loc[1::2, 'season'] = 'Summer'
df_seasons.loc[::2, 'season'] = 'Winter'

#Create temperature range column & year column
df_seasons.loc[:, 'temp_range'] = df_seasons['max'] - df_seasons['min']
df_seasons.loc[:, 'year'] = pd.DatetimeIndex(df_seasons['date']).year

df_seasons.head()

	date	max	site number	lat	lon	year	long term average	min	season	temp_range
0	1950-08-31	19.407288	41655.182796	-29.998925	139.455806	1950	24.522322	11.402815	Winter	8.004473
1	1951-02-28	29.599024	41499.893617	-29.792340	139.485957	1951	24.579390	15.553474	Summer	14.045550
2	1951-08-31	18.503477	41499.893617	-29.792340	139.485957	1951	24.579390	8.935743	Winter	9.567734
3	1952-02-29	30.192992	41116.715789	-29.713474	139.264526	1952	24.657485	15.677567	Summer	14.515425
4	1952-08-31	18.738631	41669.875000	-29.851771	139.345833	1952	24.580083	9.362085	Winter	9.376546

In addition to the data from the Australian Bureau of Meteorology, I’ve added a dataframe with cold & warm episodes by season from the National Weather Service to identify when El Nino & La Nina events are occurring and the Oceanic Niño Index (ONI) score.

oni_values = pd.read_csv("ONI Values.csv", skiprows=2)
oni_values.head()

	Type	Season	Unnamed: 2	Unnamed: 3	JJA	JAS	ASO	SON	OND	NDJ	DJF	JFM	FMA	MAM	AMJ	MJJ
0	NaN	1950	-	1951	-0.5	-0.4	-0.4	-0.4	-0.6	-0.8	-0.8	-0.5	-0.2	0.2	0.4	0.6
1	ME	1951	-	1952	0.7	0.9	1.0	1.2	1.0	0.8	0.5	0.4	0.3	0.3	0.2	0.0
2	WE	1952	-	1953	-0.1	0.0	0.2	0.1	0.0	0.1	0.4	0.6	0.6	0.7	0.8	0.8
3	WE	1953	-	1954	0.7	0.7	0.8	0.8	0.8	0.8	0.8	0.5	0.0	-0.4	-0.5	-0.5
4	WL	1954	-	1955	-0.6	-0.8	-0.9	-0.8	-0.7	-0.7	-0.7	-0.6	-0.7	-0.8	-0.8	-0.7

#The ONI scores are now added to the corresponding season in the df_seasons dataframe
for count, f in enumerate(oni_values.values):
    df_seasons.loc[count*2, "oni_value"] = float(f[4])
    df_seasons.loc[count*2+1, "oni_value"] = float(f[10])

df_seasons = df_seasons.dropna()

Visualising Seasonal Average Daily Temperature

We can now create scatterplots showing average daily temperatures for Summer and for Winter, colour-coded with the ONI data. Red indicates an El Nino event and blue indicates La Nina.

#Create a scatterplot
fig = px.scatter(df_seasons, x="year", y="max", facet_row="season", opacity=1, 
                 labels=dict(year="Year", difference="Average Daily Temperature Range (°C)", oni_value='Oceanic Niño Index'),
                 trendline = "ols", trendline_color_override="red", color="oni_value", color_continuous_scale='bluered')

fig.update_layout(height=600, width=1000, title_text="Australian Seasonal Average Daily Temperature (1950-2020)")

fig.update_xaxes(nticks=10)
fig.update_yaxes(nticks=10, title_text="Temperature (°C)", matches=None)

fig.show()

Temperatures in Summer & Winter have both increased by approximately 1.5°C since 1950, which is also reflected in the number of extreme heat events in the previous section. The data shows that although there is no apparently correlation between ONI and temperature in Winter, Summer El Nino events appear to be correlated with with higher than average temperatures (record heat in 1973, 1983, 1997 & 2019-20 was driven by strong El Nino events), and likewise La Nina events often bring lower than average temperatures (1974-76, 1984, 2000 & 2011-12).

Visualising Average Daily Temperature Range

Using the same dataframe, we can assess whether the increase in temperatures is likely to bring about an increased daily temperature range (i.e. colder nights, hotter days) across the BoM weather station locations.

#Create a scatterplot showing average daily temperature range for Summer and one for Winter
fig = px.scatter(df_seasons, x="year", y="temp_range", facet_row="season", opacity=1, 
                 labels=dict(year="Year", temp_range="Temperature Range (°C)"),
                 trendline = "ols", trendline_color_override="red")

fig.update_layout(height=600, width=1000, title_text="Australian Average Daily Temperature Range (1950-2020)")

fig.update_xaxes(nticks=10)
fig.update_yaxes(nticks=10, matches=None)

fig.show()
pio.write_html(fig, file='figure4.html', auto_open=True)

The trendlines show a very slight increase in the average daily temperature range between 1950-2020, leading to approximately 0.5°C greater difference between min / max temperatures in both Summer & Winter.

Geospatial Historical Temperature Analysis

Given we have the coordinates of weather stations Australia-wide, let’s create an animated map showing temperature changes over time.

#Consolidate dataframe into yearly average temperatures and group by location.
dfyear = df.copy()[df['year'] >= 1950].groupby(['site name', pd.Grouper(key="date", freq="Y")]).mean()

#Using the long term average for each location between 1950-1980, we can calculate the difference in temperature for each year & location.
dfyear = (
    dfyear.dropna(subset=['max']).
    reset_index()
)
dfyear.loc[:,'+/- Long Term Average'] = (dfyear['max']-dfyear['long term average'])
dfyear = dfyear.round(1)

dfyear.head()

	site name	date	max	site number	lat	lon	year	long term average	min	+/- Long Term Average
0	Albany Airport	1950-12-31	20.1	9999.0	-34.9	117.8	1950	19.8	10.2	0.4
1	Albany Airport	1951-12-31	19.3	9999.0	-34.9	117.8	1951	19.8	10.1	-0.5
2	Albany Airport	1952-12-31	19.4	9999.0	-34.9	117.8	1952	19.8	9.4	-0.4
3	Albany Airport	1953-12-31	19.5	9999.0	-34.9	117.8	1953	19.8	10.0	-0.3
4	Albany Airport	1954-12-31	19.9	9999.0	-34.9	117.8	1954	19.8	10.0	0.1

It’s easy to create animations with Plotly’s Mapbox integration. For some of the Mapbox style types (like “satellite” used here), you’ll need a mapbox access token which can be obtained by registering for free.

fig = px.scatter_mapbox(dfyear, lat="lat", 
                        lon="lon", 
                        size="max",
                        size_max = 20,
                        hover_name="site name", 
                        hover_data=["max","min"],
                        color="+/- Long Term Average",
                        color_continuous_scale="blackbody_r", 
                        range_color =[-2,3],
                        zoom=3.8, height=800, width=1000, 
                        animation_frame="year", 
                        )
fig.update_layout(
    mapbox_style="satellite",
    mapbox_accesstoken=token,
        title={
        'text': 'AUS Surface Temperature 1950-2019<br>Source: <a href="http://www.bom.gov.au/climate/data-services/station-data.shtml">BoM</a>',
        'y':0.3,
        'x':0.05,
        'xanchor': 'left',
        'yanchor': 'top'},
        font=dict(
            color="grey",
        )
)

#Adjust pitch and bearing to adjust the rotation
fig.update_layout(margin={"r":0,"t":20,"l":0,"b":20},
                  mapbox=dict(
                      pitch=40,
                      bearing=0,
                      center=dict(
                      lat=-31,
                      lon = 134),
                  ),


                 )
fig.show()

It's Getting Hotter in Here

Exploring & Visualising Australian Historical Temperature Data in Pandas

It's Getting Hotter in Here

Exploring & Visualising Australian Historical Temperature Data in Pandas

Setup

Exploring the Australian Bureau of Meteorology data

Visualising Extreme Heat Events

Hottest Places in Australia (2019)

Occurence of Extreme Weather Events Over Time

Visualising Temperature Data By Season

Visualising Seasonal Average Daily Temperature

Visualising Average Daily Temperature Range

Geospatial Historical Temperature Analysis