After enduring the hottest Australian summer on record in 2019/20 sheltered inside with air-con and air purifier running as large parts of my home state of New South Wales burned, I decided to run some exploratory analysis on the Bureau of Meteorology’s fantastic ACORN-SAT 2.1 database.

The Australian Climate Observations Reference Network – Surface Air Temperature (ACORN-SAT) is the dataset used by the Bureau of Meteorology to monitor long-term temperature trends in Australia. ACORN-SAT uses observations from 112 weather stations in all corners of Australia, selected for the quality and length of their available temperature data.

Setup

Let’s get started by importing the required packages

import pandas as pd
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import chart_studio.plotly as py
import plotly.graph_objects as go
init_notebook_mode(connected=False)
import plotly.express as px
import numpy as np
import seaborn as sns
import string
import glob
import plotly.io as pio

Some Plotly visualisations require a Mapbox account - you can register for the basic account tier for free.

#You will need your own token
token = open(".mapbox_token").read() 
places = pd.read_csv("acorn_sat_v2.1.0_stations.csv")
places.head()
stn_num stn_name lat lon elevation
0 1019 Kalumburu -14.30 126.65 23
1 2079 Halls Creek -18.23 127.67 409
2 3003 Broome -17.95 122.24 7
3 4032 Port Hedland -20.37 118.63 6
4 4106 Marble Bar -21.18 119.75 182

We then create a dataframe from a CSV containing a list of weather station names and locations

The dataset of temperature readings comes split into ‘max’ and ‘min’ folders containing a CSV for each weather station. The below code loops through each CSV, creating a dataframe for each station and running some pre-processing, before concatenating them together.

#Loop through the weather station CSVs, run some preprocessing and add the dataframes to a temporary list
path = "max"
li = []
 
for count, f in enumerate(glob.glob(path + "/*.csv")):
    df = pd.read_csv(f, index_col=None, header=0)
    df['site number'] = df['site number'][0]
    df['site name'] = string.capwords(df['site name'][0])
    df['lat'] = places['lat'][count]
    df['lon'] = places['lon'][count]
    df = df.drop(df.index[0])
    df['year'] = pd.to_datetime(df['date']).dt.year
    df['long term average'] = df['maximum temperature (degC)'][(df['year']<1980) & (df['year']>1949)].mean()
    li.append(df)

#Concatenate list of dataframes into one large dataframe of nearly 4 million rows. 
df = pd.concat(li, axis=0, ignore_index=True)
#Do likewise with the weather station data from the 'min' dataset
path = "min"
li = []

for count, f in enumerate(glob.glob(path + "/*.csv")):
    df_min = pd.read_csv(f, index_col=None, header=0)
    df_min = df_min.drop(df_min.index[0])
    li.append(df_min)

df_min = pd.concat(li, axis=0, ignore_index=True)
#Add the minimum temperature data to the first dataframe
df['min'] = df_min['minimum temperature (degC)']
df.rename(columns = {"maximum temperature (degC)": "max"}, inplace=True)

df['date'] = pd.to_datetime(df['date'])
df['year'] = pd.to_datetime(df['date']).dt.year

Exploring the Australian Bureau of Meteorology data

We start by printing some basic statistics describing the dataset

df.head()
date max site number site name lat lon year long term average min
0 1941-09-01 31.0 1019.0 Kalumburu -14.3 126.65 1941 33.678851 20.5
1 1941-09-02 31.0 1019.0 Kalumburu -14.3 126.65 1941 33.678851 20.5
2 1941-09-03 30.5 1019.0 Kalumburu -14.3 126.65 1941 33.678851 19.3
3 1941-09-04 38.8 1019.0 Kalumburu -14.3 126.65 1941 33.678851 21.0
4 1941-09-05 32.1 1019.0 Kalumburu -14.3 126.65 1941 33.678851 19.6
df.dtypes
date                 datetime64[ns]
max                         float64
site number                 float64
site name                    object
lat                         float64
lon                         float64
year                          int64
long term average           float64
min                         float64
dtype: object
df.describe(include=[np.object])
site name
count 3882887
unique 112
top Cairns Aero
freq 40177

There are a not insignificant number of missing values in this dataset, however given we are exploring mean temperature values this is not especially critical.

pd.isnull(df['max']).value_counts()
False    3779214
True      103673
Name: max, dtype: int64

Visualising Extreme Heat Events

These are the days when you just can’t afford for your air-con to break. When koalas come down from their gum trees for a swim in the billabong. Using the BoM dataset let’s find out where in Australia you would be better off living underground, and whether the occurences of 40°C+ (104 °F) are on the rise.

dfheat = df.copy()

#Tag days where the temperature reading exceeded 40°C (104 °F)
dfheat['Average Days of 40C+ / Year'] = np.where(dfheat['max'] >= 40, 1, 0)

dfheat.head()
date max site number site name lat lon year long term average min Average Days of 40C+ / Year
0 1941-09-01 31.0 1019.0 Kalumburu -14.3 126.65 1941 33.678851 20.5 0
1 1941-09-02 31.0 1019.0 Kalumburu -14.3 126.65 1941 33.678851 20.5 0
2 1941-09-03 30.5 1019.0 Kalumburu -14.3 126.65 1941 33.678851 19.3 0
3 1941-09-04 38.8 1019.0 Kalumburu -14.3 126.65 1941 33.678851 21.0 0
4 1941-09-05 32.1 1019.0 Kalumburu -14.3 126.65 1941 33.678851 19.6 0

Hottest Places in Australia (2019)

Let’s start by finding the Australian locations with the most days of extreme heat (40°C / 104 °F) in 2019

#Slice dataframe and select the top 10 locations
df_hottest = dfheat[dfheat['year'] == 2019]
df_hottest = df_hottest.groupby(["site name"]).sum()
df_hottest = df_hottest.sort_values('Average Days of 40C+ / Year', ascending=False)
df_hottest = df_hottest.reset_index().loc[:9, :]

df_hottest
site name max site number lat lon year long term average min Average Days of 40C+ / Year
0 Marble Bar 13335.4 1498690.0 -7730.70 43708.75 736935 12923.318800 7473.3 154
1 Rabbit Flat 13120.6 5718090.0 -7365.70 47453.65 736935 12218.645090 5216.3 139
2 Karijini North 12920.3 1860770.0 -8139.50 43234.25 736935 11679.412979 7623.9 127
3 Tennant Creek Airport 12160.2 5524275.0 -7168.60 48975.70 736935 11404.563166 7292.2 92
4 Victoria River Downs 12802.4 5411125.0 -5986.00 47818.65 736935 12676.517644 7079.4 89
5 Camooweal Township 12520.6 13508650.0 -7270.80 50413.80 736935 11865.371179 6600.0 88
6 Halls Creek Airport 12645.8 758835.0 -6653.95 46599.55 736935 11876.945578 6926.7 77
7 Oodnadatta Airport 11400.5 6220695.0 -10059.40 49439.25 736935 10442.994556 5205.0 74
8 Birdsville Airport 11622.4 13879490.0 -9453.50 50862.75 736935 11004.352290 5546.8 74
9 Learmonth Airport 12208.1 1827555.0 -8117.60 41646.50 736935 11368.705666 6463.8 74

The fact Marble Bar (located in Western Australia) is at the top of this list shouldn’t be a surprise. Marble Bar has a hot desert climate with sweltering summers and warm winters. Most of the annual rainfall occurs in the summer. The town set a world record of most consecutive days of 100 °F (37.8 °C) or above, during a period of 160 days from 31 October 1923 to 7 April 1924.

#Create barchart visualisation using Plotly

colors = ['lightsalmon',] * 10
colors[0] = 'indianred'

fig = go.Figure([go.Bar(x=df_hottest['site name'], y=df_hottest["Average Days of 40C+ / Year"], marker_color=colors)])

fig.update_yaxes(title="Count")
fig.update_layout(title_text= "Australia's Hottest Towns: Most Frequent Days Over 40°C (104 °F)", height=450, width=800)


fig.show()

Occurence of Extreme Weather Events Over Time

Let’s look at whether the number of days over 40°C (104 °F) has increased over time.

#Group dataframe by year and obtain average of days over 40C across weather stations
dfheat = dfheat.groupby([pd.Grouper(key="date", freq="Y")]).mean()*365
dfheat = dfheat.reset_index()
dfheat['year'] = dfheat['date'].dt.year

#Select records from 1950 to present
dfheat = dfheat[dfheat['year'] >= 1950]

dfheat.head()

date max site number lat lon year long term average min Average Days of 40C+ / Year
40 1950-12-31 8835.971305 1.525218e+07 -10921.469084 50952.185347 1950 8957.598089 4584.059041 3.966220
41 1951-12-31 9005.318544 1.511193e+07 -10866.890937 50891.841143 1951 8978.719144 4438.756589 7.076769
42 1952-12-31 8908.120587 1.514267e+07 -10879.187456 50851.405749 1952 8981.081792 4479.180043 8.558434
43 1953-12-31 8953.312144 1.520950e+07 -10895.896354 50861.229167 1953 8971.730128 4441.370431 6.604167
44 1954-12-31 8945.124239 1.520064e+07 -10886.287193 50861.239298 1954 8985.271339 4578.838965 5.784125

Visualising this data, it appears that there is indeed an upward trend showing increasing extreme weather events across Australia

#Visualise the data
fig = px.bar(dfheat, x="year", y="Average Days of 40C+ / Year",
            hover_data=['year', 'Average Days of 40C+ / Year'], 
             color_continuous_scale="hot_r",
             color='Average Days of 40C+ / Year',
             labels={'year':'Year', 'Average Days of 40C+ / Year': "Count"}, height=450)
fig.update_layout(title_text="Australia Average Days of 40°C+ (104°F) / Year")

fig.show()

Visualising Temperature Data By Season

Given that there seems to be an increasing trend of extreme weather events, let’s explore the averaged temperature data in more detail and whether there is any discernable difference between Summer & Winter records

#Set dataframe date range and group by 3 months
start = '1950-08-01'
end = '2019-12-31'
mask = (df['date'] > start) & (df['date'] <= end)
df_seasons = (
    df.loc[mask]
    .groupby(pd.Grouper(freq="3M", key="date"))
    .mean()
    .iloc[::2]
    .reset_index()
)
#Label grouped months as seasons
df_seasons.loc[1::2, 'season'] = 'Summer'
df_seasons.loc[::2, 'season'] = 'Winter'

#Create temperature range column & year column
df_seasons.loc[:, 'temp_range'] = df_seasons['max'] - df_seasons['min']
df_seasons.loc[:, 'year'] = pd.DatetimeIndex(df_seasons['date']).year

df_seasons.head()
date max site number lat lon year long term average min season temp_range
0 1950-08-31 19.407288 41655.182796 -29.998925 139.455806 1950 24.522322 11.402815 Winter 8.004473
1 1951-02-28 29.599024 41499.893617 -29.792340 139.485957 1951 24.579390 15.553474 Summer 14.045550
2 1951-08-31 18.503477 41499.893617 -29.792340 139.485957 1951 24.579390 8.935743 Winter 9.567734
3 1952-02-29 30.192992 41116.715789 -29.713474 139.264526 1952 24.657485 15.677567 Summer 14.515425
4 1952-08-31 18.738631 41669.875000 -29.851771 139.345833 1952 24.580083 9.362085 Winter 9.376546

In addition to the data from the Australian Bureau of Meteorology, I’ve added a dataframe with cold & warm episodes by season from the National Weather Service to identify when El Nino & La Nina events are occurring and the Oceanic Niño Index (ONI) score.

oni_values = pd.read_csv("ONI Values.csv", skiprows=2)
oni_values.head()
Type Season Unnamed: 2 Unnamed: 3 JJA JAS ASO SON OND NDJ DJF JFM FMA MAM AMJ MJJ
0 NaN 1950 - 1951 -0.5 -0.4 -0.4 -0.4 -0.6 -0.8 -0.8 -0.5 -0.2 0.2 0.4 0.6
1 ME 1951 - 1952 0.7 0.9 1.0 1.2 1.0 0.8 0.5 0.4 0.3 0.3 0.2 0.0
2 WE 1952 - 1953 -0.1 0.0 0.2 0.1 0.0 0.1 0.4 0.6 0.6 0.7 0.8 0.8
3 WE 1953 - 1954 0.7 0.7 0.8 0.8 0.8 0.8 0.8 0.5 0.0 -0.4 -0.5 -0.5
4 WL 1954 - 1955 -0.6 -0.8 -0.9 -0.8 -0.7 -0.7 -0.7 -0.6 -0.7 -0.8 -0.8 -0.7
#The ONI scores are now added to the corresponding season in the df_seasons dataframe
for count, f in enumerate(oni_values.values):
    df_seasons.loc[count*2, "oni_value"] = float(f[4])
    df_seasons.loc[count*2+1, "oni_value"] = float(f[10])

df_seasons = df_seasons.dropna()

Visualising Seasonal Average Daily Temperature

We can now create scatterplots showing average daily temperatures for Summer and for Winter, colour-coded with the ONI data. Red indicates an El Nino event and blue indicates La Nina.

#Create a scatterplot
fig = px.scatter(df_seasons, x="year", y="max", facet_row="season", opacity=1, 
                 labels=dict(year="Year", difference="Average Daily Temperature Range (°C)", oni_value='Oceanic Niño Index'),
                 trendline = "ols", trendline_color_override="red", color="oni_value", color_continuous_scale='bluered')

fig.update_layout(height=600, width=1000, title_text="Australian Seasonal Average Daily Temperature (1950-2020)")

fig.update_xaxes(nticks=10)
fig.update_yaxes(nticks=10, title_text="Temperature (°C)", matches=None)

fig.show()

Temperatures in Summer & Winter have both increased by approximately 1.5°C since 1950, which is also reflected in the number of extreme heat events in the previous section. The data shows that although there is no apparently correlation between ONI and temperature in Winter, Summer El Nino events appear to be correlated with with higher than average temperatures (record heat in 1973, 1983, 1997 & 2019-20 was driven by strong El Nino events), and likewise La Nina events often bring lower than average temperatures (1974-76, 1984, 2000 & 2011-12).

Visualising Average Daily Temperature Range

Using the same dataframe, we can assess whether the increase in temperatures is likely to bring about an increased daily temperature range (i.e. colder nights, hotter days) across the BoM weather station locations.

#Create a scatterplot showing average daily temperature range for Summer and one for Winter
fig = px.scatter(df_seasons, x="year", y="temp_range", facet_row="season", opacity=1, 
                 labels=dict(year="Year", temp_range="Temperature Range (°C)"),
                 trendline = "ols", trendline_color_override="red")

fig.update_layout(height=600, width=1000, title_text="Australian Average Daily Temperature Range (1950-2020)")

fig.update_xaxes(nticks=10)
fig.update_yaxes(nticks=10, matches=None)

fig.show()
pio.write_html(fig, file='figure4.html', auto_open=True)

The trendlines show a very slight increase in the average daily temperature range between 1950-2020, leading to approximately 0.5°C greater difference between min / max temperatures in both Summer & Winter.

Geospatial Historical Temperature Analysis

Given we have the coordinates of weather stations Australia-wide, let’s create an animated map showing temperature changes over time.

#Consolidate dataframe into yearly average temperatures and group by location.
dfyear = df.copy()[df['year'] >= 1950].groupby(['site name', pd.Grouper(key="date", freq="Y")]).mean()

#Using the long term average for each location between 1950-1980, we can calculate the difference in temperature for each year & location.
dfyear = (
    dfyear.dropna(subset=['max']).
    reset_index()
)
dfyear.loc[:,'+/- Long Term Average'] = (dfyear['max']-dfyear['long term average'])
dfyear = dfyear.round(1)

dfyear.head()
site name date max site number lat lon year long term average min +/- Long Term Average
0 Albany Airport 1950-12-31 20.1 9999.0 -34.9 117.8 1950 19.8 10.2 0.4
1 Albany Airport 1951-12-31 19.3 9999.0 -34.9 117.8 1951 19.8 10.1 -0.5
2 Albany Airport 1952-12-31 19.4 9999.0 -34.9 117.8 1952 19.8 9.4 -0.4
3 Albany Airport 1953-12-31 19.5 9999.0 -34.9 117.8 1953 19.8 10.0 -0.3
4 Albany Airport 1954-12-31 19.9 9999.0 -34.9 117.8 1954 19.8 10.0 0.1

It’s easy to create animations with Plotly’s Mapbox integration. For some of the Mapbox style types (like “satellite” used here), you’ll need a mapbox access token which can be obtained by registering for free.

fig = px.scatter_mapbox(dfyear, lat="lat", 
                        lon="lon", 
                        size="max",
                        size_max = 20,
                        hover_name="site name", 
                        hover_data=["max","min"],
                        color="+/- Long Term Average",
                        color_continuous_scale="blackbody_r", 
                        range_color =[-2,3],
                        zoom=3.8, height=800, width=1000, 
                        animation_frame="year", 
                        )
fig.update_layout(
    mapbox_style="satellite",
    mapbox_accesstoken=token,
        title={
        'text': 'AUS Surface Temperature 1950-2019<br>Source: <a href="http://www.bom.gov.au/climate/data-services/station-data.shtml">BoM</a>',
        'y':0.3,
        'x':0.05,
        'xanchor': 'left',
        'yanchor': 'top'},
        font=dict(
            color="grey",
        )
)

#Adjust pitch and bearing to adjust the rotation
fig.update_layout(margin={"r":0,"t":20,"l":0,"b":20},
                  mapbox=dict(
                      pitch=40,
                      bearing=0,
                      center=dict(
                      lat=-31,
                      lon = 134),
                  ),


                 )
fig.show()