Open In Colab   Open in Kaggle

Tutorial 2: A Lot of Weather Makes Climate - Exploring the ERA5 Reanalysis#

Week 1, Day 2, Ocean-Atmosphere Reanalysis

Content creators: Momme Hell

Content reviewers: Katrina Dobson, Danika Gupta, Maria Gonzalez, Will Gregory, Nahid Hasan, Sherry Mi, Beatriz Cosenza Muralles, Jenna Pearson, Chi Zhang, Ohad Zivan

Content editors: Jenna Pearson, Chi Zhang, Ohad Zivan

Production editors: Wesley Banfield, Jenna Pearson, Chi Zhang, Ohad Zivan

Our 2023 Sponsors: NASA TOPS and Google DeepMind

Tutorial Objectives#

In the previous tutorial, we learned about ENSO, which is a specific atmosphere-ocean dynamical phenomena. You will now examine the atmosphere and the ocean systems more generally.

In this tutorial, you will learn to work with reanalysis data. These data combine observations and models of the Earth system, and are a critical tool for weather and climate science. You will first utilize two methods to access a specific reanalysis dataset (ECMWF’s ERA5; through PO.DAAC and through the web Copernicus API). You will then select and mask a region of interest, investigating how important climate variables change on medium length timescales (hours to months) within this region.

By the end of this tutorial, you will be able to:

  • Access and select reanalysis data of cliamtically-important variables

  • Plot maps to explore changes on various time scales.

  • Compute and compare timeseries of different variables from reanalysis data.

Setup#

# !pip install pythia_datasets
# !pip install cartopy
# !pip install geoviews
# imports
from intake import open_catalog
import matplotlib.pyplot as plt
import matplotlib
import xarray as xr
import fsspec
import numpy as np

import boto3
import botocore
import datetime
import numpy as np
import os
import pooch
import tempfile
import geoviews as gv
import holoviews
from geoviews import Dataset as gvDataset
import geoviews.feature as gf
from geoviews import Image as gvImage

from cartopy import crs as ccrs
from cartopy import feature as cfeature

# import warnings
# #  Suppress warnings issued by Cartopy when downloading data files
# warnings.filterwarnings('ignore')

Figure Settings#

# @title Figure Settings
import ipywidgets as widgets  # interactive display

%config InlineBackend.figure_format = 'retina'
plt.style.use(
    "https://raw.githubusercontent.com/ClimateMatchAcademy/course-content/main/cma.mplstyle"
)

Helper functions#

# @title Helper functions

def pooch_load(filelocation=None, filename=None, processor=None):
    shared_location = "/home/jovyan/shared/Data/tutorials/W1D2_StateoftheClimateOceanandAtmosphereReanalysis"  # this is different for each day
    user_temp_cache = tempfile.gettempdir()

    if os.path.exists(os.path.join(shared_location, filename)):
        file = os.path.join(shared_location, filename)
    else:
        file = pooch.retrieve(
            filelocation,
            known_hash=None,
            fname=os.path.join(user_temp_cache, filename),
            processor=processor,
        )

    return file

Video 1: ECMWF Reanalysis#

Section 1: What is Reanalysis Data?#

Reanalysis refers to the process of combining historical observations from a variety of sources, such as weather stations, satellite measurments, and ocean buoys, with numerical models to create a comprehensive and consistent record of past weather and climate conditions. Reanalysis data is a useful tool to examine the Earth’s climate system over a wide range of time scales, from seasonal through decadal to century-scale changes.

There are multiple Earth system reanalysis products (e.g. MERRA-2, NCEP-NCAR, JRA-55C, see extensive list here), and no single product fits all needs. For the purposes of this tutorial you will be using a product from the European Centre for Medium-Range Weather Forecasts (ECMWF) called ECMWF Reanalysis v5 (ERA5). This video from the ECMWF provides you with a brief introduction to the ERA5 product.

Section 1.1: Accessing ERA5 Data#

You will access the data through the an AWS S3 bucket of the data: ECMWF ERA5 Reanalysis. To do this you need the name of the bucket “era5-pds”, and the file location in the bucket. Note: you can open the AWS link and find a guided tutorial on how to explore the S3 bucket.

Let’s select a specific year and month to work with, March of 2018:

era5_bucket = "era5-pds"
client = boto3.client(
    "s3", config=botocore.client.Config(signature_version=botocore.UNSIGNED)
)  # initialize aws s3 bucket client
date_sel = datetime.datetime(
    2018, 3, 1, 0
)  # select a desired date and hours (midnight is zero)
prefix = date_sel.strftime("%Y/%m/")  # format the date to match the s3 bucket
metadata_file = "main.nc"  # filename on the s3 bucket
metadata_key = prefix + metadata_file  # file location and name on the s3 bucket
filepath = pooch_load(
    filelocation="http://s3.amazonaws.com/" + era5_bucket + "/" + metadata_key,
    filename=metadata_file,
)  # open the file

ds_meta = xr.open_dataset(filepath)  # open the file
ds_meta
<xarray.Dataset>
Dimensions:                                                                                   (
                                                                                               lat: 721,
                                                                                               time1: 744,
                                                                                               time0: 744,
                                                                                               lon: 1440,
                                                                                               lon_ocean: 720,
                                                                                               lat_ocean: 361,
                                                                                               nv: 2)
Coordinates:
  * lat                                                                                       (lat) float32 ...
  * time1                                                                                     (time1) datetime64[ns] ...
  * time0                                                                                     (time0) datetime64[ns] ...
  * lon                                                                                       (lon) float32 ...
  * lon_ocean                                                                                 (lon_ocean) float32 ...
  * lat_ocean                                                                                 (lat_ocean) float32 ...
Dimensions without coordinates: nv
Data variables: (12/19)
    sea_surface_temperature                                                                   (time0, lat, lon) float32 ...
    sea_surface_wave_mean_period                                                              (time0, lat_ocean, lon_ocean) float32 ...
    air_temperature_at_2_metres_1hour_Maximum                                                 (time1, lat, lon) float32 ...
    dew_point_temperature_at_2_metres                                                         (time0, lat, lon) float32 ...
    significant_height_of_wind_and_swell_waves                                                (time0, lat_ocean, lon_ocean) float32 ...
    sea_surface_wave_from_direction                                                           (time0, lat_ocean, lon_ocean) float32 ...
    ...                                                                                        ...
    lwe_thickness_of_surface_snow_amount                                                      (time0, lat, lon) float32 ...
    snow_density                                                                              (time0, lat, lon) float32 ...
    eastward_wind_at_10_metres                                                                (time0, lat, lon) float32 ...
    integral_wrt_time_of_surface_direct_downwelling_shortwave_flux_in_air_1hour_Accumulation  (time1, lat, lon) float32 ...
    air_temperature_at_2_metres                                                               (time0, lat, lon) float32 ...
    northward_wind_at_10_metres                                                               (time0, lat, lon) float32 ...
Attributes:
    source:       Reanalysis
    institution:  ECMWF
    tilte:        ERA5 forecasts

You just loaded an xarray dataset, as introduced at the first day. This dataset contains 19 variables covering the whole globe (-90 to +90 degrees in latitude, 0 to 360 degrees on longitude) along with their respective coordinates. With this dataset you have access to our best estimates of climate parameters with a temporal resolution of 1 hour and a spatial resolution of 1/4 degree (i.e. grid points near the Equator represent a ~25 km x 25 km region). This is a lot of data, but still just a fraction the data available through the full ERA5 dataset.

Section 1.2: Selecting Regions of Interest#

The global ERA5 data over the entire time range is so large that even just one variable would be too large to store on your computer. Here you will apply a method to load a region (i.e., a spatial subset) of the data. In this first example, you will load air surface temperature at 2 meters data for a small region in the Northeastern United States. In later tutorials you will have the opportunity to select a region of your choice and to explore other climate variables.

# the order of the lat lon range has to follow the convention of the data set.
# for this dataset, longitude ranges from 0 to 360 degrees and
# latitude ranges from 90 degrees Northto 90 degrees South .

# northeastern United States
lat_range = [55.2, 30.2]  # from north to south
lon_range = [270, 295]  # from west to east
# note this can take several minutes to download
selected_vars = [
    "air_temperature_at_2_metres",
    "northward_wind_at_10_metres",
    "eastward_wind_at_10_metres",
    "surface_air_pressure",
    "sea_surface_temperature",
]  # the variables we want
s3_data_ptrn = (
    "{year}/{month}/data/{var}.nc"  # path and filename format for this S3 bucket
)
year_s3 = date_sel.strftime("%Y")  # extract the year
month_s3 = date_sel.strftime("%m")  # extract the month
ERA5_select = []  # initialize the dataset, will save a complicated check later
for var in selected_vars:  # this will download 5 files of 500Mb each.
    s3_data_key = s3_data_ptrn.format(
        year=year_s3, month=month_s3, var=var
    )  # variable specific  key
    print("Downloading %s from S3..." % s3_data_key)
    filepath = pooch_load(
        filelocation="http://s3.amazonaws.com/" + era5_bucket + "/" + s3_data_key,
        filename=s3_data_key,
    )  # open the file
    ds_temp = xr.open_dataset(filepath)  # retrieve the variable from the bucket
    if (
        ERA5_select
    ):  # check if the dataset is empty or not (first iteration of the loop)
        ERA5_select = xr.merge(
            [ERA5_select, ds_temp]
        )  # if not empty, merge the new file
    else:
        ERA5_select = ds_temp  # if empty, just assign the new file

ERA5_allvars = ERA5_select.sel(lon=slice(lon_range[0], lon_range[1])).sel(
    lat=slice(lat_range[0], lat_range[1])
)
ERA5_allvars
Downloading data from 'http://s3.amazonaws.com/era5-pds/2018/03/data/air_temperature_at_2_metres.nc' to file '/tmp/2018/03/data/air_temperature_at_2_metres.nc'.
Downloading 2018/03/data/air_temperature_at_2_metres.nc from S3...
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[8], line 20
     16 s3_data_key = s3_data_ptrn.format(
     17     year=year_s3, month=month_s3, var=var
     18 )  # variable specific  key
     19 print("Downloading %s from S3..." % s3_data_key)
---> 20 filepath = pooch_load(
     21     filelocation="http://s3.amazonaws.com/" + era5_bucket + "/" + s3_data_key,
     22     filename=s3_data_key,
     23 )  # open the file
     24 ds_temp = xr.open_dataset(filepath)  # retrieve the variable from the bucket
     25 if (
     26     ERA5_select
     27 ):  # check if the dataset is empty or not (first iteration of the loop)

Cell In[4], line 10, in pooch_load(filelocation, filename, processor)
      8     file = os.path.join(shared_location, filename)
      9 else:
---> 10     file = pooch.retrieve(
     11         filelocation,
     12         known_hash=None,
     13         fname=os.path.join(user_temp_cache, filename),
     14         processor=processor,
     15     )
     17 return file

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pooch/core.py:239, in retrieve(url, known_hash, fname, path, processor, downloader, progressbar)
    236 if downloader is None:
    237     downloader = choose_downloader(url, progressbar=progressbar)
--> 239 stream_download(url, full_path, known_hash, downloader, pooch=None)
    241 if known_hash is None:
    242     get_logger().info(
    243         "SHA256 hash of downloaded file: %s\n"
    244         "Use this value as the 'known_hash' argument of 'pooch.retrieve'"
   (...)
    247         file_hash(str(full_path)),
    248     )

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pooch/core.py:803, in stream_download(url, fname, known_hash, downloader, pooch, retry_if_failed)
    799 try:
    800     # Stream the file to a temporary so that we can safely check its
    801     # hash before overwriting the original.
    802     with temporary_file(path=str(fname.parent)) as tmp:
--> 803         downloader(url, tmp, pooch)
    804         hash_matches(tmp, known_hash, strict=True, source=str(fname.name))
    805         shutil.move(tmp, str(fname))

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pooch/downloaders.py:226, in HTTPDownloader.__call__(self, url, output_file, pooch, check_only)
    224     progress = self.progressbar
    225     progress.total = total
--> 226 for chunk in content:
    227     if chunk:
    228         output_file.write(chunk)

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/requests/models.py:816, in Response.iter_content.<locals>.generate()
    814 if hasattr(self.raw, "stream"):
    815     try:
--> 816         yield from self.raw.stream(chunk_size, decode_content=True)
    817     except ProtocolError as e:
    818         raise ChunkedEncodingError(e)

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/urllib3/response.py:628, in HTTPResponse.stream(self, amt, decode_content)
    626 else:
    627     while not is_fp_closed(self._fp):
--> 628         data = self.read(amt=amt, decode_content=decode_content)
    630         if data:
    631             yield data

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/urllib3/response.py:567, in HTTPResponse.read(self, amt, decode_content, cache_content)
    564 fp_closed = getattr(self._fp, "closed", False)
    566 with self._error_catcher():
--> 567     data = self._fp_read(amt) if not fp_closed else b""
    568     if amt is None:
    569         flush_decoder = True

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/urllib3/response.py:533, in HTTPResponse._fp_read(self, amt)
    530     return buffer.getvalue()
    531 else:
    532     # StringIO doesn't like amt=None
--> 533     return self._fp.read(amt) if amt is not None else self._fp.read()

File ~/miniconda3/envs/climatematch/lib/python3.10/http/client.py:466, in HTTPResponse.read(self, amt)
    463 if self.length is not None and amt > self.length:
    464     # clip the read to the "end of response"
    465     amt = self.length
--> 466 s = self.fp.read(amt)
    467 if not s and amt:
    468     # Ideally, we would raise IncompleteRead if the content-length
    469     # wasn't satisfied, but it might break compatibility.
    470     self._close_conn()

File ~/miniconda3/envs/climatematch/lib/python3.10/socket.py:705, in SocketIO.readinto(self, b)
    703 while True:
    704     try:
--> 705         return self._sock.recv_into(b)
    706     except timeout:
    707         self._timeout_occurred = True

KeyboardInterrupt: 

The magnitude of the wind vector represents the wind speed

(1)#\[\begin{align} ||u|| = \sqrt{u^2 + v^2} \end{align}\]

which you will use later in the tutorial and discuss in more detail in tutorial 4. We will calculate that here and add it to our dataset.

# compute ten meter wind speed
ERA5_allvars["wind_speed"] = np.sqrt(
    ERA5_allvars["northward_wind_at_10_metres"] ** 2
    + ERA5_allvars["eastward_wind_at_10_metres"] ** 2
)  # calculate the wind speed from the vectors
# name and units in the DataArray:
ERA5_allvars["wind_speed"].attrs[
    "long_name"
] = "10-meter wind speed"  # assigning long name to the windspeed
ERA5_allvars["wind_speed"].attrs["units"] = "m/s"  # assigning units
ERA5_allvars

Section 2: Plotting Spatial Maps of Reanalysis Data#

First, let’s plot the region’s surface temperature for the first time step of the reanalysis dataset. To do this let’s extract the air temperatre data from the dataset containing all the variables.

ds_surface_temp_2m = ERA5_allvars.air_temperature_at_2_metres

We will be plotting this a little bit differently that you have previously plotted a map (and differently to how you will plot in most tutorials) so we can look at a few times steps interactively later. To do this we are using the package geoviews.

holoviews.extension("bokeh")

dataset_plot = gvDataset(ds_surface_temp_2m.isel(time0=0))  # select the first time step

# create the image
images = dataset_plot.to(
    gvImage, ["longitude", "latitude"], ["air_temperature_at_2_metres"], "hour"
)

# aesthetics, add coastlines etc.
images.opts(
    cmap="coolwarm",
    colorbar=True,
    width=600,
    height=400,
    projection=ccrs.PlateCarree(),
    clabel="2m Air Temperature [K]",
) * gf.coastline

In the above figure, coastlines are shown as black lines. Most of the selected region is land, with some ocean (lower left) and a lake (top middle).

Next, we will examine variability at two different frequencies using interactive plots:

  1. Hourly variability

  2. Daily variability

Note that in the previous tutorial you computed the monthly variability, or climatology, but here you only have one month of data loaded (March 2018). If you are curious about longer timescales you will visit this in the next tutorial!

# interactive plot of hourly frequency of surface temperature
# this cell may take a little longer as it contains several maps in a single plotting function
ds_surface_temp_2m_hour = ds_surface_temp_2m.groupby("time0.hour").mean()
dataset_plot = gvDataset(
    ds_surface_temp_2m_hour.isel(hour=slice(0, 12))
)  # only the first 12 time steps, as it is a time consuming task
images = dataset_plot.to(
    gvImage, ["longitude", "latitude"], ["air_temperature_at_2_metres"], "hour"
)
images.opts(
    cmap="coolwarm",
    colorbar=True,
    width=600,
    height=400,
    projection=ccrs.PlateCarree(),
    clabel="2m Air Temperature [K]",
) * gf.coastline
# interactive plot of hourly frequency of surface temperature
# this cell may take a little longer as it contains several maps in a single plotting function holoviews.extension('bokeh')
ds_surface_temp_2m_day = ds_surface_temp_2m.groupby("time0.day").mean()
dataset_plot = gvDataset(
    ds_surface_temp_2m_day.isel(day=slice(0, 10))
)  # only the first 10 time steps, as it is a time consuming task
images = dataset_plot.to(
    gvImage, ["longitude", "latitude"], ["air_temperature_at_2_metres"], "day"
)
images.opts(
    cmap="coolwarm",
    colorbar=True,
    width=600,
    height=400,
    projection=ccrs.PlateCarree(),
    clabel="Air Temperature [K]",
) * gf.coastline

Question 2#

  1. What differences do you notice between the hourly and daily interactive plots, and are there any interesting spatial patterns of these temperature changes?

# to_remove explanation

"""
1. On hourly timescales, the largest changes are over land because it responds faster than the ocean to the diurnal cycle of solar radiation. This is because the ocean has a higher heat capacity than the land surface. On daily timescales, the surface atmospheric temperature shows comparable variations across both the ocean and land.
""";

Section 3: Plotting Timeseries of Reanalysis Data#

Section 3.1: Surface Air Temperature Timeseries#

You have demonstrated that there are a lot of changes in surface temperature within a day and between days. It is crucial to understand this temporal variability in the data when performing climate analysis.

Rather than plotting interactive spatial maps for different timescales, in this last section you will create a timeseries of surface air temperature from the data you have already examined to look at variability on longer than daily timescales. Instead of taking the mean in time to create maps, you will now take the mean in space to create timeseries.

Note that the spatially-averaged data will now only have a time coordinate coordinate, making it a timeseries (ts).

# find weights (this is a regular grid so we can use cos(lat))
weights = np.cos(np.deg2rad(ds_surface_temp_2m.lat))
weights.name = "weights"

# take the weighted spatial mean since the latitude range of the region of interest is large
ds_surface_temp_2m_ts = ds_surface_temp_2m.weighted(weights).mean(["lon", "lat"])
ds_surface_temp_2m_ts
# plot the timeseries of surface temperature

fig, ax = plt.subplots(1, 1, figsize=(10, 3))

ax.plot(ds_surface_temp_2m_ts.time0, ds_surface_temp_2m_ts)
ax.set_ylabel("2m Air \nTemperature (K)")
ax.xaxis.set_tick_params(rotation=45)

Questions 3.1#

  1. What is the dominant source of the high frequency (short timescale) variability?

  2. What drives the lower frequency variability?

  3. Would the ENSO variablity that you computed in the previous tutorial show up here? Why or why not?

# to_remove explanation
"""
1. The high frequency variability can largely be attributed to the diurnal cycle, or the differences in solar radiation between night and day. This causes large variations in surface temperature, particularly over land and shallow water.
2. The low frequency variability can be attributed to synoptic patterns (e.g., weather) which can move cold or warm air around on timescales of days to weeks.
3. We do not have a long enough time series for ENSO to show up, but ENSO could indirectly affect this timeseries by changing weather patterns on shorter timescales.
""";

Section 3.2: Comparing Timeseries of Multiple Variables#

Below you will calculate the timeseries of the surface air temperature which we just plotted, alongside timeseries of several other ERA5 variables for the same period and region: 10-meter wind speed, atmospheric surface pressure, and sea surface temperature.

ERA5_allvars_ts = ERA5_allvars.weighted(weights).mean(["lon", "lat"])
plot_vars = [
    "air_temperature_at_2_metres",
    "wind_speed",
    "surface_air_pressure",
    "sea_surface_temperature",
]

fig, ax_list = plt.subplots(len(plot_vars), 1, figsize=(10, 13), sharex=True)

for var, ax in zip(plot_vars, ax_list):

    ax.plot(ERA5_allvars_ts.time0, ERA5_allvars_ts[var])
    ax.set_ylabel(
        ERA5_allvars[var].attrs["long_name"] + ": " + ERA5_allvars[var].attrs["units"],
        fontsize=12,
    )
    ax.xaxis.set_tick_params(rotation=45)

Questions 3.2#

Which variable shows variability that is dominated by:

  1. The diurnal cycle?

  2. The synoptic [~5 day] scale?

  3. A mix of these two timescales?

  4. Longer timescales?

# to_remove explanation
"""
1. The 2-meter temperature is dominated by the diurnal cycle.
2. The surface pressure, which is usually associated with storms, is dominated by the synoptic scale.
3. The 10-meter wind speed shows influences from both the diurnal cycle and the synoptic scale.
4. The ocean surface temperature shows some sensitivity to the diurnal cycle but is dominated by longer timescale (>weeks) variations than the atmospheric variables.
""";

Bonus Section 1: Selecting a Different Spatial Region#

Define another spatial region, such as where you live, by selecting a longitude and latitude range of of your choosing. To find the longitude and latitude coordinates of your region, you can use Google Earth view, and read the position of your cursor in the lower right corner.

Bonus Section 1.1: Note About the Geographic Coordinate System and the Coordinates Used in This Dataset#

A point on Earth is described by latitude-longitude coordinates relative to the zero-meridian line going through Greenwich in London, UK (longitude = 0 degree) and the xero-latitude line along the equator (latitude = 0 degrees). Points east of Greenwich up to the dateline on the opposite side of the globe are referenced as 0 to +180 and points to the west of Greenwich are 0 to -180. -180 and +180 refer to the same longitude, the so-called dateline in the central pacific.

However, our data is referenced in a slightly different way where longitude runs from 0 to 360 rather than -180 to +180. Longitude increases as you move east of Greenwich, until you reach Greenwich again (0 or 360 degrees), rather than stopping at the dateline.

Summary#

In this tutorial, you learned how to access and process ERA5 reanalysis data. You are able to select specific regions within the reanalysis dataset and perform operations such as taking spatial and temporal averages.

You also looked at different climate variables to distinguish idenitfy the variability present at different timescales.

Resources#

Data for this tutorial can be accessed here.