Ensemble explorer

In this notebook you will see how to:

  • compute and plot ensemble mean and spread maps

  • create a stamp plot

  • create a spaghetti plot

  • compute and plot ensemble probability and percentile maps

  • generate a CDF (Cumulative Distribution Function) plot for a given location

Components of earthkit

This tutorial uses the following earthkit components - click any logo to open the package documentation:

earthkit-data earthkit-plots earthkit-transforms earthkit-geo

1. The input data

The data to work with is related to the St Jude wind storm from 2013 October. The windgust and the 850 hPa geopotential forecast were retrieved from MARS in GRIB format for a few steps on a low resolution grid to provide input for the exercises. For convenience, this data is available as an earthkit-data “sample”.

The data is loaded into 2 GRIB fieldlists.

[1]:
import earthkit.data as ekd

ds_fc = ekd.from_source("sample", "fc_storm_st_jude.grib") # hi-res forecast
ds_en = ekd.from_source("sample", "ens_storm_st_jude.grib") # ensemble forecast

Let’s start with a FieldList representation to look at the data

[2]:
fl_fc = ds_fc.to_fieldlist()
fl_en = ds_en.to_fieldlist()

You can get an overview of what the fieldlists contain using ls(). Notice the dataType for the ensemble data: “cf” means control forecast, “pf” means perturbed member forecast.

[3]:
fl_fc.ls(extra_keys=["metadata.marsType"])
[3]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type metadata.marsType
0 10fg3 2013-10-28 00:00:00 2013-10-25 3 days 00:00:00 0 surface 0 regular_ll fc
1 10fg3 2013-10-28 06:00:00 2013-10-25 3 days 06:00:00 0 surface 0 regular_ll fc
2 10fg3 2013-10-28 12:00:00 2013-10-25 3 days 12:00:00 0 surface 0 regular_ll fc
3 z 2013-10-28 00:00:00 2013-10-25 3 days 00:00:00 850 pressure 0 regular_ll fc
4 z 2013-10-28 06:00:00 2013-10-25 3 days 06:00:00 850 pressure 0 regular_ll fc
5 z 2013-10-28 12:00:00 2013-10-25 3 days 12:00:00 850 pressure 0 regular_ll fc
[4]:
fl_en.ls(extra_keys=["metadata.marsType"])
[4]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type metadata.marsType
0 10fg3 2013-10-28 00:00:00 2013-10-25 3 days 00:00:00 0 surface 0 regular_ll cf
1 10fg3 2013-10-28 06:00:00 2013-10-25 3 days 06:00:00 0 surface 0 regular_ll cf
2 10fg3 2013-10-28 12:00:00 2013-10-25 3 days 12:00:00 0 surface 0 regular_ll cf
3 10fg3 2013-10-28 00:00:00 2013-10-25 3 days 00:00:00 0 surface 1 regular_ll pf
4 10fg3 2013-10-28 00:00:00 2013-10-25 3 days 00:00:00 0 surface 2 regular_ll pf
... ... ... ... ... ... ... ... ... ...
301 z 2013-10-28 12:00:00 2013-10-25 3 days 12:00:00 850 pressure 46 regular_ll pf
302 z 2013-10-28 12:00:00 2013-10-25 3 days 12:00:00 850 pressure 47 regular_ll pf
303 z 2013-10-28 12:00:00 2013-10-25 3 days 12:00:00 850 pressure 48 regular_ll pf
304 z 2013-10-28 12:00:00 2013-10-25 3 days 12:00:00 850 pressure 49 regular_ll pf
305 z 2013-10-28 12:00:00 2013-10-25 3 days 12:00:00 850 pressure 50 regular_ll pf

306 rows × 9 columns

2. Compute the ensemble mean and spread

Some of the exercises will require the ensemble mean and spread of the windgust. We can carry out these computations in Xarray using earthkit-transforms. Since the ensemble fieldlist forms a full hypercube you can convert it to Xarray using the default options.

[5]:
# select windgust fields and convert them to Xarray
fl_fg = fl_en.sel({"parameter.variable": "10fg3"})
fg = fl_fg.to_xarray(ensure_dims="forecast_reference_time")

fg
[5]:
<xarray.Dataset> Size: 2MB
Dimensions:                  (member: 51, forecast_reference_time: 1, step: 3,
                              latitude: 31, longitude: 48)
Coordinates:
  * member                   (member) <U2 408B '0' '1' '10' '11' ... '7' '8' '9'
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 8B 2013...
  * step                     (step) timedelta64[ns] 24B 3 days 00:00:00 ... 3...
  * latitude                 (latitude) float64 248B 66.0 65.25 ... 44.25 43.5
  * longitude                (longitude) float64 384B -19.5 -18.75 ... 15.75
Data variables:
    10fg3                    (member, forecast_reference_time, step, latitude, longitude) float64 2MB ...
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF
[6]:
import earthkit.transforms as ekt

# compute ensemble mean and spread
fg_mean = ekt.ensemble.mean(fg)
fg_spread = ekt.ensemble.std(fg)

3. Create an ensemble plot

This example creates a 2x2 plot for a given timestep (=78h) of the windgust forecast. The maps in the plot show the following fields:

  • deterministic forecast

  • control forecast

  • ensemble mean

  • ensemble spread

[7]:
import earthkit.plots as ekp
[8]:
import cartopy.crs as ccrs
import datetime

# define step
step_hours = 78
step = datetime.timedelta(hours=step_hours)

figure = ekp.Figure(crs=ccrs.PlateCarree(),
                               domain=[-15,15,65,45], size=(7, 6), rows=2, columns=2)

gust_style = ekp.styles.Style(
    colors=["#85AAEE", "#208EFC", "#6CA632", "#FFB000", "#FF0000", "#7A11B1"],
    levels=[12, 15, 20, 25, 30, 35, 50],
    units="m s-1",
)

spread_style = ekp.styles.Style(colors="plasma")

# the hres forecast, GRIB
subplot = figure.add_map(0, 0)
subplot.contourf(fl_fc.sel({"parameter.variable": "10fg3", "time.step": step}), style=gust_style)
subplot.title("HRES {base_time:%Y-%m-%d %H} UTC (+{lead_time}h)")
subplot.legend(label="")

# the control forecast, GRIB
subplot = figure.add_map(0, 1)
subplot.contourf(fl_fg.sel({"metadata.marsType": "cf", "time.step": step}), style=gust_style)
subplot.title("CF {base_time:%Y-%m-%d %H} UTC (+{lead_time}h)")
subplot.legend(label="")

# the ensemble mean, Xarray
subplot = figure.add_map(1, 0)
subplot.contourf(fg_mean.sel(step=step),  style=gust_style)
subplot.title("MEAN")
subplot.legend(label="")

# the ensemble spread, Xarray
subplot = figure.add_map(1, 1)
subplot.contourf(fg_spread.sel(step=step), style=spread_style)
subplot.title("SPREAD")
subplot.legend(label="")

figure.land()
figure.coastlines()
figure.borders()

figure.show()
/var/folders/93/w0p869rx17q98wxk83gn9ys40000gn/T/ipykernel_53965/411038823.py:8: DeprecationWarning: The 'size' argument is deprecated and will be removed in a future release. Use 'figsize' instead.
  figure = ekp.Figure(crs=ccrs.PlateCarree(),
../_images/tutorials_ensemble_17_1.png

4. Creating a stamp plot

In a stamp plot all the ensemble members are plotted into a different map within the same plot. It provides us with a quick overview on how the forecast members differ from each other.

A stamp plot is about managing the layout. The code below creates an 8x8 grid and plots the ensmeble mean, the HRES forecast and the control forecast into the first row, and the perturbed members into the rest of the cells.

[9]:
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

# define step
step_hours = 78
step = datetime.timedelta(hours=step_hours)

figure = ekp.Figure(crs=ccrs.PlateCarree(), domain=[-15,15,65,45], figsize=(7, 7), rows=8, columns=8)

gust_style = ekp.styles.Style(
    colors=["#85AAEE", "#208EFC", "#6CA632", "#FFB000", "#FF0000", "#7A11B1"],
    levels=[12, 15, 20, 25, 30, 35, 50],
    units="m s-1",
)

# ensemble mean, Xarray
subplot = figure.add_map(0, 5)
subplot.contourf(fg_mean.sel(step=step), style=gust_style)
subplot.title("MEAN")

# HRES forecast, GRIB
subplot = figure.add_map(0, 6)
subplot.contourf(fl_fc.sel({"parameter.variable": "10fg3", "time.step": step}), style=gust_style)
subplot.title("HRES")

# control forecast, GRIB
subplot = figure.add_map(0, 7)
subplot.contourf(fl_fg.sel({"metadata.marsType": "cf", "time.step": step}), style=gust_style)
subplot.title("CF")

# perturbed members, GRIB
for i, f in enumerate(fl_fg.sel({"metadata.marsType": "pf", "time.step": step})):
    subplot = figure.add_map(1+i//8, i%8)
    subplot.contourf(f, style=gust_style)
    subplot.title("PF {number}")

figure.land(resolution="low")
figure.coastlines(resolution="low")
figure.borders(resolution="low")

ax = plt.axes((0.05, 0.9, 0.5, 0.01))
legends = figure.legend(ax=ax, label="")
legends[0].ax.tick_params(labelsize=8)

figure.title(
    #"ECMWF Run: {base_time!1:%Y-%m-%d %H} UTC (+{lead_time!1}h)\n{variable_name!1}",
    "ECMWF Run: {base_time:%Y-%m-%d %H} UTC (+{lead_time}h)\n{variable_name}",
    fontsize=9, horizontalalignment="left", x=0, y=0.96,
)

figure.show()
/Users/cgr/git/earthkit-plots/src/earthkit/plots/metadata/formatters.py:582: UserWarning: No key "lead_time" found in layer metadata.
  warnings.warn(f'No key "{key}" found in layer metadata.')
../_images/tutorials_ensemble_20_1.png

5. Creating a spaghetti plot

In a spaghetti plot we pick an isoline value and plot that isoline into the same map from all the ensemble members. It is a useful tool e.g. to investigate the uncertainty of the location of a trough associated with a storm.

This exercise shows you how to generate a spaghetti plot for the 850 hPa geopotential forecast using the 12500 m2/s2 isoline.

First, the geopotential data is extracted for the selected timestep.

[10]:
z_fc = fl_fc.sel({"parameter.variable": "z", "vertical.level": 850, "time.step": step})
z_en = fl_en.sel({"parameter.variable": "z", "vertical.level": 850, "time.step": step})

Next, the perturbed members, the control forecast and the hi-res forecast are plotted into the same map.

[11]:
chart = ekp.Map(crs=ccrs.PlateCarree(), domain=[-15, 15, 65, 45], size=(7,7))

# the isoline value
cont_level = [12500]

# perturbed members
for f in z_en.sel({"metadata.marsType": "pf"}):
    chart.contour(f, levels=cont_level, linewidths=[0.2, 0.2],colors="blue", labels=False)

# control forecast
chart.contour(z_en.sel({"metadata.marsType": "cf"}), levels=cont_level, linewidths=[3, 3],colors="red", labels=False)

# hres forecasts
chart.contour(z_fc, levels=cont_level, linewidths=[3, 3],colors="orange", labels=False)

chart.land()
chart.coastlines()
chart.borders()
chart.gridlines()

chart.title(
    "ECMWF Run: {base_time:%Y-%m-%d %H} UTC (+{lead_time}h) {variable_name} {level} hPa",
    fontsize=9
)

chart.show()
../_images/tutorials_ensemble_25_0.png

6. Generating a probability map

This exercise shows you compute the probability of having a wind gust exceeding a given threshold for a given timestep.

The computations are carried out in Xarray.

[12]:
threshold = 28
prob = fl_fg.to_xarray(ensure_dims="forecast_reference_time") > threshold
prob = prob.mean(dim="member") * 100
[13]:
# define step
step=78

chart = ekp.Map(crs=ccrs.PlateCarree(), domain=[-15, 15, 65, 45], size=(7,7))

prob_style = ekp.styles.Style(
    levels=[10,20,30,40,50,60,72],
)

chart.contourf(prob.sel(step=datetime.timedelta(hours=step)), style=prob_style)
chart.land()
chart.coastlines()
chart.borders()
chart.gridlines()
chart.legend()
chart.title("{variable_name} (+" +str(step) + " h) probability (> " + str(threshold) + " m/s)",
            fontsize=9)

chart.show()
../_images/tutorials_ensemble_30_0.png

7. Generating a percentile map

Another way of looking at the probabilities is to use a percentile map, which gives you a more detailed view about the actual distribution of the ENS forecast values.

This example shows how to generate the percentile map for 80%. The value in each gridpoint will be the windgust value below which 80% of the ENS members fall.

The computations are carried out in Xarray.

[14]:
perc = 0.8 # 80%
perc_xr = fl_fg.to_xarray(ensure_dims="forecast_reference_time").quantile(perc, dim="member")
[15]:
# define step
step=78

chart = ekp.Map(crs=ccrs.PlateCarree(), domain=[-15, 15, 65, 45], size=(7,7))

chart.contourf(perc_xr.sel(step=datetime.timedelta(hours=step)), style=gust_style)
chart.land()
chart.coastlines()
chart.borders()
chart.gridlines()
chart.legend()
chart.title("{variable_name} (+" +str(step) + " h) percentile " + str(perc*100) + " %",
            fontsize=9)

chart.show()
../_images/tutorials_ensemble_35_0.png

8. Creating a CDF plot

CDF (Cumulative Distribution Function) curves can be used to study the forecast probabilities at a given location in detail. The CDF curve constructed form ENS data tells us the probability that the forecast will be less than or equal to a given value.

In this example the CDF plot is built for the location of Reading for 3 consecutive time steps (72, 78 and 84 h). The index of the grid point to extract from each field is determined by using earthkit.geo.distance.nearest_point_haversine().

[16]:
import earthkit.geo as ekg
[17]:
import matplotlib.pyplot as plt
import numpy as np

step_hours = 78
step = datetime.timedelta(hours=step_hours)

p_ref = (51.45, -0.97)
f = fl_fg[0]
latlon = fl_fg[0].geography.latlons()

lat = latlon[0]
lon = latlon[1]

idx, dist = ekg.distance.nearest_point_haversine(p_ref, (lat, lon))
idx

colours = {72: "blue", 78: "orange", 84: "gold"}
lines = []
for step in [72, 78, 84]:
    x = fl_fg.sel({"time.step": step}).to_numpy(index=idx, flatten=True)

    # form cdf
    y = np.arange(0, 101)
    x = np.percentile(x, y)

    # make line plot object
    line, = plt.plot(x, y, label=f"step={step}h", c=colours[step])
    lines.append(line)

plt.legend(handles=lines, loc='lower right')
plt.xlabel('Wind gust (m/s)')
plt.ylabel('Percentage (%)')
plt.show()
../_images/tutorials_ensemble_39_0.png
[ ]: