Ensemble explorer¶
In this notebook you will see how to:
compute and plot ensemble mean and spread maps
create a stamp plot
create a spaghetti plot
compute and plot ensemble probability and percentile maps
generate a CDF (Cumulative Distribution Function) plot for a given location
Components of earthkit¶
This tutorial uses the following earthkit components - click any logo to open the package documentation:
1. The input data¶
The data to work with is related to the St Jude wind storm from 2013 October. The windgust and the 850 hPa geopotential forecast were retrieved from MARS in GRIB format for a few steps on a low resolution grid to provide input for the exercises. For convenience, this data is available as an earthkit-data “sample”.
The data is loaded into 2 GRIB fieldlists.
[1]:
import earthkit.data as ekd
ds_fc = ekd.from_source("sample", "fc_storm_st_jude.grib") # hi-res forecast
ds_en = ekd.from_source("sample", "ens_storm_st_jude.grib") # ensemble forecast
Let’s start with a FieldList representation to look at the data
[2]:
fl_fc = ds_fc.to_fieldlist()
fl_en = ds_en.to_fieldlist()
You can get an overview of what the fieldlists contain using ls(). Notice the dataType for the ensemble data: “cf” means control forecast, “pf” means perturbed member forecast.
[3]:
fl_fc.ls(extra_keys=["metadata.marsType"])
[3]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | metadata.marsType | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 10fg3 | 2013-10-28 00:00:00 | 2013-10-25 | 3 days 00:00:00 | 0 | surface | 0 | regular_ll | fc |
| 1 | 10fg3 | 2013-10-28 06:00:00 | 2013-10-25 | 3 days 06:00:00 | 0 | surface | 0 | regular_ll | fc |
| 2 | 10fg3 | 2013-10-28 12:00:00 | 2013-10-25 | 3 days 12:00:00 | 0 | surface | 0 | regular_ll | fc |
| 3 | z | 2013-10-28 00:00:00 | 2013-10-25 | 3 days 00:00:00 | 850 | pressure | 0 | regular_ll | fc |
| 4 | z | 2013-10-28 06:00:00 | 2013-10-25 | 3 days 06:00:00 | 850 | pressure | 0 | regular_ll | fc |
| 5 | z | 2013-10-28 12:00:00 | 2013-10-25 | 3 days 12:00:00 | 850 | pressure | 0 | regular_ll | fc |
[4]:
fl_en.ls(extra_keys=["metadata.marsType"])
[4]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | metadata.marsType | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 10fg3 | 2013-10-28 00:00:00 | 2013-10-25 | 3 days 00:00:00 | 0 | surface | 0 | regular_ll | cf |
| 1 | 10fg3 | 2013-10-28 06:00:00 | 2013-10-25 | 3 days 06:00:00 | 0 | surface | 0 | regular_ll | cf |
| 2 | 10fg3 | 2013-10-28 12:00:00 | 2013-10-25 | 3 days 12:00:00 | 0 | surface | 0 | regular_ll | cf |
| 3 | 10fg3 | 2013-10-28 00:00:00 | 2013-10-25 | 3 days 00:00:00 | 0 | surface | 1 | regular_ll | pf |
| 4 | 10fg3 | 2013-10-28 00:00:00 | 2013-10-25 | 3 days 00:00:00 | 0 | surface | 2 | regular_ll | pf |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 301 | z | 2013-10-28 12:00:00 | 2013-10-25 | 3 days 12:00:00 | 850 | pressure | 46 | regular_ll | pf |
| 302 | z | 2013-10-28 12:00:00 | 2013-10-25 | 3 days 12:00:00 | 850 | pressure | 47 | regular_ll | pf |
| 303 | z | 2013-10-28 12:00:00 | 2013-10-25 | 3 days 12:00:00 | 850 | pressure | 48 | regular_ll | pf |
| 304 | z | 2013-10-28 12:00:00 | 2013-10-25 | 3 days 12:00:00 | 850 | pressure | 49 | regular_ll | pf |
| 305 | z | 2013-10-28 12:00:00 | 2013-10-25 | 3 days 12:00:00 | 850 | pressure | 50 | regular_ll | pf |
306 rows × 9 columns
2. Compute the ensemble mean and spread¶
Some of the exercises will require the ensemble mean and spread of the windgust. We can carry out these computations in Xarray using earthkit-transforms. Since the ensemble fieldlist forms a full hypercube you can convert it to Xarray using the default options.
[5]:
# select windgust fields and convert them to Xarray
fl_fg = fl_en.sel({"parameter.variable": "10fg3"})
fg = fl_fg.to_xarray(ensure_dims="forecast_reference_time")
fg
[5]:
<xarray.Dataset> Size: 2MB
Dimensions: (member: 51, forecast_reference_time: 1, step: 3,
latitude: 31, longitude: 48)
Coordinates:
* member (member) <U2 408B '0' '1' '10' '11' ... '7' '8' '9'
* forecast_reference_time (forecast_reference_time) datetime64[ns] 8B 2013...
* step (step) timedelta64[ns] 24B 3 days 00:00:00 ... 3...
* latitude (latitude) float64 248B 66.0 65.25 ... 44.25 43.5
* longitude (longitude) float64 384B -19.5 -18.75 ... 15.75
Data variables:
10fg3 (member, forecast_reference_time, step, latitude, longitude) float64 2MB ...
Attributes:
Conventions: CF-1.8
institution: ECMWF[6]:
import earthkit.transforms as ekt
# compute ensemble mean and spread
fg_mean = ekt.ensemble.mean(fg)
fg_spread = ekt.ensemble.std(fg)
3. Create an ensemble plot¶
This example creates a 2x2 plot for a given timestep (=78h) of the windgust forecast. The maps in the plot show the following fields:
deterministic forecast
control forecast
ensemble mean
ensemble spread
[7]:
import earthkit.plots as ekp
[8]:
import cartopy.crs as ccrs
import datetime
# define step
step_hours = 78
step = datetime.timedelta(hours=step_hours)
figure = ekp.Figure(crs=ccrs.PlateCarree(),
domain=[-15,15,65,45], size=(7, 6), rows=2, columns=2)
gust_style = ekp.styles.Style(
colors=["#85AAEE", "#208EFC", "#6CA632", "#FFB000", "#FF0000", "#7A11B1"],
levels=[12, 15, 20, 25, 30, 35, 50],
units="m s-1",
)
spread_style = ekp.styles.Style(colors="plasma")
# the hres forecast, GRIB
subplot = figure.add_map(0, 0)
subplot.contourf(fl_fc.sel({"parameter.variable": "10fg3", "time.step": step}), style=gust_style)
subplot.title("HRES {base_time:%Y-%m-%d %H} UTC (+{lead_time}h)")
subplot.legend(label="")
# the control forecast, GRIB
subplot = figure.add_map(0, 1)
subplot.contourf(fl_fg.sel({"metadata.marsType": "cf", "time.step": step}), style=gust_style)
subplot.title("CF {base_time:%Y-%m-%d %H} UTC (+{lead_time}h)")
subplot.legend(label="")
# the ensemble mean, Xarray
subplot = figure.add_map(1, 0)
subplot.contourf(fg_mean.sel(step=step), style=gust_style)
subplot.title("MEAN")
subplot.legend(label="")
# the ensemble spread, Xarray
subplot = figure.add_map(1, 1)
subplot.contourf(fg_spread.sel(step=step), style=spread_style)
subplot.title("SPREAD")
subplot.legend(label="")
figure.land()
figure.coastlines()
figure.borders()
figure.show()
/var/folders/93/w0p869rx17q98wxk83gn9ys40000gn/T/ipykernel_53965/411038823.py:8: DeprecationWarning: The 'size' argument is deprecated and will be removed in a future release. Use 'figsize' instead.
figure = ekp.Figure(crs=ccrs.PlateCarree(),
4. Creating a stamp plot¶
In a stamp plot all the ensemble members are plotted into a different map within the same plot. It provides us with a quick overview on how the forecast members differ from each other.
A stamp plot is about managing the layout. The code below creates an 8x8 grid and plots the ensmeble mean, the HRES forecast and the control forecast into the first row, and the perturbed members into the rest of the cells.
[9]:
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
# define step
step_hours = 78
step = datetime.timedelta(hours=step_hours)
figure = ekp.Figure(crs=ccrs.PlateCarree(), domain=[-15,15,65,45], figsize=(7, 7), rows=8, columns=8)
gust_style = ekp.styles.Style(
colors=["#85AAEE", "#208EFC", "#6CA632", "#FFB000", "#FF0000", "#7A11B1"],
levels=[12, 15, 20, 25, 30, 35, 50],
units="m s-1",
)
# ensemble mean, Xarray
subplot = figure.add_map(0, 5)
subplot.contourf(fg_mean.sel(step=step), style=gust_style)
subplot.title("MEAN")
# HRES forecast, GRIB
subplot = figure.add_map(0, 6)
subplot.contourf(fl_fc.sel({"parameter.variable": "10fg3", "time.step": step}), style=gust_style)
subplot.title("HRES")
# control forecast, GRIB
subplot = figure.add_map(0, 7)
subplot.contourf(fl_fg.sel({"metadata.marsType": "cf", "time.step": step}), style=gust_style)
subplot.title("CF")
# perturbed members, GRIB
for i, f in enumerate(fl_fg.sel({"metadata.marsType": "pf", "time.step": step})):
subplot = figure.add_map(1+i//8, i%8)
subplot.contourf(f, style=gust_style)
subplot.title("PF {number}")
figure.land(resolution="low")
figure.coastlines(resolution="low")
figure.borders(resolution="low")
ax = plt.axes((0.05, 0.9, 0.5, 0.01))
legends = figure.legend(ax=ax, label="")
legends[0].ax.tick_params(labelsize=8)
figure.title(
#"ECMWF Run: {base_time!1:%Y-%m-%d %H} UTC (+{lead_time!1}h)\n{variable_name!1}",
"ECMWF Run: {base_time:%Y-%m-%d %H} UTC (+{lead_time}h)\n{variable_name}",
fontsize=9, horizontalalignment="left", x=0, y=0.96,
)
figure.show()
/Users/cgr/git/earthkit-plots/src/earthkit/plots/metadata/formatters.py:582: UserWarning: No key "lead_time" found in layer metadata.
warnings.warn(f'No key "{key}" found in layer metadata.')
5. Creating a spaghetti plot¶
In a spaghetti plot we pick an isoline value and plot that isoline into the same map from all the ensemble members. It is a useful tool e.g. to investigate the uncertainty of the location of a trough associated with a storm.
This exercise shows you how to generate a spaghetti plot for the 850 hPa geopotential forecast using the 12500 m2/s2 isoline.
First, the geopotential data is extracted for the selected timestep.
[10]:
z_fc = fl_fc.sel({"parameter.variable": "z", "vertical.level": 850, "time.step": step})
z_en = fl_en.sel({"parameter.variable": "z", "vertical.level": 850, "time.step": step})
Next, the perturbed members, the control forecast and the hi-res forecast are plotted into the same map.
[11]:
chart = ekp.Map(crs=ccrs.PlateCarree(), domain=[-15, 15, 65, 45], size=(7,7))
# the isoline value
cont_level = [12500]
# perturbed members
for f in z_en.sel({"metadata.marsType": "pf"}):
chart.contour(f, levels=cont_level, linewidths=[0.2, 0.2],colors="blue", labels=False)
# control forecast
chart.contour(z_en.sel({"metadata.marsType": "cf"}), levels=cont_level, linewidths=[3, 3],colors="red", labels=False)
# hres forecasts
chart.contour(z_fc, levels=cont_level, linewidths=[3, 3],colors="orange", labels=False)
chart.land()
chart.coastlines()
chart.borders()
chart.gridlines()
chart.title(
"ECMWF Run: {base_time:%Y-%m-%d %H} UTC (+{lead_time}h) {variable_name} {level} hPa",
fontsize=9
)
chart.show()
6. Generating a probability map¶
This exercise shows you compute the probability of having a wind gust exceeding a given threshold for a given timestep.
The computations are carried out in Xarray.
[12]:
threshold = 28
prob = fl_fg.to_xarray(ensure_dims="forecast_reference_time") > threshold
prob = prob.mean(dim="member") * 100
[13]:
# define step
step=78
chart = ekp.Map(crs=ccrs.PlateCarree(), domain=[-15, 15, 65, 45], size=(7,7))
prob_style = ekp.styles.Style(
levels=[10,20,30,40,50,60,72],
)
chart.contourf(prob.sel(step=datetime.timedelta(hours=step)), style=prob_style)
chart.land()
chart.coastlines()
chart.borders()
chart.gridlines()
chart.legend()
chart.title("{variable_name} (+" +str(step) + " h) probability (> " + str(threshold) + " m/s)",
fontsize=9)
chart.show()
7. Generating a percentile map¶
Another way of looking at the probabilities is to use a percentile map, which gives you a more detailed view about the actual distribution of the ENS forecast values.
This example shows how to generate the percentile map for 80%. The value in each gridpoint will be the windgust value below which 80% of the ENS members fall.
The computations are carried out in Xarray.
[14]:
perc = 0.8 # 80%
perc_xr = fl_fg.to_xarray(ensure_dims="forecast_reference_time").quantile(perc, dim="member")
[15]:
# define step
step=78
chart = ekp.Map(crs=ccrs.PlateCarree(), domain=[-15, 15, 65, 45], size=(7,7))
chart.contourf(perc_xr.sel(step=datetime.timedelta(hours=step)), style=gust_style)
chart.land()
chart.coastlines()
chart.borders()
chart.gridlines()
chart.legend()
chart.title("{variable_name} (+" +str(step) + " h) percentile " + str(perc*100) + " %",
fontsize=9)
chart.show()
8. Creating a CDF plot¶
CDF (Cumulative Distribution Function) curves can be used to study the forecast probabilities at a given location in detail. The CDF curve constructed form ENS data tells us the probability that the forecast will be less than or equal to a given value.
In this example the CDF plot is built for the location of Reading for 3 consecutive time steps (72, 78 and 84 h). The index of the grid point to extract from each field is determined by using earthkit.geo.distance.nearest_point_haversine().
[16]:
import earthkit.geo as ekg
[17]:
import matplotlib.pyplot as plt
import numpy as np
step_hours = 78
step = datetime.timedelta(hours=step_hours)
p_ref = (51.45, -0.97)
f = fl_fg[0]
latlon = fl_fg[0].geography.latlons()
lat = latlon[0]
lon = latlon[1]
idx, dist = ekg.distance.nearest_point_haversine(p_ref, (lat, lon))
idx
colours = {72: "blue", 78: "orange", 84: "gold"}
lines = []
for step in [72, 78, 84]:
x = fl_fg.sel({"time.step": step}).to_numpy(index=idx, flatten=True)
# form cdf
y = np.arange(0, 101)
x = np.percentile(x, y)
# make line plot object
line, = plt.plot(x, y, label=f"step={step}h", c=colours[step])
lines.append(line)
plt.legend(handles=lines, loc='lower right')
plt.xlabel('Wind gust (m/s)')
plt.ylabel('Percentage (%)')
plt.show()
[ ]: