Data from different sources¶

In this notebook, we will compare global temperature anomalies from two datasets:

ERA5 reanalysis from ECMWF, available through the Climate Data Store.
HADCRUT5, a dataset of global historical surface temperature anomalies from the UK Met Office, available on their website as a direct download.

We will then compare the anomalies from each of these datasets for July 2024 - one of the hottest months on record globally.

The bigger picture¶

This notebook demonstrates that we can get data from:

Two different institutions (ECMWF and the UK Met Office)
From two different source (CDS and URL)
In two different formats (GRIB and netCDF)
And that the earthkit ecosystem will treat them as “equal citizens”, with earthkit tools working with the same API in both cases.

Components of earthkit¶

This tutorial uses the following earthkit components - click any logo to open the package documentation:

earthkit-data earthkit-transforms earthkit-plots

We first import all of the earthkit packages required for the tutorial.

[1]:

import earthkit.data as ekd
import earthkit.transforms as ekt
import earthkit.plots as ekp

1. Getting the data¶

Let’s do a comparison of the temperature anomalies for July 2024 (one of the hottest months on record globally) from ERA5 and HADCRUT5.

1.1 ERA5¶

In order to get temperature anomalies for July 2024 from ERA5, we need to acces the ERA5 monthly averaged reanalysis dataset from the CDS.

In order to access ERA5 renalysis data, you will need an account on the Copernicus Climate Data Store (CDS). If you do not have an account, please visit the CDS website and register for an account. Then, follow these instructions (step 1 only) to set up your API key.

HADCRUT5 data is an anomaly against the 1961-1991 average, so we need to retrieve:

Data for every July from 1961-1991. This is our reference period.
Data for July 2024, to calculate the anomaly.

[2]:

era5_reference_data = ekd.from_source(
    "cds",
    "reanalysis-era5-single-levels-monthly-means",
    {
        "product_type": "monthly_averaged_reanalysis",
        "variable": "2m_temperature",
        "year": list(range(1961, 1991)),
        "month": "07",
        "time": "00:00",
    },
)

era5_2024_data = ekd.from_source(
    "cds",
    "reanalysis-era5-single-levels-monthly-means",
    {
        "product_type": "monthly_averaged_reanalysis",
        "variable": "2m_temperature",
        "year": 2024,
        "month": "07",
        "time": "00:00",
    },
)

2026-04-15 17:57:47,764 INFO Request ID is e06287ea-b2db-4860-be5b-0578e2ecc096
2026-04-15 17:57:47,811 INFO status has been updated to accepted
2026-04-15 17:58:01,276 INFO status has been updated to running
2026-04-15 17:58:20,369 INFO status has been updated to successful
2026-04-15 17:58:22,632 INFO Request ID is a985f61a-b5f0-46b0-baaf-33e106636f46
2026-04-15 17:58:22,690 INFO status has been updated to accepted
2026-04-15 17:58:43,808 INFO status has been updated to successful

1.2 HADCRUT5¶

The HADCRUT5 dataset is available on the Met Office’s dedicated website. We can access the gridded monthly dataset directly from a URL.

[3]:

HADCRUT5_URL = "https://www.metoffice.gov.uk/hadobs/hadcrut5/data/HadCRUT.5.0.2.0/analysis/HadCRUT.5.0.2.0.analysis.anomalies.ensemble_mean.nc"
hadcrut5_data = ekd.from_source("url", HADCRUT5_URL)

2. Data analysis¶

3. Visualisation¶

We can use earthkit-plots to visualise these two datasets using the same principles.

First, let’s design a suitable style for these anomalies.

[9]:

style = ekp.styles.Contour(
    colors=[
        "#1B2C62",
        "#1F4182",
        "#2355A1",
        "#3978BB",
        "#519BD2",
        "#71B8E4",
        "#91D1F2",
        "#B0E1F8",
        "#CBEBF9",
        "#E3F4FB",
        "#F5FBFE",
        "#FEFBEA",
        "#FDF2BC",
        "#FCE18A",
        "#FDC659",
        "#FDA731",
        "#F9872D",
        "#F26429",
        "#E34128",
        "#D01F27",
        "#B31A21",
        "#921519",
    ],
    levels=range(-7, 8),
    ticks=range(-7, 8),
    extend="both",
    # the data is in Kelvin but we want to show celsius in the legend
    # if we use the `units` key it will attempt to convert units to C
    # units_label lets us just override the label
    units_label="°C",
)

Now we can plot them!

[10]:

import cartopy.crs as ccrs

figure = ekp.Figure(crs=ccrs.Robinson(), rows=1, columns=2, size=(8, 5.5))

# We can throw both datasets at the figure and it will iterate over subplots to plot them
figure.pcolormesh([hadcrut5_july_anomaly, era5_july_anomaly], style=style)

figure.coastlines(resolution="low")
figure.borders(resolution="low")

figure.legend(label="temperature anomaly ({units})")

# Add titles
figure[0].title("HADCRUT5")
figure[1].title("ERA5")
# We can use the "time" key once here as it should have the same value for ERA5 and HADCRUT5
figure.title("Global temperature anomaly during {time:%B %Y}", fontsize=15)

# Add shading to emphasise the missing data in HADCRUT5
# We can directly access the underlying matplotlib objects to do this
x = [-180, -180, 180, 180, -180]
y = [-90, 90, 90, -90, -90]
figure[0].ax.fill(
    x, y, transform=ccrs.PlateCarree(), hatch="///////", fill=False, zorder=0
)

figure.show()

/var/folders/td/yqnxcqpx39dc855vwjtv5hj40000gn/T/ipykernel_6793/2184806048.py:3: DeprecationWarning: The 'size' argument is deprecated and will be removed in a future release. Use 'figsize' instead.
  figure = ekp.Figure(crs=ccrs.Robinson(), rows=1, columns=2, size=(8, 5.5))

<Figure size 800x550 with 0 Axes>

../_images/tutorials_comparing-models-solutions_18_2.png

4. Repeat for another month and/or year¶

We will need to do a new retrieval of ERA5 from the CDS. This time, we need all the Januaries from 1961-1991 as our reference period, and January 2025 for our anomaly.

[11]:

era5_reference_data = ekd.from_source(
    "cds",
    "reanalysis-era5-single-levels-monthly-means",
    {
        "product_type": "monthly_averaged_reanalysis",
        "variable": "2m_temperature",
        "year": list(range(1961, 1991)),
        "month": "01",
        "time": "00:00",
    },
)

era5_2025_data = ekd.from_source(
    "cds",
    "reanalysis-era5-single-levels-monthly-means",
    {
        "product_type": "monthly_averaged_reanalysis",
        "variable": "2m_temperature",
        "year": 2025,
        "month": "01",
        "time": "00:00",
    },
)

2026-04-15 17:59:02,175 INFO Request ID is 116415fe-020d-47db-9859-339d903c35a5
2026-04-15 17:59:02,219 INFO status has been updated to accepted
2026-04-15 17:59:23,385 INFO status has been updated to running
2026-04-15 17:59:34,902 INFO status has been updated to successful
2026-04-15 17:59:37,113 INFO Request ID is 3a223fb7-1984-4a25-a304-423c87291598
2026-04-15 17:59:37,166 INFO status has been updated to accepted
2026-04-15 17:59:58,329 INFO status has been updated to successful

Now let’s do the anomaly calculations:

[12]:

reference = era5_reference_data.to_xarray()
jan_2025 = era5_2025_data.to_xarray(ensure_dims="forecast_reference_time")

climatology = ekt.climatology.mean(reference, frequency="month")
era5_jan_anomaly = ekt.climatology.anomaly(jan_2025, climatology)

For HADCRUT5, we simply need to extract January 2025 from the dataset:

[13]:

hadcrut5_jan_anomaly = hadcrut5.tas_mean.sel(time="2025-01-16")

And finally, we can plot this data using the exact same code as before (we don’t even need to update the title, as the date is extracted from the metadata!).

[14]:

figure = ekp.Figure(crs=ccrs.Robinson(), rows=1, columns=2, size=(8, 5.5))

# We can throw both datasets at the figure and it will iterate over subplots to plot them
figure.pcolormesh([hadcrut5_jan_anomaly, era5_jan_anomaly], style=style)

figure.coastlines(resolution="low")
figure.borders(resolution="low")

figure.legend(label="temperature anomaly ({units})")

# Add titles
figure[0].title("HADCRUT5")
figure[1].title("ERA5")
# We can use the "time" key once here as it should have the same value for ERA5 and HADCRUT5
figure.title("Global temperature anomaly during {time:%B %Y}", fontsize=15)

# Add shading to emphasise the missing data in HADCRUT5
# We can directly access the underlying matplotlib objects to do this
x = [-180, -180, 180, 180, -180]
y = [-90, 90, 90, -90, -90]
figure[0].ax.fill(
    x, y, transform=ccrs.PlateCarree(), hatch="///////", fill=False, zorder=0
)

figure.show()

/var/folders/td/yqnxcqpx39dc855vwjtv5hj40000gn/T/ipykernel_6793/3158689154.py:1: DeprecationWarning: The 'size' argument is deprecated and will be removed in a future release. Use 'figsize' instead.
  figure = ekp.Figure(crs=ccrs.Robinson(), rows=1, columns=2, size=(8, 5.5))

<Figure size 800x550 with 0 Axes>

../_images/tutorials_comparing-models-solutions_26_2.png

5. Zoomed-in version of this comparison over Europe¶

We can do this by simply replacing the crs argument with a domain argument.

[15]:

figure = ekp.Figure(domain="Europe", rows=1, columns=2, size=(8, 5.5))

# We can throw both datasets at the figure and it will iterate over subplots to plot them
figure.pcolormesh([hadcrut5_jan_anomaly, era5_jan_anomaly], style=style)

figure.coastlines(resolution="medium")
figure.borders(resolution="medium")

figure.legend(label="temperature anomaly ({units})")

# Add titles
figure[0].title("HADCRUT5")
figure[1].title("ERA5")
# We can use the "time" key once here as it should have the same value for ERA5 and HADCRUT5
figure.title("Global temperature anomaly during {time:%B %Y}", fontsize=15)

# Add shading to emphasise the missing data in HADCRUT5
# We can directly access the underlying matplotlib objects to do this
x = [-180, -180, 180, 180, -180]
y = [-90, 90, 90, -90, -90]
figure[0].ax.fill(
    x, y, transform=ccrs.PlateCarree(), hatch="///////", fill=False, zorder=0
)

figure.show()

/var/folders/td/yqnxcqpx39dc855vwjtv5hj40000gn/T/ipykernel_6793/2255799421.py:1: DeprecationWarning: The 'size' argument is deprecated and will be removed in a future release. Use 'figsize' instead.
  figure = ekp.Figure(domain="Europe", rows=1, columns=2, size=(8, 5.5))

<Figure size 800x550 with 0 Axes>

../_images/tutorials_comparing-models-solutions_28_2.png