Data from different sources

In this notebook, we will compare global temperature anomalies from two datasets:

  • ERA5 reanalysis from ECMWF, available through the Climate Data Store.

  • HADCRUT5, a dataset of global historical surface temperature anomalies from the UK Met Office, available on their website as a direct download.

We will then compare the anomalies from each of these datasets for July 2024 - one of the hottest months on record globally.

The bigger picture

This notebook demonstrates that we can get data from:

  • Two different institutions (ECMWF and the UK Met Office)

  • From two different source (CDS and URL)

  • In two different formats (GRIB and netCDF)

  • And that the earthkit ecosystem will treat them as “equal citizens”, with earthkit tools working with the same API in both cases.

Components of earthkit

This tutorial uses the following earthkit components - click any logo to open the package documentation:

earthkit-data earthkit-transforms earthkit-plots

We first import all of the earthkit packages required for the tutorial.

[1]:
import earthkit.data as ekd
import earthkit.transforms as ekt
import earthkit.plots as ekp

1. Getting the data

Let’s do a comparison of the temperature anomalies for July 2024 (one of the hottest months on record globally) from ERA5 and HADCRUT5.

1.1 ERA5

In order to get temperature anomalies for July 2024 from ERA5, we need to acces the ERA5 monthly averaged reanalysis dataset from the CDS.

In order to access ERA5 renalysis data, you will need an account on the Copernicus Climate Data Store (CDS). If you do not have an account, please visit the CDS website and register for an account. Then, follow these instructions (step 1 only) to set up your API key.

HADCRUT5 data is an anomaly against the 1961-1991 average, so we need to retrieve:

  • Data for every July from 1961-1991. This is our reference period.

  • Data for July 2024, to calculate the anomaly.

[2]:
era5_reference_data = ekd.from_source(
    "cds", "reanalysis-era5-single-levels-monthly-means",
    {
        "product_type": "monthly_averaged_reanalysis",
        "variable": "2m_temperature",
        "year": list(range(1961, 1991)),
        "month": "07",
        "time": "00:00",
    },
)

era5_2024_data = ekd.from_source(
    "cds", "reanalysis-era5-single-levels-monthly-means",
    {
        "product_type": "monthly_averaged_reanalysis",
        "variable": "2m_temperature",
        "year": 2024,
        "month": "07",
        "time": "00:00",
    },
)
2026-04-15 17:57:47,764 INFO Request ID is e06287ea-b2db-4860-be5b-0578e2ecc096
2026-04-15 17:57:47,811 INFO status has been updated to accepted
2026-04-15 17:58:01,276 INFO status has been updated to running
2026-04-15 17:58:20,369 INFO status has been updated to successful
2026-04-15 17:58:22,632 INFO Request ID is a985f61a-b5f0-46b0-baaf-33e106636f46
2026-04-15 17:58:22,690 INFO status has been updated to accepted
2026-04-15 17:58:43,808 INFO status has been updated to successful

1.2 HADCRUT5

The HADCRUT5 dataset is available on the Met Office’s dedicated website. We can access the gridded monthly dataset directly from a URL.

[3]:
HADCRUT5_URL = "https://www.metoffice.gov.uk/hadobs/hadcrut5/data/HadCRUT.5.0.2.0/analysis/HadCRUT.5.0.2.0.analysis.anomalies.ensemble_mean.nc"
hadcrut5_data = ekd.from_source("url", HADCRUT5_URL)

2. Data analysis

2.1 ERA5

Now we need to do some analysis with earthkit-transforms to calculate the ERA5 temperature anomalies.

First, let’s convert the GRIB data that we retrieved from the CDS to xarray.

[4]:
reference = era5_reference_data.to_xarray()
july_2024 = era5_2024_data.to_xarray(ensure_dims="forecast_reference_time")

Now let’s calculate a climatology from the reference data, and use it to calculate an anomaly for July 2024.

[5]:
climatology = ekt.climatology.mean(reference, frequency="month")
climatology
[5]:
<xarray.Dataset> Size: 8MB
Dimensions:    (month: 1, latitude: 721, longitude: 1440)
Coordinates:
  * latitude   (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * longitude  (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
  * month      (month) int64 8B 7
Data variables:
    2t         (month, latitude, longitude) float64 8MB 274.0 274.0 ... 218.5
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

We calculate anomalies by finding the difference between some data at a point in time against the long-term average. Our climatology is our long-term average, and earthkit-transforms provides an anomaly method for conveniently calculating the difference between the two.

[6]:
era5_july_anomaly = ekt.climatology.anomaly(july_2024, climatology)

2.2 HADCRUT5

As for HADCRUT5, this data is already anomalies - so we just need to extract July 2024.

[7]:
hadcrut5 = hadcrut5_data.to_xarray()
hadcrut5
[7]:
<xarray.Dataset> Size: 44MB
Dimensions:           (time: 2102, latitude: 36, longitude: 72, bnds: 2)
Coordinates:
  * time              (time) datetime64[ns] 17kB 1850-01-16T12:00:00 ... 2025...
  * latitude          (latitude) float64 288B -87.5 -82.5 -77.5 ... 82.5 87.5
  * longitude         (longitude) float64 576B -177.5 -172.5 ... 172.5 177.5
    realization       int64 8B ...
Dimensions without coordinates: bnds
Data variables:
    tas_mean          (time, latitude, longitude) float64 44MB ...
    time_bnds         (time, bnds) datetime64[ns] 34kB ...
    latitude_bnds     (latitude, bnds) float64 576B ...
    longitude_bnds    (longitude, bnds) float64 1kB ...
    realization_bnds  (bnds) int64 16B ...
Attributes:
    comment:      2m air temperature over land blended with sea water tempera...
    history:      Data set built at: 2025-04-30T17:33:58+00:00
    institution:  Met Office Hadley Centre / Climatic Research Unit, Universi...
    licence:      HadCRUT5 is licensed under the Open Government Licence v3.0...
    reference:    C. P. Morice, J. J. Kennedy, N. A. Rayner, J. P. Winn, E. H...
    source:       CRUTEM.5.0.2.0 HadSST.4.0.1.0
    title:        HadCRUT.5.0.2.0 blended land air temperature and sea-surfac...
    version:      HadCRUT.5.0.2.0
    Conventions:  CF-1.7
[8]:
hadcrut5_july_anomaly = hadcrut5.tas_mean.sel(time="2024-07-16")
hadcrut5_july_anomaly
[8]:
<xarray.DataArray 'tas_mean' (time: 1, latitude: 36, longitude: 72)> Size: 21kB
[2592 values with dtype=float64]
Coordinates:
  * time         (time) datetime64[ns] 8B 2024-07-16T12:00:00
  * latitude     (latitude) float64 288B -87.5 -82.5 -77.5 ... 77.5 82.5 87.5
  * longitude    (longitude) float64 576B -177.5 -172.5 -167.5 ... 172.5 177.5
    realization  int64 8B ...
Attributes:
    long_name:     blended air_temperature_anomaly over land with sea_water_t...
    units:         K
    cell_methods:  area: mean (interval: 5.0 degrees_north 5.0 degrees_east) ...

3. Visualisation

We can use earthkit-plots to visualise these two datasets using the same principles.

First, let’s design a suitable style for these anomalies.

[9]:
style = ekp.styles.Contour(
    colors=[
        "#1B2C62", "#1F4182", "#2355A1", "#3978BB", "#519BD2", "#71B8E4",
        "#91D1F2", "#B0E1F8", "#CBEBF9", "#E3F4FB", "#F5FBFE", "#FEFBEA",
        "#FDF2BC", "#FCE18A", "#FDC659", "#FDA731", "#F9872D", "#F26429",
        "#E34128", "#D01F27", "#B31A21", "#921519",
    ],
    levels=range(-7, 8),
    ticks=range(-7, 8),
    extend="both",
    # the data is in Kelvin but we want to show celsius in the legend
    # if we use the `units` key it will attempt to convert units to C
    # units_label lets us just override the label
    units_label="°C",
)

Now we can plot them!

[10]:
import cartopy.crs as ccrs

figure = ekp.Figure(crs=ccrs.Robinson(), rows=1, columns=2, size=(8, 5.5))

# We can throw both datasets at the figure and it will iterate over subplots to plot them
figure.pcolormesh([hadcrut5_july_anomaly, era5_july_anomaly], style=style)

figure.coastlines(resolution="low")
figure.borders(resolution="low")

figure.legend(label="temperature anomaly ({units})")

# Add titles
figure[0].title("HADCRUT5")
figure[1].title("ERA5")
# We can use the "time" key once here as it should have the same value for ERA5 and HADCRUT5
figure.title("Global temperature anomaly during {time:%B %Y}", fontsize=15)

# Add shading to emphasise the missing data in HADCRUT5
# We can directly access the underlying matplotlib objects to do this
x = [-180, -180, 180, 180, -180]
y = [-90, 90, 90, -90, -90]
figure[0].ax.fill(x, y, transform=ccrs.PlateCarree(), hatch="///////", fill=False, zorder=0)

figure.show()
/var/folders/td/yqnxcqpx39dc855vwjtv5hj40000gn/T/ipykernel_6793/2184806048.py:3: DeprecationWarning: The 'size' argument is deprecated and will be removed in a future release. Use 'figsize' instead.
  figure = ekp.Figure(crs=ccrs.Robinson(), rows=1, columns=2, size=(8, 5.5))
<Figure size 800x550 with 0 Axes>
../_images/tutorials_comparing-models-solutions_18_2.png

4. Repeat for another month and/or year

We will need to do a new retrieval of ERA5 from the CDS. This time, we need all the Januaries from 1961-1991 as our reference period, and January 2025 for our anomaly.

[11]:
era5_reference_data = ekd.from_source(
    "cds", "reanalysis-era5-single-levels-monthly-means",
    {
        "product_type": "monthly_averaged_reanalysis",
        "variable": "2m_temperature",
        "year": list(range(1961, 1991)),
        "month": "01",
        "time": "00:00",
    },
)

era5_2025_data = ekd.from_source(
    "cds", "reanalysis-era5-single-levels-monthly-means",
    {
        "product_type": "monthly_averaged_reanalysis",
        "variable": "2m_temperature",
        "year": 2025,
        "month": "01",
        "time": "00:00",
    },
)
2026-04-15 17:59:02,175 INFO Request ID is 116415fe-020d-47db-9859-339d903c35a5
2026-04-15 17:59:02,219 INFO status has been updated to accepted
2026-04-15 17:59:23,385 INFO status has been updated to running
2026-04-15 17:59:34,902 INFO status has been updated to successful
2026-04-15 17:59:37,113 INFO Request ID is 3a223fb7-1984-4a25-a304-423c87291598
2026-04-15 17:59:37,166 INFO status has been updated to accepted
2026-04-15 17:59:58,329 INFO status has been updated to successful

Now let’s do the anomaly calculations:

[12]:
reference = era5_reference_data.to_xarray()
jan_2025 = era5_2025_data.to_xarray(ensure_dims="forecast_reference_time")

climatology = ekt.climatology.mean(reference, frequency="month")
era5_jan_anomaly = ekt.climatology.anomaly(jan_2025, climatology)

For HADCRUT5, we simply need to extract January 2025 from the dataset:

[13]:
hadcrut5_jan_anomaly = hadcrut5.tas_mean.sel(time="2025-01-16")

And finally, we can plot this data using the exact same code as before (we don’t even need to update the title, as the date is extracted from the metadata!).

[14]:
figure = ekp.Figure(crs=ccrs.Robinson(), rows=1, columns=2, size=(8, 5.5))

# We can throw both datasets at the figure and it will iterate over subplots to plot them
figure.pcolormesh([hadcrut5_jan_anomaly, era5_jan_anomaly], style=style)

figure.coastlines(resolution="low")
figure.borders(resolution="low")

figure.legend(label="temperature anomaly ({units})")

# Add titles
figure[0].title("HADCRUT5")
figure[1].title("ERA5")
# We can use the "time" key once here as it should have the same value for ERA5 and HADCRUT5
figure.title("Global temperature anomaly during {time:%B %Y}", fontsize=15)

# Add shading to emphasise the missing data in HADCRUT5
# We can directly access the underlying matplotlib objects to do this
x = [-180, -180, 180, 180, -180]
y = [-90, 90, 90, -90, -90]
figure[0].ax.fill(x, y, transform=ccrs.PlateCarree(), hatch="///////", fill=False, zorder=0)

figure.show()
/var/folders/td/yqnxcqpx39dc855vwjtv5hj40000gn/T/ipykernel_6793/3158689154.py:1: DeprecationWarning: The 'size' argument is deprecated and will be removed in a future release. Use 'figsize' instead.
  figure = ekp.Figure(crs=ccrs.Robinson(), rows=1, columns=2, size=(8, 5.5))
<Figure size 800x550 with 0 Axes>
../_images/tutorials_comparing-models-solutions_26_2.png

5. Zoomed-in version of this comparison over Europe

We can do this by simply replacing the crs argument with a domain argument.

[15]:
figure = ekp.Figure(domain="Europe", rows=1, columns=2, size=(8, 5.5))

# We can throw both datasets at the figure and it will iterate over subplots to plot them
figure.pcolormesh([hadcrut5_jan_anomaly, era5_jan_anomaly], style=style)

figure.coastlines(resolution="medium")
figure.borders(resolution="medium")

figure.legend(label="temperature anomaly ({units})")

# Add titles
figure[0].title("HADCRUT5")
figure[1].title("ERA5")
# We can use the "time" key once here as it should have the same value for ERA5 and HADCRUT5
figure.title("Global temperature anomaly during {time:%B %Y}", fontsize=15)

# Add shading to emphasise the missing data in HADCRUT5
# We can directly access the underlying matplotlib objects to do this
x = [-180, -180, 180, 180, -180]
y = [-90, 90, 90, -90, -90]
figure[0].ax.fill(x, y, transform=ccrs.PlateCarree(), hatch="///////", fill=False, zorder=0)

figure.show()
/var/folders/td/yqnxcqpx39dc855vwjtv5hj40000gn/T/ipykernel_6793/2255799421.py:1: DeprecationWarning: The 'size' argument is deprecated and will be removed in a future release. Use 'figsize' instead.
  figure = ekp.Figure(domain="Europe", rows=1, columns=2, size=(8, 5.5))
<Figure size 800x550 with 0 Axes>
../_images/tutorials_comparing-models-solutions_28_2.png