GPU-accelerated computations¶
In this notebook you will see how to:
compute an upstream accumulation of precipitation along a river network
change the array backend to use gpu-accelerated array computations
change the array backend for xarray computations also
plot easily the results
Components of earthkit¶
This tutorial uses the following earthkit components - click any logo to open the package documentation:
Note: some of the examples in this notebook requires an optional dependency - the Python package cupy. Prior to running the notebook, install it following the relevant instructions here.
[1]:
import earthkit.data as ekd
import earthkit.hydro as ekh
import earthkit.plots as ekp
1. Load a large dataset¶
First, we load a precipitation dataset on the EFAS domain, and convert it directly to an xarray representation.
[2]:
ds = ekd.from_source("sample", "R06a.nc").to_xarray()
ds
[2]:
<xarray.Dataset> Size: 2GB
Dimensions: (time: 40, lat: 2970, lon: 4530)
Coordinates:
* lat (lat) float64 24kB 72.24 72.22 72.21 72.19 ... 22.79 22.77 22.76
* lon (lon) float64 36kB -25.24 -25.23 -25.21 ... 50.21 50.22 50.24
* time (time) datetime64[ns] 320B 2024-11-14T06:00:00 ... 2024-11-24
Data variables:
R06a (time, lat, lon) float32 2GB ...
Attributes:
history: Wed Mar 8 12:00:44 2023: ncks -O --mk_rec_dmn time area.templa...
NCO: netCDF Operators version 4.9.7 (Homepage = http://nco.sf.net, C...[3]:
# create a smaller dataarray for illustration purposes
da = ds["R06a"].isel(time=slice(0,10))
da
[3]:
<xarray.DataArray 'R06a' (time: 10, lat: 2970, lon: 4530)> Size: 538MB
[134541000 values with dtype=float32]
Coordinates:
* lat (lat) float64 24kB 72.24 72.22 72.21 72.19 ... 22.79 22.77 22.76
* lon (lon) float64 36kB -25.24 -25.23 -25.21 ... 50.21 50.22 50.24
* time (time) datetime64[ns] 80B 2024-11-14T06:00:00 ... 2024-11-16T12:...
Attributes:
units: mm
standard_name: precipitation
long_name: precipitation2. Load a river network¶
Computations with earthkit-hydro require a river network. We load a precomputed network directly - the EFAS version 5 river network.
[4]:
net = ekh.river_network.load("efas", "5")
River network not found in cache (/dev/shm/_tmpdir_.ecm7348.39938383/tmplzbi5hz3_earthkit_hydro/1.3_0dc8123bbf944ff1cb86f41bc7506e891baaa990666d836fc0cf2edd503916db.joblib).
River network loaded, saving to cache (/dev/shm/_tmpdir_.ecm7348.39938383/tmplzbi5hz3_earthkit_hydro/1.3_0dc8123bbf944ff1cb86f41bc7506e891baaa990666d836fc0cf2edd503916db.joblib).
3. Conduct array-level computation using numpy¶
To compute an upstream accumulation in earthkit-hydro, we simply use ekh.upstream module. To begin, we specifically use ekh.upstream.array.sum, which is the array-based upstream summation.
By default, this will use numpy, which is CPU-only. This is our baseline, and we show the timings below.
[5]:
array_np = da.values # convert to array (numpy)
%timeit ekh.upstream.array.sum(net, array_np)
3.41 s ± 40.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4. Conduct same operation with cupy¶
We can also easily leverage GPU to speed up computations. Instead of numpy, a straighforward replacement is to use the cupy library (note - different array backends like torch or jax are also supported). The library is analagous to numpy, but GPU-accelerated.
[6]:
import cupy as cp
net_gpu = ekh.river_network.load("efas", "5").to_device(device="cuda")
array_cp = cp.array(da.values)
%timeit ekh.upstream.array.sum(net_gpu, array_cp)
Loading river network from cache (/dev/shm/_tmpdir_.ecm7348.39938383/tmp_3rj4bfp_earthkit_hydro/1.3_0dc8123bbf944ff1cb86f41bc7506e891baaa990666d836fc0cf2edd503916db.joblib).
173 ms ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
That’s over a 10x speedup!
5. Xarray computation backed by numpy¶
Instead of lower level computations on numpy arrays, we can also compute on higher level data representations such as xarray. The analagous xarray version of the previous example is as follows:
[7]:
small_da = da.isel(time=slice(0,5))
%timeit ekh.upstream.sum(net, small_da)
1.8 s ± 38.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
6. Xarray computation backed by cupy¶
Under the hood, xarray uses numpy by default. However, we can change that easily and use cupy instead to achieve a speedup.
[8]:
small_da_gpu = small_da.copy(data=cp.asarray(small_da.data))
%timeit ekh.upstream.sum(net_gpu, small_da_gpu)
172 ms ± 4.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
7. Plotting¶
To finish, we take a look at what we’ve been computing using earthkit-plots!
[9]:
step0 = small_da.isel(time=0)
upstream_sum = ekh.upstream.sum(net, step0)
[ ]:
style = ekp.styles.Style(
colors="Blues",
levels=[0, 0.5, 1, 2, 5, 10, 50, 100, 500, 1000, 2000, 3000, 4000],
extend="max",
)
chart = ekp.Map()
chart.quickplot(upstream_sum, style=style)
chart.legend(label="{variable_name}")
chart.title("Upstream precipitation at {time:%H:%M on %-d %B %Y}")
chart.coastlines()
chart.gridlines()
chart.show()