(gentle-intro)=
# A gentle introduction

Many, but not all, useful array methods are wrapped by Xarray and accessible
as methods on Xarray objects. For example `DataArray.mean` calls `numpy.nanmean`.
A very common use-case is to apply functions that expect and return NumPy 
(or other array types) on Xarray objects.  For example, this would include all of SciPy's API. 
Applying many of these functions to Xarray object involves a series of repeated steps.
`apply_ufunc` provides a convenient wrapper function that generalizes the steps
involved in applying such functions to Xarray objects.

```{tip}
Xarray uses `apply_ufunc` internally to implement much of its API, meaning that it is quite powerful!
```

Our goals are to learn that `apply_ufunc` automates aspects of applying computation functions that are designed for pure arrays (like numpy arrays) on xarray objects including
- Propagating dimension names, coordinate variables, and (optionally) attributes.
- Handle Dataset input by looping over data variables.
- Allow passing arbitrary positional and keyword arguments


```{tip}
We'll reduce the length of error messages using `%xmode minimal` See the [ipython documentation](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-xmode) for details.
```


## Setup

In [None]:
%xmode minimal

import numpy as np
import xarray as xr

# limit the amount of information printed to screen
xr.set_options(display_expand_data=False)
np.set_printoptions(threshold=10, edgeitems=2)

Let's load a dataset

In [None]:
ds = xr.tutorial.load_dataset("air_temperature")
ds

## A simple example: pure numpy

Simple functions that act independently on each value should work without any
additional arguments. 

Consider the following `squared_error` function

In [None]:
def squared_error(x, y):
    return (x - y) ** 2

````{tip}

This function uses only arithmetic operations. For such simple functions, you can pass Xarray objects directly and receive Xarray objects back.
Try
```python
squared_error(ds.air, 1)
```

We use it here as a very simple example
````

We can apply `squared_error` manually by extracting the underlying numpy array

In [None]:
numpy_result = squared_error(ds.air.data, 1)
numpy_result

To convert this result to a DataArray, we could do it manually

In [None]:
xr.DataArray(
    data=numpy_result,
    # propagate all the Xarray metadata manually
    dims=ds.air.dims,
    coords=ds.air.coords,
    attrs=ds.air.attrs,
    name=ds.air.name,
)

A shorter version uses [DataArray.copy](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.copy.html)

In [None]:
ds.air.copy(data=numpy_result)

```{caution}
Using `DataArray.copy` works for such simple cases but doesn't generalize that well. 

For example, consider a function that removed one dimension and added a new dimension.
```

## apply_ufunc

`apply_ufunc` can handle more complicated functions. Here's how to use it with `squared_error`

In [None]:
xr.apply_ufunc(squared_error, ds.air, 1)

## How does apply_ufunc work?


This line
```python
xr.apply_ufunc(squared_error, ds.air, 1)
```
is equivalent to `squared_error(ds.air.data, 1)` with automatic propagation of xarray metadata like dimension names, coordinate values etc.


To illustrate how `apply_ufunc` works, let us write a small wrapper function. This will let us examine what data is received and returned from the applied function. 

```{tip}
This trick is very useful for debugging
```

In [None]:
def wrapper(x, y):
    print(f"received x of type {type(x)}, shape {x.shape}")
    print(f"received y of type {type(y)}")
    return squared_error(x, y)


xr.apply_ufunc(wrapper, ds.air, 1)

We see that `wrapper` receives the underlying numpy array (`ds.air.data`), and the integer `1`. 

Essentially, `apply_ufunc` does the following:
1. extracts the underlying array data (`.data`), 
2. passes it to the user function, 
3. receives the returned values, and 
4. then wraps that back up as a DataArray

```{tip}
`apply_ufunc` always takes in at least one DataArray or Dataset and returns one DataArray or Dataset
```

## Handling attributes

By default, attributes are omitted since they may now be inaccurate

In [None]:
result = xr.apply_ufunc(wrapper, ds.air, 1)
result.attrs

To propagate attributes, pass `keep_attrs=True`

In [None]:
result = xr.apply_ufunc(wrapper, ds.air, 1, keep_attrs=True)
result.attrs

## Handling datasets

`apply_ufunc` easily handles both DataArrays and Datasets. 

When passed a Dataset, `apply_ufunc` will loop over the data variables and sequentially pass those to `squared_error`.

So `squared_error` always receives a _single_ numpy array.

To illustrate that lets create a new `Dataset` with two arrays. We'll create a new array `air2` that is 2D `time, lat`.

In [None]:
ds2 = ds.copy()
ds2["air2"] = ds2.air.isel(lon=0) ** 2

We see that `wrapper` is called twice

In [None]:
xr.apply_ufunc(wrapper, ds2, 1)

In [None]:
xr.apply_ufunc(squared_error, ds2, 1)

## Passing positional and keyword arguments

```{seealso}
See the Python tutorial on [defining functions](https://docs.python.org/3/tutorial/controlflow.html#defining-functions) for more on positional and keyword arguments.
```

`squared_error` takes two arguments named `x` and `y`.

In `xr.apply_ufunc(squared_error, ds.air, 1)`, the value of `1` for `y` was passed positionally. 

to use the keyword argument form, pass it using the `kwargs` keyword argument to `apply_ufunc`
> kwargs (dict, optional) â€“ Optional keyword arguments passed directly on to call func.

In [None]:
xr.apply_ufunc(squared_error, ds.air, kwargs={"y": 1})