# Xarray Fundamentals


## Learning Objectives

- Provide an overview of xarray
- Describe the core xarray data structures, the DataArray and the Dataset, and
  the components that make them up
- Load xarray dataset from a netCDF file
- View and set attributes


## What Is Xarray?

- Unlabeled, N-dimensional arrays of numbers (e.g., NumPyâ€™s ndarray) are the
  most widely used data structure in scientific computing. However, they lack a
  meaningful representation of the metadata associated with their data.
  Implementing such functionality is left to individual users and
  domain-specific packages.

- xarray expands on the capabilities of NumPy arrays, providing a lot of
  streamline data manipulation.

- Xarray's interface is based largely on the netCDF data model (variables,
  attributes, and dimensions), but it goes beyond the traditional netCDF
  interfaces to provide functionality similar to netCDF-java's Common Data Model
  (CDM).

- xarray is motivated by weather and climate use cases but is **domain
  agnostic**...


## Core Data Structures

- xarray has 2 fundamental data structures:

  - `DataArray`, which holds single multi-dimensional variables and its
    coordinates
  - `Dataset`, which holds multiple variables that potentially share the same
    coordinates

![](../../images/xarray-data-structures.png)


### Loading data from netCDF

- NetCDF (network Common Data Form) is a file format for storing
  multidimensional array data
- NetCDF is a self-describing, meaning that a netCDF file includes information
  about the data it contains, and the necessary metadata such as coordinate
  system used, attributes describing the data, etc...
- NetCDF is used extensively in the geoscience communities
- Xarray's interface is based largely on the netCDF data model

Learn more about netCDF
[here](https://docs.unidata.ucar.edu/netcdf-c/current/faq.html#What-Is-netCDF).


In [None]:
import xarray as xr

%config InlineBackend.figure_format='retina'

In [None]:
# Load mean sea surface temperature dataset
ds = xr.open_dataset("../../data/sst.mnmean.nc", engine="netcdf4")

# xarray's HTML representation
ds

In [None]:
# If you prefer a text based repr, you can set the display_style='text' by uncommenting the line below
# xr.set_options(display_style="text")

In [None]:
# Look at the netCDF representation
ds.info()

### `Dataset`

- Xarray's `Dataset` is a dict-like container of labeled arrays (`DataArrays`)
  with aligned dimensions. - It is designed as an in-memory representation of a
  netCDF dataset.
- In addition to the dict-like interface of the dataset itself, which can be
  used to access any `DataArray` in a `Dataset`. Datasets have the following key
  properties:

| Attribute   | Description                                                                                                                              |
| ----------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `data_vars` | OrderedDict of `DataArray` objects corresponding to data variables.                                                                      |
| `dims`      | dictionary mapping from dimension names to the fixed length of each dimension (e.g., {`lat`: 6, `lon`: 6, `time`: 8}).                   |
| `coords`    | a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings) |
| `attrs`     | OrderedDict to hold arbitrary metadata pertaining to the dataset.                                                                        |


In [None]:
# variables are in our dataset
ds.data_vars

In [None]:
# select one variable and pick the first entry along the first axis (time)
ds.sst[0]

In [None]:
# Plot one timestep
ds.sst[0].plot();

In [None]:
# dataset dimensions
ds.dims

In [None]:
# dataset coordinates
ds.coords

In [None]:
# dataset global attributes
ds.attrs

### `DataArray`

The DataArray is xarray's implementation of a labeled, multi-dimensional array.
It has several key properties:

| Attribute | Description                                                                                                                              |
| --------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `data`    | `numpy.ndarray` or `dask.array` holding the array's values.                                                                              |
| `dims`    | dimension names for each axis. For example:(`x`, `y`, `z`) (`lat`, `lon`, `time`).                                                       |
| `coords`  | a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings) |
| `attrs`   | an `OrderedDict` to hold arbitrary attributes/metadata (such as units)                                                                   |
| `name`    | an arbitrary name of the array                                                                                                           |


In [None]:
# Extract the sst Variable/DataArray
ds["sst"]  # Equivalent to ds.sst

In [None]:
# The actual (numpy) array data
ds.sst.data

In [None]:
# DataArray/Variable dimensions
ds.sst.dims

In [None]:
# DataArray/Variable coordinates
ds.sst.coords

In [None]:
# DataArray/Variable attributes
ds.sst.attrs

## Coordinates vs dimensions


- DataArray objects inside a Dataset may have any number of dimensions but are
  presumed to share a common coordinate system.
- Coordinates can also have any number of dimensions but denote
  constant/independent quantities, unlike the varying/dependent quantities that
  belong in data.
- A dimension is just a name of an axis, like "time"


In [None]:
ds.dims

In [None]:
ds.coords

In [None]:
# extracting a coordinate variable
ds.sst.lon

In [None]:
# extracting a coordinate variable from .coords
ds.coords["time"]

## Attributes

Attributes can be used to store metadata. What metadata should you store? It
depends on your domain and your needs


In [None]:
# Look at global attributes
ds.attrs

In [None]:
# Look at variable specific attributes
ds.sst.attrs

In [None]:
# Set some arbitrary attribute on a data Variable/DataArray
ds.sst.attrs["my_custom_attribute"] = "Foo Bar"
ds.sst.attrs

## Going Further

- Xarray Documentation on Data Structures:
  https://docs.xarray.dev/en/stable/data-structures.html
- Xarray Documentation on Reading files and writing files:
  https://docs.xarray.dev/en/stable/user-guide/io.html


<div class="alert alert-block alert-success">
  <p>Next: <a href="02_indexing.ipynb">Indexing</a></p>
</div>
