Exploitation System: JupyterLab

AVL JupyterLab user guide#

Basic usage#

This section provides a brief introduction for users to the basic features of the JupyterLab environment as deployed on the AVL system. For more in-depth documentation on the various components, see the links in the ‘Further information’ section.

Logging in and starting the JupyterLab environment#

To use the AVL JupyterLab environment, navigate to https://jupyter.agriculturevlab.eu/ with a web browser (a recent version of Firefox, Chrome, or Safari is recommended).

AVL uses a single sign-on (SSO) system, so if you have already logged in to use another component of the AVL (e.g. the thematic processing system), you will already be automatically logged in to JupyterLab. Otherwise, you will be briefly redirected to the AVL SSO service (provided using Keycloak) to log in. If your Jupyter server is not already running, you may be presented with a menu of user environments to use for your session; the first of these is marked as ‘recommended’, and you should choose this one unless you have a specific need for an alternative environment. After choosing your environment, you will see a progress bar appear for a few seconds while it is started for you. The JupyterLab interface will then appear in your web browser, ready for use.

Choosing a user environment#

If you don't already have a running AVL Jupyter session, you will be presented with a list of available user environments when you log in. A user environment provides a particular combination of pre-installed package versions and settings. The standard AVL environment is updated frequently to add new packages and update existing packages to newer versions. In general, it's best to use the most recent stable environment (which will be marked ‘recommended’). If you have specific needs, you might also choose an older environment version (if you have code that requires specific, older package versions), an experimental version (if you need newer features and can accept the risk of potential problems due to the environment not yet being fully tested), or a special-purpose version (e.g. for a particular course or workshop).

If you have already started your session and need to change environment, you can do this by selecting ‘Hub control panel’ from the ‘File’ menu within JupyterLab. Then click the ‘Stop my server’ button and wait for your current server to shut down. When the ‘Start my server’ button appears, you can click on it to return to the user environment menu.

Listing and opening datasets#

AVL data is available, mostly in Zarr format, in several object storage buckets, which can be accessed via predefined xcube data store objects:

store name	description
`lab_store`	file data in your Jupyter Lab environment (also visible in the file chooser on the left)
`user_store`	Your personal, private object storage. Only you can read and write data here.
`public_store_write`	Your publicly shared object storage. Only you can write to it, but all AVL users can read it.
`public_store_read`	Everyone’s publicly shared object storage. You can read both your own and other users’ publicly shared data here.
`scratch_store`	Insecure, temporary shared storage. All AVL users can read and write freely, and data are deleted automatically after two days.
`data_store`	Pre-processed, standard data sets made available for all users by the AVL project.
`staging_store`	A staging area for the `data_store` store. Data here are migrated to `data_store` once they have been thoroughly tested.
`test_store`	A pre-staging area for `staging_store` and `data_store`. Data here are migrated to `staging_store` after some initial testing.

The datasets in a bucket can be listed using the associated xcube data store:

list(staging_store.get_data_ids())

This will produce a list of dataset identifiers within the store, for example:

path1/path2/dataset1.zarr
path1/path2/dataset2.zarr
path2/dataset3.zarr

A dataset from this list can then be opened using the store object:

cube = staging_store.open_data('path1/path2/dataset1.zarr')

To read data from non-AVL S3 buckets, you can create a new xcube data store:

from xcube.core.store import new_data_store
my_store = new_data_store('s3', root='my-bucket-name',
                          max_depth=10, storage_options=dict(anon=True))
list(my_store.get_data_ids())

A dataset can also be opened directly from an S3 path without instantiating a store object, as below. Note that there should be no trailing slash after the Zarr name.

from xcube.core.dsio import open_cube
cube = open_cube('s3://bucket/path/to/dataset.zarr', s3_kwargs=dict(anon=True))

AVL also includes support for xcube data stores interfacing to other data providers. You can create a store with

my_store = new_data_store(store_id)

Here, store_id is a string specifying the data provider. Supported providers include:

cds: Copernicus climate data store
cciodp: ESA Climate Change Initiative data (from ODP)
ccizarr: ESA Climate Change Initiative data (from Zarr)
sentinelhub: Sentinel Hub
cmems: Copernicus Marine Environment Monitoring Service

See the xcube store example notebooks for more information about how to work with these data stores.

Uploading data#

JupyterLab runs remotely on an AVL server, and can work directly with files stored in your user area on the server. To work with a file stored on your local computer, you must first upload it to the server. You can do this by clicking on the Upload (⇪) icon near the top left of the JupyterLab interface, or simply by dragging the file from your file manager to the file list along the left side of the JupyterLab interface. After upload the file will be directly accessible in the notebook environment.

Creating a notebook#

You can create a new notebook from the JupyterLab File menu (File → New → Notebook). If you are prompted to select a kernel, choose ‘Python 3 (ipykernel). You can also create a notebook by clicking on the ‘Python 3 (ipykernel)’ icon under the heading ‘Notebook’ in the JupyterLab launcher. The new notebook will open in the main part of the JupyterLab interface with an empty input cell at the top, ready for your first input to the Python interpreter.

Importing Python libraries#

The AVL Python environment includes a large number of preinstalled scientific libraries to support common use cases in data processing and analysis of EO and agricultural data. A brief list of these libraries can be found in the software reuse file for the exploitation subsystem. You can view a full and current list of installed packages in the notebook itself by entering this command into an input cell in a notebook:

!mamba list

Installed libraries can be imported using the standard Python import command.

If you require a library that isn’t already installed in AVL, please contact AVL support to request it; in most cases it’s quick and easy to add a new library to the environment. This is the preferred method of adding libraries to AVL, but if you require a library urgently, and if it's available in a conda channel such as conda-forge, you can also install it yourself. For example, to install a package called example_package from the conda-forge distribution channel:

import sys
!mamba install --yes --channel conda-forge --prefix {sys.prefix} example_package

If a package is not available in any conda channel, it can also be installed with pip:

import sys
!{sys.executable} -m pip install example_package

⚠ Note: pip installation should only be used if the package is not available in a conda channel, since it can cause conflicts with the AVL’s existing conda-based package management.

Working with the Jupyter notebook#

For more information, see The JupyterLab Interface in the JupyterLab documentation.

The Jupyter scientific notebook combines features of an interactive terminal environment (like, for instance, the bash or ipython shell) with features of a programmer’s text editor. Within the notebook you can interact with the Python environment by entering commands or expressions; your command history and the associated output is stored and can be edited, re-run, rearranged, annotated, saved, and shared.

You interact with a Jupyter notebook by typing or pasting an expression or command into an input cell. When you press shift-enter or click the ▶ icon above the notebook, the contents of the cell are evaluated by the Python interpreter, and the result is displayed in a new cell below your input – depending on the command this may be text, an image, or an interactive widget. A new input cell is created below the displayed result, ready for your next input.

By clicking the ▸▸ icon, you can run the entire notebook from start to finish – not unlike a traditional Python script, but with the results from every input cell evaluation interleaved into the notebook.

You can also comment and document your notebooks by including cells that contain not Python code but Markdown source. Markdown is a simple markup language which lets you add symbols to plain text to indicate common formatting operations such as headings, bold or italic text, tables, and lists. In addition to Markdown, you can include LaTeX-style mathematical formatting by enclosing text between $ characters. To use an input cell for Markdown rather than code, use the drop-down menu at the top of the notebook on the right and change its setting from ‘Code’ to ‘Markdown’. After editing, press shift-enter as for a code cell; for a Markdown cell the source text will be hidden and the input cell will show the formatted Markdown until it is opened for editing again.

Saving results#

The notebook can write to the server-side storage associated with your AVL account, and any file writing functions in your Python code will write to this area. The resulting files will appear in the file chooser in the left-hand column of the JupyterLab environment. For instance, the following code writes a table to CSV and saves a PNG image of a graph:

import numpy as np
from matplotlib import pyplot as plt

table = np.transpose(np.array([np.arange(10), np.arange(10) ** 2]))
np.savetxt('table.csv', table, delimiter=',')
plt.plot(table[:,0], table[:,1])
plt.savefig('figure.png')

Downloading results#

Saved data files and figures – and the saved notebooks themselves – can be downloaded to your local computer. Right-click on the file in the file chooser at the left and select ‘Download’ from the context menu which appears. Alternatively, select the file in the file chooser, open the ‘File’ menu from the JupyterLab menu bar, and select ‘Download’.

Starting an on-demand compute cluster#

If you are working with larger amounts of data, the computing power available to you in your notebook environment may not be enough; instead, you should create an on-demand dask cluster for processing. You can do this with the command

cluster = new_cluster(name='cluster_demo', n_workers=8)
client = Client(cluster)

After carrying out your cluster computations, shut down the cluster like this:

cluster.close()

For an example of using a cluster in the AVL, see demo notebook 6 (out-of-core computation) in your JupyterLab environment. For more information on cluster-based computation with dask, see the documentation on the dask website.

Logging out#

To log out, select ‘Log out’ from the ‘File’ menu within JupyterLab. Since AVL uses a single sign-on system, this will also log you out of any other AVL components that you’re currently logged in to.

Note that your JupyterLab session will continue in the background even after you have logged out, but will eventually be terminated due to inactivity, usually within an hour or so. If you wish to stop your session explicitly, you can use the hub control panel as described in the ‘Choosing a user environment’ section above.

Further information#

The AVL demo notebooks, which are available in the demo-notebooks folder of your JupyterLab environment. You can also view them online (but not run them) in their GitHub repository
The JupyterLab documentation: an in-depth user guide for the JupyterLab interface.
How to Use JupyterLab: a short introductory video tutorial.
Markdown cells, a guide to writing Markdown in Jupyter notebooks.
The xcube documentation: user guide and API reference for the xcube libraries.
The geoDB documentation: user guide and API reference for the geoDB feature database.