Data Access

Learn how to download and work with data from the DestinE Data Portfolio using the HDA API.

Note

Authentication Required: All data access operations require a valid access token. See Authentication & Quotas for setup instructions.

Overview

The HDA API provides several ways to access data:

  1. Direct download - Get complete files

  2. Streaming access - Work with cloud-optimized formats

  3. Asset-specific access - Download individual components

  4. Bulk operations - Handle multiple files efficiently

Prerequisite info

For more information about the tools and libraries used in this guide see the relevant references:

Download data

Each data item may contains multiple data assets.

EODAG provides a convenient way to search and download data from the HDA API.

It natively supports downloading full products and individual assets, managing authentication and local storage for you.

# Download first product
product = search_results[0]

# Download to full product
downloaded_path = product.download()
print(f"Downloaded to: {downloaded_path}")

# download a specific asset
downloaded_asset_path = product.assets["AOT_10m"].download()
import requests
import os
from urllib.parse import urlparse

# Authentication headers
# Check the Authentication API guide for how to get the access token
headers = {
    "Authorization": f"Bearer {dedl_access_token}"
}

# Get first item from search results
item = next(items) # From your pySTAC search results

# Download all assets
for asset_key, asset in item.assets.items():
    if asset.href:
        response = requests.get(asset.href, headers=headers, stream=True)
        filename = os.path.basename(urlparse(asset.href).path) or f"{asset_key}.data"

        with open(filename, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
        print(f"Downloaded {asset_key}: {filename}")

# Download a specific asset
asset = item.assets["AOT_10m"]
response = requests.get(asset.href, headers=headers, stream=True)
filename = f"{asset_key}.data"

with open(filename, 'wb') as f:
   for chunk in response.iter_content(chunk_size=8192):
      f.write(chunk)
print(f"Downloaded {asset_key}: {filename}")
# Set your access token
# Check the Authentication API guide for how to get the access token
ACCESS_TOKEN="dedl_access_token"

# First, perform a STAC search and save results
curl -H "Authorization: Bearer $ACCESS_TOKEN" \
     -H "Content-Type: application/json" \
     -X POST \
     "https://hda.data.destination-earth.eu/stac/v2/search" \
     -d '{
       "collections": ["EO.ESA.DAT.SENTINEL-2.MSI.L2A"],
       "limit": 1
     }' \
     -o search_results.json

# Extract all asset URLs from the first item using jq
jq -r '.features[0].assets | to_entries[] | "\(.key):\(.value.href)"' search_results.json > asset_urls.txt

# Download all assets
while IFS=':' read -r asset_key asset_url; do
    echo "Downloading $asset_key..."
    curl -H "Authorization: Bearer $ACCESS_TOKEN" \
         -L \
         -o "${asset_key}.data" \
         "$asset_url"
done < asset_urls.txt

# Or download a specific asset (e.g., WVP_10m)
SPECIFIC_ASSET_URL=$(jq -r '.features[0].assets.WVP_10m.href' search_results.json)
curl -H "Authorization: Bearer $ACCESS_TOKEN" \
     -L \
     -o "WVP_10m.data" \
     "$SPECIFIC_ASSET_URL"

# Clean up
rm search_results.json asset_urls.txt

Ordering and downloading ECMWF data

ECMWF datasets in the DestinE Data Lake require an ordering workflow. You submit an order with parameters directly to the collection, monitor the returned STAC item status, and download the asset when ready.

Note

Order Workflow: ECMWF data follows this process: POST order with parameters → monitor STAC item order:status → download asset when succeeded. The STAC item will always contain one asset with the requested data.

EODAG simplifies the ECMWF ordering process by handling the order, polling, and download automatically.

from eodag import EODataAccessGateway

# Configure EODAG
dag = EODataAccessGateway()
dag.set_preferred_provider("dedl")

params = {
    "ecmwf:data_format": "grib",
    "ecmwf:day": 1,
    "ecmwf:month": 1,
    "ecmwf:pressure_level": 30,
    "ecmwf:product_type": "monthly_mean",
    "ecmwf:variable": "carbon_dioxide",
    "ecmwf:year": 2020
}

# Search with ECMWF parameters - this creates the order request
search_results = dag.search(
    productType="CAMS_GREENHOUSE_EGG4_MONTHLY",
    **params
)

# Download the first result
# EODAG will automatically:
# 1. Submit the order with your search parameters
# 2. Poll the order status until completion
# 3. Download the data when ready
downloaded_path = search_results[0].download()
print(f"Downloaded to: {downloaded_path}")
import requests
import time
import json

# Authentication
headers = {"Authorization": f"Bearer {access_token}"}
base_url = "https://hda.data.destination-earth.eu/stac/v2"
collection_id = "EO.ECMWF.DAT.CAMS_GLOBAL_GREENHOUSE_GAS_REANALYSIS_MONTHLY_AV_FIELDS"

# Order parameters
order_params = {
    "data_format": "grib",
    "day": 1,
    "month": 1,
    "pressure_level": 30,
    "product_type": "monthly_mean",
    "variable": "carbon_dioxide",
    "year": 2020
}

# Submit order request with parameters
order_response = requests.post(
    f"{base_url}/collections/{collection_id}/order",
    headers=headers,
    json=order_params
)
order_item = order_response.json()

print(f"Order submitted, initial status: {order_item['properties']['order:status']}")

# Get the self link to poll the STAC item
self_link = next(link["href"] for link in order_item["links"] if link["rel"] == "self")
print(f"Polling URL: {self_link}")

# Monitor order status via STAC item
while True:
    item_response = requests.get(self_link, headers=headers)
    item_data = item_response.json()

    order_status = item_data["properties"].get("order:status")
    print(f"Order status: {order_status}")

    if order_status == "succeeded":
        print("Order completed! Asset is now available for download.")
        break
    elif order_status == "failed":
        print("Order failed")
        raise Exception("Order processing failed")

    time.sleep(30)  # Wait 30 seconds

# Download the asset from the STAC item (ECMWF items always have one asset)
assets = item_data["assets"]
asset_key = list(assets.keys())[0]  # Get the first (and only) asset
asset_url = assets[asset_key]["href"]

print(f"Downloading asset: {asset_key}")
file_response = requests.get(asset_url, headers=headers, stream=True)
filename = f"{asset_key}.data"

with open(filename, 'wb') as f:
    for chunk in file_response.iter_content(chunk_size=8192):
        f.write(chunk)
print(f"Downloaded: {filename}")
# Set your access token
ACCESS_TOKEN="your_access_token_here"
BASE_URL="https://hda.data.destination-earth.eu/stac/v2"

# Order parameters
ORDER_BODY='{
  "data_format": "grib",
  "day": 1,
  "month": 1,
  "pressure_level": 30,
  "product_type": "monthly_mean",
  "variable": "carbon_dioxide",
  "year": 2020
}'

# Submit order request with parameters
ORDER_RESPONSE=$(curl -X POST \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d "$ORDER_BODY" \
  "$BASE_URL/collections/EO.ECMWF.DAT.CAMS_GLOBAL_GREENHOUSE_GAS_REANALYSIS_MONTHLY_AV_FIELDS/order")

# The response is a STAC item with order:status "shipping"
echo "Order submitted, initial status:"
echo "$ORDER_RESPONSE" | jq '.properties["order:status"]'

# Get the self link to poll the STAC item
SELF_LINK=$(echo "$ORDER_RESPONSE" | jq -r '.links[] | select(.rel == "self") | .href')
echo "Polling URL: $SELF_LINK"

# Monitor order status by polling the STAC item
while true; do
  ITEM_RESPONSE=$(curl -H "Authorization: Bearer $ACCESS_TOKEN" "$SELF_LINK")
  ORDER_STATUS=$(echo "$ITEM_RESPONSE" | jq -r '.properties["order:status"]')

  echo "Order status: $ORDER_STATUS"

  if [ "$ORDER_STATUS" = "succeeded" ]; then
    echo "Order completed! Asset is now available for download."
    break
  elif [ "$ORDER_STATUS" = "failed" ]; then
    echo "Order failed"
    exit 1
  fi

  sleep 30  # Wait 30 seconds before checking again
done

# Download the asset from the STAC item (ECMWF items always have one asset)
ASSET_URL=$(echo "$ITEM_RESPONSE" | jq -r '.assets | to_entries[0] | .value.href')
ASSET_KEY=$(echo "$ITEM_RESPONSE" | jq -r '.assets | to_entries[0] | .key')

echo "Downloading asset: $ASSET_KEY"
curl -H "Authorization: Bearer $ACCESS_TOKEN" \
  -L -o "${ASSET_KEY}.data" "$ASSET_URL"

STAC Item Properties for Orders

  • order:status: Current status (shipping = in progress, succeeded = completed, failed = error)

  • Assets appear: Only available when order:status is succeeded

  • Single asset: ECMWF STAC items always contain exactly one asset with the requested data

Order Considerations

  • Monitor the order:status property to track progress

  • Assets are only downloadable when status is succeeded

  • Large orders may take significant time to process

  • Orders may expire after a certain time period

Load data to xarray

Download complete data files to your local system:

EODAG will load data into xarray using lazy loading for cloud native formats, so you can work with large datasets without loading everything into memory at once.

For ECMWF datasets: EODAG automatically handles the ordering workflow - it will order, poll the status, and download the data when calling to_xarray() or download() methods.

Supported formats: All raster formats except .nat (EUMETSAT native format) and CovJSON. This includes GeoTIFF, NetCDF, HDF5, GRIB, Zarr, and cloud-optimized formats like COG. EODAG uses rasterio and other backends to provide comprehensive format support.

# Install eodag-cube
# pip install eodag-cube

product = search_results[0]  # Get first product of a EODAG search results

# Load the full product as a dictionnary of Xarray Datasets
xarray_dict = product.to_xarray()

# Load a single asset data as a Xarray Dataset
asset_xarray = product.assets["AOT_10m"].to_xarray()

Use stackstac or odc-stac libraries to load STAC items directly into xarray datasets with efficient lazy loading and cloud-optimized access.

Supported formats: All GDAL-supported raster formats including GeoTIFF, NetCDF, HDF5, GRIB, COG (Cloud Optimized GeoTIFF), and Zarr. Best performance with cloud-optimized formats like COG and Zarr for remote data access.

import stackstac
import odc.stac
import rasterio
from rasterio.session import AWSSession

# Configure authentication for protected assets
# Both libraries use rasterio for data access, so we configure rasterio's session
headers = {"Authorization": f"Bearer {dedl_access_token}"}

# Create custom session with authentication headers
session = rasterio.session.Session()
session.headers.update(headers)

# Option 1: Using stackstac (good for single-band analysis)
# pip install stackstac
items = list(search.items())  # From your pystac_client search

# Load specific bands as xarray DataArray with authentication
with rasterio.Env(session=session):
    stack = stackstac.stack(
        items,
        assets=["AOT_10m", "WVP_10m"],  # Select specific assets
        resolution=1000,  # Set resolution in meters
    )

# Option 2: Using odc-stac (good for multi-dimensional analysis)
# pip install odc-stac

# Configure odc-stac with authentication
with rasterio.Env(session=session):
    ds = odc.stac.load(
        items,
        bands=["AOT_10m", "WVP_10m"],
        resolution=1000,
        bbox=(-180, -90, 180, 90),  # Optional bbox
    )

# Access individual bands
aot_data = ds.AOT_10m
wvp_data = ds.WVP_10m

Next Steps

Now that you can access data, explore these advanced topics: