Promote user data to become DestinE data
As a DestinE user you may want to share a self-created dataset with the DestinE community. DestinE Data Lake (DEDL) integrates a mechanism allowing you to share your dataset so that any DEDL user can access it via the HDA API and the DEDL Web Portal.
This document describes how to form new DestinE data and submit them to be included in the standard DestinE offering.
Step 1 Contact the DestinE Help Desk
The first step to bring your data to the DestinE community is to get in touch with the DestinE Help Desk and to express your wish to share your dataset.
In your DestinE Help Desk ticket, describe the dataset and the benefit it will bring to the DestinE community. Important information to include will, amongst others, contain the following characteristics:
general description of the dataset,
the temporal and spatial extent of the data,
the format of the data,
which input data were used to generate this dataset
and so on.
The data lake HDA has the capability of restricting datasets to specific users. If the dataset is to be available only for specific users, you should specify this, following this a new IAM role or roles may be created to handle this visibility.
With respect to licensing preferences to be applied to the new dataset, please also mention this in your ticket.
A board will examine your request and respond to you directly in the ticket.
Step 2 Prepare your dataset for ingestion by the DEDL HDA
It is the responsibility of the user to provide a dataset compatible with DEDL HDA.
DEDL HDA exposes a STAC API. Hence, the dataset must be added in STAC format. It also means that you need to generate STAC metadata to reference your dataset. To generate this data, you can use PySTAC, which is an open-source Python client developed by STAC community .
The following resources will help you create the STAC metadata associated to your dataset:
STAC community tutorial: create STAC collection with PySTAC.
STAC specifications for in-depth knowledge of the STAC standard.
The dataset should follow the following organization:
MY_DATASET_ID/
metadata/
collection.json
items/
ITEM_1_ID.json
ITEM_2_ID.json
...
data/
...
If possible, the data folder should be organized by date: ./year/month/day
Warning
Make sure to upload both the data and the metadata on your DEDL user space. Otherwise, DEDL team will not be able to upload your dataset to the data lake.
Step 3 Follow the ticket exchanges
Once the data is ready, the DEDL team will proceed to upload your dataset to the data lake. The team will keep you informed of the current status of the process and will come back to you if there is any issue regarding the data or its format.
Once the data is fully uploaded and available, you will be notified on the ticket and you will be able to access this newly added dataset directly from the DEDL HDA API and the DEDL Web Portal