Promote user data to become DestinE data

As a DestinE user you may want to share a self-created dataset with the DestinE community. DestinE Data Lake (DEDL) integrates a mechanism allowing you to share your dataset so that any DEDL user can access it via the HDA API and the DEDL Web Portal.

This document describes how to form new DE data and submit them to be included in the standard DE offering.

Step 1 Contact the DestinE helpdesk support

The first step to bring your data to the DestinE community is to get in touch with the DestinE Help Desk support and to express your wish to share your dataset.

In your helpdesk ticket, describe the dataset and what benefit will it bring to the DestinE community. Important information to include will, among others, contain the following characteristics:

  • general description of the dataset,

  • the temporal and spatial extent of the data,

  • the format of the data,

  • which input data were used to generate this dataset

and so on.

The data lake HDA has the capability to restrict datasets to specific users. You can express the licensing preferences you want to apply to the new dataset in your ticket.

If the dataset is to be available only for specific users, you might also request creation of a new IAM role or roles to describe that user group.

A board will examine your request and respond to you directly in the ticket.

Step 2 Prepare your dataset for ingestion by the DEDL HDA

It is the responsibility of the user to provide a dataset compatible with DEDL HDA.

DEDL HDA expose a STAC API. Hence, the dataset must be added in STAC format. It also means that you need to generate STAC metadata to reference your dataset. To generate those data, you can use PySTAC, which is an open-source Python client developed by STAC community .

The following resources will help you create the STAC metadata associated to your dataset:

The dataset should follow the following organization:

MY_DATASET_ID/
    metadata/
        collection.json
        items/
            ITEM_1_ID.json
            ITEM_2_ID.json
            ...
    data/
        ...

If possible, the data folder should be organized by date: ./year/month/day

Warning

Make sure to upload both the data and the metadata on your DEDL user space. Otherwise, DEDL team will not be able to upload your dataset to the data lake.

To acquire a better perspective of how the data you submitted will be handled, see How to upload DE user-generated data to DE user community,

Step 3 Follow the ticket exchanges

Once the data is ready, the DEDL team will proceed to upload your dataset to the data lake. The team will keep you informed of the current status of the process and will come back to you if there is any issue regarding the data or its format.

Once the data is fully uploaded and available, you will be notified on the ticket and you will be able to access this newly added dataset directly from the DEDL HDA API and the DEDL Web Portal.