Conveners
Training: Federating and serving distributed data for computational users with EGI DataHub
- Lukasz Dutka (CYFRONET)
- Andrea Manzi (EGI.eu)
Description
EGI runs a 'DataHub service' based on the Onedata technology from CYFRONET. DataHub is a high-performance data management solution that offers unified data access across globally distributed environments and multiple types of underlying storage. It allows researchers to share, collaborate and perform computations on the stored data easily.
Users can bring data close to their community or to the compute facilities they use, in order to exploit it efficiently. This is as simple as selecting which (subset of the) data should be available at which supporting provider.
This tutorial will show to users and scientific communities how to publish, share, discover and reuse data with the EGI DataHub service.
The main features of DataHub are:
- Discovery of data spaces via a central portal.
- Policy based data access.
- Replication of data across providers for resiliency and availability purposes.
- Integration with EGI Check-in allows access using community credentials, including from other EGI services and components.
- File catalog to track replication of data and manage logical and physical files.
With the EGI DataHub communities can implement various access policies for the data they share:
- Unauthenticated, open access
- Access after user registration or
- Access restricted to members of a scientific community
In this tutorial the EGI DataHub fundamentals will be presented and will be shown a live user oriented demo.
Target audience: This tutorial is designed for scientific communities, and IT-service providers who are interested to elaborate big datasets in a hybrid cloud scenarios
Agenda:
Intro
- Basics - Onedata 101
- Intro to EGI DataHub
- DataHub current features
Onedata roadmap for Datahub
- Directory capacity
- Data archive
Hands On
- Using the Web interface in EGI DataHub