19–23 Jun 2023
Novotel Poznań Centrum
Europe/Amsterdam timezone

Onedata hands-on workshop: Workflow Processing, Dataset Management and Archiving

23 Jun 2023, 09:00
1h 30m
Novotel Poznań Centrum

Novotel Poznań Centrum

pl. Andersa 1 61-894 Poznań Poland
Workshop/Training

Speaker

Lukasz Opiola (CYFRONET)

Description

Onedata [1] is a high-performance data management system with a distributed, global infrastructure that enables users to access storage resources worldwide. It supports various use cases ranging from personal data management to data-intensive scientific computations. Onedata has a fully distributed architecture that facilitates the creation of a hybrid-cloud infrastructure with private and commercial cloud resources. Users can collaborate, share, and publish data, as well as perform high-performance computations on distributed data using POSIX-compliant data access applications. The latest Onedata release introduces the integration of a powerful workflow execution engine, which is powered by OpenFaas [2]. This integration enables the creation of complex data processing pipelines that can leverage transparent access to organizationally distributed data. In addition, the new software version offers several new features and improvements that enhance its capabilities in managing distributed datasets throughout their lifecycle.

This hands-on workshop will focus on the latest Onedata release version, 21.02.1. Participants will explore its features through interactive exercises, with a special focus on data processing using automation workflows, distributed dataset management, and archive preservation. Other covered topics will include directory size statistics and the Space Marketplace. The training materials will correspond to the scenarios from the Onedata demonstration, presented during another session. The workshop will be conducted on the Onedata services at EGI DataHub, with the intention of easy reproducibility by EGI users.

Keywords: workflows, distributed dataset, data lifecycle, data access, distributed systems, file sharing.

Acknowledgments: This work was supported in part by 2018-2020’s research funds in the scope of the co-financed international projects framework (project no. 5145/H2020/2020/2).

[1] Onedata project website. https://onedata.org.
[2] OpenFaaS - Serverless Functions Made Simple. https://www.openfaas.com/.

Key Topic Data Spaces

Primary authors

Lukasz Opiola (CYFRONET) Michal Orzechowski Bartosz Kryza Lukasz Dutka

Presentation materials

There are no materials yet.