30 September 2024 to 4 October 2024
Hilton Garden Inn, Lecce, Italy
Europe/Amsterdam timezone

Running Multi-Cloud Workloads on Distributed Datasets with Onedata

2 Oct 2024, 14:30
30m
Hilton Garden Inn, Lecce, Italy

Hilton Garden Inn, Lecce, Italy

Demonstrations & Tutorials Demonstrations & Posters

Speaker

Lukasz Opiola (CYFRONET)

Description

[In a nutshell]

This demo, run on the production EGI DataHub service, will cover a multi-cloud distributed data management scenario. We will showcase how the data can be ingested, managed, processed in a repetitive way using automation workflows, and interacted with using middleware and scientific tools. All that will happen in a distributed environment, underlining Onedata’s capabilities for collaborative data sharing that crosses organizational borders.

Join us to see the latest developments of Onedata: S3 interface, evolved GUI, improved automation workflows, developer tools and lightweight Python client libraries.


[Full abstract]

Onedata continues to evolve with subsequent releases within the 21.02 line, enhancing its capabilities and solidifying its position as a versatile distributed data management system. Key improvements include the rapid development of the automation workflow engine, the maturation of the S3 interface, and powerful enhancements to the web UI for a smoother user experience and greater control over the distributed data.

Apart from that, a significant focus has been put on enhancing the interoperability of the platform. Onedata can be easily integrated as a back-end storage solution for various scientific tools, data processing and analysis platforms, and domain-specific solutions, providing a unified logical view on otherwise highly distributed datasets. This is achieved thanks to the S3, POSIX, and Pythonic data interfaces and tools that enable effortless inclusion of Onedata as a 3rd party solution in CI/CD pipelines. For example, the "demo mode" makes it straightforward to develop and test arbitrary middleware against a fully functional, zero-configuration Onedata backend. With the ability to integrate with SSO and IAM services and reflect the fine-grained federated VO structures, Onedata can serve as a comprehensive data management solution in federated, multi-cloud, and cross-organizational environments. Currently, it's serving this purpose in the ongoing EuroScienceGateway, EUreka3D, and Dome EU-funded projects.

Automation workflows in Onedata can streamline data processing, transformation, and management tasks by automating repetitive actions and running user-defined logic fitted to their requirements. The integrated automation engine runs containerized jobs on a scalable cluster next to the data provider's storage systems. This allows seamless integration of data management and processing steps, allowing for efficient handling of large-scale datasets across distributed environments.

During our demonstration, we will present a comprehensive use case demonstrating Onedata's capabilities in managing and processing distributed data based on the EGI DataHub environment. It will showcase a pipeline that embraces the user's federated identity and VO entitlements, automated data processing workflows, the wide range of Onedata's tools for data management, and interoperability with scientific tools and middleware --- with a special focus on the S3 interface.

Join us for the demo to see how Onedata empowers organizations to manage and process federated and multi-cloud data efficiently, driving collaboration and accelerating scientific discovery.

Topic Data innovations: Data Management/Integration/Exchange

Primary authors

Lukasz Dutka (CYFRONET) Lukasz Opiola (CYFRONET)

Co-authors

Presentation materials

There are no materials yet.