30 September 2024 to 4 October 2024
Hilton Garden Inn, Lecce, Italy
Europe/Amsterdam timezone

Leveraging Federated Data Infrastructure for a European Open Web Index

2 Oct 2024, 12:00
15m
San Martino (Hilton Garden Inn)

San Martino

Hilton Garden Inn

Speaker

Mr Mohamad Hayek (Leibniz Supercomputing Centre)

Description

In an era where web search serves as a cornerstone driving the global digital economy, the necessity for an impartial and transparent web index has reached unprecedented levels, not only in Europe but also worldwide. Presently, the landscape is dominated by a select few gatekeepers who provide their web search services with minimal scrutiny from the general populace. Moreover, web data has emerged as a pivotal element in the development of AI systems, particularly Large Language Models. The efficacy of these models is contingent upon both the quantity and calibre of the data available. Consequently, restricted access to web data and search capabilities severely curtails the innovation potential, particularly for smaller innovators and researchers who lack the resources to manage Petabyte Platforms.

In this talk, we present the OpenWebSearch.eu project which is currently developing the core of a European Open Web Index (OWI) as a basis for a new Internet Search in Europe. We mainly focus on the setup of a Federated Data Infrastructure leveraging geographically distributed data and compute resources at top-tier supercomputing centres across Europe. We then detail the use of the LEXIS platform to orchestrate and automate the execution of complex preprocessing and indexing of crawled data at each of the centres. We finally present the effort to adhere to the FAIR data principles and to make the data available to the general public.

Topic Data innovations: Data Analytics, Sensitive Data/FAIR Data

Primary authors

Prof. Michael Granitzer (University of Passau) Mr Mohamad Hayek (Leibniz Supercomputing Centre)

Co-authors

Dr Andreas Wagner (CERN) Dr Martin Golasowski (VSB - Technical University of Ostrava) Dr Megi Sharikadze (Leibniz Supercomputing Centre) Mr Michael Dinzinger (University of Passau) Ms Noor Afshan Fathima (CERN) Mr Saber Zerhoudi (University of Passau) Mr Stavros Moiras (CERN) Dr Stephan Hachinger (Leibniz Supercomputing Centre)

Presentation materials