EGI Conference 2020

Europe/Amsterdam
URLs assigned per session (Zoom)

URLs assigned per session

Zoom

Description

The 2020 edition of the EGI conference has ended, we thank you all for your participation. Recorded sessions can be found on our YouTube playlist: go.egi.eu/egi2020videos

Please share your feedback with us: go.egi.eu/egi2020feedback

Please consult the timetable for the full programme and make sure to look at the Zoom room the session will be held. More on Zoom rooms are on the dedicated page (see left bar).

With our theme “Federated infrastructures for connected communities” we aim to bring together science, computing, and (international) collaboration through a diverse and interactive programme.

Are you new to our annual conferences and interested in how EGI can help your team or community in using advanced computing systems? Then the 'New communities' track of the conference is a must-attend for you! The track starts with an introduction to EGI, then goes into details on the four technical areas of EGI: Computing, Data management, Authentication-authorisation, and Data analytics. Make sure to consult the timetable, these sessions are all colored in light blue.

Additionally, we have a co-located event on Thursday 5 November organised by the
EOSC Synergy project, presenting and discussing the Dutch data landscape during a half day workshop. Please find all information on the dedicated subpage and tick the box in the registration form should you be interested in attending.

Also on the 5th (and 6th): FitSM training. Don't forget to register if you're interested in joining!

Please find all information regarding the programme, speakers, posters, registration and other on this conference page.

For further questions, do not hesitate to contact: events@egi.eu.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Registrations are open during the course of the conference.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Contact
    • EGI 101: introduction to EGI Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      The EGI Federation is an international e-Infrastructure set up to provide advanced computing and data analytics services for research and innovation. EGI federates compute resources, support teams and various online services from over 20 countries and international institutes, and make those available for researchers, innovators and educators. This session will give a basic introduction to the EGI federation, covering all the fundamental topics that You need to know about EGI before deep-diving into the conference. The session will include time for Q&A as well.

      Main target audience
      Scientists, scientific community representatives and compute service providers.

      Convener: Gergely Sipos (EGI.eu)
    • 9:30 AM
      Coffee break
    • Opening plenary Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      Officially kicking-off the three day virtual conference! We are grateful for the opportunity to still be able to meet albeit through Zoom. In the opening plenary, EGI Foundation director Tiziana Ferrari will give an update about current activities as well as some future plans - make sure you don’t miss it! Speaking about the future, Arjen van Rijn (EGI Federation Council chair) and Sergio Andreozzi (Head of Strategy Innovation and Communications, EGI Foundation) will tell us all about the strategy for the coming years.
      To conclude the session, Grazyna Piesiewicz, Head of Unit C1: eInfrastructure and Science Cloud, will present the future role of e-Infrastructures in 2021-2027.

      Conveners: Arjen van Rijn (NIKHEF), Tiziana Ferrari (EGI.eu), Sergio Andreozzi (EGI.eu), Liina Munari (European Commission)
      • 1
        Welcome to the conference and EGI federation new members
        Speaker: Arjen van Rijn (NIKHEF)
      • 2
        EGI Federation status and strategy 2021-2024

        In this presentation we will showcase the status and future directions of the EGI Federation addressing the role and future of scientific computing in the coming decade. We will inform the audience about the status of the EGI Federation, our success stories, the status of the EGI strategy implementation, and our collaborations.

        Speakers: Sergio Andreozzi (EGI.eu), Tiziana Ferrari (EGI.eu)
      • 3
        EC future directions for e-Infrastructures
        Speaker: Liina Munari (European Commission)
    • 11:00 AM
      Coffee break
    • Authentication and Authorisation services in EGI - Overview and use cases Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      This session provides an overview of the Check-in security service of EGI that enables users to login and be authorised on different services across EGI, and even beyond the EGI federation.

      Check-in is operated by GRNET for the EGI community and its partners as a central hub to connect federated Identity Providers (IdPs) with service providers (SPs). Check-in allows users to login to services using their preferred IdPs (either institutional or social) in a uniform and easy way. Check-in is connected to all the EGI services, and also used as an authentication-authorisation system in the European Open Science Cloud.

      During this session we will introduce Check-in and its technical components. We will present various options on how Check-in can support research communities. We will also feature a few research communities who will report on their experiences in using Check-in to enable their research workflows in distributed environments.

      The session is of introductory level aiming to serve new communities who want to engage with, and use the EGI Check-in service.

      Main target audience
      Scientists, representatives of scientific communities

      Convener: Valeria Ardizzone (EGI.eu)
    • Container Security: what could possibly go wrong? Room: http://go.egi.eu/zoom4

      Room: http://go.egi.eu/zoom4

      Containers are an emerging technology that is getting more and more popular as an alternative to full virtualization. The flexibility of containers enabled several new approaches in IT, like DevOps. However powerful and attractive the new technology is, we should not ignore security implications that are inherently linked to the principles or main implementations.

      The goal of this training session is to discuss selected aspects related to containers and present potential security threats. We will focus mainly on Docker and its typical use-cases. The main principles, however, will be applicable also for other container technologies. The workshop is not meant as an exhaustive training session covering every possible aspect of the technology. Its purpose is to point out some typical problems that are important to consider, some of which are inspired by real-world security incidents.

      The workshop will be organized as a mixture of technical presentations, interleaved with shorter sessions where attendees will be able to practice the ongoing topics on a couple of hands-on exercises.

      Conveners: Daniel Kouril (CESNET), Michaela Doležalová (Masaryk University)
    • EGI Federation partnerships with research communities, industry and infrastructure providers Room: http://go.egi.eu/zoom2

      Room: http://go.egi.eu/zoom2

      As EGI is a multi-organisation federation there are a number of ways to formally participate. This session is dedicated to providing an update to how key stakeholders can partner with EGI both from research and academia as well as private entities.

      Organisations representing national e-infrastructures join EGI with a mission to provide distributed data and computing services for international user communities. Joining the EGI Federation means becoming part of a well established community serving international research and innovation.

      Regarding industry, the concept/term of Digital Innovation Hub is increasing as a means for describing public-private partnerships such as the EGI business engagement programme, which has been in operation since 2015, but dates back to the mid-2000s. The newly branded EGI Digital Innovation Hub offers an opportunity for private entities to formally engage with EGI to conduct innovation activities such running pilots and proof of concepts, accessing the EGI technical services such as compute as well as taking advantage of the wealth of expertise and networking opportunities that the EGI community offers.

      Attendees will hear directly from organisations regarding their experience and benefits in having partnered with EGI. In addition, there will be dedicated time allocated for discussion and questions.

      Convener: Giuseppe La Rocca (EGI.eu)
    • Experience from Around the World with Federated Clouds and Data: Lessons for a European Cloud Federation Room: http://go.egi.eu/zoom3

      Room: http://go.egi.eu/zoom3

      In 2021-2027, the European Commission will invest in a High Impact Project on European data spaces and federated cloud infrastructures. Research communities have built many successful federations of cloud infrastructure, services and data. What can we learn from them? Research leaders will share their experience and highlight best practices and lessons learned. Panelists will discuss scope, structure and governance, as well as technical, architectural and service management topics.

      Convener: Mark Dietrich (Senior Advisor, EGI.eu)
      • 14
        ENES climate data infrastructure

        In the last decade, the continuous increase in the volume of scientific data has forced a shift in the data analysis approach, leading to the development of a multitude of data analytics platforms and systems capable of handling this data deluge. All these innovations have propelled the community towards the definition of novel virtual environments for efficiently dealing with complex scientific experiments, while abstracting from the underlying infrastructure complexity.
        In this context, the ENES Climate Analytics Service (ECAS) aims to enable scientists to perform data analysis experiments over large multi-dimensional data volumes providing a workflow-oriented, PID-supported, server-side and distributed computing approach. Two instances of ECAS are respectively running at CMCC and DKRZ in the scope of the European Open Science Cloud (EOSC) platform, under the EU H2020 EOSC-Hub project. ECAS builds on top of the Ophidia High Performance Data Analytics framework, which has been integrated with AAI solutions (e.g. EGI Check-in, IAM), data access and sharing services (e.g., EGI DataHub, EUDAT B2DROP/B2SHARE), along with the EGI federated cloud infrastructure.

        The ECASLab virtual environment, based upon ECAS and the JupyterHub service, aims to provide a user-friendly data analytics environment to support scientists in their daily research activities, particularly in the climate change domain, by integrating analysis tools with scientific datasets (e.g., from the ESGF data archive) and computing resources (i.e., Cloud and HPC-based).

        ECAS is one of the platform configurations made available to users from the EGI Applications on Demand (AoD) service. Thanks to the integration into the Elastic Cloud Compute Cluster (EC3) platform, operated by UPV, researchers can very easily deploy and configure a full ECAS environment on the EGI FedCloud. The EC3 service not only takes care of managing the setup and contextualization of the entire ECAS cluster, but also manages the elasticity of the environment by scaling up/down the cluster size on the cloud resources based on the current workload. This integration will effectively support scientists and help advance their research by exploiting a custom ready-to-use environment without the burden of the platform setup. With respect to security and data access, a stronger integration with EGI services will be part of the future work to provide an even smoother experience to ECAS users.
        This talk will present the ECAS environment and the integration activities performed in the context of EOSC-Hub, with a special focus on the integration with the EGI federated cloud infrastructure.

        Speaker: Dr Stephan Kindermann (Deutsches Klimarechenzentrum (DKRZ))
      • 15
        European Weather Cloud – a federated approach
        Speaker: Dr Martin Palkovic (Director of Computing ECMWF)
      • 16
        SeaDataNet: Developments to develop harmonised marine data access from the cloud
        Speaker: Peter Thijsse (Project Manager, Maris BV)
      • 17
        Developing Standards for Genomic Exchange in GA4GH
        Speaker: Dr Rishi Nag (Technical Programme Manager, Global Alliance for Genomics and Health)
      • 18
        Panel discussion
        Speakers: Mark Dietrich, Martin Palkovic (Director of Computing ECMWF), Peter Thijsse (Project Manager Maris BV), Rishi Nag (Technical Programme Manager Global Alliance for Genomics and Health), Stephan Kindermann (DKRZ)
    • Lunch break
    • Achieving a Photon and Neutron community federated cloud in EOSC Room: http://go.egi.eu/zoom3

      Room: http://go.egi.eu/zoom3

      Using Photon and Neutron sources to investigate samples of matter at molecular or atomic level applies to very diverse science disciplines, ranging from chemistry and life science to palaeontology and art history. The scientific communities represented by PaNOSC for European Research Infrastructures (RIs) and ExPaNDS for national RIs is extremely wide ranging, as they embrace over 40,000 researchers, however due to the breadth of subjects it is extremely diverse in the data produced.

      One of the shared high-level objectives of our projects is to give the means to scientists to make the most of this wealth of data, created every year by the PaN community and we identified the EOSC to be the perfect tool for achieving this goal.

      During this session we will present our contributions to:

      Enabling our facilities to produce FAIR data
      Federating our data catalogues on the 'PaN portal'
      Enabling the transfer of large quantities of data
      Sharing knowledge with the open PaN e-learning platform
      Remote access to PaN facilities instruments and services

      Conveners: Patrick Fuhrmann (DESY), Sophie Servan Servan (DESY)
    • Clinic: Authentication-Authorisation services Room: http://go.egi.eu/zoom2

      Room: http://go.egi.eu/zoom2

      This session will provide technical support for existing and new users of the EGI Check-in service and all AAI services to which it is connected, both for user enrollment and management and for the integration of services: VOMS, Perun, COmanage, RCauth and MasterPortal. During the session experts will share technical information, usage tips and tricks about AAI services, and will answer questions from the audience. The session will be interactive - A perfect opportunity to bring questions, and to deep-dive into EGI Authentication and Authorization services!

      EGI Check-in is a proxy service that operates as a central hub to connect federated Identity Providers (IdPs) with Service Providers (SPs). Check-in allows users to select their preferred IdP so that they can access and use EGI, EOSC and other services in a uniform and easy way.

      Main target audience
      Scientists, representatives of scientific communities, software and platform developers.

      Convener: Valeria Ardizzone (EGI.eu)
    • Highlights from EGI participants and partners - Part 1 Room: http://go.egi.eu/zoom4

      Room: http://go.egi.eu/zoom4

      This session offers a space for EGI Federation Participants and Partners to share the latest strategic and infrastructure developments in their domain.

      The first part will focus on three presentations from initiatives in Bulgaria, Italy and Germany.

      Convener: Sergio Andreozzi (EGI.eu)
      • 32
        New developments in the Bulgarian National Centre for High Performance and Distributed Computing

        The Bulgarian National Centre for High Performance and Distributed Computing (NCHDC, http://nchdc.acad.bg/en/) has been established as part of the National Roadmap for Research Infrastructures in 2014 and has continued support in the updated roadmap for 2017-2023. The main resource of the centre is the supercomputer Avitohol, which comprises of 150 servers with dual Intel CPU and dual Intel Xeon Phi, interconnected with non-blocking Infiniband. After significant reconstruction of the datacenter at IICT new hardware and software has been acquired, while more is under way.

        For several years Avitohol provided resources for operational meteorological forecasting, so that prognoses for the atmospheric pollution in Sofia can be made available to citizens and policymakers.
        The centre supports a diverse set of national user communities as well as it forms the cornerstone of the partners’ participation in large-scale international collaborations.
        The most intensive usage comes from scientists working in computational chemistry and drug research, climate modelling and computational physics. Attractive new resources optimized for AI and Big Data processing become operational.

        The long-term strategy for expansion of the capabilities of the centre is supported through a set of national and EU funding schemes, which are complemented with soft-measures.

        E. Atanassov and A. Karaivanova
        Institute of Information and Communication Technologies, Bulgarian Academy of Sciences

        Acad. G. Bonchev Str., bl. 25-A, 1113 Sofia, Bulgaria
        (emanouil,anet)@parallel.bas.bg

        Speaker: Emanouil Atanassov (IICT-BAS)
      • 33
        INFN-Cloud, an easy to use, distributed, user-centric Cloud infrastructure and solutions toolbox

        Following up on 20 years of successful development and operation of the largest Italian research e-infrastructure through the Grid, the Italian National Institute for Nuclear Physics (INFN) recently created INFN-Cloud, an integrated and comprehensive cloud-based set of solutions, delivered through distributed and federated infrastructures. INFN-Cloud consists of two main types of resources: the “INFN-Cloud backbone”, spanning the two main INFN computing sites of CNAF and Bari, and a set of distributed, federated cloud infrastructures connected to the backbone. It provides a large and customizable set of services, ranging from simple IaaS to specialized SaaS solutions, centered through a PaaS layer built upon flexible authentication and authorization services offered via INDIGO-IAM, and optimized resources and services orchestration.

        This talk will describe the INFN-Cloud architecture and implementation. Services offered via INFN-Cloud are instantiated through TOSCA templates. Currently, INFN-Cloud provides a set of about 20 ready-to-use templates that can be used to deploy services to any of its federated cloud resources. This is implemented via a PaaS Layer based on the INDIGO-DataCloud Orchestrator. All services are presented to the users via an easy-to-use web dashboard, but can also be instantiated via a Command Line Interface.

        The INFN-Cloud PaaS Layer also handles the federation of resources, based on a lightweight approach minimizing the technical barriers for joining the INFN-Cloud federation. Using this approach, INFN-Cloud solutions address data-locality, SLAs, auto-scalability and elastic allocation of the resources even in widely distributed environments. In fact, INFN-Cloud can easily federate with other Cloud infrastructures, such as the EGI FedCloud, other research or academic resources or projects in the context of the EOSC, HPC resources, as well as with public Cloud providers, such as Amazon Web Services, and is being proposed as a blueprint to the Italian Cloud and Data Infrastructure (ICDI) to create the Italian national research cloud.

        The INFN-Cloud architecture is designed to exploit highly heterogeneous cloud resources in terms of hardware (including CPUs, GPUs, low-latency networks and disks), cloud technologies (such as OpenStack, Mesos and Kubernetes), deployment models (supporting private and public clouds) and service delivery (supporting generic workloads, as well as GDPR-related and sensitive data processing). The talk will discuss the general technical solutions adopted to implement PaaS-level automation, federation mechanisms, the web dashboard and the services already implemented. More details are provided in other contributions proposed to the EGI conference, tailored to describe some specific operational and technical solutions.

        The talk will also describe the organizational and operational structure of INFN-Cloud, as well as its Rules of Participations, defining the operational and security requirements, policies, processes and procedures that must be implemented by all sites joining the INFN-Cloud federation. These rules aim at implementing service provisioning best practices and common procedures in order to guarantee the high quality of the INFN-Cloud provisioned services.

        Finally, the expected evolutions and the potential impact of the INFN-Cloud architecture will be covered, especially in the context of the ongoing integration and collaboration between the public and private sectors and of multi-disciplinary trans-national federation of heterogeneous resources.

        Speaker: Doina Cristina Duma (INFN)
      • 34
        Helmholtz Federated IT and Accessible Compute Resources for Applied AI Research

        Research is increasingly driven by close cooperation between teams from multiple institutions and often disciplines, leveraging rich scientific expertise and capacities. This puts pressure on the seamless and highly performant interaction of heterogeneous IT services. Such services include large data transfer, high performance computing, post-processing and analysis pipelines; add to that documentation services and collaboration tools of all kinds, whose importance has been demonstrated forcefully this year.

        The Helmholtz Association with its more than 40.000 employees performs outstanding research covering the research fields Energy, Earth & Environment, Health, Aeronautics, Space & Transport, Matter and Key Technologies. The Helmholtz Information & Data Science Incubator [1] was founded in 2016 to combine and enhance the diverse, decentralised expertise of the Association in the pioneering field of Information and Data Science.

        Embedded in this context, the platform “Helmholtz Federated IT Services” (HIFIS) [2] has been established to provide such common access to IT resources, as well as training and support for professional and sustainable scientific software development. From the very beginning, the requirements of the whole Helmholtz scientific communities have been surveyed extensively, now allowing to shape and operate services according to their needs.

        An organizational and technological key to allow common access to IT services is the establishment of an authentication and authorization infrastructure, named Helmholtz AAI, which is based on the European AARC blueprint architecture for AAI. Based on this infrastructure, cloud services for data management, collaboration and scientific work are offered as prototypes and will go into production by end of 2020. The full establishment of such federated IT infrastructure in Helmholtz will allow closer partnering to European IT and research communities such as EGI and EOSC.

        Access to powerful compute resources is crucial for the machine learning (ML) and artificial intelligence (AI) community. For Helmholtz, this has been pushed forward by the “Helmholtz AI computing resources” (HAICORE) initiative with resources being installed and operated at two research sites: A first HAICORE resource has been recently installed comprising latest NVIDIA DGX A100 - one of the first installations in Europe [3]. The common access to these capacities is established using the AAI service offered by HIFIS.

        Fostering collaborative research in applied ML/AI, allowing to crosslink all of Helmholtz’ research fields, is the main objective of the cooperation platform “Helmholtz AI”. In close cooperation with HIFIS and exploiting capacities of HAICORE, project calls are being delivered by Helmholtz AI. Starting in 2020, this allowed funding of overall 19 AI projects [4] with a duration of two to three years, involving all Helmholtz research fields.

        In this talk, we will present the interaction of these new infrastructural and scientific initiatives, the advantage of their design as long term, sustainable platforms and we will demonstrate the benefits of integrating the needs of the scientific communities from the very beginning in order to sustainably foster cross-institutional and multi-disciplinary research.

        [1] https://www.helmholtz.de/en/research/information-data-science/helmholtz-incubator/
        [2] https://www.hifis.net
        [3] https://www.kit.edu/kit/english/pi_2020_056_super-fast-ai-system-installed-at-kit.php
        [4] https://www.helmholtz.ai/themenmenue/our-model/funding-lines/project-call-2019/index.html

        Speaker: Dr Uwe Jandt (DESY, on behalf of HIFIS, HAICORE, Helmholtz AI initiatives of the Helmholtz Association)
    • Workflow Management solutions Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      This session will include Workflow Management related presentations that were submitted to the conference and selected by the programme committee.

      Convener: Sorina POP (CNRS)
      • 35
        Workflow Platform for Machine Learning and Non-machine Learning Applications

        ML research usually starts with a prototype on a single machine with CPU only. As a project grows, it would have to experience two major transitions: from laptop to data centre, and from data centre to clouds. Major rework is often required for each transition. Researchers often have little expertise in both core facilities and clouds. Many projects experience unnecessary growing pain or even fail to reach production-quality products.

        Kubeflow has largely unified the computing in three different environments. We still need federated data access to unify the storage in different environments. This is particularly important for data-intensive ML applications, such as image classification as well as non-ML applications, such as genomic sequence analysis. On the one hand, the algorithms require large amounts of data in TB to PB range. On the other hand, many implementations assume that all data is stored on a local disk. We then use Onedata, a FUSE-based utility to present data to workflows as files stored in a local POSIX-like file system. When we shift the computing between the three different environments, we do not copy data into respective environments. We do not change implementation for different data storage, either.

        We have run a workflow of image classification notebook in three environments. The source images are cardiomyocyte tissues from Image Data Repository (IDR). The results clearly demonstrate how the training and verification become significantly faster from a single machine to local data center with CPU only, and to cloud with GPU. Without making any changes to the Python script in the notebook, we are able to make use of more resources in the core facility, and GPU in Google cloud to accelerate the training and validation significantly. With much improved throughput, we were able to experiment with various image augmentation, many different image classification models, and hyper parameters quickly in the core facility and in the clouds with different GPU models.

        We have also run a hand-crafted non-ML pipeline for classic variant calling on the same platform. We express the directed acyclic graph (DAG) as Python functions and function calls with DSL. The functions are backed by our custom containers created with any programming languages. The Kubeflow DSL compiler turns them into the highly detailed YAML files for Argo on Kubernetes. The time and effort to create a pipeline from scratch is greatly reduced.

        Kubeflow, as a brand new ML workflow platform, is little known to the Computational Biology and Bioinformatics communities. We have successfully enhanced it with the integration with FUSE-based utilities. This gives us the flexibility to leverage more computing resources, faster network on internet backbones and the latest GPU models without changing our implementation. It allows us to use commercial cloud resources in a cost-efficient manner, where GPUs may be charged by the second, instead of having them reserved for the whole duration of a batch job. It also allows us to combine ML workflows and classic non-ML workflows on a single unified platform.

        Speaker: Dr David Yuan (European Bioinformatics Institute)
      • 36
        Towards FAIR CryoEM workflows in EOSC

        Scipion is an application framework developed by the Instruct Image Processing Center (I2PC) in Madrid to help the Structural Biology community to process CryoEM data. Scipion is a plugin-based workflow management system that integrates most relevant solutions in the community, allowing scientists to use them without bothering about formats and conversions.

        CryoEM processing starts in the microscopes facilities where specific preprocessing workflows are run on Scipion to obtain the first quality data out of the microscope raw images as they are produced. When the acquisition finishes biologists leave the facility with both raw data and Scipion project and ideally continue processing using Scipion on their home labs or computing centers.

        In the case of Instruct funded projects, scientists benefit from grants to obtain data in Instruct facilities but they also have to comply with Instruct Data Policy Management Plan, which states that all data obtained has to be made public after a certain embargo period, including both results and raw data.

        Due to the recent evolution in CryoEM techniques and algorithms, processing demands more and more CPU and more importantly, GPU power, which implies that research institutions have to invest in expensive hardware and qualify staff to administer it. Cloud computing appears as a natural solution to overcome this problem, providing not only the best hardware but also packed images containing scientific software that allow to deploy a complete processing environment in a very short time.

        With the latest in mind and in the context of former European projects, such as WestLife, MoBrain and other Instruct funded projects, I2PC developed ScipionCloud images and made them available in EGI AppDB and AWS. However, users still need to have knowledge on how to deploy and manage instances in the cloud and how to optimize the use of cloud resources.

        This has been the main motivation to create a ScipionCloud service within EOSC that will use standard EOSC services to facilitate cloud deployment and user access as well as an optimized usage of cloud resources. This service is currently being implemented as a thematic service in the EOSCSynergy project and in the first phase will only be available to Instruct users, with the potential to be open to other users by means of access control. In parallel and in the context of the EOSCLife project a different aspect is targeted, FAIR data and workflows compliance while addressing the Instruct DPMP to guarantee data publication after the required embargo.

        This complex scenario involves moving to containers technology (docker) and using existing EOSC core services such as EGI Checkin for access control and the Infrastructure Manager and EC3 for elastic cluster deployments. Data produced at the facility will be sent to a cloud storage where the service could access it and moreover, to ensure data FAIRness, an existing Scipion plugin used to deposit workflow and data in the EBI EMPIAR database will be enhanced to produce a CWL file with the workflow description and submit it packed into an RO-Crate to the EOSCLife WorkflowHub repository.

        Speaker: Laura del Cano (CSIC)
      • 37
        Workflow Orchestration on Tightly Federated Computing Resources: the LEXIS approach

        Keywords:

        • Scientific application workflow modelling and orchestration
        • Multi-site Cloud and HPC federation
        • Federated security

        Summary:

        This presentation aims at illustrating the key idea and technologies behind the LEXIS Orchestration Service and the LEXIS AAI. Starting from a review of the state-of-art on computing resource federation, the innovative approach of the LEXIS platform will be motivated. Finally, a detailed illustration of the LEXIS orchestrator, along with the main relevant aspects related to the dynamic resource allocation, and the LEXIS AAI will be provided.

        Speakers: Dr Alberto Scionti (LINKS Foundation), Mr Marc Levrier (Atos)
    • 2:30 PM
      Coffee break URLs assigned per session

      URLs assigned per session

      Zoom

    • 2:30 PM
      Coffee break URLs assigned per session

      URLs assigned per session

      Zoom

    • 2:30 PM
      Coffee break
    • Demos 1 Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      Some things are best demonstrated, especially when it comes to technical services. Just before the coffee break, we offer a 30 minutes slot for submitted demos to show these services, outputs, or any other activity that is relevant to the conference's theme.

      • 38
        Running batch jobs opportunistically across dynamic hybrid multi-clouds

        The use of clouds for running scientific workloads, in particular for HTC applications, is well established. In recent years the use of clouds for also running HPC applications, which typically require low-latency interconnects, has also been gaining momentum. As such, many platforms have been developed for creating clusters in clouds or enabling local batch systems to burst into clouds. However, generally these platforms assume that only one or a few large clouds, where resources are guaranteed, will be used. With the increasing deployment of clouds within national e-infrastructures and increasing access to public clouds it is becoming more and more important to provide ways for users to easily run their workloads across any number of clouds.

        PROMINENCE is a platform, originally developed within the Fusion Science Demonstrator in EOSCpilot and currently being extended through the EGI Strategic and Innovation Fund, allowing users to transparently run both HTC and HPC applications on clouds. It was designed from the ground up to not be restricted to a single cloud but be able to use multiple clouds simultaneously in a dynamic way, including many small clouds opportunistically. From the user's perspective it appears like a normal batch system and all infrastructure provisioning and failure handling is totally invisible. All jobs are run in containers to ensure they will reliably run anywhere and are reproduceable. POSIX-like access to data can be provided (leveraging technologies such as OneData) or data can be staged-in and out of jobs from object storage.

        This demonstration will begin with a walk-through of the features provided by PROMINENCE and further show how easy it is to run jobs and workflows across multiple cloud providers.

        Speaker: Andrew Lahiff (CCFE / UK Atomic Energy Authority)
      • 39
        Cloud benchmarking and validation test suite

        The Helix Nebula Science Cloud (HNSciCloud) project developed a hybrid cloud linking commercial cloud service providers and research organisations in-house resources via the GÉANT network. Following HNSciCloud, the Open Clouds for Research Environments project (OCRE) leverages its experience in the exploitation of commercial cloud services, currently being considered by the European research community as part of a hybrid cloud model to support the needs of their scientific programmes. Parallel to OCRE, the Archiving and Preservation for Research Environments (ARCHIVER) project -led by CERN- combines multiple Information and Communications technologies, including extreme data-scaling, network connectivity, service interoperability and business models, in a hybrid cloud environment to deliver end-to-end archival and preservation services that cover the full research lifecycle.

        During the testing phases of HNSciCloud it became evident how challenging such tasks can be as cloud services providers offer a variety of products that are often unknown to the research community. In order to test and compare services performance and adequacy for multiple scientific domains (High Energy Physics, Life Sciences, Photon-Neutron Sciences, Astronomy), an automated testing suite to run a set of tests and benchmarks across all cloud stacks is needed. Such a framework would strongly support testing activities on both ARCHIVER and OCRE projects.

        CERN IT developed a test-suite that leverages on the testing activities of HNSciCloud where the European Research organisations involved -CERN being one of them- representing multiple use cases from several scientific domains, put together more than thirty tests and benchmarks. This tool was designed to be as modular and autonomous as possible utilising open source technologies well established in industry that allow easy deployment and transparent assessment. It relies first on Terraform for resource provisioning, followed by Ansible for compute instances configuration and bootstrapping of a Kubernetes cluster, which provides the abstraction layer so eventually tests run on Docker containers. Finally, results are pushed to an S3 bucket that the CERN OpenStack cloud hosts. The test catalog offers functional and performance benchmarks in several technical domains such as compute, storage, HPC, GPUs, network connectivity performance, and advanced containerised cloud application deployments but also covers upper levels in the stack as for example evaluating the degree of “FAIRness” of data repository services or federated AAI protocols. The concept is expected to be expanded and considered a best practice for the on-boarding of commercial services for research environments in the context of the European Open Science Cloud, to ensure that cloud offerings conform to the requirements and satisfy the needs of the research community.

        On this demonstration the test-suite will be run showing the whole workflow, from configuration of the tool to results gathering, using as test deployment examples a variety of networking tests and CPU benchmarks, as these two areas are relevant for most domains.

        Speaker: Ignacio Peluaga (CERN)
    • Demos 2 Room: http://go.egi.eu/zoom4

      Room: http://go.egi.eu/zoom4

      Some things are best demonstrated, especially when it comes to technical services. Just before the coffee break, we offer a 30 minutes slot for submitted demos to show these services, outputs, or any other activity that is relevant to the conference's theme.

      • 40
        Fusion's FAIR Data Portal

        While Fusion is one of the oldest ‘official’ scientific communities, it has not fully embraced modern technical solutions to ease user access and permit interoperability and findability between data generated at different experimental sites. This has been due to a number of very good reasons - different funding models, national strategic importance, differing legal frameworks. However, the FAIR4Fusion project aims to help Fusion data to become at least FAIR within the community by making data findable, accessible and citable by any fusion researcher. This project has gathered a community from 5 large European tokamaks (ASDEX Upgrade, JET, MAST-U, TCV and WEST) including data managers and users who have provided a coherent set of requirements which have been augmented with requirements from FAIR to allow us to improve present practices, allowing better interactions with the EOSC. To this end, we are embarking on the parallel development of two demonstrator projects, one making use of existing tools within the fusion community and one looking at alternative technologies, both of which will feed a reference architecture for future implementation.

        The first demonstrator being prepared makes use of existing tools within the community that have been extended by the F4F project and dockerised; the JET dashboard forming the basis for the data query layer, EUROfusion’s Catalog QT forming the underlying information storage and indexing power, the ITER developed Unified Data Access layer for information transport and mapping, and the ITER/EUROfusion Interface Data Structure (IDS) to provide a basis for metadata modelling. The second demonstrator that is also dockerised is designed to accommodate experimentation with technologies and approaches again using a container based approach, that could influence future developments. Its UI is also based on the JET dashboard, whereas its backend technologies are loosely coupled to, but not dependent, on the ITER technologies. By making data more accessible it is anticipated that more usage could be made of services provided by the EGI foundation, particularly use of federated cloud resources and notebooks for applications and data processing, which would further improve the FAIRness of the work undertaken by the fusion community. As it integrates with joint EUROfusion and ITER infrastructure tools it will be extendible towards future exploitation at both an extended set of European devices as well as ITER.

        This demonstration will show the progress we have made so far in terms of making the Fusion FAIR Data Portal a reality. We are seeking inputs from others who have been on a similar journey.

        Speakers: Mr George Gibbons (UKAEA), Dr Iraklis Klampanos (NCSR Demokritos)
      • 41
        FAIR-Cells: An interactive Jupyter extension for containerizing scientific code as services for eScience work

        Abstract—Researchers nowadays often rapidly prototype and share their experiments using notebook environments, such as Jupyter. To scale experiments to large data volumes, or high-resolution models, researchers often employ Cloud infrastructures to enhance notebooks (e.g., Jupyter Hub) or execute their experiments as a distributed workflow. In many cases, a researcher needs to encapsulate subsets of their code (namely, cells in Jupyter) from the notebook to the workflow. However, it is usually time-consuming and burdensome for the researcher to encapsulate those code subsets and integrate them with a workflow. This process limts the Findability, Accessibility, Interoperability, and Reusability (FAIR) of those components are often limited.

        To address this issue, we propose and develop a tool called FAIR-Cells, that can be integrated into the Jupyter notebook as a Jupyter extension to help scientists and researchers improve the FAIRness of their code. FAIR-Cells can encapsulate user-selected cells of code as standardized RESTful API services, and allow users to containerize such Jupyter code cells and publish them as reusable components via the community repositories.

        We demonstrate the features of the FAIR-CELLS using an application from the ecology domain. Ecologists currently process various point cloud datasets derived from Light Detection and Ranging (LiDAR) to extract metrics that capture vegetation's vertical and horizontal structure. A novel opensource software called 'Laserchicken' allows the processing of country-wide LiDAR datasets in a local environment (e.g., the Dutch national ITC infrastructure called SURF). However, users have to employ the Laserchicken application as a whole to process the LiDAR data. Moreover, the volume of data that Laserchicken can process is limited by the capacity of the given infrastructure. In this work, we demonstrate how a user can use the FAIR-Cells extension. Namely, to interactively create RESTful services for the components in the Laserchicken software in a Jupyter environment, to automate the encapsulation of those services as Docker containers, and to publish the services in a community catalog (e.g., LifeWatch) via the API (based on GeoNetwork). We also demonstrate how those containers can be assembled as a workflow (using Common Workflow Language) and deployed on a cloud environment (offered by the EOSC early adopter program for ENVRI-FAIR) to process a much bigger data sets than in a local environment. The demonstration results suggest that the technical roadmap of our approach can achieve FAIRness and behave good parallelism in large distributed volumes of data when executing the Jupiter-environment-based codes.

        Speakers: Dr Spiros Koulouzis (University of Amsterdam), Mrs Yuandou Wang (Multiscale Networked Systems, University of Amsterdam), Dr W. Daniel Kissling (LifeWatch ERIC, vLab&Innovation Center, Institute for Biodiversity and Ecosystem Dynamics (IBED)), Dr Zhiming Zhao (Multiscale Networked Systems, University of Amsterdam)
    • Demos 3 Room: http://go.egi.eu/zoom3

      Room: http://go.egi.eu/zoom3

      Some things are best demonstrated, especially when it comes to technical services. Just before the coffee break, we offer a 30 minutes slot for submitted demos to show these services, outputs, or any other activity that is relevant to the conference's theme.

      • 42
        EOSC support to transparent access to Copernicus Sentinel image data for wider uptake in science

        Earth Observation (EO) is a relatively new domain on the European Open Science Cloud. EO data access has long suffered from restrictive licenses and opaque and proprietary distribution systems, which has, by an large, hindered wide uptake in science, in particular beyond the traditional remote sensing and geospatial analysis disciplines. Massive new EO data streams which are distributed under a full, free and open license include those from the European Copernicus program’s Sentinel sensors since 2014 and US Landsat since 2010. Currently multiple Petabytes of high resolution Sentinel-1 (SAR) and -2 (optical) sensor data are available for thematic research and monitoring applications in maritime and land science disciplines.
        Still, even with open licenses, EO data access remains complex and combining such data with geospatial reference data for targeted analysis is hard for novice users. Extensive knowledge of sensor-specific data organization, map projections and formats is often required. Some data, for instance Sentinel-1, requires complex processing to create “analysis ready” data sets. Similarly, feature data sets suffer from a plethora of, outdated, data formats that only (still) exist due to long deprecated proprietary solutions. The old paradigm in EO data analysis was that 80% of a researcher’s time was spent on data pre-processing and preparation, and 20% on analysis. This radically changed with the introduction of Google Earth Engine (GEE), Google’s cloud infrastructure that hosts complete sensor data collections closely coupled to its massive parallel processing capacity. By abstracting data access and integrating ever more sophisticated analysis methods in its library of geospatial analysis routines, science users are able to compose their analysis in scripts that can be executed interactively or in batch. With that, the paradigm is more than inverted, i.e. 95% of research time is now spent on programming and testing the analytical logic that underlies scalable and reproducible science methods.

        In this demonstration, we show how EOSC resources can be used to emulate some of GEE’s functionalities. We have developed this in an Early Adaptor Project, which inherits developments that were originally implemented on Copernicus DIAS, which is European cloud infrastructure that is closely coupled to Sentinel data archives. One of the DIAS instances (CloudFerro) is federated in EOSC. Amongst others, we demonstrate hybrid solutions that combine pre-extracted time series from PostgreSQL/Postgis with direct access to image subsets. The time series can be analyzed and visualized in Jupyter Notebooks or client side python, including machine learning routines, through the use of RESTful services. Image extracts are used both in pre-configured visualization and for full resolution higher level image processing (e.g. segmentation, structural analysis). Our development further provides pointers on how optimized data formats, smart caching and prediction and support to “on the fly” processing may further advance DIAS utility in this domain, and achieve the overall goal to “take space data out of the space domain”, and integrate deeper in applied science.

        Speaker: Guido Lemoine (European Commission, Joint Research Centre)
      • 43
        EOSC-Synergy Jenkins Pipeline Library

        The Jenkins Pipeline Library (JPL) [1] is one of the core components of the EOSC-Synergy software and services quality assurance as a service platform (SQAaaS), aimed at fostering the adoption of EOSC services through a quality based approach. The JPL, whose previous version was used in recently finished EINFRA-related research projects, has been refactored during the ongoing EOSC-Synergy project, being the first component of the SQAaaS platform to be released. However, it is a self contained component that can be used standalone for the creation and execution of CI/CD pipelines.

        The library facilitates the creation of Jenkins pipelines by using a YAML description that will be used to compose dynamically the stages that need to be present in the CI/CD pipeline. The actions in the YAML configuration file are aligned with the criteria compiled in the software and service quality baselines [2][3], supported by the EOSC-Synergy project, and rely on Docker Compose to orchestrate the required set of services needed during the quality assessment process. A minimal (single-stage) Jenkins CI/CD pipeline definition (Jenkinsfile) is needed to dynamically compose the stages defined as actions in the YAML description. This means that the use of this library does not limit the researcher to the criteria as they are defined in the baselines, but additional stages can be added directly in the Jenkinsfile. Once this file layout is placed in the application’s source code repository, the pipelines will be automatically constructed and executed through a Jenkins CI/CD system. The approach followed by the new JPL release lowers the barriers that hinder the adoption of the good practices that enable quality-based and sustainable software and service developments in research environments.

        In the context of the EOSC-Synergy SQAaaS platform, the library will be used to enable the on-demand dynamic composition of Jenkins pipelines that will perform the several steps of the envisaged quality assurance. These steps will implement the quality validation actions defined in the EOSC-synergy software and services quality criteria.

        The demonstration will highlight the features and capabilities of the library in practice, showing how to easily create pipelines that implement and comply with the good practices that are expected during the software lifecycle, from development to production. This is particularly relevant to developers and managers of research services both at the infrastructure and thematic levels.

        [1] https://github.com/indigo-dc/jenkins-pipeline-library
        [2] http://hdl.handle.net/10261/160086
        [3] https://digital.csic.es/handle/10261/214441

        Speaker: Pablo Orviz (CSIC)
    • Coffee break
    • Data Analytics and thematic services - part 1 Room: http://go.egi.eu/zoom3

      Room: http://go.egi.eu/zoom3

      This session will include Data Analytics related presentations that were submitted to the conference and selected by the programme committee.

      Convener: Wolfgang zu Castell (Helmoltz Zentrum München, German Research Center for Environmental Health)
      • 44
        Generating on-demand coastal forecasts using EOSC resources: the OPENCoastS service

        The OPENCoastS service, developed within the EOSC-hub project, enables the creation of on-demand operational coastal circulation forecast systems through a user-friendly web site, only requiring the computational grid for the selected area. Deploying a new forecast is straightforward, following seven simple guided steps. Several modeling options are available, depending on the relevant physics of the coastal system. The user may choose to simulate 2D tide and river hydrodynamics, 3D baroclinic circulation or 2D wave and current interaction hydrodynamics. The web site is a one-stop-shop portal for the whole modeling procedure, providing forecast deployment setup, management and visualization components, including a detailed viewer where the daily outputs are available for probing and data comparison.
        The worldwide usage of OPENCoastS requires the availability of considerable and reliable computational resources and core services to guarantee the timely delivery of modeling products. As the simulations in the OPENCoastS service are performed with an MPI-based model (SCHISM, Zhang et al., 2016), it is possible to scale their performance (and the computational time) by taking advantage of several nodes in the EGI Cloud and High-Throughput computing services, through the INCD and IFCA cloud infrastructure. Besides the access to computing power, OPENCoastS is also integrated with the following EOSC core services:

        • EGI Check-in: the authentication and authorization infrastructure (AAI).
        • EGI Workload Manager (DIRAC4EGI): the workload management system.
          This paper presents the integration procedure with all services and the development required to achieve the very stringent conditions of OPENCoastS real-time operation. In particular, the work developed to integrate OPENCoastS with the DIRAC4EGI workload management service is detailed. The choices undertaken proved vital to support more than 200 international users and applications to coastal systems in the 5 continents.
          Zhang, Y., Ye, F., Stanev, E.V., Grashorn, S. (2016) Seamless cross-scale modeling with SCHISM, Ocean Modelling, 102, 64-81.
        Speaker: Mr João Rogeiro (LNEC)
      • 45
        Digital Platform Integrating Complex Software Solutions for Decision Support System Dedicated to the Implementation of Sustainable Mobility Measures in Port Cities

        The Port cities are complex urban agglomerations where the mobility issues have many specific aspects. The proposed presentation is summarizing the results obtained in the process of structuring a digital platform for Decision support in the implementation of Sustainable Mobility Measures in Port Cities. The platform has been conceived to integrate a dedicated database able to accommodate traffic data, air quality data, tools for processing data, including Big Data Tools, specific software for modeling and simulation traffic flows in the city and the port, for modeling dispersion of pollutants and for analyses of specific aspects related to mobility.
        Special algorithms have been conceived for the analysis and optimization of traffic flows taking into consideration different objective functions as access time for specific areas of the city or the port, fuel consumption, pollutant emissions, integration of Renewable Energy Sources and other similar.
        The application is including dedicated tools for impact evaluation of based on social and behavioral analyses of the selected sample groups of citizens. Statistical analyses and AI tools are integrated for performing complex evaluations.
        The entire platform has been developed as a web based service with extended and configurable functionalities.
        The results of the research and innovation activities have been obtained withing the PORTIS Project that has been funded under Horizon 2020 program of the EU. Within the project, there were developed complex activities for promoting sustainable mobility in 5 port cities from the EU as Antwerp, Aberdeen, Trieste, Constanta and Klaipeda. The post city of Ningbo from China has been associated to the project.
        The digital platform for decision support has been developed and validated for the case of Constanta as a port city from the Black Sea.

        Speaker: Prof. Eden Mamut (Ovidius University of Constanta)
      • 46
        Using the EGI infrastructure for REPROLANG2020: reproducibility in the context of Natural Language Processing

        Introduction

        REPROLANG-The Shared Task on the Reproduction of
        Research Results in Science and Technology of Language,
        was organized by ELRA with the technical support of CLARIN.
        This initiative was aimed at eliciting and motivating the
        spread of scientific work on reproductibility in the area
        of Natural Language Processing. It built on the previous
        pioneer LREC workshops on reproducibility 4REAL2016 and 4REAL2018,
        and followed also the initiative of the Language Resources
        and Evaluation journal. In this presentation we describe how the computational resources of the EGI Infrastructure were used to support this
        initiative.

        Scientific methodology

        This shared task is a new type of challenge: it is partly similar
        to the usual competitive shared tasks—in the sense that
        all participants share a common goal; but it is partly different
        —in the sense that its primary
        focus is on seeking support and confirmation of previous
        results, rather than on overcoming those previous results
        with superior ones. Thus instead of a competitive shared
        task, this is a cooperative shared task, with participants struggling for
        systems to reproduce as close as possible the results to an
        original complex research experiment and thus eventually
        reinforcing the level of reliability on its results by means of
        their eventually convergent outcomes.

        3. Technical approach

        Each submission provided a link to the input dataset and
        to a gitlab repository containing a docker image associated
        with a tag. The association of the tag to the image
        was ensured by building the docker image via the gitlab CI
        pipeline on each tag. The structure of the gitlab repository,
        the entrypoint script and parameters and the mount points
        for the input and output datasets where all predefined.
        With all container images available and a well-defined process
        in place to run the submissions, we provisioned four
        virtual private servers on EGI infrastructure. Some of the
        submissions ran without issues, some had obvious errors,
        but others had subtle, unexpected issues, such as a dependency
        on specific CPU instructions which one of the VPS
        instances did not have. Another experiment failed due to
        a lack of GPU memory. Our instances had 8GB of GPU
        memory, after provisioning a new instance with 12GB on
        commercial infrastructure we were able to successfully run
        the submission.
        To check for possible hard-coded results, we proceeded to
        rerun the experiments that successfully finished before the
        review deadline with ablated input data.

        Conclusion

        Through the efforts made in the context of REPROLANG2020,
        we learnt that a meticulous replication
        process requires substantial efforts from all sides involved.
        In this paper we focused on the technical replication, highlighting
        the importance of having access to flexible and adequate
        computing resources. While in the end we were able
        to successfully complete the exercise, we also hope that our
        experience can contribute to further improvements of the
        EGI infrastructure, like better pre-configured GPGPU instances
        with more dedicated GPU RAM, and the inclusion
        of more information about the CPU models and supported
        instruction sets in the provisioning systems.

        Speakers: Dieter van Uytvanck (CLARIN ERIC), Willem Elbers (CLARIN ERIC)
      • 47
        Integration of WORSICA’s thematic service in EOSC - challenges and achievements

        The thematic service Water mOnitoRing SentInel Cloud plAtform (WORSICA) is a one-stop-shop for the detection of water using images from satellites and Unmanned Aerial Vehicles (UAVs), and in-situ data. WORSICA can be used for the detection of coastlines, coastal inundation areas and the limits of inland water bodies. It can also be applied to a range of other purposes, from the determination of flooded areas (from rainfall, storms, hurricanes or tsunamis) to the detection of large water leaks in major water distribution networks. This freely available service enables the user communities to generate maps of water presence and water delimitation lines in coastal and inland regions. In particular, the service helps to promote 1) the preservation of lives during an emergency, supporting emergency rescue operations of people in dangerously inundated areas, and 2) the efficient management of water resources targeting water saving in drought-prone areas.
        The WORSICA thematic service builds on several components developed in both national and European projects, such as EOSC-hub and EOSC-Synergy European projects and the nationally-funded INCD, included in the Portuguese infrastructure roadmap. In this work, a brief demonstration of the main challenges found during the integration of the service in the EOSC infrastructure is presented, using IT services from the EOSC marketplace catalog and other ones developed in the scope of the EOSC-Synergy project.
        The WORSICA’s workflow consists in the processing of large imagery datasets for the detection of water, which includes several processing steps (e.g. download of the images, atmospheric correction, classification and clustering of the observed features), and finally, the presentation of the resulting products in a user-friendly web portal. These inputs are quite large, resulting in substantial computational and time-consuming costs. Therefore, to minimize these costs the usage of different hardware technologies, such as HPC, GPUs and cloud computing, is necessary, as well as robust and efficient IT tools, developed and provided by EOSC and EOSC-Synergy projects.
        WORSICA’s architecture was developed to connect several components from the backend, locally and remotely, with IT services provided by EOSC partners, such as Dataverse (for the FAIR data principles), EGI Check-In AAI (federated authentication), workload monitoring/resource managers (HPC, GPUs and cloud computing), data storage and EOSC-hub-OPENCoastS. WORSICA uses docker containers to create sub-services and connect them to each other: i) portal (web frontend), where the users specify their options in the workflow, and ii) intermediate (service backend), which connects the required EOSC core services, receives the simulation requests, then generate and submits the jobs to the core services for processing.
        As WORSICA is EOSC Synergy’s reference thematic service, the software and service quality will be validated using the platform SQAaaS. At the end of the project, Worsica will be certified for the community evaluation assuring high quality standards.
        The integration of the WORSICA service in the EOSC infrastructure will boost the usage of the service at an European level, taking advantage of robust, well-designed and efficient IT tools to work with a massive amount of data in this first class computing European infrastructures.

        Speaker: Mr Ricardo Martins (Laboratório Nacional de Engenharia Civil - LNEC)
    • Data Management Solutions - Part 1 Room: http://go.egi.eu/zoom2

      Room: http://go.egi.eu/zoom2

      This session will include Data Management related presentations that were submitted to the conference and selected by the programme committee.

      Convener: Patrick Fuhrmann (DESY)
      • 48
        ARCHIVER overview and results of the design phase

        ARCHIVER - Archiving and preservation for research environments - is a project coordinated by CERN using the EC PCP instrument to procure, assess and validate R&D from cloud based data archiving and preservation services. The project activities combine multiple ICT technologies, including extreme data-scaling, network connectivity, service interoperability and business models, in a hybrid cloud environment to deliver end-to-end archival and preservation services that cover the full research lifecycle. By acting as a coalition of public funded organisations (CERN, EMBL-EBI, DESY and PIC), ARCHIVER will create an eco-system for specialist ICT companies active in archiving and data preservation, who would like to introduce new, open services capable of supporting the expanding needs of research communities.

        The project started in 2019 with a large Open Market Consultation aiming to improve the mutual understanding of the R&D challenges across procurer research performing organisations and industry. During this process, potential R&D bidders assessed the innovation potential to address the project use-cases while the ARCHIVER consortium performed a gap analysis in preservation services offered to the public sector. This analysis has been documented and the results will be presented at the EGI2020 conference.

        The Request for Tenders (RfT) for the development of innovative preservation services in the context of the ARCHIVER project was based on this analysis. Published on January 31st 2020, the RfT was downloaded 150 times. In total, 15 offers were received, grouping 43 companies and organisations. Following the review process, five consortia were selected for the Design Phase (by alphabetical order):

        • Arkivum – Google
        • GMV – PIQL – AWS – SafeSpring
        • Libnova – CSIC – University of Barcelona – Giaretta Associates
        • RHEA System Spa – DEDAGROUP – GTT
        • T-Systems International – GWDG – Onedata

        These consortia are now competitively developing services to meet the requirements of the coalition of public funded organisations engaged in the project.

        In parallel, to encourage wide deployment of solutions outside the consortium, namely in the context of the EOSC, the ARCHIVER project engaged with a group of Early Adopters. Early Adopters are public organisations having a need for innovative digital archiving and preservation solutions that the services resulting from the ARCHIVER project could be satisfying. Early Adopters are entitled to several benefits such as assessing resulting services from the project, shaping the R&D carried out in the project, contributing with use cases and profiting from the same commercialisation conditions as the organisations participating in the project, in the case of purchase of services developed after the end of the project. Currently, eleven organisations are enrolled in the programme, representing different fields of research and different regions and others are in the process of becoming Early Adopters.

        This presentation will make an overview of the project, the Early Adopters programme and the R&D proposals of the consortia selected for the project design phase. In addition, the selected consortia’ architectures will be briefly shown during the presentation.

        Speaker: Jakub Urban (CERN)
      • 49
        Next-gen research data management, archiving and digital preservation

        With a total procurement budget of 3.4 million euros, the project ARCHIVER will use a Pre-Commercial Procurement (PCP) approach to competitively procure R&D services for archiving and digital preservation. ARCHIVER will introduce significant improvements in the area of archiving and digital preservation services, supporting the IT requirements of European scientists and providing end-to-end archival and preservation services, cost-effective for data generated in the petabyte range with high, sustained ingest rates, in the context of scientific research projects. The project is managed by a consortium of procurer research organisations (CERN, DESY, EMBL-EBI and PIC) and experts (Addestino and Trust-IT) and receives funding from the European Union’s Horizon 2020 research and innovation programme. T-Systems is amongst the five winning consortia selected to perform the solution design.
        T-Systems’ approach for Archiver is to provide research organisations and the European Science community with a next-generation architecture, that will seamlessly integrate archiving and digital preservation into existing science workflows, and provide an OAIS-compliant, open, easy-to-use, extendable, cost- and energy-efficient solution.
        The solution follows a full open-source and cloud-agnostic approach, building on pre-existing and proven components for data preservation, data and workflow management. The core components include Archivematica, Onedata and Flowable, and have been selected based on functionality, integration, cloud-adoption, maturity, size of user community and relations with EOSC. The modular approach is supported by a large set of APIs that will enable users to extend and integrate the components with other preferred services.
        The service offer will include a wide range of Petabyte-scale storage options in compliance with standards like OAIS, PREMIS, METS and BagIT. The service will integrate new innovate functions for distributed data and workflow management, data processing, and provide a set of new visual tools for advanced data services including search and discovery, data representation and scientific analysis.
        The core components will be provided as one integrated service offer that will be available to end-users, integrators and cloud providers for deployments on local and public cloud infrastructures. T-Systems will operate the services as part of its Open Telekom Cloud portfolio, a leading European public cloud service based on OpenStack. GWDG will extend its portfolio with the service offer for its established public and academic community. Both will support other candidates that will adapt the solution and join the community.

        Speaker: Jurry Mar, de la (T-Systems International GmbH)
      • 50
        Making Fusion Data FAIR as a prerequisite to using Open Cloud Resources

        The EOSC, together with EUROhpc, provides potential opportunities in terms of scalability and flexibility which have never previously been available to European researchers, including those involved in the fusion community. For this community, the uptake of resources provided by the cross infrastructure projects, including those provided by EGI and EUDAT has been hampered by a number of policy, political and infrastructure related issues. However, recent drives by EUROfusion and some national funding agencies are starting to bridge these issues. While issues related to federated storage, particularly for Long Term Archival, will remain an issue as much data will remain commercially sensitive and have restricted access, the potential use of external HPC and cloud resources offer the community access to larger scale and newer technologies. In this presentation we will present what the community is doing to enable use of resources and look at some preliminary investigations which have been performed, as well as look at future opportunities in the computational and workflow areas, including increased uptake of cloud facilities making use of the planned FAIR data portal for Fusion.

        Speaker: Marcin Plociennik (ICBP)
      • 51
        Processes Behind the Research Data Management Life Cycle

        Without understanding the need for compatibility between research data management systems and research processes, it is not possible to design and develop efficient and user-friendly data services for researchers. In this paper we describe the research data processes behind RDM lifecycles and provide an insight into the actual implementations of these services all around the world in different research systems.

        Speaker: Ville Tenhunen
      • 52
        Federated Research Data Management in LEXIS

        Within the LEXIS project (Large-scale EXecution for Industry & Society), a platform for optimized execution of distributed Cloud-HPC workflows for simulation and analysis of Big Data is developed. A user-friendly web portal, as a unique entry point, will provide access to data as well as workflow-handling and remote visualization functionality. The LEXIS platform federates European computing and data centers. It uses advanced orchestration solutions (Bull Ystia Orchestrator, based on TOSCA and Alien4Cloud), the High-End Application Execution Middleware (HEAppE), and new hardware systems (e.g. Burst Buffers with fast GPU- and FPGA-based data reprocessing). LEXIS heavily relies on its data-storage backend – an EUDAT-based “Distributed Data Infrastructure” for flexible, federated data and metadata management within workflows. All LEXIS systems are co-developed with three application Pilots, which represent demanding HPC/Cloud-Computing use cases in Industry (SMEs) and Science: i) Simulations of complex turbo- machinery and gearbox systems in Aeronautics, ii) Earthquake and Tsunami simulations which are accelerated to enable accurate real-time analysis, and iii) Weather and Climate simulations where massive amounts of in situ data are assimilated to improve forecasts.

        This contribution focuses on the LEXIS Distributed Data Infrastructure (DDI). With EUDAT-B2SAFE and thus iRODS (the Integrated Rule-Oriented Data System) as a basis, a data grid with a unified view on LEXIS datasets has been realized. The core of our iRODS federation comprises the supercomputing centers IT4I (CZE) and LRZ (DEU). It can be extended to include further partners at any time. Leveraging EUDAT-B2SHARE, data in the DDI can be accessed via GridFTP. This follows our policy of adapting European federated computing and data-handling concepts for building a specialized, but open simulation-workflow environment, with focus on Cloud-HPC-Big Data convergence. This may be extended on the computing side by federating LEXIS computing resources e.g. with the EGI federated cloud. In general, we aim at seamlessly immersing LEXIS in the European computing and data landscape, with a focus on EOSC partners and the Big Data Value Association (BDVA).

        On a technical level, the DDI features significant adaptions to the LEXIS ecosystem with respect to a pure iRODS or EUDAT-B2SAFE system. With an iRODS OpenID plugin extended to handle large tokens, it connects to the Keycloak-based LEXIS AAI (or any AAI compatible with OpenID-Connect and SAML). Using a redundant setup and appropriate policies (iRODS rules), data safety and quick data availability will be ensured. As a further feature in LEXIS, innovative REST APIs on top of the DDI ensure a smooth interaction with the LEXIS orchestration layer and portal. They provide, for example, the (meta-)data catalogue of the DDI, and offer an endpoint to trigger asynchronously-executed data transfers to/from the HPC and Cloud systems involved in LEXIS.

        Besides performance-oriented functionality, the DDI features fundamental capabilities for Research Data Management (RDM) following the FAIR principles ("Findable, Accessible, Interoperable, Reusable"). Metadata is kept with the data, and PIDs will be acquired via EUDAT-B2HANDLE. This serves to disseminate open LEXIS (meta-)data to search facilities (B2FIND, BASE, web search engines, etc.), and thus to contribute to general data sharing and re-use.

        Speaker: Mr Mohamad Hayek (Leibniz Supercomputing Centre (LRZ) - Bavarian Academy of Sciences & Humanities)
    • Highlights from EGI participants and partners - Part 2 Room: http://go.egi.eu/zoom4

      Room: http://go.egi.eu/zoom4

      This session offers a space for EGI Federation Participants and Partners to share the latest strategic and infrastructure developments in their domain.

      The second part will focus on three presentations from initiatives in Russia, Iberian region (Spain and Portugal) and Slovenia.

      Convener: Sergio Andreozzi (EGI.eu)
      • 53
        Future developments of the JINR computing infrastructure for large scale collaborations

        The experiments at the Large Hadron Collider (LHC) at CERN (Geneva, Switzerland) played a leading role in scientific research. Data processing and analysis is carried out using high-performance complexes (Grid), academic, national and commercial resources of cloud computing, supercomputers and other resources. JINR is actively involved in the integration of distributed heterogeneous resources and the development of Big data technologies to provide modern large scale projects. JINR is actively working on the construction of a unique NICA accelerator complex, which requires new approaches to the implementation of distributed infrastructure for processing and analysis of experimental data.
        The report provides an overview of major integrated infrastructures to support large scale projects and trends in their evolution. The report also presents the main results of the Laboratory of Information Technologies Joint Institute for Nuclear Research (JINR) in the development of distributed computing.
        A brief overview of the projects in the field of the development of distributed computations performed by LIT in Russia, CERN, the USA, Europe, China, JINR Member States of JINR.

        The Joint Institute for Nuclear Research is an international intergovernmental organization, a world famous scientific centre that is a unique example of integration of fundamental theoretical and experimental research with development and application of the cutting edge technology and university education. The rating of JINR in the world scientific community is very high.

        Dr. Vladimir V. Korenkov is the Director of the Laboratory of Information Technologies (LIT) at JINR.

        Speaker: Dr Vladimir Korenkov (Joint Institute for Nuclear Research)
      • 54
        IBERGRID roadmap towards expanding capacities and capabilities for scientific service provisioning

        For the past 12 years IBERGRID has been the forum for common activities and sharing of knowledge between Spain and Portugal in the area of distributed computing. IBERGRID federates infrastructures from Iberian research and academic organizations mainly focused on Grid, Cloud computing and Data processing. As such it enables the joint participation of Spain and Portugal in international initiatives of distributed computing such as the EGI Federation, and Data Repositories in the framework of EOSC.
        IBERGRID also provides regional operations coordination for the computing and data processing activities of several international user communities including ESFRIs. In this presentation we will present the challenges ahead regarding service development & deployment, and co-development activities with research communities of Iberian interest.

        Speaker: Isabel Campos (CSIC)
      • 55
        Integrating and scaling up SLING: perspectives for scientific computing in Slovenia

        The Slovenian national supercomputing network is available for researchers in universities, research institutions and industrial development centers who need significant computing capacities to run compute-intensive algorithms and massively parallel algorithms (HPC/supercomputing) and use distributed processing of large numbers of tasks across several clusters (compute grid, HTC/high throughput computing) and big data processing.

        Speaker: Jan Javorsek (JSI)
    • The SoBigData Research Infrastructure Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      The workshop will present an overview of the important aspects a community must take into consideration in order to design and implement a distributed, pan-European, multi-disciplinary research infrastructure for big social data analytics such as SoBigData RI (www.sobigdata.eu).

      This RI is the result of a first project called SoBigData ended in 2019 and the base of the newest SoBigData++ started in January 2020 with the objective to consolidate and enrich the platform for the design and execution of large-scale social mining experiments accessible seamlessly on computational resources from the European Open Science Cloud (EOSC) and on supercomputing facilities. SoBigData++ integrates a community of 31 key excellence centres at a European in Big Data analytics and social mining. We will use this experience to present in the workshop the following key aspects:
      Data Science, Multidisciplinary & AI.

      We believe that the necessary starting point to tackle the challenges is to observe how our society works, and the big data originating from the digital breadcrumbs of human activities offer a huge opportunity to scrutinize the ground truth of individual and collective behaviour at an unprecedented detail and at a global scale.

      Ethics & Privacy.
      There is an urgency to develop strategies that allow the coexistence between the protection of personal information and fundamental human rights together with the safe usage of information for scientific purposes by different stakeholders with diverse levels of knowledge and needs. There is a need to democratise the benefits of data science and Big Data within an ethical responsibility framework that harmonizes individual rights and collective interest.
      Training the next generation of Data Scientists for Social Goods.

      There is an urgency to thoroughly exploit this opportunity for scientific advancement and social good as currently the predominant exploitation of Big Data revolves around either commercial purposes (such as profiling and behavioural advertising) or – worse – social control and surveillance. The main obstacle towards the exploitation of Big Data for scientific advancement and social good – besides the scarcity of data scientists – is the absence of a large-scale, open ecosystem where Big Data and social mining research can be carried out.
      TagMe: A success story of how the integration boosted a research result.

      SoBigData has succeeded in boosting the usage of some services it offers, with highly relevant peaks of daily accesses, for example in the TagMe system for automated semantic annotation of short texts. The future direction is to develop more tools that are similarly simple and effective to use.

      We will include in the workshop a round table moment with the participants going in details and open discussions about the presented aspects.

      Conveners: Beatrice Rapisarda, Mrs Francesca Pratesi (CNR), Mr Luca Pappalardo (CNR), Mr Mark Cotè (King's College London), Mr Paolo Ferragina (University of Pisa), Mr Roberto Trasarti (CNR)
    • Networking cocktail: Amsterdam, drinks and fun! Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      Unfortunately, we couldn’t welcome you at the beautiful venue of the Royal Tropical Institute in Amsterdam this year. But, chin up, we’ll ensure you’ll feel as much connected with the venue and city as possible. What do we ask from you? Bring your drink, and prepare for a fun, virtual quiz! PS: there are prizes to win, you wouldn’t want to miss this!

      Convener: Dimple Sokartara
      • 56
        Short intro
        Speaker: Dimple Sokartara
      • 57
        Presentation Mark Schneider CEO Royal Tropical Institute (KIT Amsterdam)
      • 58
        Quiz
    • Keynote: GAIA-X - Europe's sovereign technology ecosystem Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      Europe risks losing its digital sovereignty in the ongoing digitalization of its industry and public sector. Currently, there is no scalable European technology ecosystem. However, cloud platforms for data storage and computing resources with additional software ecosystems are key for digitalization, which in turn drives innovation. The ecosystems provided by hyperscalers are technically powerful, yet are not tailored towards European industry’s demands on distributedness, openness, and interoperability. Europe needs a sovereign ecosystem to steer digitalization the European way and support its industry and public sector on the journey to having a sustainable digital economy. To this end, GAIA-X promises a solution to regain control of European data under European standards.

      "With GAIA-X, representatives from politics, business and science from France and Germany, together with other European partners, create a proposal for the next generation of a data infrastructure for Europe: a secure, federated system that meets the highest standards of digital sovereignty while promoting innovation. This project is the cradle of an open, transparent digital ecosystem, where data and services can be made available, collated and shared in an environment of trust".
      compliance framework and the supported use cases.

      Conveners: Maximilian Ahrens (CTO at T-Systems), Volker Guelzow (DESY)
    • Compute services in EGI - Overview and use cases Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      Description
      This session provides an overview of the baseline computing services of EGI that deliver a distributed computing infrastructure to perform any kind of data analytics for research and innovation.

      The EGI service portfolio provides you various computing solutions to match your needs: Virtual Machine based computing for long-running services and for data analytics platforms; container orchestration powered by Kubernetes and Docker; facilities for massively parallel workloads. During this session we will introduce these various services, will compare them to advise You on the most suitable choice for certain problems, and will feature a few research communities to report on their experience in using these services in real-life research workflows.

      The session is of introductory level aiming to serve new communities who want to engage with and use EGI compute services.

      Main target audience
      Scientists, representatives of scientific communities

      Convener: Enol Fernandez (EGI.eu)
      • 59
        Overview of the EGI Computing services

        Introduction to the EGI Computing services covering:
        * EGI Cloud Compute
        * EGI Cloud Container Compute
        * EGI High Throughput Compute
        * Using GPUs on EGI Cloud/HTC
        * CVMFS

      • 60
        NextGEOSS EO data processing campaigns on EGI.eu Federated Cloud

        EO data is a unique source of global measurements over decades. When calibrated and combined with other sources, it empowers validation & interpolation models. The NextGEOSS DataHub and Platform users are Earth sciences practitioners who create, share and reuse software assets & data products in a collaborative process.
        Terradue supports these partners by giving them access to an application integration environment - Ellip - and access to a set of production environments on Cloud Providers part of the EGI.eu Federated Cloud.
        We present the NextGEOSS use cases and cloud resources involved for the data production campaigns, as well as the perspectives opened by this multi-partner collaboration effort.

        Speaker: Herve Caumont (Terradue Srl)
      • 61
        AiiDAlab - an ecosystem for developing, executing, and sharing scientific workflows

        In my talk, I am going to present AiiDAlab, which is a web platform that enables computational scientists to package scientific workflows and computational environments and share them with their collaborators and peers for further use. I will start by motivating the creation of the service. Further, I am going to describe the interface and show a quick demonstration of how AiiDAlab works. Finally, I will give some examples of how AiiDAlab is used in the "real-life" research.

      • 62
        Simulation of high power laser experiments by using high performance computer

        In this talk I will briefly introduce the the high-power laser system available in ELI-NP and the necessity to perform numerical simulation for experiments with these lasers. The simulation method involved requires a massive parallelization. The idea how it is done will be introduced. A short example of parallelization of using Massage Passing Interface (MPI) will be shown.

        Speaker: Jian Fuh Ong (ELI-NP)
      • 63
        Q & A
    • Data analytics and thematic services - part 2 Room: http://go.egi.eu/zoom3

      Room: http://go.egi.eu/zoom3

      Convener: Alex Upton (ETH ZURICH)
      • 64
        Machine learning and Deep Learning services for the EOSC

        The DEEP-HybridDataCloud project offers a development framework for all users, including non-experts, enabling the transparent training, sharing and serving of Artificial Intelligence, Machine Learning and Deep Learning models both locally or on hybrid cloud systems in the context of the European Open Science Cloud. The DEEP solution is based on Docker containers packaging already all the tools needed to deploy and run the models in a transparent way for the users.

        In this session we will present the current service offer, allowing scientsits to share and publish models ready to be used (through the DEEP marketplace); to develop, build and train models (through the DEEP training facility); and to deploy them as services (through the DEEP as a Service).

        Speaker: Alvaro Lopez Garcia (CSIC)
      • 65
        Quality and Capacity expansion of Thematic Services in EOSC-SYNERGY

        EOSC-SYNERGY Thematic services aim to increase the acceptance of EOSC by building capacities and introducing improved platform and infrastructure services. EOSC-SYNERGY has identified ten thematic services addressing four scientific areas (Earth Observation, Environment, Life Sciences and Astrophysics). Those thematic services are heterogeneous, address a wider range of requirements and have different maturity levels, targets and usage models. In the field of Earth Observation, the services deal with monitoring coastal changes and inundations, processing satellite image data and estimating forest mass. In the field of environment, they include stratospheric ozone monitoring and the protection and recovery of the ozone layer, the forecast of sand and dust storms, the simulation of water network distribution and untargeted mass-spectrometry analysis for toxics. In Astrophysics, the project will set up a European service for the Latin American Giant Observatory, and in Life Sciences, EOSC-SYNERGY covers both a platform for supporting community-led scientific benchmarking efforts and the processing of Cryo-electron microscopy imaging.
        These thematic services will be improved in terms of authentication and authorisation, resource management, job scheduling, data management and accounting. Not all services have identified gaps in all aspects, so each thematic service will focus on those that are relevant according to their bottlenecks.
        The thematic services have several technical similarities and differences. Common to all thematic services is the need for robust authentication and authorisation infrastructure compatible with those used by the users' institutions. The EGI Check-in has a widely accepted choice although services like ELIXIR AAI - soon to be upgraded to Life Sciences AAI - are also important assets. With respect to resource management, all services have an interest in providing processing resources dynamically. The Infrastructure Manager and the Elastic Compute Clusters in the Cloud have been identified by most of them as candidate technologies for this gap. Regarding job management, most services use batch queues, which could be extended to support containerised jobs. The use of Kubernetes to orchestrate microservices and containerised job queues are also being considered. The most challenging part is data management. Thematic services have identified issues in transferring and accessing large amounts of data requiring smart caching, advanced data transfer and persistent massive data storage.
        The thematic services expect a workload between 400 and 46.500 CPU hours per week (a cumulative 71K CPU hours per week), consumed by up to 10k jobs per week requiring a median of 16 GB RAM and 15 GB of storage per job. The persistent storage requirements range from 2 GB to 500 GB (a median of 100GB and a total of 1 PB).
        The thematic services have also defined a set of performance metrics grouped into five impact categories (users, service Capacity and Capability, Scientific Outreach, service usability and Cross-Fertilization). These metrics can provide quantitative indicators of the performance and improvement of the thematic services.
        Thematic services constitute a key activity to evaluate the impact of the capabilities in EOSC-SYNERGY with respect to adopting mature and scalable services, software and service quality assurance, increased resource capacity and improved user skills.

        Speaker: Ignacio Blanquer (UPVLC)
      • 66
        RECAS-BARI: new high-level services for eScience researchers

        In the last 5 years the RECAS-BARI datacenter has been offering to its users an increasing amount of compute and storage resources through the local batch system and the on-premise Openstack-based cloud infrastructure. Our users come from different scientific communities (HEP, bioinformatics, medical physics, etc.) and some local SMEs: they need to execute their workloads with different requirements and ask for different levels of technical support.
        In the last period, the emergence of ML techniques applied to diverse research areas has increased the need to access specialised hardware devices like GPUs and Infiniband.
        In parallel, containers are gaining traction among users as this lightweight virtualization technology dramatically simplifies the distribution and deployment of their software encapsulating the runtime dependencies in a single package.
        Since GPUs are not available yet in our cloud, users have been using our batch system (slurm-based) to access these specialized hw devices. They had to learn how to interact with the system and access the datacenter LAN from a bastion in order to submit their jobs.
        One of the most common wishes of our users is of course to access resources in a transparent and easy way. On the other side, our admins and support team desire to limit the manual operations and configurations needed to operate the cluster and support the users in their daily activities. To this purpose we have decided to install and manage some of the new compute nodes (GPU equipped) acquired with the IBISCO PON under a Mesos cluster and to adopt the INDIGO/DEEP solutions to provide high-level interfaces to the end users. We have developed a set of ansible roles for setting up the Mesos cluster with the needed configuration, including the support for the OpenID connect authentication and for exploiting GPUs. The Mesos cluster has been integrated with the RECAS-BARI PaaS Orchestration system in order to facilitate the user interaction: the Orchestrator and its dashboard hide the complexity of managing Mesos tasks offering a transparent access to almost all the functionalities provided by the cluster. Currently we are working on hardening the user isolation in an environment that is multi-tenant natively and addressing the security aspects related to the use of docker containers.

        Speaker: Marica Antonacci (INFN)
    • How to make your service more secure? Room: http://go.egi.eu/zoom4

      Room: http://go.egi.eu/zoom4

      Convener: Valeria Ardizzone (EGI.eu)
      • 67
        Making Identity Assurance and Authentication Strength Work for Federated Infrastructures

        In both higher Research and Education (R&E) as well as in research-/ e-infrastructures (in short: infrastructures), federated access and single sign-on by way of national federations (operated in most cases by NRENs) are used as a means to provide users access to a variety of services. Whereas in national federations institutional accounts (e.g. provided by a university) are typically used to access services, many infrastructures also accept other sources of identity: provided by ‘community identity providers’, social identity providers, or governmental IDs. Hence, the quality of a user identity, for example in regard to identity proofing, enrollment and authentication, may differ - which has an impact on the service providers risk perception and thus their authorization decision.

        In order to communicate qualitative information on both identity vetting and on the strength of the authentication tokens used between the identity providers and service providers, assurance information is used - with the strength being expressed by different Levels of Assurance (LoA) or ‘assurance profiles’ combining the various elements in community-specific ways. While in the commercial sector assurance frameworks such as NIST 800-63-3 or Kantara IAF have been established, these are often considered as too heavy with strict requirements, and not appropriate for the risks encountered in the R&E community. This is why in the R&E space a more lightweight solution is necessary.

        The REFEDS Assurance Suite comprises orthogonal components on identity assurance (the REFEDS Assurance Framework RAF) and authentication assurance (Single Factor Authentication profile, Multi Factor Authentication Profile) and provides profiles for low and high risk use cases. The Suite is applicable in many scenarios, like identity interfederations (cross-national collaborations) or for exchanging assurance information between identity providers and Infrastructure Proxies (according to AARC Blueprint Architecture). This presentation serves as a guidance on how the assurance values can be assessed and introduced into existing AAI scenarios.

        This 15 minutes talk starts with a short overview of existing assurance frameworks such as NIST 800-63 and Kantara and the standards introduced in the R&E sector. We will discuss their relationships and dependencies and how they relate to the management of risks. Following that, use cases of the REFEDS Assurance Suite will be presented to show how the REFEDS specifications can be used to exchange identity and authentication assurance in cross-collaborative scenarios. The focus of this talk lies in providing basic recommendations to facilitate the adoption of exchanging assurance information. The recommendations will provide information about both the identity side, i.e. based on employed processes, what can be said about the quality of the identity assurance, and on the services side, i.e. based on the provided services and use cases, what is the required or expected identity assurance.

        Speaker: Mrs Jule Anna Ziegler
      • 68
        Orpheus - Managing differences in the variety of OpenID Providers

        Background

        OpenID Connect is widely used in modern Authentication and Authorization
        Infrastructures including the infrastructures of multiple EU projects like
        the European Open Science Cloud and also EGI. Also in the non-academic
        world everyone moves to OpenID Connect (e.g. Google, Apple, IBM).

        Despite its wide adoption OpenID Connect is very complex.
        OpenID Connect is an identity layer on top of OAuth2; there is a core
        profile for OpenID Connect, but also additional profiles; there are
        extensions for OAuth2; there are several draft extensions for both OAuth2
        and OpenID Connect; all of these might be supported by OpenID Connect Providers, but
        also might not. And because OpenID Connect finds wide adoption there are
        naturally a lot of different providers, that all support different
        aspects. Some might even support certain features, but with small
        violations of the specification / draft. All of this makes it difficult if
        one has to deal with multiple providers.

        Orpheus

        Opheus is a web based tool for analysing and characterising OIDC Provider-
        and Relying Party implementations of OpenID Connect.

        For that Orpheus supports specifically features targeted at developers and
        operators of OpenID Connect based infrastructures:

        • Comparison of OIDC providers
        • Analysis of the supported features of OIDC providers
        • Live testing capabilities for testing the claimed features
        • No implementation effort
        • Live testing capabilities for many different flows
        • Authorization Code Flow
        • Device Code Flow
        • Refresh Flow
        • Token Revocation Flow
        • Debugging functionionality:
        • Perform a working OIDC flow.
        • All relevant data including all the communication between the parties is
          given.
        • User - Developer Interaction:
        • Help debugging OIDC related problems.
        • Multiple reasons for failed authorization:
          • misconfigured OIDC client
          • released attributes by the home identity provider
          • user's account missing attributes
          • ...
        • Hard to debug, because linked to account and real identity.
        • User can perform OIDC flow against orpheus and share all relevant data in a
          privacy compliant way with the developer.

        Orpheus focuses on a universal approach so large numbers of OpenID providers
        can be supported. It is easy to add new providers and to extend
        the list of comparable features.

        Future Work

        The current development focuses on making orpheus more modular so it can
        be used more easily for the different use cases. Also more features will
        be added in the future. An idea for a future extension is to provide a
        public API that gives information which features are supported by the
        different providers.

        Speaker: Dr Uros Stevanovic (KIT-G)
      • 69
        Attacking disjoined federation the old way

        Earlier this year, our communities have been the victims of attacks that managed to compromise systems in different sites and infrastructures. Without going into details, this presentation aims at explaining the mechanisms that were used to spread and at drawing the parallel with clouds and cloud federations: Does moving into the cloud change anything?

        Speaker: Vincent Brillault (CERN)
      • 70
        Avoiding Operational nightmares by adhering to basic security guidelines

        Every system administrator is affected by a security incident sooner or later. The timing of such events is invariably right on the spot: Your service is hit exactly at the moment when it is needed most urgently for your project.
        In this short presentation we will show examples of when this happened within our environment, give some guidelines on how to reduce the likelihood as well as the impact of security incidents, and offer best practices of how to prevent mishaps in the first place. The talk is aimed at administrators of systems or services, but the core lessons to be taken away from it are applicable and valuable for regular users and their personal computers as well.

        Speaker: Sven Gabriel (NIKHEF)
    • Open Science Policy in Europe: state of play and EGI contribution Room: http://go.egi.eu/zoom2

      Room: http://go.egi.eu/zoom2

      Over the last decade, the discourse on open science has grown considerably and has become a primary topic among the various actors involved in research production and dissemination. The EC has played a key role in mobilising the European research community towards defining policies and adapting more open practices. This session aims to present the state of play on Open Science policy in Europe based on the last report from the Open Science Policy Platform and to highlight the key contribution from the EGI Federation in implementing Open Science. The presentations will be followed by position statements and a Q&A session to reflect on the way forward.

      Following this session, there are two sessions on the topic of the Global Open Science Cloud.

      Convener: Eva Méndez (Universidad Carlos III de Madrid, Chair OSPP Mandate 2 )
      • 71
        Introduction to the EU-Open Science Policy Platform

        The presentation will provide an overview of the Open Science Policy Platform with a summary of the activities covered in the 2 mandates.

        Short bio: Eva Méndez holds a PhD in Library and Information Sciences (LIS) and is an expert in metadata. She defines herself in her Twitter profile as an ‘open knowledge militant’ (@evamen). She has been a lecturer at Universidad Carlos III de Madrid (UC3M), LIS department since 1997.She has been an active member of several international research teams, advisory boards and communities including: DCMI, OpenAire, Metadata2020, RDA, etc. In 2005-06 she was awarded a Fulbright Research Scholarship at the University of North Carolina at Chapel Hill (USA). She has taken part in and led several research projects and acted as advisor to many more in the fields related with standardisation, metadata, semantic web, open data, digital repositories and libraries, in addition to information policies for development in several countries. In 2015 she won the Young Researcher of Excellence award of her University. In November 2017 she was named “Open Data Champion” by SPARC Europe. She is currently Deputy Vice President for Scientific Policy-Open Science at UC3M and member of the EU-OSPP (European Open Science Policy Platform) on behalf of YERUN (Young European Research Universities Network). She is the OSPP chair for the 2nd mandate of the platform.

        Speaker: Eva Mendéz
      • 72
        EU-Open Science Policy Platform - From Recommendations to Practical implementation

        The EU Open Science Policy Platform (OSPP) is a High-Level Advisory Group established by the Directorate-General for Research and Innovation (RTD) of the European Commission (EC) in May 2016, comprising 26 expert representatives of the broad constituency of European science stakeholders. In May 2020, the OSPP published its final report gathering the work achieved by the OSPP during its two mandates, helping the EC to discuss and practically implementing Open Science in the European Research landscape. This report reviews the Practical Commitments for Implementation (PCIs) of Open Science practices made by each of the actors in the system and discusses potential blockers (and next steps) to progress. It then goes on to call on all European Member States and other relevant actors from the public and private sectors to help co-create, develop and maintain a ‘Research System based on shared knowledge’ by 2030. This presentation will provide a summary of the key points and issues addressed by the report.

        Short bio:
        Rebecca Lawrence is Managing Director of F1000 Research Ltd. She was responsible for the launch of F1000Research in 2013 and has subsequently led the initiative behind the launches of many funder- and institution-based publishing platforms that aim to provide a new trajectory in the way scientific findings and data are communicated.

        She was a member of the European Commission’s Open Science Policy Platform, chairing their work on next-generation indicators and their integrated advice: OSPP-REC, and Editor of their final report. She is also a member of the US National Academies (NASEM) Committee on Advanced and Automated Workflows. She has been co-Chair of many working groups on data and peer review, including for Research Data Alliance (RDA) and ORCID, and is an Advisory Board member for DORA (San Francisco Declaration on Research Assessment). She has worked in STM publishing for over 20 years, is an Associate of the Royal College of Music, and holds a PhD in Pharmacology.

        Speaker: Rebecca Lawrence (F1000 Research)
      • 73
        EGI contribution to Open Science

        After having contributing to develop the OSPP recommendations for open science, the EGI Federation has also took action to implement part of them (with "PCI" statements, practical commitments for implementation). This presentation will provide an update on the EGI PCIs and will prepare the floor to discuss what we can do next.

        Short bio:
        Sergio is Head of Strategy, Innovation and Communications of the EGI Foundation. In his role, Sergio contributes to strategic planning and execution, governance, and business models. Other responsibilities comprise contributing to developing project proposals to implement the EGI strategy or innovative ideas, leading activities in projects, authoring external communication messages as well as organising and participating in meetings, forums and conferences. Sergio was also a member of the EC Open Science Policy Platform. He holds an Executive Master in Management of Research Infrastructures (University of Milano-Bicocca), a PhD in Computer Science (University of Bologna) and a MSc in Computer Science Engineering (University of Pisa).

        Speaker: Sergio Andreozzi (EGI.eu)
      • 74
        Position statements

        Two representatives from the research communities will provide statements on:
        - what the EGI Federation has done so far for their communities to support open science
        - what the EGI Federation could do more moving forward

        The speakers are:
        - Sorina Pop, CNRS Research Engineer at CREATIS & BIOMED VO Manager representing the BIOMED Research Community
        - Isabel Campos, Researcher at the Spanish Research Council (CSIC), representing the IBERGRID Infrastructure

        Speakers: Isabel Campos (CSIC), Sorina POP (CNRS)
      • 75
        Q&A
    • Coffee break
    • Authentication-Authorisation solutions - Part 1 Room: http://go.egi.eu/zoom3

      Room: http://go.egi.eu/zoom3

      This session will include Authentication-Authorisation related presentations that were submitted to the conference and selected by the programme committee.

      Convener: antonella fresa (Promoter S.r.l.)
      • 76
        oidc-agent: Your OpenID Connect tokens on the command line

        Background

        OpenID Connect is widely used in modern Authentication and Authorization
        Infrastructures including the infrastructures of multiple EU projects like
        the European Open Science Cloud and also EGI. Due to their nature, OpenID
        Connect Access Tokens were not straightforward to use from the command
        line. They have a high character count and are short lived, so they
        cannot be learnt by heard like a password. Copying the access token from a
        web service whenever needed is clearly suboptimal in a command line based
        process. However, retrieving an access token on the command line without
        oidc-agent requires substantial effort that is both, time consuming and
        cannot be expected from the average user.

        Considering this insufficient
        usability from the command line, our goal was to overcome this by
        developing a tool that manages OpenID Connect tokens. It should allow a
        user to obtain access tokens on the command line as easy as possible, so
        it can be integrated in his workflow.

        OIDC-AGENT

        Oidc-agent is the swiss-army-knife tools for OpenID Connect in any non-web
        environment.

        The design of oidc-agent is oriented at the ssh-agent, providing the user
        a familiar way to handle OIDC tokens. Essentially oidc-agent supports several
        flows to obtain the Refresh Token, which it uses whenever an Access Token
        is required. All credentials are stored in encrypted ways (both on Disk and
        in RAM).

        In summary, oidc-agent supports a wide range of features:
        - Handle all communicate with OpenID Provider
        - Register OIDC client and initialize configuration
        - Store ecnrypted configurations
        - Provide Access Tokens to
        - command line usage (syntax allows easy integration)
        - other applications
        - Easy to use, hidden complexity
        - Libaries for various languages so other applications can directly obtain tokens from the agent:
        - C
        - Go
        - Python
        - Integrated with Xsession to autostart at startup and availabilty throughout a session
        - Agent forwarding to obtain tokens on remote computers
        - Tested to work with many OIDC providers
        - EGI-Checkin, IAM, B2Access, Keycloak, Human-Brain, ...
        - Support of restricted access tokens:
        - scope
        - audience
        - Privacy and security focused design.
        - Privilege separation
        - Strong cryptography
        - Memory obfuscation
        - Local application run by the user on his own machine
        - No data collection and calling home
        - Open source code under the MIT license.
        - Available for different platforms:
        - Debian/Ubuntu via PPA
        - Process started to include oidc-agent in the official debian package repository.
        - CentOS as prebuilt package
        - Gentoo
        - Fedora and EPEL is planned
        - MacOS via homebrew

        Speaker: Dr Marcus Hardt (Karlsruhe Institute of Technology)
      • 77
        Efficient AAI for research communities using the IdP/SP Proxy

        Lots of Authentication the Authorization Infrastructures (AAI) are adopting AARC blueprint architecture which relies heavily on the IdP/SP Proxy. This model was verified in the real deployments and surely there are no doubts about its technical feasibility. Main advantages like attributes harmonization, protocol translation or providing a single identifier regardless of authentication method are well known and used in most Proxies.
        Over the time of operating a proxy solution, we have realized the proxy concept offers much more. Therefore, we have tried to develop additional features on top of this standard set to either make the whole AAI more efficient or to improve end-user experience with the AAI workflows.

        Examples of such features are:

        “Automatic account validity extension”
        On every user access Proxy updates the “last access” timestamp of corresponding digital identity of the user in the backend IAM system. Users can have registered multiple identities, therefore the IAM system knows which digital identity is used by the user and can do automatic account validity extension based on that information.

        “Delegated authorization”
        Proprietary software or software which is not capable to do the authorization can delegate it to the Proxy. Proxy checks every access to the service and compares it with data stored in the IAM system to determine whether the user is allowed to access the service. If the user is denied Proxy shows information on how the user can get access.

        “Acceptable Usage Policy management”
        When the user accesses the service through the Proxy the check-in the backend IAM system is done to verify whether the user accepted the latest version of AUP. If not, then the user needs to accept the current version which is presented by Proxy.

        “Multi-factor authentication”
        Some services can be in need to have the user reliably authenticated. Proxy can request multi-factor authentication from upstream IdP or can provide it itself in case the upstream IdP is not able to. All the data about the authentication can be delivered to the service, which can decide if the used mechanism satisfies given needs.

        “Manually assigned affiliations”
        External sources often do not provide all the information service might use. For these purposes, Proxy is able to generate this additional information based on the data in the IAM system. Usually, there is a trusted user who is providing the missing data. For example a trusted representative of an organization can manually assign affiliation related to the organization to any user.

        “Identity provider enforcement requested by services”
        Sometimes, a service might require having an account in a specific organization or can be offered only to users coming from a specific identity provider. Proxy has a mechanism using which the service can enforce an user to use the specific identity provider for login.

        In our presentation, we will explain in detail how the features work and what is their added value. The use-cases for the individual feature demonstration will be taken from production environments of ELIXIR, BBMRI and CESNET AAIs, where the features are already deployed and used.

        Speaker: Slavek Licehammer (CESNET)
      • 78
        Developing a Trust and Security Framework for IRIS

        Driven by the physics communities supported by UKRI-STFC, the eInfrastructure for Research and Innovation for STFC, or IRIS, is a collaboration of STFC’s science activities, computing facilities, and its national computing centres at universities. The vision of IRIS is to develop a single federated national computing Infrastructure for STFC science. To enable this vision, IRIS requires clear rules of engagement. The IRIS Trust and Security Framework delivers a policy platform within which service providers can offer resources – and users can perform their work - in a safe and secure manner.

        The EU H2020-funded AARC projects, building on existing work for infrastructures including EGI, addressed the challenges involved in integrating identity services across different infrastructures, thereby allowing research communities to securely share data and resources. The result of this work hinged around the AARC Blueprint Architecture allowing federations of services and identity providers to connect via one or more proxies, such as the IRIS IAM discussed in a parallel abstract. In addition to AARC technical architecture documents and guidelines, a policy team created a set of template policies published as the AARC Policy Development Kit (PDK), which, following the completion of the AARC projects, will find a long term home under the Security for Collaborating Infrastructures (SCI) working group of the Wise Information Security for Collaborating e-Infrastructures (WISE) community. Building on existing practice, the PDK aims to assist in efficiently bootstrapping Research Infrastructures in the operation of an authentication and authorisation infrastructure in line with the AARC Blueprint Architecture, making them accessible to researchers in an easy and secure fashion.

        We will present the current status of work to bootstrap a trust framework of security policies for IRIS, based on the PDK, in consultation with the IRIS community. We will also discuss the future directions of this work, both in the context of IRIS and in the wider development of federated infrastructure security policy under the WISE community as part of a global collaboration.

        Speaker: David Crooks (STFC)
    • Clinic: Compute Services - EGI Cloud, ARC-CE

      This session will provide technical support for existing and new users of EGI compute services. During the session experts will share technical information, usage tips and tricks about the EGI Cloud and ARC CE technologies, and will answer questions from the audience. The session will be interactive - A perfect opportunity to bring questions, and to deep-dive into EGI Cloud and ARC-CE!

      The EGI Cloud offers a multi-cloud Infrastructure-as-a-Service federation that brings together research clouds into a scalable computing platform for data and compute intensive applications and platforms. The EGI Cloud is based on OpenStack, with various extra tools developed within the EGI Community.

      The ARC Compute Element (CE) is a Grid front-end on top of a conventional computing resource (e.g. a Linux cluster or a standalone workstation). ARC CE is used in EGI compute centres to offer High Throughput Compute services for compute intensive data processing applications.

      Main target audience
      Scientists, representatives of scientific communities, software and platform developers.

      Convener: Enol Fernandez (EGI.eu)
      • 79
        ARC-CE Room: http://go.egi.eu/zoom4

        Room: http://go.egi.eu/zoom4

        Speaker: Balazs Konya (EMI project)
      • 80
        EGI Cloud Room: http://go.egi.eu/zoom5

        Room: http://go.egi.eu/zoom5

        Speaker: Enol Fernandez (EGI.eu)
    • Data management solutions - Part 2 Room: http://go.egi.eu/zoom2

      Room: http://go.egi.eu/zoom2

      This session will include Data Management related presentations that were submitted to the conference and selected by the programme committee.

      Convener: Maria Girone (CERN)
      • 81
        Hubdrive: Enhancing HUBzero© for Offline Data Sharing

        HUBzero© is a framework for creating instances of virtual research environments and/or science gateways, so-called hubs under HUBzero©. The strategy behind HUBzero© started over two decades ago with nanoHUB filling the need for a framework enabling developers to integrate tools and simulations for nanotechnology easily in a web browser user interface. One of HUBzero©’s strategies since then is to extend the use of the framework for general use opening it up to a diversity of research domains and adding features for research and teaching in the framework. HUBzero@ applies cutting-edge and successful concepts and environments such as Jupyter, RStudio and shell environments in the user interface and give federated access to computing and data infrastructures.

        Following the strategy of supporting cutting-edge concepts and embedding them into the framework, HUBzero© is always looking to streamline, simplify and remove barries in doing collaborative research. To this end, HUBzero© is consistently evaluating new innovations as they appear on the technologic landscape.

        One active element both from a technological perspective, but also on both a cultural and community standpoint is innovations in peer-to-peer (p2p) technologies. P2p technology allows computers (including desktops, mobiles, and laptops) to communicate directly without and intermediate server. This is extremely interesting not only from a privacy standpoint, but it also allows and additional level of resilience and natural pruning (the removal of incorrect or no longer relevant information) over traditional client-server architecture.

        It has long been a requested feature by researchers using HUBzero© to simplify file and data sharing in a way that more naturally fits into a day-to-day workflow. HUBzero© recognized an opportunity to potentially fulfill this request via the leveraging of a decentralized data sharing network and protocol called “hypercore protocol” (formerly dat protocol). It allows files and folders to be accessible to any other peer with the proper read encryption keys, without having to upload the files or folders to a centralized server. HUBzero© can, of course, add additional resilience via “pinning services”, effecting leveraging its own infrastructure as a participating peer as well, but it is not strictly necessary.

        HUBzero© is working to integrate this functionality into its CMS system. HUBzero© projects are the means which researchers collaborate on the HUBzero© platform, providing an online location to gather and share resources, including datasets, images, PDFs, etc., to ultimately create publications with a title, abstract, authors and attached file assets content from the collaborative project. Ultimately submitting the publication via the publications component for an associated DOI (Digitial Object Identifier).

        Hubdrive is a downloadable executable application or app, and requires no formal technical expertise on behalf of the user. The Hubdrive p2p client application allows this collaboration to seamlessly fit into the researcher’s workflow without having to visit the online website to manually upload files through a web form, but rather just save, drag, and modify files on a local desktop folder.

        Speaker: Sandra Gesing (University of Notre Dame)
      • 82
        Integrated, heterogeneous data access in INFN-Cloud and beyond

        INFN-Cloud integrates an object storage service as its main data backend for end user applications as well as for internal use.

        The INFN-Cloud Object Storage Service is a geographically distributed OpenStack Swift instance, instantiated over the INFN-Cloud backbone, where data is replicated over two different data centers about 600km away from each other. In the current deployment different replica policies can be applied, depending on both the characteristics of specific sets of data and the requirements of their owners.

        High availability, resilience, ubiquitous and authenticated access, as well as ease of use and support for multiple technologies, are the highlights of the service described in this talk.

        The INFN-Cloud Object Storage service has been coupled with high-level tools and facilities by taking advantage of the OpenStack Swift and S3 APIs. Using this approach, Nextcloud, ownCloud, Minio, Duplicati, Rclone, AWS cli, S3fs and many other similar tools act as the contact points between the backend storage service and end user applications deployed on the INFN-Cloud infrastructure at the IaaS or PaaS levels. The variety of the supported storage products, each one with its distinct characteristics and different data access paradigms, allows to implement ad-hoc solutions aimed at satisfying requirements coming from different scientific communities.

        Some typical requests that INFN-Cloud is addressing within specific use cases are related to scientific data archival and distribution, remote and encrypted data backup, personal and shared data storage. Besides that, the INFN-Cloud Object Storage Service is also used internally for image and software repository and data backup for its own core services.

        The talk will provide details about the storage integration capabilities already implemented in INFN-Cloud, as well as future directions; in addition, some representative use cases dealing with Jupyter notebooks with persistent storage deployed on Kubernetes cluster, the integration of the entire data workflow of some physics experiments and the use of sync&share solutions for scientific data management will be described, with the goal of highlighting the role and impact of the INFN-Cloud Object Storage Service on the solutions provided to scientific user communities.

        Speaker: Stefano Stalio (INFN)
    • Global Open Science Cloud -- Part 1: GOSC, Concept and Landscape Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      The workshop page is at https://indico.egi.eu/event/5255/

      The digital revolution has transformed the way in which data, information and knowledge are acquired, managed, repurposed, analysed, used and disseminated. We are at the threshold of an era with unprecedented opportunities for cross-disciplinary and cross-border collaboration for research and innovation. A new research paradigm is emerging which applies increasingly automated approaches and Machine Learning, and which harnesses the most advanced computing facilities and software, to handle huge and disparate cross-disciplinary data. The advanced infrastructure needed for this new paradigm and Open Science is emerging: it needs to be on demand, as a service, ubiquitous and seamless. In pursuit of this vision, infrastructures are beginning to emerge at institutional, national and regional levels, such as the show cases in European Open Science Cloud from European Commission, the CSTCloud from Chinese Academy of Sciences, the ARDC e-infrastructure in Australia, the African Open Science Platform, etc.

      Is it possible to share experiences and make a global framework to align and federate such Open Science clouds and platforms? Is there a way to better support research collaborations across continents to resolve global science challenges, such as the UN Sustainable Development Goals (SDGs), climate change, infectious diseases and pandemics, COVID-19, coordinated and global disaster risk reduction, and so on? At the moment, a global, fully connected digital infrastructure is not in place, making it difficult for scientists to access digital resources across countries and continents.

      The idea of a Global Open Science Cloud (GOSC) was initiated during the CODATA 2019 Beijing conference. The mission of GOSC is to connect different international, national and regional open science clouds and platforms to create a global digital environment for borderless research and innovation. It aims to provide better ways to harness digital resources from around the world, help bridge the division in infrastructure, technique and capacity building among different countries, support global science collaborations and foster truly international science.

      There are many challenges and difficulties, i.e., inconsistent access policies from country to country; lack of common standards for building a global-level data and e-infrastructure federation; differences in language and culture; highly varied funding schemes, etc.

      The workshop will gather representatives of international initiatives, research communities and public digital infrastructure providers, to review the existing work in GOSC, and to develop consensus about an initial concept model, framework, and roadmap for GOSC. We will discuss the needs and typical use cases from research community representatives, examine available resources and possible contributions from international e-infrastructure providers, identify the key barriers in policy, governance, standard and technique, and identify possible funding opportunities.

      We welcome all GOSC stakeholders to join and contribute to the discussion. We invite attendance by:

      -- Research community and research infrastructure representatives with needs and experience supporting global collaborations;

      -- Digital infrastructure representatives open to participating in a global resource federation;

      -- Experts on standards and technology developing and operating solutions for federated access to data, computing, software and applications;

      -- Policy researchers and policy makers who can identify the key policy barriers and provide plausible solutions;

      -- Funders who have the vision and interests of investment in the implementation of GOSC.

      Convener: Simon Hodson (Executive Director CODATA; Vice Chair, UNESCO Open Science Advisory Committee)
      • 83
        The UNESCO Recommendation on Open Science
        Speaker: Ana Persic (Senior Programme Specialist, Chief of Section a.i.,UNESCO)
      • 84
        INTERNATIONAL SCIENCE COUNCIL OPEN SCIENCE IN THE COUNCIL’s 2019 - 2021 ACTION PLAN
        Speaker: Geoffrey Boulton (Governing Board, International Science Council)
      • 85
        The ISC CODATA Decadal Program: Making Data Work for Cross Domain Grand Challenges
        Speaker: Simon Hodson (Executive Director CODATA; Vice Chair, UNESCO Open Science Advisory Committee)
      • 86
        GOSC, landscape and vision
        Speakers: Jianhui Li (Director of CSTCloud department in CNIC, CAS), Hussein Sherief (AASCTC)
      • 87
        From EOSC out: sharing lessons and co-building a global open research commons

        This talk will reflect on progress made towards a European Open Science Cloud (EOSC) from an Executive Board perspective. Common challenges which need to be pursued in global fora will be explored to discuss how EOSC is looking out to global peers and seeking to co-build an interoperable set of services and data which facilitate collaboration across disciplinary and geographic boundaries.

        Speaker: Sarah Jones (UG)
      • 88
        Coordination of global activities on the development of Open Science platforms - the RDA Global Open Research Commons (GORC) IG

        The so called “Open Science Commons” or “Data commons” provide a shared virtual space or platform that provides a marketplace for data and services. Examples include the European Open Science Cloud, the Australian Research Data Commons, the African Open Science Platform, open government portals and initiatives outside traditional research contexts. Coordinating across these initiatives to enable a network of interoperable data commons is the goal. The Interest Group works to reach a shared understanding of what a “Commons” is in the research data space; what functionality, coverage and characteristics does such an initiative require and how can this be coordinated at a global level. Collaborations will be sought with parallel initiatives in other spaces, whether in national / regional contexts or in other fora such as the OECD, G7 Open Science Working Group, UN’s Expert Advisory Group on a Data Revolution for Sustainable Development, CODATA, GO-FAIR and others. Recognising the broad scope, the IG will focus initially on Data Commons and extend to Open Science Commons as work progresses

        Speaker: Corina Pascu (Co-Chair of RDA GORG IG, Policy Officer of European Commission)
      • 89
        Panel

        The discussion will be driven by the following questions to the panel and the audience:
        - What is GOSC, the concept, scope?
        - What are the main challenges?
        - What should be on the GOSC Landscape?
        - What should GOSC do and how should it go about it?

    • 12:30 PM
      Lunch break
    • 12:30 PM
      Lunch break
    • Lunch break
    • 12:45 PM
      Lunch break
    • Cloud computing - Part 1 Room: http://go.egi.eu/zoom4

      Room: http://go.egi.eu/zoom4

      This session will include Cloud Computing related presentations that were submitted to the conference and selected by the programme committee.

      Convener: Kostas Koumantaros (GRNET)
      • 90
        Managed Services for Accelerating Science in the 2020's - a SURF perspective

        In this talk we will present Spider, a managed service by SURF. Spider is a versatile high-throughput data-processing platform aimed at processing large structured data sets. It runs on top of our in-house elastic Cloud. In combination with superb network and hierarchical storage this allows for processing on Spider to scale from many terabytes to petabytes.

        In recent years Cloud technology has given IT users and providers near-limitless possibilities in terms of customizable and dedicated infrastructure for computing. However, in parallel researchers have faced increased publication pressure (publish or perish), explosive growth in data and a diversion of funding towards personal grants (rather than institutional/structural funding). This has decreased the effective time that researchers can afford to spend on designing IT solutions for their scientific problems and steadily widened the gap in IT knowledge between researchers/research-institutes and IT providers. Hence, although using Cloud technology to deploy tailored computing environments scales technically, this scalability also requires support, automation and fault tolerance which bring many new challenges that researchers cannot tackle and often are not interested in.

        A new balance has to be found to more fully support researchers with IT intensive problems. Accepting the above paradigm shift, this means that any effective solution, for sustainable data processing, has to go beyond the virtualization layer. We believe that this solution can be found in providing researchers with managed, persistent and where possible shared services. Such a solution would not only accelerate science but will also reduce its carbon footprint.

        In our model the researcher focuses on the the scientific algorithms and the IT provider is responsible for the infrastructure and the platforms built on top. These platforms are built on generic solutions and only where required are tailored to the needs of a particular user community. In this vision the infrastructure itself remains Cloud-native to preserve the proven strengths of this technology such as rapid deployment, robust adaptation and dynamic scaling.

        Managed services we believe pave the road towards unburdening researchers and allow users and providers to focus on their respective strengths and achieve increased synergy. Furthermore, armed with modern technology (e.g., containers, virtual environments, shared & local filesystems, role-based access, collaborative spaces, private nodes/partitions and secure networks) managed services can flourish and fulfill the requirements of a broad and diverse set of research communities.

        Spider combines these technologies in an effort to provide a low-threshold, managed data processing platform that appeals to a broad set of scientific disciplines. Here we discuss its technical setup, the possibilities for customization, its potential within a distributed computing federation and share some of the many current use-cases. The deployment and integration of managed services on the EGI infrastructure does not feature within the current EGI service model and through this talk we also aim to start a discussion on the need for including such services as part of this infrastructure and the European Open Science Cloud.

        Speaker: Raymond Oonk (SURFsara BV)
      • 91
        The INFN-Cloud PaaS Dashboard

        INFN-Cloud is a distributed, federated infrastructure built on top of heterogeneous and geographically distant cloud sites, offering compute and storage resources that are locally managed by frameworks like OpenStack, Apache Mesos and Kubernetes. The resources, spread across different administrative boundaries, are federated at the PaaS level through the AAI system, based on INDIGO-IAM, and the INDIGO PaaS Orchestrator. This ensures transparent, flexible and efficient access to the distributed resources.

        A rich collection of services can be self-instantiated by the INFN-Cloud end users, ranging from the provisioning of pure IaaS services (e.g. virtual machines and block or object storage) to the deployment of complex services and virtualized clusters using e.g. Kubernetes, Spark, Mesos or HTCondor. The deployment workflow is managed by the INDIGO-DataCloud Orchestrator that coordinates the selection of the best provider/site to allocate the needed resources, depending on the available SLAs, the monitoring metrics and the user requirements.

        The topology of the services to be instantiated is defined through TOSCA, the standard templating language used to describe services and applications deployed in cloud environments. TOSCA templates can be submitted using the Orchestrator REST API or via the command-line tool orchent. However, handling TOSCA templates is not a simple task, since you need to be familiar with the TOSCA language and know some technical details that most researchers and scientific community users are not necessarily interested in. To overcome this, the INFN-Cloud Dashboard provides a simple and user-friendly graphical web interface that allows users to 1) authenticate with INFN-Cloud, 2) select the service to deploy from a catalogue of pre-defined templates, 3) configure and customize the deployment through a simple form, 4) monitor and manage the deployments through dedicated menus and views, and finally 5) get notified as soon as the deployment is complete.

        When submitting a deployment request through the dashboard, a user can also decide to bypass the automatic scheduling mechanism implemented by the INFN-Cloud Orchestrator and send the request to a specific site, chosen from a drop-down list. Among the advanced features implemented in the dashboard, a notable feature is the integration with the Secrets Manager based on Hashicorp Vault, allowing to safely store user data such as ssh-keys and credentials.

        The talk will describe and show in practice the actual working of the INFN-Cloud Dashboard, highlighting its flexibility in particular with regard to the incorporation and customization of new services, and discussing its possible within INFN-Cloud and beyond.

        Speaker: Marica Antonacci (INFN)
      • 92
        Implementing Multi-Cloud Interoperability by Means of Jelastic PaaS

        Most organizations require to retain control over data and workloads placement by diversifying their choice of cloud vendors. Combining public cloud, private cloud and on-premise infrastructure can provide highly sophisticated and customized environments with an ideal mix of performance, cost and functionality. Multi-cloud adoption enables companies to ensure data sovereignty and meet the modern data protection rules such as GDPR, CCPA, LGDP, etc. It also improves high availability and disaster recovery across multiple data centers. At the same time, organizations require a governance layer to orchestrate their multi-cloud infrastructure via a single consolidated management panel for reducing complexity and improving cyber security. We’ll demonstrate real cases of multi-cloud interoperability implemented across hyperscalers like AWS, Azure, and domestic cloud infrastructure providers with the help of Jelastic Platform-as-a-Service.

        Speaker: Mr Ruslan Synytsky (Jelastic)
      • 93
        Data Challenges at the Square Kilometre Array (SKA)

        The upcoming observatory Square Kilometre Array (SKA) transports data products to SKA Regional Centers (SRC). Realizing a global SRC network is of utmost importance for radio astronomy and opens, beyond that, unique opportunities for developing generic infrastructure components that are of interest for other communities as well.

        The resolution power of sensors is increasing steadily resulting in larger and larger data volumes. For reasons of sustainability, everybody is getting sooner or later to the point that only a tiny fraction of the generated data can be stored in the long term.

        Experiments archive their data and analyze them over an over again. This traditional method resulted in unexpected discoveries. For example, "Fast Radio Bursts (FRB)" are high-luminous signals from rare cosmic events that were detected in 2007 by analyzing data taken a few years earlier in 2001. The raw data volumes at SKA are so large that only a small fraction can be stored in archives. The necessary strong data reduction has to be provided nearly in real-time. This very time constraint results in fundamental challenges.

        1. Data Irreversibility
          Experiments archive all data, in general, in order not to loose any information. The real-time constraint, however, limits the effectivity of this approach simply because there is not enough time to process workflows in full detail. Missing information cannot be recovered later on whereby an "arrow of time" is introduced, an essential characteristic of irreversibility. To reduce irreversibility effects, feedbacks should be integrated into the global workflow. Firstly, the outcome of a "fast analysis" of online data could be used to optimize the control of sensors.

        2. Dynamic Archives
          Archived datasets have to be characterized by quality measures that are adjusted regularly based on simulations. The outcome of this "slow analysis" could be used for steering the sensors via a further feedback. In other words, archives will no longer be static but dynamic entities. Accordingly, metadata schemes should be extendable dynamically to keep up with increasing knowledge.

        3. Data Monster
          The great number of antennas at SKA provide images from the cosmos of unprecedented resolution. Single images may be as large as one Petabyte. Analyzing objects of such size requires a shift in paradigm: from currently processor-centric to memory-based computing architectures. It should, however, be noted that further efforts are needed. Speedup in parallel computing relies on the Divide&Conquer principle which, in turn, is based on the assumption that the problem class of a split dataset is equivalent to the original dataset. Medical image processing indicates that each Divide&Conquer-step may need a careful justification.

        In the presentation, the impact of the three challenges on future data infrastructures is elucidated, in the general as well as on the global SRC network of SKA, based on discussions within the German SKA community, which is organized by the "Association of data-intensive Radio Astronomy (VdR)". The connection to related work is clarified, e.g. to the concept of "data lakes" in high-energy physics, and to the outcome of the Big Data and Exascale Computing (BDEC) project.

        Speaker: Prof. Hermann Heßling (Verein für datenintensive Radioastronomie (VdR), and University of Applied Sciences (HTW) Berlin)
      • 94
        Policy Management on Cloud Environments. An introduction to PolicyCLOUD

        PolicyCloud project will improve policy making for public administrations across Europe by harnessing the potential of digitisation, big data and cloud technologies to support transparent, democratic, and evidence-based decision making around the creation and implementation of social and economic policy.

        Speaker: Ricard Munne Caldes (ATOS)
    • Data transfer workshop - Part 1 Room: http://go.egi.eu/zoom2

      Room: http://go.egi.eu/zoom2

      Description
      EGI recently launched a ‘Data Transfer Working Group’ to drive the technical evolution of Data Transfer services in the context of the EGI federation.
      This workshop is organised by this working group with the aim to engage with scientific communities and technology/services providers, to present and to discuss use cases, user & operational requirements, state-of-the-art solutions. The input gathered during this double-session will be used by the working group to define and to run technology pilots, new services and test cases.

      The workshop is relevant for scientific users, scientific communities who need to transfer large amounts of data among institutes at national, international or inter-continental scale. The session is also a good opportunity for developers and operators of data transfer services to collect requirements and propose solutions for EGI providers and user communities.

      Main target audience
      Scientists, representatives of scientific communities, data providers, compute/data centre operators.

      Convener: Andrea Manzi
      • 95
        Intro to Data Transfer WG session
        Speaker: Andrea Manzi
      • 96
        HIFIS transfer service: FTS for everyone
        Speaker: Mr Tim Wetzel (DESY)
      • 97
        PaNOSC Data Transfer Use cases
        Speaker: Jean-François Perrin (Institut Laue-Langevin)
      • 98
        EMBL-EBI Data Transfer Use cases
        Speaker: Andrea Cristofori (EMBL-EBI)
      • 99
        The BioCommons Platform
        Speaker: Guido Aben (AARNET)
      • 100
        RapidXfer - Proposed Data Transfer Framework for Square Kilometre Array

        Square Kilometre Array will be the largest radio telescope, which comes with its huge data challenges [1]. SKA’s host sites are in South Africa and Australia[2]. Each of the host sites is estimated to produce data at different rates. Very. high-performance central supercomputers (one in South Africa and another in Australia) process the extremely voluminous data produced by the SKA. The initial data products [3] that are generated by the SKA’s Science Data Processors (SDP) are not suitable for immediate imaging. The Data delivery architecture [4], facilitates the transfer of data SKA-SA (SKA South Africa) CHPC (Centre for High-Performance Computing) to the IDIA [5] Regional Science Data centres using the dedicated Globus endpoint. The partially preprocessed data are sent to SKA Regional Centres around the world for further processing. SKA Regional Centres play a key role in the transfer of data from SKA’s sites to CERN’s Tier 1 sites and further to other Tier 2 sites. SRC forms an intrinsic part of SKA operations [6], it’s model is still at its infancy. The Rucio[7] provides a generic scalable approach to transfer data for high-energy physics experiments and it is still being evaluated for SKA. We propose our Rapid Data transfer framework "RapidXfer", which is a solution that we are currently using to transfer from MeerKAT IDIA to DiRAC's Logical File Catalogue (LFC) for further processing. The framework "RapidXfer" makes use of Globus online transfer through a dedicated Globus endpoint. "Grid File Access Library" shortly called “gfal” is used to transfer from high memory machines to Physical storage where each file is given a “Physical File Name”, then it is registered in DiRAC’s “Logical File Catalogue” [8]. This DiRAC’s register helps to make as many replicas as we need depending on the preferable Storage Element. The "RapidXfer" framework reduced the time for data transfer from South Africa’s SDP to the IRIS machine [9] to half compared to traditional SCP transfer and the direct file transfer to LFC from IDIA (uses DiRAC’s “dirac-dms-add-file” feature). Our future work focuses on transferring different-sized MeerKAT datasets to evaluate its efficiency and scalability.

        Speaker: Dr Priyaa Thavasimani (The University of Manchester)
    • EGI Core Services Roadmap - Part 1

      The EGI Core Services is a set of central services that supports the operation of the distributed end-user services of the EGI federation. The Core services act as ‘the glue’ that keeps the data and compute centres together, and make them manageable and usable as integrated sites.

      The Core services include:
      - Accounting service - EGI Accounting stores user accounting records from various services offered by EGI, such as Cloud, HTC and storage usage.
      - Configuration database - supports management of the configuration information of federated e-infrastructure assets and their functional relations.
      - Helpdesk - Single point of contact to ask for support for any service or location across the EGI Federation.
      - Messaging service - A backend service which enables the scalable transfer of messages between arbitrary service components
      - Monitoring service - Service to enable monitoring of the performance of services across the EGI Federation.
      - Operations Portal - Central portal for operations management of the EGI federated infrastructure providing a comprehensive array of management and communication tools

      The plans for extending the capabilities of these services will be presented and discussed in this double-session. At the end of each talk there will be the opportunity to collect further requirements and to discuss prioritisations of the planned activities. The session is relevant to NGI managers and operators, resource centre administrators, scientific communities and users who want to learn about the EGI Core services, who want to provide their feedback on the aforementioned services, and want to influence their development plans.

      Convener: Matthew Viljoen (EGI.eu)
    • Global Open Science Cloud -- Part 2: Research Community and Co-Design Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      The workshop page is at: https://indico.egi.eu/event/5255/

      The digital revolution has transformed the way in which data, information and knowledge are acquired, managed, repurposed, analysed, used and disseminated. We are at the threshold of an era with unprecedented opportunities for cross-disciplinary and cross-border collaboration for research and innovation. A new research paradigm is emerging which applies increasingly automated approaches and Machine Learning, and which harnesses the most advanced computing facilities and software, to handle huge and disparate cross-disciplinary data. The advanced infrastructure needed for this new paradigm and Open Science is emerging: it needs to be on demand, as a service, ubiquitous and seamless. In pursuit of this vision, infrastructures are beginning to emerge at institutional, national and regional levels, such as the show cases in European Open Science Cloud from European Commission, the CSTCloud from Chinese Academy of Sciences, the ARDC e-infrastructure in Australia, the African Open Science Platform, etc.

      Is it possible to share experiences and make a global framework to align and federate such Open Science clouds and platforms? Is there a way to better support research collaborations across continents to resolve global science challenges, such as the UN Sustainable Development Goals (SDGs), climate change, infectious diseases and pandemics, COVID-19, coordinated and global disaster risk reduction, and so on? At the moment, a global, fully connected digital infrastructure is not in place, making it difficult for scientists to access digital resources across countries and continents.

      The idea of a Global Open Science Cloud (GOSC) was initiated during the CODATA 2019 Beijing conference. The mission of GOSC is to connect different international, national and regional open science clouds and platforms to create a global digital environment for borderless research and innovation. It aims to provide better ways to harness digital resources from around the world, help bridge the division in infrastructure, technique and capacity building among different countries, support global science collaborations and foster truly international science.

      There are many challenges and difficulties, i.e., inconsistent access policies from country to country; lack of common standards for building a global-level data and e-infrastructure federation; differences in language and culture; highly varied funding schemes, etc.

      The workshop will gather representatives of international initiatives, research communities and public digital infrastructure providers, to review the existing work in GOSC, and to develop consensus about an initial concept model, framework, and roadmap for GOSC. We will discuss the needs and typical use cases from research community representatives, examine available resources and possible contributions from international e-infrastructure providers, identify the key barriers in policy, governance, standard and technique, and identify possible funding opportunities.

      We welcome all GOSC stakeholders to join and contribute to the discussion. We invite attendance by:

      -- Research community and research infrastructure representatives with needs and experience supporting global collaborations;

      -- Digital infrastructure representatives open to participating in a global resource federation;

      -- Experts on standards and technology developing and operating solutions for federated access to data, computing, software and applications;

      -- Policy researchers and policy makers who can identify the key policy barriers and provide plausible solutions;

      -- Funders who have the vision and interests of investment in the implementation of GOSC.

      Convener: Jianhui Li (Director of CSTCloud department in CNIC, CAS)
      • 106
        Overview by the Chair
      • 107
        Global Science Collaboration in the Photon and Neutron Community
        Speaker: Rudolf Dimper (IT Advisor to the European Synchrotron Radiation Facility (ESRF) Directorate)
      • 108
        Global EISCAT
        Speaker: Ingemar Haggstrom (EISCAT)
      • 109
        Potential collaboration and challenge between SYISR and EISCAT

        EISCAT is a multiple Incoherent Scatter Radar (ISR) system in Europe and has played a significant role in space physics community in the past decades. Recently, they are planning to update the system and named EISCAT-3D. Sanya ISR (SYISR) is an ISR under developing over low latitude China and almost completed. At the same time, we are turning to the development of SYIRS phase 2, which will double SYISR and build two other remote receivers. Both EISCAT-3D and SYISR Tristatic System use phased array antenna and will generate huge amount of scientific data. Scientifically, these two ISRs will complement each other due to the geographic location difference. In the talk, I will generally describe both radar system, the potential collaboration in the future, and also the main challenges.

        Speaker: Xinan YUE (Institute of Geology and Geophysics, CAS)
      • 110
        Regional Collaborations on Disaster Mitigation
        Speaker: Eric Yen (AS)
      • 111
        Virtual Observatory and Science Platforms in Astronomy

        The Virtual Observatory (VO) aims to provide a research environment that will open up new possibilities for scientific research based on data discovery, efficient data access, and interoperability. In the talk, basic concept of the VO and current status of the IVOA will be introduced. As whole life-cycle data management science platform, several examples will be given including NADC (China), NOAO Data Lab (US), CANFAR
        (Canada) and SciServer (US).

        Speaker: Chenzhou CUI (The PI for Chinese Virtual Observatory (China-VO) and the chair for International Virtual Observatory Alliance (IVOA), the deputy director of National Astronomy Data Center in China (NADC))
      • 112
        Big Data Analytics needs for the Earth Observation Science Community

        Earth Observation (EO) data from open access sensors, such as those from the European Copernicus Sentinel fleet, are streaming in at rates of multiple Terrabytes per day. Comprehensive processing of these data streams, their analysis and integration into scientific maritime and land
        disciplines requires adoption of Big Data Analytics. EO use cases cover a wide range of data processing patterns across varying access profiles and have long term data curation requirements. Effective uptake relies on Petabyte-scale storage solutions coupled with massive parallel processing
        and access to efficient, state-of-the-art geospatial data analysis routines. Open cloud solutions are expected to make major contributions in providing consistent long term storage, facilitate rapid on-demand data staging and marshalling advanced compute resources to apply open source algorithms. However, uptake of cloud solutions in science also requires efforts in education in order to apply scalable and reproducible science methods in disciplines that are beyond the “space data” domain.

        Speaker: Guido Lemoine (European Commission, Joint Research Centre)
      • 113
        Supporting Global Open Science with Collaboration in Geoinformatics
        Speaker: Kerstin Lehnert (Doherty Senior Research Scientist at the Lamont-Doherty Earth Observatory of Columbia University, Director of the NSF-funded data facility IEDA (Interdisciplinary Earth Data Alliance))
      • 114
        Panel

        Focused questions:
        -- Who are interested and what are the benefits from GOSC?
        -- What are the community's needs for GOSC?
        -- What are the main functions for the GOSC? What are the big challenges for GOSC from your own research community and your experiences?

    • 2:30 PM
      Coffee break URLs assigned per session

      URLs assigned per session

      Zoom

    • 2:30 PM
      Coffee break URLs assigned per session

      URLs assigned per session

      Zoom

    • 2:35 PM
      Coffee break URLs assigned per session

      URLs assigned per session

      Zoom

    • Demos 4 Room: http://go.egi.eu/zoom4

      Room: http://go.egi.eu/zoom4

      Some things are best demonstrated, especially when it comes to technical services. Just before the coffee break, we offer a 30 minutes slot for submitted demos to show these services, outputs, or any other activity that is relevant to the conference's theme.

      • 115
        Using the Advanced dCache API (ADA) tool for Big Data processing

        This demo will go through ADA (Advanced dCache API), a tool for interacting with dCache which is a powerful data storage platform tailored to data intensive applications, offered by SURF. dCache is optimised for processing huge datasets from many terabytes to petabytes, examples of datasets this large often include instrument data from sensors, DNA sequencers, telescopes, and satellites.

        Several communities from various scientific domains are using the SURF dCache service to achieve high throughput data analysis. Our SURF dCache service is used, among others, by CERN and LIGO-VIRGO experiments in High energy physics domain, the Lofar radio telescope community in Astronomy and ProjectMine ALS research in Life Sciences. Lately we notice an increasing demand from Earth Observation projects using the SURF dCache service, for example Tropomi S5P and other projects dealing with Sentinel missions data in the Copernicus program.

        The growing demand for our SURF dCache service has increased the need to simplify the access and data transfer methods with the dCache storage while enabling easy and secure ways to collaborate on the data. As a result, SURF developed a new tool to enable users access dCache from anywhere. Our new tool is called ADA (Advanced dCache API) and it is based on the dCache API and webdav.

        Inspired by the first computer programmer 'Ada Lovelace', our ADA tool enables users to access and process their data on dCache from any platform and with various authentication methods by using industry standard tools. For several years, dCache was mainly accessible by Grid storage clients and protocols (SRM, GridFTP, Xrootd) and using x509 certificate authentication, which was limiting usage to Grid computing experts and from Grid enabled platforms. ADA was developed to unload the burden of dealing with dependencies with the Grid infrastructure and offer a portable solution to explore the storage space.

        Although ADA supports various authentication methods (x509, LDAP, OpenIDconnect), this demo will cover our recommended authentication method, macaroons. Macaroons are tokens that can be used to give access to dCache data in a very granular way. This gives data managers autonomy to share their data in dCache with project members and external collaborators at local, national and international level. Finally, we will demonstrate the ADA event-driven features for triggering tasks automatically when data is uploaded or staged from tape to disk as an option for automating workflows in High Throughput Computing applications.

        Speaker: Natalie Danezi (SURF)
      • 116
        Hubdrive: Supporting Seamlessly Peer-to-Peer Data Sharing

        HUBzero© has been developed to support the whole research lifecycle for creating data, sharing data, running simulations and workflows and publication of the research results with DOIs (Digital Object Identifiers). The framework belongs to the family of virtual research environments and/or science gateways and instances are so-called hubs under HUBzero©. The HUBzero© team is conducting research into the viability of collaborative systems built on peer-to-peer (p2p) network infrastructure for the development and sharing of scientific research online. P2p is communication between client computers (desktops, mobiles, laptops) without the need of a server. Decentralized/p2p systems simultaneously offer both resilience and pruning (expiration of no longer relevant information) over traditional client-server architecture.

        This research is manifesting as a prototype client application: Hubdrive. Hubdrive is a standalone application with a user experience similar to network-based file and folder management like Google Drive, Dropbox, iCloud and others, with additional collaborative file sharing features. There is no size limit to file sharing in the p2p world, so files can be papers or documents, but also very large datasets.

        Hubdrive is novel over traditional file sharing solutions in that it is leveraging a decentralized data sharing network and protocol called “hypercore protocol” (formerly dat protocol4). Therefore, data is peer or user owned and managed and simply pinned on centralized servers to assist with availability offline and redundancy.

        The feature set most beneficial to the end user include:

        1. Offline managing of data and files, which syncs to collaborators and
          peer researchers when connectivity is re-established
        2. Redundancy via leveraging peer devices if centralized systems (e.g. HUBzero©)

        are unavailable
        3. Out-of-the-box version control

        The demonstration will show how this prototype seamlessly integrates with HUBzero©’s CMS making the user experience around collaboration on projects and publications (two components which have long since been included in the HUBzero© offering) as easy as managing a folder on one’s desktop. Simply drag, drop, move, rename as one would any normal file. The demo will present 3 features:

        1. Adding a folder and containing files and directory structure to
          Hubdrive
        2. How Hubdrive integrates with HUBzero© projects
        3. How the files in Hubdrive are available to be attached to HUBzero©
          project-based publications

        Attendees of the demonstration will learn how to create a folder on a user’s local computer, which is discoverable both by peer collaborators and the HUBzero© infrastructure. The folders are accessible from within the HUBzero© CMS. The HUBzero© CMS uses a project-based approach to organize collaboration, these folders exist in this online group. Within this project context, users can create publications and submissions which ultimately result in DOIs after adding authors, abstracts, a title, and selecting relevant attachments from the project in the form of files representing datasets, images, PDFs, etc.. The demonstration will show modifying that the connected folders’ content, automatically syncs to the project without the time consuming need to go to a website and manually upload files. Thus, Hubdrive enables to effectively and seamlessly for p2p data sharing.

        Speaker: James Bryan Graves (University of California, San Diego)
    • Demos 5 Room: http://go.egi.eu/zoom2

      Room: http://go.egi.eu/zoom2

      Some things are best demonstrated, especially when it comes to technical services. Just before the coffee break, we offer a 30 minutes slot for submitted demos to show these services, outputs, or any other activity that is relevant to the conference's theme.

      • 117
        JENNIFER2 Cloud demonstrator

        Cloud infrastructures enable physics experiments to expand their computing infrastructure following the paradigm of elastic computing or utilizing opportunistic resources. In this demonstrator, we summarize the techniques explored in the context of JENNIFER2, a project funded under the Horizon2020 program of the European Union as a Marie Slodowska Curie Action of the RISE program, under grant n.822070. In particular we focus on one of the on-going tasks of the 'Computing and common techniques' work package dedicated to the deployment of common tools for computing and data handling for the Belle II, T2K and Hyper-K experiments.
        One of the key points to setup the demonstrator is the selection of software components sufficiently flexible to deal with the demands of the different communities and able to optimize the usage of available computing resources.
        A full analysis of the computing infrastructure of the three experiments highlighted a set of common tools already in use, among them the workload management system of the DIRAC framework and CVMFS for software distribution. Taking advantage of this starting point, we focussed on the adoption of the virtual machine life cycle manager tool VCYCLE [1] as a building block for the constitution of the JENNIFER2 demonstrator.
        The designed setup provides a single centralized instance of VCYCLE which manages multiple clouds. For each endpoint we can define different Virtual Machine profiles configured with the appropriate OS and the needed contextualization environment to run Belle II, T2K or HK pilots. A set of parameters controls the upper limit on the number of concurrent instances per cloud and per machine type. Playing within these numbers, we can balance the usage of concurrent resources among the experiments, increasing or reducing the share in response to the demands.
        A pilot version of the system is currently integrated in the DIRAC infrastructure of Belle II and in the GridPP DIRAC service used by T2K and Hyper-K. Over this testbed we are running MC production jobs for the three experiments. The demonstrator is in expansion in terms of available endpoints and possible services, with the support of EGI Federated Cloud infrastructure and other providers.
        This experience is developing transversal synergy between European and Japanese collaborations on the topic of computing and is going to define an efficient and sustainable environment to manage cloud resources on which deploy other services and exploit additional technologies.
        [1] A. McNab et al 2015 J. Phys.: Conf. Ser.664 022031

        Speaker: Dr Silvio Pardi on behalf of the JENNIFER2 computing group
      • 118
        Deep Learning for everybody: The DEEP- Hybrid-DataCloud approach

        Deep Learning (DL) is nowadays at the forefront of Artificial Intelligence, shaping tools that are being used to achieve very high levels of accuracy in many different research fields. Worried about the learning curve to introduce Deep Learning in your research? Don’t be. The DEEP-HybridDataCloud project offers a framework for all users, including non-experts, enabling the transparent training, sharing and serving of Deep Learning models both locally or on hybrid cloud system.

        The DEEP as a Service (DEEPaaS) approach developed within the DEEP-HybridDataCloud project (https://deep-hybrid-datacloud.eu/) allows to train, serve and share a DL model in a user friendly way, lowing the entry barrier for non-experts. This demo shows how to deploy some of the DEEP-HybridDataCloud modules (https://marketplace.deep-hybrid-datac...) both in a Cloud environment and in HPC using different methods.

        Speakers: Alvaro Lopez Garcia (CSIC), Lara Lloret (IFCA CSIC)
    • Demos 6 Room: http://go.egi.eu/zoom3

      Room: http://go.egi.eu/zoom3

      Some things are best demonstrated, especially when it comes to technical services. Just before the coffee break, we offer a 30 minutes slot for submitted demos to show these services, outputs, or any other activity that is relevant to the conference's theme.

      • 119
        SAPS: Estimating the Evolution of Forest Masses and Crops using Cloud Resources

        SAPS is a service to estimate Evapotranspiration (ET) and other environmental data that can be applied, for example, on water management and the analysis of the evolution of forest masses and crops. SAPS allows the integration of Energy Balance algorithms (e.g. SEBAL and SEB) to compute the estimations, that are of special interest for researchers in Agriculture Engineering and Environment. These algorithms can be used to increase the knowledge on the impact of human and environmental actions on vegetations, leading better forest management and analysis of risks.

        SAPS uses containers on top of a cloud back-end to facilitate the deployment of customizable versions of energy balance algorithms that are broken in a three-stage pipeline: input data download, input preprocessing, and evapotranspiration estimation. SAPS comes with a number of implementations for these stages. In particular, it provides two different versions for the input download stage that use different data sources. The reference input download implementation uses multiple data providers. Landsat imagery is downloaded from the Google Earth Engine (GEE) platform. Meteorological information is provided by the National Centers for Environment Information, and elevation data is provided by the Consortium Spatial Information. All data is downloaded from mirror servers of these services managed by the Federal University of Campina Grande (UFCG). The alternative implementation works similarly to the reference implementation, but downloads Landsat imagery from the USGS service, instead of GEE.

        In the context of EOSC-Synergy, SAPS is being integrated with several services offered by EOSC. This will facilitate European scientists to exploit the evapotranspiration estimation services from remote sensing imagery. Currently, the service relies on the EOSC computing resources, dynamically managed by the EC3 tool. The demo will show SAPS in action, deployed on top of an elastic Kubernetes cluster over EOSC resources, whose horizontal elasticity will be automatically orchestrated by EC3 in response to changes in the workload submitted to the service.

        Speaker: Amanda Calatrava (UPVLC)
      • 120
        Reproducible Open Data analysis with Binder and DataHub

        In recent years, the vision of Open Science has emerged as a new paradigm for transparent, data-driven science capable of accelerating competitiveness and innovation. Notebooks can support Open Science as they are documents that allow to easily share concepts, ideas and working applications, capturing the full analytical methodology, connections to data and descriptive text to interpret those data. In this demo we showcase how EGI can support the execution of reproducible analysis using the EGI DataHub as an Open Data store and the EGI Notebooks as a platform for executing and reproducing Jupyter-based notebooks and how these can be easily shared in Zenodo, an open access repository for research publications, scientific data and other 'research objects’.

        Speakers: Andrea Manzi, Enol Fernandez (EGI.eu), Giuseppe La Rocca (EGI.eu)
    • Coffee break
    • Clinic: Compute Services - CVMFS, HTCondor-CE

      This session will provide technical support for existing and new users of EGI compute services. During the session experts will share technical information, usage tips and tricks about the HTCondor-CE and CVMFS technologies, and will answer questions from the audience. The session will be interactive - A perfect opportunity to bring questions, and to deep-dive into HTCondor-CE and CVMFS!

      HTCondor-CE is a special configuration of the HTCondor software designed to be a job gateway solution for computing grids (e.g in EGI or in the US-based Open Science Grid). It is configured to use a job router daemon to delegate jobs from the users to batch systems deployed in distributed compute centres.

      The CernVM File System (CVMFS) provides a scalable, reliable and low-maintenance software distribution service. CVMFS is implemented as a POSIX read-only file system in user space (a FUSE module), and it is used by various scientific communities in and beyond EGI to distribute software to distributed compute resources over large area networks.

      Main target audience
      Scientists, representatives of scientific communities, software and platform developers, scientific data providers.

      Convener: Catalin Condurache (EGI.eu)
    • Combined use of HPC, Cloud and HTC systems Room: http://go.egi.eu/zoom6

      Room: http://go.egi.eu/zoom6

      Description
      The combined usage of HPC, Cloud and HTC systems is a common requirement for multiple research disciplines. EGI recently started an HPC-integration working group that brings together user communities, HPC providers, and technology providers to assess and plan the technical integration among various types of EGI services and HPC systems. This session will provide an overview of integration experiences, integration possibilities and some of the use cases that motivate the work itself. The session will be interactive, and will collect further input from the audience into the technical integration plan for EGI. The session will be relevant for scientific communities that need to run workloads across heterogeneous compute systems; and for compute service providers who want to support such heterogeneous workloads.

      Main target audience
      User communities with HPC requirements, HPC providers.

      Convener: Enol Fernandez (EGI.eu)
      • 123
        Introduction to the session
      • 124
        Strengthening HPC Competences in Europe - the EuroCC and CASTIEL approach

        Currently 33 National Competence Centres (NCCs) for HPC (and associated technologies) are set up within the frame of the EuroCC project. This is accompanied by the CASTIEL activity to support this on a European Level by maximizing the synergies between the nations and thus boost the evolutions of the National Competence Centres (NCCs).

        Speaker: Bastian Koller
      • 125
        The transcontinuum cyberinfrastructure

        This presentation gives a short overview of the the transcontinuum concept that influences the current European roadmap design for HPC and HPDA future cyberinfrastructure.

        This concept captures the need to provide application developers an end-to-end abstraction of the infrastructures (HPC centers, Cloud, Fog, sensors) needed to combine HPC computations, data life cycle management, AI everywhere, IoT etc. in complex workflows.

        The content of this presentation is a synthesis of the EXDCI-2 European project (https://exdci.eu/about-exdci) roadmapping and international collaboration (https://www.exascale.org/bdec/) efforts.

        Speaker: Prof. François Bodin (EXDCI2 project)
      • 126
        INFN HPC Experience
        Speaker: Giacinto Donvito (INFN)
      • 127
        HPC AAI Integration
        Speakers: Isabel Campos (CSIC), Dr Marcus Hardt (Karlsruhe Institute of Technology)
      • 128
        Dynamic deployment of the Ophidia HPDA framework on HPC and Cloud environments

        This talk introduces the Ophidia High-Performance Data Analytics (HPDA) framework for scientific multi-dimensional data analysis, its multi-layered architecture, as well as the main challenges related to its deployment. In particular, it presents the solution developed for dynamic deployment of Ophidia over different infrastructures (i.e., HPC and Cloud clusters).
        Furthermore, the talk provides a brief overview of the integrated solution adopted in the frame of the EOSC-Hub project for the deployment of the ENES Climate Analytics Service (ECAS) on the EGI FedCloud, with the ECAS service using Ophidia data analytics features.

        Speaker: Mr Donatello Elia (Euro-Mediterranean Center on Climate Change (CMCC) Foundation and University of Salento)
      • 129
        Fusion experience
        Speaker: Shaun de Witt (UKAEA)
      • 130
        Discussion
    • Data transfer workshop - Part 2 Room: http://go.egi.eu/zoom2

      Room: http://go.egi.eu/zoom2

      Description
      EGI recently launched a ‘Data Transfer Working Group’ to drive the technical evolution of Data Transfer services in the context of the EGI federation.
      This workshop is organised by this working group with the aim to engage with scientific communities and technology/services providers, to present and to discuss use cases, user & operational requirements, state-of-the-art solutions. The input gathered during this double-session will be used by the working group to define and to run technology pilots, new services and test cases.

      The workshop is relevant for scientific users, scientific communities who need to transfer large amounts of data among institutes at national, international or inter-continental scale. The session is also a good opportunity for developers and operators of data transfer services to collect requirements and propose solutions for EGI providers and user communities.

      Main target audience
      Scientists, representatives of scientific communities, data providers, compute/data centre operators.

      Convener: Andrea Manzi
    • EGI Core Services Roadmap - Part 2 Room: http://go.egi.eu/zoom3

      Room: http://go.egi.eu/zoom3

      The EGI Core Services is a set of central services that supports the operation of the distributed end-user services of the EGI federation. The Core services act as ‘the glue’ that keeps the data and compute centres together, and make them manageable and usable as integrated sites.

      The Core services include:
      Accounting service - EGI Accounting stores user accounting records from various services offered by EGI, such as Cloud, HTC and storage usage.
      Configuration database - supports management of the configuration information of federated e-infrastructure assets and their functional relations.
      Helpdesk - Single point of contact to ask for support for any service or location across the EGI Federation.
      Messaging service - A backend service which enables the scalable transfer of messages between arbitrary service components
      Monitoring service - Service to enable monitoring of the performance of services across the EGI Federation.
      Operations Portal - Central portal for operations management of the EGI federated infrastructure providing a comprehensive array of management and communication tools

      The plans for extending the capabilities of these services will be presented and discussed in this double-session. At the end of each talk there will be the opportunity to collect further requirements and to discuss prioritisations of the planned activities. The session is relevant to NGI managers and operators, resource centre administrators, scientific communities and users who want to learn about the EGI Core services, who want to provide their feedback on the aforementioned services, and want to influence their development plans.

      Convener: Matthew Viljoen (EGI.eu)
    • Keynote: Diving into the Galaxy: an accessible and reproducible workbench with an European-wide distributed compute network Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      Björn Grüning will take us on a journey by introducing us to Galaxy, sharing his 10 years+ experiences and know-how in just 45 minutes. Best-practices, tools, workflow development, just a few topics that will be mentioned, not to forget how these contribute to transparent and reproducible research. Come and join us on Tuesday to learn more about the European Galaxy server and Galaxy-based data analytics.

      Conveners: Björn Grüning, Gergely Sipos (EGI.eu)
    • Keynote: The Destination Earth Initiative Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      The Destination Earth (DestinE) initiative underpins two major priorities of the European Commission: the European Green Deal and the Digital Transformation and will demonstrate the important role of digital technologies and digital infrastructures for the green ambitions of the EU.

      DestinE will build on Copernicus and our European Earth Observation capacity is an area where Europe is truly a global leader. It will use the wealth of available data to serve as the platform providing public authorities with evidence based policy and decision-making, expanding our capacity to understand and tackle environmental challenges; for example, DestinE will provide the ability to predict and manage environmental disasters and support Green Deal priority actions on climate change, biodiversity, deforestation, and many others.

      The distinguishing features of DestinE are:
      • Fusion of advanced modelling/simulation capabilities driven by (pre)exascale computing power with Earth observation data analytics (machine learning, “AI”), resulting in a number of thematically different digital twins of the Earth (natural disasters, climate change, environment, biodiversity…), and, ultimately, an integrated Digital Twin of the Earth.
      • The integration of numerous different types of data sources from Earth system data (land-cover, ocean, atmosphere) to socio-economic ones as well as those from new sources of information, such as IoT-devices and smart sensor technologies deployed on satellites, aircraft, drones, drones, and in farms, cities and cars.
      • The requirement to accompany each scenario prediction with a quality assessment label indicating the maturity of the underlying models/data and thus the usability of the integrated result for decision support purposes (building trust in “science for policy”).

      The resulting very high precision digital model of the Earth will enable users not only to understand better the interplay of climate, environment and human activities but to unlock the full potential of modelling and prediction capabilities to support trusted evidence-based decision-making in Europe and beyond. We will have at our disposal "the health monitor of the planet" to understand better how to predict socio-economic effects of climate change or natural emergencies with higher accuracy and transparency.

      By mid-to-end 2022, we strive to have in place an operational cloud platform with two first Digital Twins offering concrete services, further extendable with more models and data. The operational part will be realised from the Digital Europe Programme. In parallel, we will continue to support the linked continuous research efforts through the EU’s Horizon Europe programme and improve state-of-the-art towards more precise and comprehensive models/data.

      Convener: Christian Kirchsteiger (European Commission)
    • AI strategy in Europe Room: http://go.egi.eu/zoom2

      Room: http://go.egi.eu/zoom2

      In recent years Artificial intelligence (AI) has brought a technological revolution becoming a key driver of economic development worldwide. In order to reach enough scale and avoid the fragmentation of the single market, the adoption of a common European approach is required.

      The White Paper published in February 2020 presented the policy options to enable a trustworthy and secure development of AI in Europe, in full respect of the values and rights of EU citizens. In this document the European Commission puts forward an European approach to AI based on three pillars: (1) being ahead of technological developments and encouraging uptake by the public and private sectors; (2) being prepared for socio-economic changes brought about by AI; (3) ensuring an appropriate ethical and legal framework.

      This session will introduce the AI strategy of the European Commision and its funding priorities within the next Digital Europe Program and some of the current initiatives playing a relevant role in the AI Excellence ecosystem. Finally, we will have the opportunity to discuss the EGI AI strategy and how the EGI service offer can evolve with AI to better contribute to the Digital Transformation in Research.

      Main target audience
      Service providers, researchers, policy makers.

      Convener: Elisa Cauhe (EGI.eu)
      • 141
        AI landscape: The new Partnerships in AI, Data and Robotics in Europe

        The AI, Data and Robotics PPP is a candidate contractual Public Private Partnership under the Horizon Europe Programme. The Vision of the Partnership is to boost European competitiveness, societal wellbeing and environmental aspects to lead the world in researching, developing and deploying value-driven trustworthy AI, Data and Robotics based on European fundamental rights, principles and values. In this talk, I will give an overview of the Partnerships as well as cover the Strategic Research, Innovation and Deployment Agenda (SRIDA) that defines the vision, overall goals, main technical and non-technical priorities, investment areas and a research, innovation and deployment roadmap for this new European Public Private Partnership.

        Speaker: Sonja Zillner
      • 142
        The AI4EU Ecosystem, Barriers and Opportunities
        Speaker: Gabriel Gonzalez Castañe
      • 143
        EGI Strategy towards AI: How AI contributes to the Digital Transformation in Research.

        Development of data science (DS), machine learning (ML) and artificial intelligence (AI) is currently hot topic in the science and data is game changer. It is also a challenge to research infrastructures including e-infrastructures. As a modern federated e-infrastructure EGI Foundation will give its answer at the strategic level and also with services and projects based on the strategy.

        Speaker: Ville Tenhunen
      • 144
        Q&A
        Speaker: Elisa Cauhe
    • Coordination of infrastructure management Room: http://go.egi.eu/zoom4

      Room: http://go.egi.eu/zoom4

      The maintenance of computer systems is an essential activity to guarantee the availability of services and to respect the commitments made to users. It requires tools that are compatible with all middleware deployed by the resource providers. Previously, YAIM was used to simplify the configuration of grid middleware. However, it is no longer
      supported and cannot be used with other type of software like OpenStack or OpenNebula.

      In order to overcome this issue and fulfil production requirements, the teams in charge of system administration have developed a set of recipes for configuration management tools, such as Puppet or Ansible. These
      developments have been done separately and without coordination. The UMD team is using recipes for the verification step, updating existing ones and writing from scratch where nothing is available. Also, EGI.eu put
      effort in providing some guidelines on how to write Ansible roles for EGI.

      The main objective of this session is to be a place that allow for more thorough discussion about how to set up common tools and repositories to facilitate the sharing of configuration recipes for Grid and Cloud sites. After the presentation of configuration management tools, a round table will take place to exchange views and experience, as well as the coordination required to develop a common place to share recipes and documentations for these tools.

      Convener: Jerome Pansanel (CNRS)
    • Data services in EGI - Overview and use cases Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      Description
      This session gives an overview of the data services/technologies that EGI is offering and planning to offer in the near future. The EGI service portfolio provides a set of services for data storage, metadata handling and data transfer, which can be integrated with Research communities frameworks in order to implement efficient Data Management. During this 75’ session we will introduce the services and bring 2 selected research communities to report on their experience in using the services and the technologies on which they are built. The session is relevant to scientific communities who would like to understand how services from EGI can be relevant for the storage and management of data.

      Main target audience
      Scientists, representatives of scientific communities, service providers, data providers, scientific software/platform developers.

      Convener: Andrea Manzi
    • Earth Observation data, cloud services and analytics tools for climate and environment protection Room: http://go.egi.eu/zoom3

      Room: http://go.egi.eu/zoom3

      The session will present various initiatives and EU projects that are focusing on the provisioning of Earth Observation data in different sectors:
      - Research: the OpenEO project (a Federated Open Earth Observation Platform - https://openeo.org/) will demonstrate how ESA is investing on he realization of a federated open infrastructure (open source software, data, compute platform) building upon existing EO processing platforms.
      The sesssion will also feature a presentation of the recently approved project C-SCALE (Copernicus - eoSC AnaLytics Engine). The project aims to federate European EO infrastructure services, such as the Copernicus DIAS and others. The federation shall capitalise on the EOSC capacity and capabilities to support Copernicus research and operations with large and easily accessible European computing environments. That would allow the rapid scaling and sharing of EO data among a large community of users by increasing the service offering of the EOSC Portal. The project will deliver a blueprint, setting up an interaction model between service providers to facilitate interoperability between commercial (e.g. DIAS-es) and public cloud infrastructures.
      - Policy making: the NextGEOSS project has created a data hub that demonstrates the potential of new advances in Information and Communications Technology (ICT) to help develop and deploy new services requiring a wide variety of data sources and creating a solid foundation for capacity building through GEOSS community platforms.

      The session will conclude with a panel focusing on the discussion on how Destination Earth can take advantage of EGI and other EOSC capabilities, leveraging the national capacity they federate.

      Convener: Diego Scardaci (EGI.eu)
      • 154
        Unlock the potential of EO data and services with NextGEOSS

        The NextGEOSS has developed a datahub and a suite of user centric services to accelerate the development of EO services. This project is funded by the EU and is a contribution to GEO to support UN SDGs and is continuing its support through Next-EOS community activity. This session will introduce the NextGEOSS services and explain how you can maximize the benefits for your research or business activities.

        Speaker: Koushik Panda (DEIMOS)
      • 155
        openEO platform

        openEO Platform will build a new European platform based on EOSC, five DIASes and commercial clouds, VITO’s Mission Exploration platform and EODC, by using the unified openEO API to connect these platforms and make them usable with client software. In addition, the ESA data cube project and the national Austrian Data Cube (ACube) activity will be linked to the openEO Platform. The openEO Platform ensures federated data access, federated computing environments, flexible clients and powerful interfaces. Large-scale use cases from different application areas will demonstrate its feasibility and success.

        Speaker: Christian Briese (EODC)
      • 156
        C-SCALE project - Federated EO infrastructure services into EOSC

        The proposed C-SCALE (Copernicus - eoSC AnaLytics Engine) project aims to federate European EO infrastructure services, such as the Copernicus DIAS and others. The federation shall capitalise on the European Open Science Cloud’s (EOSC) capacity and capabilities to support Copernicus research and operations with large and easily accessible European computing environments. That would allow the rapid scaling and sharing of EO data among a large community of users by increasing the service offering of the EOSC Portal.

        Speaker: Diego Scardaci (EGI.eu)
      • 157
        Panel: Destination Earth - Architecture aspects and potential collaborations with EGI and EOSC

        Panelists:
        - Christian Kirchsteiger (EC DG-CNECT)
        - Albrecht Schmidt (ESA)
        - Bryan Lawrence (IS-ENES3)
        - Christian Briese (EODC)

    • Coffee break
    • AI and Machine Learning experiences Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      The impact of Artificial Intelligence (AI) and Machine Learning (ML) is shaping a new future where we will be able to create more competitive industries, more accuracy diagnosis and treatment on health, better understanding of the Earth for life sciences researchers or more efficient energy sources.

      Europe wants and needs to get the potential of AI, not only as a consumer but also as a lead generator of tools and developments. Europe has a good position in R&D with strong computing infrastructures essentials to support the AI technology. It holds a world-leading position in robotics for manufacturing in different fields and services (from automotive to healthcare, energy, water management, agriculture or financial services) and it has developed multiple innovative and well connected networks of start-ups. In addition, the wide range of datasets from public and private sectors will contribute to boost this data driven digital transformation.

      The session will present several AI and ML experiences in different fields related to the EGI communities. These experiences, combining research related and industry oriented, will give an overview of some of the applications of AI and how the computing infrastructures and data services support their execution.

      Main target audience
      Scientists, software developers, service providers.

      Convener: Elisa Cauhe (EGI.eu)
      • 158
        AI and ML in Astronomy

        The expected volume of data from the new generation of scientific facilities such as the Square Kilometre Array (SKA) has motivated the expanded use of semi-automatic and automatic machine learning algorithms for scientific discovery in astronomy. In this field, the robust and systematic use of machine learning faces a number of specific challenges including (1) a paucity of labelled data for training - paradoxically, although we have too much data, we don't have enough; (2) a clear understanding of the effect of biases introduced due to observational and intrinsic astrophysical selection effects in the training data, and (3) the quantitative statistical representation of outcomes from decisive AI applications that can be used in scientific analysis. I will discuss a range of AI applications currently in use and under development in astronomy, highlighting the practical aspects of these applications from a computational perspective and looking to the future.

        Speaker: Anna Scaife
      • 159
        AI and ML in Health: Large resource requirements for medical image analysis

        This presentation will show results of two EU projects, notably PROCESS and ExaMode, where we work on large scale medical image analysis. The projects work on histopathology images that are increasingly becoming digital. These very large images (~100,000x100,000 pixels at 40x magnification) are produced in large quantities and require massive computational power for the development of machine learning algorithms. Particularly deep convolutional neural networks require GPUs for computation and have many memory limitations (GPUs have generally less than 32 GB of RAM). Another challenge is bandwidth for distributing the large images when developing scalable solutions.

        Speaker: Henning Mueller
      • 160
        AI and ML in Manufacturing: The DIGITbrain project - use cases and challenges on ICT infrastructures

        The DIGITbrain project aims to enable customised industrial products and to facilitate cost-effective distributed and localised production for manufacturing SMEs, by means of leveraging edge-, cloud- and HPC-based modelling, simulation, optimisation, analytics, and machine learning tools and by means of augmenting the concept of digital twin with a memorising capacity towards recording the provenance and boosting the cognition of the industrial product over its full lifecycle, and em­powering the network of DIHs to implement the smart business model “Manufacturing as a Service”.

        As can be seen from the project description, DIGITbrain wants to span edge computing to high-performance computing in order to more efficiently create and utilize digital twins for rendering manufacturing more agile. Part of DIGITbrain are use cases, so-called application experiments. The talk will introduce the project’s motivation, goals and planned architecture, exemplify some of the use cases and hint at challenges towards ICT architectures in this setting.

        Speaker: Andre Stork
      • 161
        Q&A
        Speaker: Elisa Cauhe
    • Clinic: Data services : DPM, Dynafed and Onedata

      This session will provide technical support for existing and new users of EGI data services. During the session experts will share technical information, usage tips and tricks about DPM, Dynafed and Onedata and will answer questions from the audience. The session will be interactive - A perfect opportunity to bring questions, and to deep-dive into DPM, Dynafed and Onedata!

      The Disk Pool Manager (DPM) is a distributed storage system offering multi-protocol access to a scalable data-store with support for the authentication/communication protocols used in the "grid standard" X509/VOMS and recently extended to OpenID Connect. It’s the most common storage solution in the EGI infrastructure.

      The Dynamic Federations system (Dynafed) allows to expose via HTTP and WebDAV a very fast dynamic name space, built on the fly by merging and caching (in memory) metadata items taken from a number of (remote) endpoints. It natively supports HTTP, WebDAV, S3 and MS Azure.

      Onedata is a high-performance data management solution that offers unified data access across globally distributed environments and multiple types of underlying storage, allowing users to share, collaborate and perform computations on the stored data easily. Onedata is the technology behind the EGI DataHub service.

      Main target audience
      Representatives of scientific communities, software and platform developers, scientific data providers and site administrators.

      Conveners: Andrea Manzi, Catalin Condurache (EGI.eu)
      • 162
        DPM and Dynafed clinic http://go.egi.eu/zoom5 (Zoom)

        http://go.egi.eu/zoom5

        Zoom

        http://go.egi.eu/zoom5

        Status of the projects, new features and roadmap. Discussion of topics brought by the audience

        Speaker: Fabrizio Furano (CERN)
      • 163
        Onedata Clinic http://go.egi.eu/zoom2 (Zoom)

        http://go.egi.eu/zoom2

        Zoom

        http://go.egi.eu/zoom2 (Zoom)

        The session will comprise an initial overview about the project status and roadmap, and discussion on new features like QoS and Harvester.
        The rest of the session will include a discussion about topics raised by the audience

        Speaker: Lukasz Dutka (CYFRONET)
    • Global Open Science Cloud -- Part 3: Global e-Infrastructures: Challenges and Opportunities in Achieving the GOSC Vision Room: http://go.egi.eu/zoom3

      Room: http://go.egi.eu/zoom3

      The workshop page is at: https://indico.egi.eu/event/5255/

      The digital revolution has transformed the way in which data, information and knowledge are acquired, managed, repurposed, analysed, used and disseminated. We are at the threshold of an era with unprecedented opportunities for cross-disciplinary and cross-border collaboration for research and innovation. A new research paradigm is emerging which applies increasingly automated approaches and Machine Learning, and which harnesses the most advanced computing facilities and software, to handle huge and disparate cross-disciplinary data. The advanced infrastructure needed for this new paradigm and Open Science is emerging: it needs to be on demand, as a service, ubiquitous and seamless. In pursuit of this vision, infrastructures are beginning to emerge at institutional, national and regional levels, such as the show cases in European Open Science Cloud from European Commission, the CSTCloud from Chinese Academy of Sciences, the ARDC e-infrastructure in Australia, the African Open Science Platform, etc.

      Is it possible to share experiences and make a global framework to align and federate such Open Science clouds and platforms? Is there a way to better support research collaborations across continents to resolve global science challenges, such as the UN Sustainable Development Goals (SDGs), climate change, infectious diseases and pandemics, COVID-19, coordinated and global disaster risk reduction, and so on? At the moment, a global, fully connected digital infrastructure is not in place, making it difficult for scientists to access digital resources across countries and continents.

      The idea of a Global Open Science Cloud (GOSC) was initiated during the CODATA 2019 Beijing conference. The mission of GOSC is to connect different international, national and regional open science clouds and platforms to create a global digital environment for borderless research and innovation. It aims to provide better ways to harness digital resources from around the world, help bridge the division in infrastructure, technique and capacity building among different countries, support global science collaborations and foster truly international science.

      There are many challenges and difficulties, i.e., inconsistent access policies from country to country; lack of common standards for building a global-level data and e-infrastructure federation; differences in language and culture; highly varied funding schemes, etc.

      The workshop will gather representatives of international initiatives, research communities and public digital infrastructure providers, to review the existing work in GOSC, and to develop consensus about an initial concept model, framework, and roadmap for GOSC. We will discuss the needs and typical use cases from research community representatives, examine available resources and possible contributions from international e-infrastructure providers, identify the key barriers in policy, governance, standard and technique, and identify possible funding opportunities.

      We welcome all GOSC stakeholders to join and contribute to the discussion. We invite attendance by:
      -- Research community and research infrastructure representatives with needs and experience supporting global collaborations;
      -- Digital infrastructure representatives open to participating in a global resource federation;
      -- Experts on standards and technology developing and operating solutions for federated access to data, computing, software and applications;
      -- Policy researchers and policy makers who can identify the key policy barriers and provide plausible solutions;
      -- Funders who have the vision and interests of investment in the implementation of GOSC.

      The full workshop agenda is at https://indico.egi.eu/event/5255/

      Convener: Mark Dietrich (EGI.eu)
      • 164
        Overview by the Chair
      • 165
        African Infrastructure for GOSC: Challenges and Opportunities (Gold and Diamonds)
        Speaker: Happy Sithole (Center Manager, National Integrated Cyber-Infrastructure at CSIR-NICIS, South Africa)
      • 166
        The ARDC’s Nectar Research Cloud: Challenges and Opportunities for the GOSC
        Speaker: Rosie Hicks (CEO, Australian Research Data Commons (ARDC))
      • 167
        Research e-infrastructure federation in China
        Speaker: Lili Zhang ((International Project Manager, CSTCloud Department, CNIC, CAS)
      • 168
        Open Science in the Context of the Globalizing World

        Digital transformation is stimulating research towards a more collaborative, global and open ecosystem, shifting the new paradigm of science towards openness, participation, transparency, and social impact. Even though this shift has started a few years already, it is still unclear how we can take and sustain it at the global level as we are missing consensus on essential elements of the ecosystem, and specifically on the connecting elements. We are asking researchers to share, but this will only happen if we develop the right environment for them with incentives and services. This presentation focuses on how OpenAIRE is building bridges within Europe and with regional infrastructures around the world to bridge scholarly communication initiatives, by sharing and putting forward best practices on policy and services, by assisting communities to develop with open science in their core, by enabling all actors across the research spectrum to commit to local infrastructure, and by putting the connecting elements for a global effect.

        Speaker: Natalia Manola (Managing Director, OpenAIRE; Member of EOSC Executive Board)
      • 169
        EGI experience in supporting international scientific collaborations

        In the past 15 years the EGI Federation has been supporting multiple research communities and scientific collaborations thanks to collaboration agreements with peer e-Infrastructure operators in the world which allow the EGI federation to be part of a integrated system of international research infrastructures by endorsing common policies, interoperability best practices and by coordinating service delivery activities. This presentation will present approaches and lessons learnt from the current experience.

        Speaker: Tiziana Ferrari (EGI.eu)
      • 170
        Global Open Science -- Support and lessons from global networks
        Speaker: Erik Huizer (CEO, GĖANT)
      • 171
        Panel
    • How to measure the impact of user engagement? Room: http://go.egi.eu/zoom4

      Room: http://go.egi.eu/zoom4

      The EGI e-infrastructure comprises hundreds of publicly funded service providers spread across Europe and worldwide. Monitoring the use of these resources, and reporting about this use as well as about its impact to various funding agencies is an increasingly important aspect of service operation. In this session we will share experiences of impact monitoring and assessment from various communities and will discuss ways in which we could, as a community, improve the way we monitor the impact of EGI on science and innovation. Topics will include:
      Ways of measuring service usage (through logins; from service logs; via accounting systems; etc.)
      Tracking scientific impact (through repositories; through acknowledgments; through SLAs, etc.)

      Main target audience
      Service providers, research support staff, policy makers.

      Conveners: Gergely Sipos (EGI.eu), Giuseppe La Rocca (EGI.eu)
      • 172
        Intro - why is measuring impact important
        Speaker: Gergely Sipos (EGI.eu)
      • 173
        How is impact measured in EGI today
        • Use of OpenAIRE (publications)
        • Use of Ops portal (num. of users)
        • Use of accounting portal (CPU consumption)
        • Use of customer interviews (free form response)
        • REST API from Customers’ services
        • Council interview
        Speaker: Giuseppe La Rocca (EGI.eu)
      • 174
        Monitoring scientific outputs with OpenAIRE
        • Best use of OpenAIRE to track publications;
        • Live demo to showcase how OpenAIRE can support the needs of the communities;
        • Understanding current limitations and present future plans.
        Speaker: Paolo Manghi (Istituto di Scienza e Tecnologie dell'Informazione - CNR)
      • 175
        Discussion - How we can improve?
    • 12:30 PM
      Lunch break
    • Lunch break
    • 12:30 PM
      Lunch break
    • 12:45 PM
      Lunch break
    • Cloud computing - Part 2 Room: http://go.egi.eu/zoom4

      Room: http://go.egi.eu/zoom4

      Convener: Jerome Pansanel (CNRS)
      • 176
        Using OpenStack to share hardware between Big Data, AI, HTC and HPC workloads

        At StackHPC we work with many public and private institutions to build clouds that work well for their Scientific Computing needs. At Cambridge University, we have helped build their new Arcus cloud. It supports VMs, Containers and baremetal instances within a single cloud. This enables a diverse set of communities to share a single pool hardware resources including Kubernetes based environments (such as JupyterHub, Kubeflow and Pangeo), traditional batch job HPC clusters (typically Slurm with low latency networking) and allowing science communities to consume infrastructure directly and run their own custom science platform. This is all powered by the ongoing convergence of hardware needed by these various workloads.

        In this talk we look back at the lessons Cambridge University have learnt over the years running a wide variety of workloads across OpenStack and Slurm. We then take a detailed look at how they are currently using to provision all of their new baremetal servers, rather than xCAT. This means the same infrastructure as code automation can be used to create baremetal and VM based platforms. Good practices and industry standard tools like Terraform and Ansible are being adopted to help make it easier to port these platforms both to other OpenStack clouds and non-OpenStack clouds.

        Finally we look at some of the active development currently in flight, including work to add a temporal dimension to quota using OpenStack Blazar. The aim being to reduce overheads in rebalancing the allocations between multiple competing workloads.

        Speakers: John Garbutt (StackHPC), Mr Browne Paul (Cambridge University)
      • 177
        Experience with cloud vouchers in OCRE

        CERN investigated the use of cloud vouchers to provide IaaS resources to researchers in the Helix Nebula Science Cloud (HNSciCloud) project. HNSciCloud was a €5.3 million project that established a hybrid cloud platform combining commercial services with existing publicly funded on-premise resources, to support the deployment of high performance computing and big-data capabilities for scientific research. Part of the procurement budget was dedicated to procuring cloud vouchers. This exercise demonstrated that:

        • cloud vouchers facilitate the distribution of free-at-the-point-of-use cloud resources to individual researchers
        • they are particularly suitable for small-scale projects with defined costs and to explore innovative architectures before procuring them at scale

        Leveraging lessons learned from HNSciCloud, CERN is currently using cloud vouchers in the context of the Open Clouds for Research Environments (OCRE) project. This project aims to accelerate cloud adoption in the European research community, by bringing together commercial cloud providers and the research and education community. The mechanism for this purpose is a pan-European tender and framework agreements with cloud service providers that meet the specific requirements of the research community.

        Cloud vouchers were identified as a powerful tool to encourage consumption of digital services by the so-called Long-Tails-of-Science within the project. In this context, CERN is responsible for:

        • identifying individual scientists in need of cloud services for their research
        • analysing their requirements
        • allocating and distributing cloud vouchers
        • collecting feedback

        To carry these tasks, CERN established links with two organisations representing the Long-Tails-of-Science in Europe: the European Council of Doctoral Candidates and Junior Researchers (Eurodoc) and the Marie Curie Alumni Association (MCAA). A comprehensive survey was jointly created to scope the needs of individual researchers. The survey was distributed through the Eurodoc and MCAA networks and advertised at international conferences. Between 1st April and 1st September 2019, the survey generated 81 answers, of which 72 were valid.
        After analysing requirements from individual researchers based on the responses received, CERN launched the distribution of a first wave of pre-paid vouchers. These vouchers were procured by GÉANT from three suppliers already contracted via the pre-existing GÉANT IaaS framework: a Microsoft reseller, an AWS reseller and CloudSigma, an independent service provider. The total amount of vouchers procured by GEANT is €500,000.

        Between November 2019 and June 2020, CERN distributed cloud vouchers with a €500 face value to 70 individual researchers representing the Long-Tail-of-Science in Europe. Researchers were selected by representatives of Eurodoc and MCAA within their networks, taking into account diversity across researcher types, gender, scientific disciplines, and geographical location. Currently, CERN is in the process of collecting the researcher’s feedback. Additionally, EGI.eu, as coordinator of EOSC-hub, is another distribution channel of OCRE cloud vouchers. Granted with 75 vouchers, this channel targets Earth Observation researchers in the EOSC-hub Early Adopter Programme.

        This presentation gives an overview of the lessons learned on vouchers from HNSciCloud. It presents the outcome of the requirement analysis and details the distribution process of the vouchers in OCRE. Finally, the feedback from researchers so far will be presented.

        Speaker: Marion Devouassoux
      • 178
        Dynamic DNS service for EGI Federated Cloud

        Dynamic DNS service is critical for application and infrastructure services that are dynamically deployed in EGI Federated Cloud because it can:

        • Improve user experiences by using memorable, sensible hostnames for the services hosted in the Cloud.

        • Reducing cost of development/deployment by using predictable, reusable URLs for services so service URLs can be pre-set for clients and servers, that can greatly simplify the service deployment.

        • Improve security of services deployed in Cloud as SSL certificates can be obtained in advance.

        • Promote federated approach as services are independent from location of hosting VM in Federated Cloud

        • Reduce cost for users and site admins as the Dynamic DNS service is very easy to use and no action required from site admins.

        The Dynamic DNS service provide united, federation-wide Dynamic DNS support for VMs in EGI Federated Clouds. Users can register their chosen meaningful and memorable DNS entries (hostnames) in given domains (e.g. my-server.vo-name.fedcloud.eu) and assign them to public IPs of their servers hosted in EGI Federated Cloud. By using Dynamic DNS, users can host services in EGI Federated Cloud with their meaningful service names, can freely move VMs from sites to sites without modifying server/client configurations (federated approach), can request valid server certificates in advance (critical for security) and many other advantages.

        The service is currently hosted by Institute of Informatics SAS at https://nsupdate.fedcloud.eu/. More information is available at https://wiki.egi.eu/wiki/Dynamic_DNS_tutorial.

        Speaker: Viet Tran (IISAS)
      • 179
        Monitoring and accounting in the INFN-Cloud infrastructure

        INFN-Cloud aims at exploiting the power of a federation of cloud sites to provide cloud resources to a heterogeneous public of end users. In the cloud context, they are supposed to manage the whole stack, or a part of it. This means that they need to have full control of what happens on their servers, with almost no need from them to ask for help to the site administrator.

        This talk will describe in detail the architecture of the INFN Cloud monitoring and accounting services, which have been designed to provide users with complete feedback, and thus control, on the resources they use and on how they use them.

        Several levels of monitoring checks have been implemented, ranging from a very low level view of the resources to high-level application monitoring, keeping in mind simplicity, clearness and effectiveness.
        The INFN-Cloud status page provides a very quick high-level overview of the overall status of the cloud federation, in terms of service availability (e.g. operational, under maintenance or degraded). The monitoring system is integrated with the services instantiated by the users through the INFN-Cloud Dashboard, enabling them to completely self-manage their instantiations.

        On the other hand, INFN Cloud also implements an accounting system, aiming at controlling how resources are used by the different user communities or by specific users. The collected data about resource usage information is then provided through a user-friendly web interface.

        Besides being user-oriented services, the INFN-Cloud monitoring and accounting architectures are also instrumental to allow an easy and effective integration of new cloud providers into the INFN-Cloud federation, providing each cloud site administrators and the central INFN-Cloud operations team with dedicated notifications and statistics. The talk will conclude with an overview of the expected enhancements of the INFN-Cloud monitoring and accounting system over the next months, allowing further integration with hybrid cloud deployments.

        Speaker: Vincenzo Spinoso (INFN)
    • Data analytics services in EGI - Overview and use cases Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      This session provides an overview of the data analytics services that are provided by EGI or are hosted within the EGI infrastructure.

      Data analytics services offer generic or discipline-specific environments for researchers to perform data intensive analysis. EGI provides such analysis within its own portfolio (for example Jupyter Notebooks, CHIPSTER Next Generation Sequencing), as well as supports scientific communities to offer scalable data analytics on top of the baseline EGI compute, data and security services (for example WeNMR portals for structural biology).

      This session will provide an overview of the existing analytics services, detailing their features and access modes, and will explain how similar setups can be achieved by scientific communities who wish to use EGI as a scalable hosting environment for their applications/services/frameworks. The session will feature a few research communities to report on their experience in offering data analytics services in EGI.

      The session is of introductory level aiming to serve researchers and research communities who want to either use existing analytics services, or wish to make their applications scalable through EGI.

      Main target audience
      Scientists, representatives of scientific communities, scientific software/platform developers.

      Convener: Giuseppe La Rocca (EGI.eu)
      • 180
        Overview of the EGI Analytics services
        • The EGI Notebooks service
        • The EGI Workload Manager
        • The EGI Applications on Demand services pool
          • EC3,
          • Science Software on Demand (SSoD),
          • AppDB/VMOps, and
          • Chipster
        Speaker: Giuseppe La Rocca (EGI.eu)
      • 181
        EGI Notebooks for D4Science
        Speaker: Andrea Manzi
      • 182
        EISCAT_3D report with the EGI Workload Manager
        Speaker: Ingemar Haggstrom (EISCAT)
      • 183
        Q&A
        Speaker: Giuseppe La Rocca (EGI.eu)
    • Global Open Science Cloud -- Part 4: realizing the vision of GOSC Room: http://go.egi.eu/zoom3

      Room: http://go.egi.eu/zoom3

      The workshop page is at: https://indico.egi.eu/event/5255/

      The digital revolution has transformed the way in which data, information and knowledge are acquired, managed, repurposed, analysed, used and disseminated. We are at the threshold of an era with unprecedented opportunities for cross-disciplinary and cross-border collaboration for research and innovation. A new research paradigm is emerging which applies increasingly automated approaches and Machine Learning, and which harnesses the most advanced computing facilities and software, to handle huge and disparate cross-disciplinary data. The advanced infrastructure needed for this new paradigm and Open Science is emerging: it needs to be on demand, as a service, ubiquitous and seamless. In pursuit of this vision, infrastructures are beginning to emerge at institutional, national and regional levels, such as the show cases in European Open Science Cloud from European Commission, the CSTCloud from Chinese Academy of Sciences, the ARDC e-infrastructure in Australia, the African Open Science Platform, etc.

      Is it possible to share experiences and make a global framework to align and federate such Open Science clouds and platforms? Is there a way to better support research collaborations across continents to resolve global science challenges, such as the UN Sustainable Development Goals (SDGs), climate change, infectious diseases and pandemics, COVID-19, coordinated and global disaster risk reduction, and so on? At the moment, a global, fully connected digital infrastructure is not in place, making it difficult for scientists to access digital resources across countries and continents.

      The idea of a Global Open Science Cloud (GOSC) was initiated during the CODATA 2019 Beijing conference. The mission of GOSC is to connect different international, national and regional open science clouds and platforms to create a global digital environment for borderless research and innovation. It aims to provide better ways to harness digital resources from around the world, help bridge the division in infrastructure, technique and capacity building among different countries, support global science collaborations and foster truly international science.

      There are many challenges and difficulties, i.e., inconsistent access policies from country to country; lack of common standards for building a global-level data and e-infrastructure federation; differences in language and culture; highly varied funding schemes, etc.

      The workshop will gather representatives of international initiatives, research communities and public digital infrastructure providers, to review the existing work in GOSC, and to develop consensus about an initial concept model, framework, and roadmap for GOSC. We will discuss the needs and typical use cases from research community representatives, examine available resources and possible contributions from international e-infrastructure providers, identify the key barriers in policy, governance, standard and technique, and identify possible funding opportunities.

      We welcome all GOSC stakeholders to join and contribute to the discussion. We invite attendance by:
      -- Research community and research infrastructure representatives with needs and experience supporting global collaborations;
      -- Digital infrastructure representatives open to participating in a global resource federation;
      -- Experts on standards and technology developing and operating solutions for federated access to data, computing, software and applications;
      -- Policy researchers and policy makers who can identify the key policy barriers and provide plausible solutions;
      -- Funders who have the vision and interests of investment in the implementation of GOSC.

      The full workshop agenda is at https://indico.egi.eu/event/5255/

      Convener: Tiziana Ferrari (EGI.eu)
      • 184
        Overview by the Chair
      • 185
        Introduction to Chinese Academy of Sciences International Cooperation and GOSC
        Speaker: Yan ZHUANG (Divison Director, Bureau of International Cooperation, CAS headquarter)
      • 186
        EC support to the European Open Science Cloud and perspectives on international cooperation
        Speaker: Kostas Glinos (European Commission, Head of Unit for Open Science)
      • 187
        Perspectives on open science and open data from the US National Science Foundation
        Speaker: Manish Parashar (Director, Office of Advanced Cyberinfrastructure (OAC) National Science Foundation)
      • 188
        Q&A to Funders and short panel discussion
      • 189
        Panel
    • Service design WS (part 1): How can EGI bring your service to the world? Room: http://go.egi.eu/zoom2

      Room: http://go.egi.eu/zoom2

      The goal of this session is to present how EGI can help service providers to reach new markets/users and, in the same time, innovate its service offer. The session provides details on the EGI approach to marketing and matchmaking, and provides an overview of the process that leads to the publishing of services in the EGI portfolio. The session will include an introduction to IT Service management (ITSM) as the underlying approach to service portfolio management.

      Specific attention will be given to those aspects of ITSM that ensure high quality, proper resourcing and availability of online services, predictable manner, maintaining and growing the user confidence in the EGI branded services in general. The Service Management System and especially the requirements of the Service Portfolio Management (SPM) process will be discussed in some depth.

      A proposal to create a new EGI catalogue for external/communities services will be presented for discussion. The idea is to develop an alternative way for service providers to join the EGI service offer preserving their branding and without the need of comply with very strict requirements but, in the same time, guaranteeing a quality similar to those of the EGI branded services. This community catalogue would be open to all the services that can provide added value/additional capabilities on top of the current federation offer.

      Finally, a concrete example of what the key part of the onboarding process looks like from the service provider perspective is presented by reviewing the “Service Design and Transition Package” (SDTP) of one of the existing EGI services.

      Conveners: Diego Scardaci (EGI.eu), Matti Heikkurinen (EGI.eu)
    • 2:30 PM
      Coffee break URLs assigned per session

      URLs assigned per session

      Zoom

    • 2:30 PM
      Coffee break URLs assigned per session

      URLs assigned per session

      Zoom

    • 2:30 PM
      Coffee break URLs assigned per session

      URLs assigned per session

      Zoom

    • Demos 7 Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      Some things are best demonstrated, especially when it comes to technical services. Just before the coffee break, we offer a 30 minutes slot for submitted demos to show these services, outputs, or any other activity that is relevant to the conference's theme.

      • 194
        A demonstration of the VIP platform interoperability capabilities

        VIP (Virtual Imaging Platform) is a web portal for the simulation and processing of massive data in medical imaging. VIP users can access applications as a service and significant amounts of computing resources and storage (provided by the biomed EG VO) with no required technical skills beyond the use of a web browser. In this demonstration, we will show that VIP enables i) application interoperability across execution environments and ii) data interoperability between storage and execution platforms.

        We will begin by a short run-through of the main features of the VIP web portal, focused on selecting an application and launching it. Then, after an introduction of the Boutiques tool (https://github.com/boutiques/boutiques) we will use it to easily create and integrate a new application in VIP, and so making it usable by all the VIP community. We will then go further and publish this application in the open research repository zenodo (https://zenodo.org/) with a single click through VIP and Boutiques. Openly available on Zenodo in the Boutiques format, and with a DOI attached, the application can now be referenced in papers, and anybody interested can use Boutiques tools to fetch it and run it locally or in a VIP-like platform. Through all these features, this first part will demonstrate how VIP encourages to include applications and software as a first-class elements in research projects and to make them more open and interoperable.

        In a second part, we will present the VIP REST API, based on the CARMIN specification (https://github.com/CARMIN-org/CARMIN-API) and we will use it to launch an execution on VIP without going on the VIP web portal. Then we will use the most recent CARMIN feature, implemented on VIP, to reference external storage platforms as inputs and outputs of execution. In order to demonstrate that, we will use an instance of a girder server (https://github.com/girder/girder) hosting the inputs (and outputs) of the execution to be executed. And we will make the VIP REST API requests from the girder web portal thanks to a plugin we developed, as this allows girder users to launch massive treatments on their files in the same tool they use to manage their research data. But the same CARMIN REST requests could be submitted on any other CARMIN execution platform hosting the same application (for instance a Boutiques application), allowing for data interoperability across the storage platforms and all the CARMIN execution platforms.

        Speaker: Axel Bonnet (CNRS)
      • 195
        OpenID Connect plugin for OpenStack Clouds

        Although the initial authentication and authorization mechanisms of the EGI Federated Cloud were based on X.509 certificates and VOMS proxies, it has been shown to be an obstacle for the integration of additional components, such as Platform and Software as a Service components or simply web portals.Nowadays, EGI.eu is transitioning its Authentication and Authorization infrastructure from X.509 certificates and proxies towards the use of the EGI Check-In and the OpenID Connect standard. The most widely used Cloud Management Framework in the EGI Federated Cloud is OpenStack, an open source cloud software system whose development is community driven. The Identity component of the OpenStack cloud distribution (code named Keystone) is a REST service that leverages the Apache HTTP server and a 3rd party module named “mod_auth_openid” to provide OpenID Connect authentication to an OpenStack Cloud. Due to the current status of these components, the OIDC standard is not purely implemented and this makes it impossible to configure two different providers at a single resource center to be used from command line tools.

        Supported by EGI Strategic and Innovation Fund, IFCA advanced computing and e-Science group has implemented a keystone plugin to enable Open ID configurations in a standard-manner, which will also make possible to consume Oauth 2.0 tokens and make requests to the corresponding Oauth 2.0 introspection endpoints even from a command line interface. Furthermore, it solved the limitation to configure only one provider at a single resource.

        The proposed demonstration will show how to install, configure and deploy this plugin in an OpenStack instance.

        Plugin available at: [https://github.com/IFCA/keystone-oidc-auth-plugin]

        Speakers: Dr Alvaro Lopez Garcia (IFCA-CSIC), Dr Fernando Aguilar (IFCA-CSIC)
    • Demos 8 Room: http://go.egi.eu/zoom3

      Room: http://go.egi.eu/zoom3

      Some things are best demonstrated, especially when it comes to technical services. Just before the coffee break, we offer a 30 minutes slot for submitted demos to show these services, outputs, or any other activity that is relevant to the conference's theme.

      • 196
        Customizable Elastic Kubernetes on EGI Cloud Compute

        The EKaaS (Elastic Kubernetes as a Service) is an on-demand service to deploy Elastic Kubernetes clusters on the EGI Cloud Compute. EKaaS has been partially funded in the Second EGI Strategic and Innovation call and aimed at the development of a convenient service for the provisioning customization of self-managed Kubernetes clusters.
        The service is fully operational at http://servproject.i3m.upv.es/ec3-ltos and provides the following functionality:
        - Full integration with the EGI Check-in and support to members of the vo.access.egi.eu VO. Any user belonging to this VO is entitled to deploy a cluster in any of the sites that support it.
        - Integration with the appDB information system for the retrieval of the available endpoints, Virtual Machine base Images and instance flavours. Therefore, the cluster fits the configuration of the target cloud compute site.
        - Self-elasticity according to the workload, thanks to CLUES (Cluster Energy Savings). A minimum number of nodes are deployed to deal with the workload of the cluster. As the workload increases (provided that the Kubernetes objects make use of the “resources/limits” attributes), new nodes are powered on and added to the cluster, up to a maximum number fixed by the user at deployment time. If the workload reduces, the cluster will be shrunk automatically.
        - Deployment of a Kubernetes Dashboard (clusterurl/dashboard/#/login) and a Kubeapps dashboard (clusterurl/kubeapps/#/login) to customize the clusters deployed with custom and official Helm Charts from bitnami and google repositories. The user can easily add services such as databases, web servers, application development, key-value stores, logging, visualization, networking and data analytics among others, to the cluster through a graphical interface without developing the Kubernetes specifications for the deployment of those applications. In the four official repositories included there are over 300 software components available.
        The service is open to any user given the above conditions and it is released under Apache 2.0 open source license in https://github.com/grycap/ec3, so it can be easily customized and deployed for a different VO.

        Speaker: Ignacio Blanquer (UPVLC)
      • 197
        Accounting for New Types of Resource Consumption in a Federated Cloud

        As infrastructures and cloud services evolve, resource consumption is more flexible and users are often allowed to reserve resources without actual consumption. Relevant standardization bodies have developed new types of accounting record specifications, and new or updated tools are required to keep track of resource usage. The reaction to this is the development of a new accounting tool – GOAT.

        GOAT – GO Accounting Tool – is a service running in the background and waiting for a connection from a compatible client. The client connects to a cloud management framework, extracts computing data about projects, servers, networks, and storages, filters them accordingly, and sends them to a server for further processing. Multiple clients can use the server at once. When the server receives accounting data, it is transformed into the configured format and writes them to the destination file. The consumer collects data into a central accounting database where it is processed to generate statistical summaries.
        For now, the GOAT project supports two cloud computing platforms on the client side – OpenNebula and Openstack. Thanks to its use of standard accounting record formats it can work with different consumers. The ones used in the real world are APEL and Prometheus.

        This Demonstration shows how the GOAT client extracts accounting data from a cloud management platform, sends them to the GOAT server where they are transformed, and how Prometheus and Grafana process them and present various views of resource usage.

        Speaker: Ms Lenka Svetlovská (CESNET)
    • Coffee break
    • Authentication-Authorisation solutions - Part 2 Room: http://go.egi.eu/zoom3

      Room: http://go.egi.eu/zoom3

      Convener: Nicolas Liampotis (GRNET)
      • 198
        EGI Check-in for TRIPLE and its innovative services

        The TRIPLE (Targeting Researchers through Innovative Practices and Linked Exploration) project aims at designing and developing the European discovery platform dedicated to Social Sciences and Humanities (SSH) resources, providing SSH researchers a single point of access for easily finding data, publications, profiles and projects, in at least 9 languages. This EC-funded project started in October 2019, will last for 42 months and counts 19 partners from 13 countries. TRIPLE will become a core service of OPERAS, a European Research Infrastructure for open scholarly communication in the SSH.
        TRIPLE is designed to be an open infrastructure that can be integrated with new or existing services, both of the EOSC or implemented by research centres and private companies, that can enhance the user experience and provide meaningful functionalities. In addition to the TRIPLE Core platform, a set of innovative tools (visualisation, annotation, trust building system, crowdfunding, social network and recommender system) are being deployed.
        In this presentation, we will introduce the overall integration strategy of OPERAS to use EGI Check-in, discussing challenging issues identified during TRIPLE implementation.
        Among TRIPLE innovative services, the Trust Building System (TBS) and the Pundit Annotation Tool are existing external tools which need to be integrated to the TRIPLE Core and align with a single sign-on provided by Check-in. The TBS, developed by MEOH, is a social platform fostering federated communities and transdisciplinary cooperation between a variety of stakeholders such as scientists, policy makers, public services, business networks, and civil society. Powered with Semantic Web technologies, Pundit (developed by Net7) is a tool for web annotation that easily allows to highlight, comment and create semantic annotations on text fragments of any web page. TBS and Pundit are expanding their authentication and registration features to include Check-in. This has significantly increased the possibility of adoption of these tools: on the one hand researchers all over Europe (and beyond) might be easily onboarded by simply using their organisation credentials; at the same time casual users could use social Ids, e.g., Google and Facebook that are enabled by Check-in.
        In performing this integration, issues have been identified -- when a new user registers to TBS and Pundit, s/he also needs to register as an EGI member. This is because Check-In is designed for higher AAI needs and has to satisfy the EGI e-Infrastructure service access policy. However, both TBS and Pundit want to be as open (public access) as possible. There is a need to make the registration process of new users as simple and immediate as possible, while maintaining compliance with the authentication and identification standards. In order to address this specific requirement from TRIPLE, the new development version (available in production in September 2020) of EGI Check-in integrates a much simpler registration process for users, that can largely help those public access platforms, similar to TRIPLE, who want to ease the user onboarding process (in particular for non-academic users). This will allow TRIPLE users to be very quickly active on the services.

        Speakers: Luca De Santis (Net7), Yoann Moranville (DARIAH), Valeria Ardizzone (EGI.eu), Yin Chen (EGI.eu)
      • 199
        FEUDAL: Federated User Credential Deployment Portal

        Distributed federated infrastructures contain services that require the deployment i.e. creation of accounts before users may access them. Examples include unix accounts or accounts in web-based systems, e.g. mailing lists or so. Often, site-local policies (e.g. on usernames) have to be respected and federated authorisation (Virtual Organisations) is mandatory. Which services are provisioned should be selected by the users themselves. Sometimes questions need to be asked back to the user (e.g. conflicting username, primary group to use) before an account can be provisioned. Once provisioned, login information needs to be displayed to the user. Once users do not need their services anymore they need to be able to trigger the removal of their accounts. A challenge is the automatic removal of service deployments belonging to users which e.g. leave a VO and as a result lack the authorisation for the use of the service.

        With the Federated User Credential Deployment Portal (FEUDAL) we implemented a system that addresses these requirements. Its key features are the user-oriented webpage and the instantaneous deployment to the services of a VO. Users are presented with a list of available services and can select which they want to use. Publish-subscribe is used to communicate deployments, minimising latency for the user. Depending on the service, this enables us to immediately prompt for more information and to display e.g. service credentials after the user selected a service for deployment.
        Any third party service may be created via an adapter approach. The adapters are run by the service administrators, not directly by Feudal. Adapters for provisioning users into unix systems, LDAP instances and dCache are already implemented. Feudal is integrated with all major AAIs, such as EGI-Checkin, Unity and eduTEAMS. A HTTP API is available for both users and the integration with other AAI.

        Speaker: Lukas Burgey (KIT)
      • 200
        IRIS IAM: Federated Access Management for IRIS

        Driven by the physics communities supported by UKRI-STFC (UK Research and Innovation Science and Technology Facilities Council), the eInfrastructure for Research and Innovation for STFC, or IRIS, is a collaboration of UKRI STFC, science activities and provider entities in the UK. Over the last three years the UK’s IRIS collaboration and IRIS 4x4 project has worked to deploy hardware and federating tools across the range of physics supported by STFC. Providing a coherent framework for accessing HTC, HPC and Open Stack cloud resources the IRIS IAM, a deployment of the INDIGO IAM software, provides federated access to resources based on the AARC blueprint architecture, removing friction for scientific communities and promising to facilitate a new generation of workflows across diverse resources.

        Development of the IRIS IAM has been in parallel to other community Authentication and Authorization activities, such as FIM4R and the WLCG authorization project, in order to ensure that the IRIS solution aligns with and supports the work undertaken elsewhere. The IRIS IAM is now an established production service, providing access to a number of IRIS services, including OpenStack clouds, accounting dashboards and security portals. However, work is still underway to enhance the service, including the range and scope of clients the IAM provides access too. This talk shall touch on progress thus far, notable challenges, and next steps and plans for the IRIS IAM service.

        The talk will also present recent work investigating methods for supporting federated methods to provide command line access to resources, utilising the IRIS IAM and the OIDC flow. This will include details about the various technologies investigated and an overview of the currently favoured technical solution, an extension of an existing OIDC PAM module to support authorization based on both the preffered_username and the groups scopes.

        Speakers: Mr Tom Dack (UKRI - STFC), Mr Will Furnell (UKRI - STFC)
    • Clinic: data analytics services (multiple zoom rooms)

      This session will provide technical support for existing and new users of EGI data analytics services, and to those who wish to integrate additional analytics services with EGI for scalable hosting. During the session experts will share technical information, usage tips and tricks about the EGI Notebooks service, the EGI Workload Manager, the EGI AppDB/VMOps dashboard, the Science Software on Demand Science Gateway (SSoD), the WWW-based Chipster service and will answer questions from the audience. The session will be interactive - A perfect opportunity to bring questions, and to deep-dive into the EGI Notebooks, the EGI Workload Manager, the EGI VMOps dashboard, SSoD and the Chipster service!

      The EGI Notebooks is a browser-based tool for interactive analysis of data using EGI storage and compute services. Notebooks is based on the JupyterHub technology.

      The EGI Workload Manager service is based on DIRAC technology and is suitable for users that need to exploit distributed resources in a transparent way. The service has a user-friendly interface and also allows easy extensions for the needs of specific applications via APIs.

      The EGI VMOps dashboard is a framework that allows users to perform Virtual Machine (VM) management operations on the EGI Federated Cloud.

      The Science Software on Demand Science Gateway is a programmable interface of a RESTful API Server, compliant with CSGF APIs specifications, able to provide an easy acces to the PaaS layer by leveraging on recent Web technologies

      The Chipster service is is a user-friendly analysis software for next generation sequencing data such as scRNA-seq, RNA-seq, ChIP-seq, 16S, etc.

      Main target audience
      Scientists, representatives of scientific communities, software and platform developers, scientific data providers.

      Convener: Giuseppe La Rocca (EGI.eu)
      • 201
        EGI VMOps dashboard, Science Software on Demand (SSoD) and Chipster Room: http://go.egi.eu/zoom5

        Room: http://go.egi.eu/zoom5

        The session will comprise an initial overview about the EGI VMOps dashboard, the SSoD and the Chipster services their main features and future roadmaps.

        The rest of the session will include a discussion about topics raised by the audience.

        Speakers: Alexander Nakos (IASA), Kimmo Mattila (CSC), Riccardo Bruno (INFN)
      • 202
        The EGI Notebooks and the Elastic Cloud Compute Cluster (EC3) Room: http://go.egi.eu/zoom4

        Room: http://go.egi.eu/zoom4

        The session will comprise an initial overview about the EGI Notebooks and EC3 services, their main features and future roadmaps.

        The rest of the session will include a discussion about topics raised by the audience.

        Room

        Speakers: Amanda Calatrava (UPVLC), Enol Fernandez (EGI.eu), Miguel Caballer (UPVLC)
      • 203
        The EGI Workload Manager http://go.egi.eu/zoom2

        http://go.egi.eu/zoom2

        The session will comprise an initial overview about the EGI Workload Manager based on DIRAC technology, its main features and future roadmap.

        The rest of the session will include a discussion about topics raised by the audience.

        Speaker: Andrei Tsaregorodtsev (CNRS)
    • Service Design WS (part 2): New services - proposals for EGI Room: http://go.egi.eu/zoom6

      Room: http://go.egi.eu/zoom6

      This session will present a set of interesting service candidates to innovate the EGI service offer. The goal of this session is to be an interactive session where the participants can use a brainstorming approach to identify new advantages and promotional aspects that the service providers could include in their service descriptions.
      The individual service presentations cover both the “business case” for the service (problem solved, target audience, unique aspects of the service) and the key information to be included in the “Service Design and Transition Package” (SDTP) of the service.
      The session can be seen as a practice-oriented follow-up of the previous session on the topic. However, it is possible to participate in the session without having participated in the previous one.

      Conveners: Diego Scardaci (EGI.eu), Matti Heikkurinen (EGI.eu)
    • Closing plenary. EGI contributions to the European Open Science Cloud: from EOSC-hub to EGI-ACE and EOSC Future Room: http://go.egi.eu/zoom1

      Room: http://go.egi.eu/zoom1

      The EGI Federation has been actively contributing to the implementation of the European Open Science Cloud since its launch. From January 2021 the activities of the EOSC-hub project will be expanded through the EGI-ACE and EOSC Future, for the implementation of the EOSC compute platform and the delivery of various components of the EOSC Core services. This presentation will introduce the EGI contributions to both projects explaining the relationship between EGI and EOSC.

      ABOUT EGI-ACE
      Invited to grant by the EC in response to the call INFRAEOSC-07 A1, EGI-ACE empowers researchers from all disciplines to collaborate in data- and compute-intensive research across borders through free at point of use services. Building on the distributed computing integration in EOSChub, it delivers the EOSC Compute Platform and contributes to the EOSC Data Commons through a federation of Cloud compute and storage facilities, PaaS services and data spaces with analytics tools and federated access services.

      The Platform is built on the EGI Federation, the largest distributed computing infrastructure for research. The EGI Federation delivers over 1 Exabyte of research data and 1 Million CPU cores which supported the discovery of the Higgs Boson and the first observation of gravitational waves, while remaining open to new members. The Platform pools the capacity of some of Europe’s largest research data centres, leveraging ISO compliant federated service management. Over 30 months, it will provide more than 82 M CPU hours and 250 K GPU hours for data processing and analytics, and 45 PB/month to host and exploit research data. Its services address the needs of major research infrastructures and communities of practice engaged through the EOSC-hub project.

      The Platform advances beyond the state of the art through a data-centric approach, where data, tools and compute and storage facilities form a fully integrated environment accessible across borders thanks to Virtual Access. The Platform offers heterogeneous systems to meet different needs, including state of the art GPGPUs and accelerators supporting AI and ML, making the Platform an ideal innovation space for AI applications. The data spaces and analytics tools are delivered in collaboration with tens of research infrastructures and projects, to support use cases for Health, the Green Deal, and fundamental sciences.

      The consortium builds on the expertise and assets of the EGI federation members, key research communities and data providers, and collaborating initiatives.

      ABOUT EOSC Future
      EOSC Future integrates, consolidates, and connects e-infrastructures, research communities, and initiatives in EOSC to develop the EOSC-Core and EOSC-Exchange. EOSC Future will expand the EOSC ecosystem, integrate existing but disparate initiatives, and hand over key project outputs to the EOSC Association. EOSC Future will unlock the potential of European research through a vision of Open Science for Society. In EOSC Future EGI partners will contribute to the EOSC architecture and its interoperability framework, the development and operations of EOSC Core components such as the Portal, accounting, monitoring and AAI, integration with thematic services from research infrastructures, training and outreach.

      EOSC Future bridges the game from project based EOSC development to a new model, connected to the recently launched EOSC Association. It integrates the services from the INFRASEOSC7 projects such as EGI-ACE and is intended to leave a valuable and robust EOSC ecosystem behind when it concludes in 2023.

      Conveners: Gergely Sipos (EGI.eu), Ivan Maric (SRCE), Owen Appleton (EGI.eu)
    • EOSC Synergy Dutch Data Landscape Workshop Room: go.egi.eu/egi2020synergy

      Room: go.egi.eu/egi2020synergy

      This (virtual) meeting aims toto present the findings of the Dutch landscape analysis, to validate the results, and to discuss how researchers can best be provided with the data services and digital infrastructure that meet their demands. After an introduction to EOSC Synergy and a presentation of the Dutch report, feedback of the audience will be requested in a final discussion.

      Chair: Elisa Cauhé (EGI Foundation & EOSC Synergy).

      10:00 - Welcome, Introduction to the workshop - Luděk Matyska (CESNET & member of the EOSC Governing Board).

      10:05 - Short overview of the EOSC Synergy countries report - Valentino Cavalli (EGI Foundation & EOSC Synergy).

      10:25 - Main results of the Netherlands landscape analysis - Peter Doorn (DANS).

      10:55 - Feedback on landscape report & discussion.

      11:25 - Wrap-up and follow up.

      11:30 - Workshop close.