Digital Infrastructures for Research 2018

Name: Digital Infrastructures for Research 2018
Start: 2018-10-09T09:00:00+01:00
End: 2018-10-11T17:00:00+01:00
Location: Lisbon

9 Oct 2018, 09:00 → 11 Oct 2018, 17:00 Europe/Lisbon

Lisbon

ISCTE, University of Lisbon

Sinead Ryan (TCD), Volker Guelzow (DESY)

Description

Find out more about the DI4R in the conference's website: https://www.digitalinfrastructures.eu/

Tuesday, 9 October ¶
- 09:00 → 11:00
  Opening Plenary¶ Main Auditorium
  
  Main Auditorium
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Prof. Sinead Ryan (Trinity College Dublin)
  - 09:00
    
    Opening¶ 45m
    
    Including a "first-comers" introduction to the organising RIs and DI4R
    
    Speaker: Prof. Manuel Heitor (Minister of Science and Higher Education of Portugal)
  - 09:45
    
    Research infrastructures for climate prediction: Current data-centric challenges¶ 45m
    
    Weather and climate prediction and high-performance computing have gone hand in hand in the last few decades. Current activities in this field rely on different digital infrastructures needed both for running computationally expensive global and regional climate models (HPC infrastructures), and for storing and making available the resulting scientific data and metadata (GRID infrastructures). For instance, the Earth System Grid Federation (ESGF) is a key international effort building on different national infrastructures (e.g. ENES in Europe, http://portal.enes.org) to provide a distributed data platform enabling free world wide access to climate data (moving from Peta- to Exa-scale). ESGF provides archiving and access services for the multi-model multi-scenario climate projections obtained in successive Climate Model Intercomparison Projects (CMIPs and CORDEX), which are the basis for climate change studies (including the IPCC reports). These studies typically require accessing and post-processing huge amounts of data, for instance to harmonize and postprocess climate change information for a particular region and, therefore, require new data-centric infrastructures facilitating postprocessing services (including machine learning). Some ongoing initiatives are exploring the use of cloud services to deploy efficient data processing services, based on a data-as-a-service approach. An example is the Data and Information Access Services (DIAS) being developed by Copernicus in Europe. In this talk I will introduce the main international ongoing collaborations on climate prediction (focusing on the IPCC - Intergovernmental Panel on Climate Change) and describe the current challenges posed by the new data-centric approach on the existing digital infrastructures in this field.
    
    Speaker: Jose Manuel Gutierrez (CSIC and IPCC author)
  - 10:30
    
    Mapping Species Knowledge– the value of data and digital infrastructure to address the extinction crisis¶ 30m
    
    The wealth of species in our planet is critical for our survival. However, we are losing species at a rate equivalent to a mass extinction event. During the last century, vertebrate extinction has been about 100 times higher than what would be expected during stable geological periods. Design of species conservation strategies and policies directly rely on accessible information, such as species demography, genetics, habitat, threats, and legislation. Biodiversity databases are expanding, and new ones are emerging, but despite the technological advances on digital infrastructure we are still struggling to map the available information for each of the described species. However, such an endeavour will allow managers and policy makers to improve their decision process. The Species-Index is an initiative driven by both the Species360 Conservation Science Alliance and the CPop at the University of Southern Denmark, with the aim to develop partnerships to map information, generate development platforms, workflow systems, and storage between open biodiversity repositories. We have developed the concept based on the Species-Index on Demography from 22 data repositories and the Zoological Information Management System (ZIMS). We started by indexing demographic information because of the immediate need to address species recovery strategies highly depends on births and death data to gain a deeper understanding of population dynamics, e.g., to assess species extinction risk or for the establishment of sustainable harvesting quotas for species trade. We found that ZIMS can provide a unique source of demographic knowledge to fill data gaps. As a result, we are developing routines on SDU ABACUS (https://escience.sdu.dk/) 2.0 Supercomputer to estimate demographic measures that can be used by conservation scientist and policymakers, with a significant focus on fighting the illegal wildlife trade.
    
    Speaker: Prof. Dalia A. Conde (Species360 CSA & CPop University of Southern Denmark)
- 11:00 → 11:30
  
  Coffee Break 30m
- 11:30 → 13:00
  EuroHPC a new player in the game - PRACE/GEANT organizer session¶ Main Auditorium
  
  Main Auditorium
  
  Lisbon
  
  ISCTE, University of Lisbon
  By end of this year the new Joint Undertaking EuroHPC will be founded. EuroHPC will pool European resources to develop top-of-the-range exascale supercomputers for processing big data, based on competitive European technology. Its aim is to acquire and provide a world-class pre-exascale supercomputing infrastructure to Europe's scientific and industrial users, matching their demanding application requirements by 2020, and to develop exascale supercomputers based on competitive EU technology that the Joint Undertaking could acquire around 2022/2023. Together with GÉANT who has unified the national research and education networks (NRENs) into a pan-European research network, interconnecting universities, research institutes, experiments and HPC centers and PRACE, the Partnership for Advanced Computing in Europe, aiming for high-impact scientific discovery and engineering research and development across all disciplines to enhance European competitiveness for the benefit of society, a truly European HPC landscape will be created. New services also open to industry and public services will be available and complete the HPC offer already existing today. Find out how this will benefit new service users and what the opportunities will be.
  
  Proposed programme:
  1. 30 min Introduction EuroHPC by EC
  2. 15 min PRACE perspective
  3. 15 min GEANT perspective
  4. 30 min Round table EuroHPC and EOSC – perspectives and opportunities. How can we work together? How will exascale computing benefit users?
  slides
- 11:30 → 13:00
  Persistent Identifiers in use: Exchanging ideas about new developments in the field of PID services¶ Auditorium B104
  
  Auditorium B104
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Eliane Fankhauser (DANS-KNAW)
  - 11:30
    
    Persistent Identifiers in use: Exchanging ideas about new developments in the field of PID services¶ 15m
    
    Persistent identifiers (PIDs) like DOIs for articles or ORCiDs for researchers are a core component of open science as they improve discovery, navigation, retrieval, and access of research resources. FREYA, a 3-year EU-funded project, aims to extend the PID infrastructure by cross-linking PID services, facilitating the development of new PID types, and creating community of practice. The engagement with the stakeholders and the wider PID community is an important means with which to exchange knowledge and get feedback about the development of new PID types and services. Currently, FREYA is establishing the PID Forum consisting of a user community whose members collectively oversee the development and deployment of new services. Anyone with an interested in PIDs is invited to join this session, exchanging ideas and contributing to the discussions. At this World Café Session, the PID Forum will be introduced; some of the work that has been done in the first few months of this project will be presented and discussed with the audience in a workshop. The workshop will focus on two current FREYA activities: (i) mapping the identifier landscape and (ii) understanding how stakeholders operate within the landscape. Both of these activities we would like to discuss with and get feedback about from the user community. FREYA has recently surveyed the current identifier landscape and would like to share key findings with the user community. Moreover, FREYA would like feedback from the community on user stories that have already collected. Questions like “Is there broader value to be gained from addressing the user story?” or “What is needed to deliver the value identified in the user story?” will be addressed. Finally, FREYA is eager to connect with any stakeholders in the user community to learn about their user stories and identify gaps where research resources could be better connected and services extended or built.
    
    Speakers: Eliane Fankhauser (DANS-KNAW), Ferguson Christine (EMBL-EBI)
  - 11:45
    
    Persistent identifiers and their services in digital infrastructures¶ 15m
    
    The reliable identification and location of digital objects that play a role in research is a fundamental requirement for digital infrastructures in general and the European Open Science Cloud (EOSC) in particular. And not only digital objects need identification: real-world entities such as people (researchers), funding bodies and research equipment need to be identified and linked up with other entities to address the use cases that are common to so many disciplines and research environments: facilitating reuse of data and other research outputs, assessing impact and contributing to long-term preservation. Persistent identifiers also play a vital role in implementing the FAIR principles, enabling stability of reference, publication and dissemination of services and access to resources. Already some persistent identifiers and their supporting infrastructures are very well established. DOIs and ORCIDs have gained enormous traction and are indispensable elements of the research landscape. Work is proceeding in several forums to develop identifiers for further entities. The question arises: how to bring all this together into a coherent and sustainable foundation for the vast distributed ecosystem of data and services that is the EOSC. The FREYA project, funded under the EC’s Horizon 2020 programme (project number 777523; see https://www.project-freya.eu), aims to develop the infrastructure for persistent identifiers as a core component of open science, in the EU and globally. FREYA will develop new PID services, new PID types and validate in a diverse set of applications. FREYA has a vision of three key concepts needed to achieve its goals: • The technical framework (PID Graph). The PID Graph connects and integrates PID systems, representing a map of the relationships across a network of PIDs and serving as a basis for new services. It will need to address common formats and metadata, interoperability between PID providers, interlinking and harvesting. • A community forum (PID Forum, a stakeholder community whose members collectively oversee the development and deployment of the PID Graph; it will be strongly linked to the Research Data Alliance (RDA). • a governance model (PID Commons), concerned with the sustainability and growth of the PID infrastructure resulting from FREYA beyond the lifetime of the project itself, defining the roles, responsibilities and structures for good self-governance based on consensual decision-making. The presentation will introduce the concepts of FREYA, providing an update on the current thinking, illuminated by examples from the project’s pilot applications, and consider how it will affect the further development of the EOSC.
    
    Speaker: Mr Simon Lambert (STFC)
- 11:30 → 13:00
  Thematic Services: Life Sciences, Neuroscience and Astronomy¶ Auditorium JJLaginha
  
  Auditorium JJLaginha
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Dr Gergely Sipos (EGI.eu)
  - 11:45
    
    Using Onedata for data caching in hybrid-cloud environments¶ 15m
    
    Onedata [1] is a global high-performance data management system, that provides easy and unified access to globally distributed storage resources and supports a wide range of use cases from personal data management to data-intensive scientific computations. Onedata enables the creation of complex hybrid-cloud deployments, using private and commercial cloud resources. It allows users to share, collaborate and publish data as well as perform high-performance computations on distributed data. Onedata system consists of zones (Onezone) which enable the creation of federations of data centres and users, storage providers (Oneprovider) which expose storage resources, and clients (Oneclient), who can access their data via a virtual POSIX file system. Onedata introduces the concept of space, a virtual volume, owned by one or more users, where the data is stored. Each space can be supported by a dedicated amount of storage supplied by one or multiple storage providers. Storage providers deploy Oneprovider instance near the storage resources, register it in selected Onezone service to become part of a federation and expose those resources to users. By supporting multiple types of storage backends, such as POSIX, S3, Ceph and OpenStack Swift, and GlusterFS. In large-scale hybrid cloud deployments, it is often the case that data maintained in the private cloud has to be processed on-demand in the public cloud. While deploying remote jobs is today fairly straightforward, and can be automated using several orchestration platforms, making the data available for processing in the remote cloud is a significant challenge. Onedata makes this easy, by enabling automatic, on-demand, block-based data prefetching based on the POSIX requests from user applications and automatically caching the files based on analysis of file popularity. In most cases, prestaging is not necessary at all, as the data blocks are fetched on the fly when requested for reading, however it provides REST API for controlling data replication manually or integrating it with 3rd party services. Currently, Onedata is used in Helix Nebula Science Cloud [2], eXtreme DataCloud [3], PLGrid [4], European Open Science Cloud Hub, and European Open Science Cloud Pilot [6], where it provides data transparency layer for computation deployed on hybrid-clouds. In EOSC-hub [5] it serves as the basis of EGI Open Data Platform, supporting open science use cases such as open data curation (metadata editing), publishing (DOI registration) and discovery (OAI-PMH protocol). 1. Onedata project website. http://onedata.org. 2. Helix Nebula Science Cloud (Europe’s Leading Public-Private Partnership for Cloud). http://www.helix-nebula.eu. 3. eXtreme DataCloud (Developing scalable technologies for federating storage resources). http://www.extreme-datacloud.eu. 4. PL-Grid (Polish Infrastructure for Supporting Computational Science in the European Research Space). http://projekt.plgrid.pl/en. 5. European Open Science Cloud Hub (Bringing together service providers to create a contact point for European researchers and innovators.). https://www.eosc-hub.eu. 6. European Open Science Cloud Pilot (Development of the EOSC-hub). https://eoscpilot.eu.
    
    Speakers: Dr Lukasz Dutka (CYFRONET), Michal Orzechowski (CYFRONET)
    
    Slides
  - 12:00
    
    Frictionless Data Exchange Across Research Data, Software and Scientific Paper Repositories¶ 15m
    
    A single scientific repository is, if considered by itself, of limited value. Real benefits come from the ability to exchange information effectively and in an interoperable way, enabling the development of a wide range of global cross-repository services. However, exchanging metadata and content across scientific repositories is mostly based on a 15-year-old technology, symbolized by the OAI-PMH protocol. This protocol is: 1. is unsuitable when there is a need to exchange large quantities of metadata, 2. suffers from inconsistent implementations across providers and 3. was only designed for metadata transfer, omitting the much needed support for content exchange. In light of these issues, the COAR Next Generation Repositories Working Group recommends the adoption of ResourceSync across repository platforms. As a result, it is important that we fully understand how ResourceSync performs against OAI-PMH. This work is being conducted under the umbrella of the European Open Science Cloud Pilot project from which we received funding to run experimental pilot to provide a fast and highly scalable exchange of data across repositories. The work will assess how scholarly communication resources, i.e. research datasets, scientific manuscripts (research papers, theses, monographs, etc.) and scientific software, can be effectively, regularly and reliably exchanged across systems using the ResourceSync protocol. The underlying aim of this work is set to provide an argument and evidence for modernising existing legacy communication mechanisms routinely used by thousands of research repositories. This will be achieved by running a set of experiments/benchmarks comparing OAI-PMH with ResourceSync along a set of dimensions, scenarios and implementation setups, including: **Architectural** - 1-to-1 synchronization - 1-to-many synchronization (master copy or mirror) experiment - many-to-1 synchronization (aggregator) **Conceptual** - Baseline synchronization - Metadata - Metadata and content - Incremental synchronization - Selective synchronization (PMH Sets, RS capability lists) We will also compare/evaluate the efficacy of ResourceSync against OAI-PMH in terms of: - speed (time) - complexity (steps required to complete) - reliability (recall) - freshness (e.g. average time gap between syncs) The evaluation will also consider different implementation set ups, such as sequential vs parallelized implementation of a ResourceSync client. The proposed talk will concentrate on presenting the first set of results form the evaluation.
    
    Speaker: Dr Petr Knoth (KMi, The Open University)
  - 12:15
    
    Data Challenges at the Square Kilometre Array (SKA)¶ 15m
    
    The Square Kilometre Array (SKA) will be a radio telescope distributed over two continents: In South Africa approx. 190 parabolic antennas will be built, in Australia more than 100,000 dipole antennas. The computing at SKA has to cope with next-generation big data analytics challenges: So many data will be taken that only a tiny fraction can be stored in long term archives. Extracting the relevant astronomical information out of huge data streams has to be done nearly in real-time. Due to the complexity of the workflows an enormous computing power is needed. Moreover, the fantastic resolution of the antennas result finally in 3D-images of the universe, which may become as large as one petabyte: Traditional computing architectures are not designed for analyzing objects of such size. The signals from the antennas are "interfered" in local stations and sent to two computing centers in South Africa and Australia, respectively. The antennas generate a 24/7-stream of "raw data" of the order of 2 Pb/s, which is more than the global internet traffic (~360 Tb/s, Cisco 2016). In both computing centers the incoming data are analyzed iteratively by complicated workflows to reduce the data volumes strongly. The outcome of the central data centers are called *science data products* and will be transported to a few “Regional Centres“. In Europe there will be one virtual Regional Centre that is physically distributed over the European SKA member states. The community of astronomers can access SKA data only via the Regional Centers. The project AENEAS (Advanced European Network of E-infrastructures for Astronomy with the SKA) is developing a design for the European Regional Center. The talk will give an overview of the current status of AENEAS. Huge data objects (~1 PB / object) can only be analyzed sufficiently fast if they are stored "in-memory". This needs a radical change in the design of computing infrastructures away from a "processor-centric computing" to a "memory-driven computing". The talk will give an overview of the results of two recent workshops, where the need of a paradigm shift was discussed by big data analytics experts: - *Exascale Data Center*, Berlin, Jan. 30, 2018 - *Memory-driven Computing for Big Data Analytics*, Berlin, May 30, 2018 see http://bigdata.htw-berlin.de. Finally it will be indicated that the big data analytics challenges at SKA are not just a "do it bigger and do it faster business" (G. Longo). Almost all data (more than 99.999 % of the raw data) are already rejected, before a human researcher will have had the chance to start his analysis. This needs in particular a development of highly parallelizable machine learning techniques, which are currently not available. Suitable statistical procedures are needed for evaluating the quality of the remaining data, as done exemplary in high-energy physics at the Large Hadron Collider (LHC). Moreover, developing a scalable distributed memory-driven computing infrastructure is an interdisciplinary challenge, where scientists of different disciplines and industry have to cooperate.
    
    Speaker: Prof. Hermann Hessling (Univ. of Applied Sciences (HTW) Berlin)
  - 12:30
    
    West-Life Virtual Folder - connecting data and computation for structural biology¶ 15m
    
    West-Life is H2020 project aiming to deliver virtual research environment in order to support integrative research in structural biology. Structural biology involves multiple techniques X-ray crystallography, croyEM, NMR, mass spectroscopy and others. It aims to address two main challenges: - Allow discovery and deliver multiple software tools and techniques to user lowering the effort for installation configuration and integration. - Aggregate scattered data into virtual folder view and allow processing them using uniform interface West-Life Virtual Folder allows to register data storage provider in one place and aggregate them when the data are needed. It supports to register Dropbox or EUDAT’s B2DROP service or any data storage service giving WEBDAV interface. It heavily uses WEBDAV interface and protocol to deliver standard method to download and upload data by other services and web sites. There is possibility to integrate proprietary solution provided by each data storage provider delivering better performance. West-Life Virtual machine templates brings uniform configuration for launching software and processing data. It leverages CernVM-FS technology for distributing software suites, thus they are not included in VM template itself, but are downloaded to VM cache and executed on demand, bringing initial VM image very small (18 MB). Virtual Folder inside virtual machine integrates user's data with software deployed on computation node ready for processing. Currently software suites of CCP4,CCPEM, SCIPION and others are available for user's of VM.
    
    Speaker: Dr Tomas Kulhanek (STFC)
    
    Slides
- 11:30 → 13:00
  Training: The Scientific Scavenger Hunt¶ Auditorium B203
  
  Auditorium B203
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Peter Kraker (Open Knowledge Maps)
  - 11:30
    
    The Scientific Scavenger Hunt: Improve your discovery skills¶ 1h 30m
    
    The open science revolution has dramatically increased the accessibility of scientific knowledge. But what about discoverability? Discovery is in many ways the departure point of research; whether you are starting out in your PhD, initiating a research project or venturing into a different discipline: in many cases, you want to get an overview of an unknown field of research and the most relevant projects therein. The quality of this overview often decides whether research gets reused or duplicated, whether collaborations are formed or such opportunities are missed. However, with 2.5 million papers published every year, and thousands of research projects launched every day, discovery becomes increasingly difficult. Traditional approaches involving search engines providing long, unstructured lists of scientific outputs are not sufficient. We can also see this reflected in the numbers: the vast majority of datasets are not reused, and even in application-oriented disciplines such as medicine, only a minority of results ever gets transferred to practice. But not to worry, open science is here to help: new and innovative tools for exploring scientific knowledge are bridging the gap between accessibility and discoverability. In this workshop, you will learn to improve your discovery skills with two open science tools enabling visual discovery: Open Knowledge Maps (https://openknowledgemaps.org/search), which provides knowledge maps of research topics in any discipline, and VIPER (https://openknowledgemaps.org/viper), which builds on the EOSC via OpenAIRE to enable visual discovery of research projects. You will learn how to get an overview of a scientific field, to identify relevant concepts and to separate relevant from irrelevant content with respect to your information need. This training will be given in the form of an innovative, hands-on format: the Scientific Scavenger Hunt. The Scientific Scavenger Hunt is a fun and fast-paced mix between a pub quiz and a virtual scavenger hunt. In groups, participants try and complete tasks on knowledge maps within a given time limit. They follow hints on knowledge maps that lead you to the correct answer. On the way, they learn what makes a guerilla archivist and why the city of Athens is almost synonymous with insomnia in some communities. And they may even win a prize in the end! We have already conducted this workshop around the world. More than a 1000 participants have participated in this fun, hands-on activity at events such as the Open Science Fair and OpenCon, and we would love to bring it to DI4R. *More information on Open Knowledge Maps: Open Knowledge Maps is based on the principles of open science: we share our source code, content and data under an open license. As a community-driven initiative, we are developing our services together with our advisors, collaboration partners and users. Currently, more than 30,000 users from all around the world leverage our openly accessible discovery tool for their research, writing and studies per month. For more information, please visit https://openknowledgemaps.org*
    
    Speakers: Maxi Schramm (Open Knowledge Maps), Peter Kraker (Open Knowledge Maps)
- 13:00 → 14:15
  
  Lunch 1h 15m
- 14:15 → 15:45
  EOSC Service Architecture¶ Main Auditorium
  
  Main Auditorium
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Dr Giacinto Donvito (INFN)
  slides
  
  2018-10-09-DEEP-DI4R-EOSC-services.pdf
  
  AppDB-openaire.v2.pptx
  
  DI4R_EOSCpilot_Interoperability__final_(1).pptx
  
  EIC_APIs@DI4R_20181009_(1).pptx
  
  EOSC-hub_AAI_-_Service_Architecture.pptx
  
  EOSC_HUB_architecture_WP7_TS-v2.pptx
  
  EOSC_Hub_service_Architecture_DI4R.pdf
  - 14:15
    EOSC Service Architecture: how the services could support the use communities¶ 1h 30m
    
    EOSC-Hub project is actively working on proposing a new Service Architecture, starting from the Service already available at the proposal preparation, plus considering also the new services provided by use communities working in the EOSC-Hub project and the ones implemented by external projects. This activity has the final goal to support the end users communities with powerful and easy to exploit services. In the context of EOSC-Hub we are already working together with the user communities, to gather their requirements and propose a coherent and effective service architecture. We will report on this activity with the aim to help other communities to exploit the services also in different context. The session will present an updated view on the EOSC-Hub Service architecture, as released in the deliverable planned to end of September. We will have one Technical talk where the service architecture will be shown from the point of view of the end users community: how those services could be composed and used by the user to build their own services. We will provide information about the interaction between the services in the EOSC Service catalogue and how they could be used together also if they come from different environment. We will put the EOSC effort to build the service architecture, in the context of others projects in parallel (EOSCpilot, EINFRA-21 project, GEANT4-2, OpenAIRE-Advanced, e-InfraCentral, etc), in order to present to the user community a coherent view of the possibility available and they will evolve. We will put into the agenda: one talk that describe the EOSC-Hub effort in the context of Service Architecture one talk from external project that are providing or willing to provide new services into the EOSC Service Catalogue one talk from EOSCpilot to talk about the work done in the context of service catalogue one talk representative of others efforts in the same context. One example of use communities exploiting services in the service catalogue to build brand-new service usable by end users. In the World Cafe Session, we will also dedicate a short slot to update the audience about the status of the service roadmap and the planned updated.
    
    Speaker: Dr Giacinto Donvito (INFN)
    
    Status and overview of EOSC-Hub service architecture¶ 15m
    
    AAI integration activities in the context of EOSC-Hub¶ 15m
    
    AppDB integration activities in the context of joint effort with OpenAIRE-Advance¶ 10m
    
    EOSCPilot and interoperability architecture¶ 15m
    
    eInfraCentral and service catalog interaction through APIs¶ 10m
    
    DEEP-HybridDataCloud and eXtreamDataCloud contribution to the EOSC Service Architecture¶ 15m
    
    Thematic Services in the context of EOSC-Hub¶ 10m
- 14:15 → 15:45
  EOSCpilot Policy Recommendations¶ Auditorium B104
  
  Auditorium B104
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Dale Robertson (Jisc)
  - 14:15
    
    Introduction¶ 10m
    
    Speaker: Natalia Manola (University of Athens, Greece)
  - 14:25
    
    Policies in the EOSC Through the Lens of Research Infrastructures: The EOSCpilot Policy Recommendations¶ 1h 20m
    
    Background: The EOSCpilot project supports the first phase in the development of the European Open Science Cloud (EOSC) governance. Its objectives include establishing the policy environment required for the effective operation, access and use of the EOSC to foster research and Open Science. Building on a high-level landscape review of European policies of relevance to the EOSC, the EOSCpilot project has developed draft policy recommendations aimed primarily at funders/ministries, research infrastructures and research producing organisations. The policies have been formulated using information from a range of sources, including the EOSCpilot Science Demonstrator pilots, and cover the areas of Open Science and Open Scholarship, Data Protection, Procurement and Ethics. These draft recommendations will be the subject of consultation with stakeholders from July onwards in order to validate them and produce a final set of policy recommendations by the end of 2018. This session will: present an overview of the proposed policy recommendations, focusing on those of most relevance to RIs, and including feedback already received provide an opportunity for RIs to discuss the draft policy recommendations, including their suitability for supporting the implementation of the EOSC and considerations relating to their implementation by research infrastructures and other stakeholders, examining also aspects such as timescales, costs and collaboration requirements focus on key issues and barriers to implementing the EOSC and gather further suggestions for policy actions which would help deliver the EOSC. Who is it for: This session will target digital data infrastructures, i.e. research infrastructures and e-Infrastructures, who envisage being involved in the EOSC and need to align with the emerging EOSC developments. Why is it important: The policy environment will support and complement the EOSC Rules of Participation and implicate the EOSC governance activities and developments. Policy formation for EOSC is key to establishing the EOSC and achieving its aims. The adaptation and adoption of appropriate policies by key stakeholders is of key importance and a very timely activity with the EOSC November 2018 launch around the corner. Objective: The session is an important step in the process of validating the draft policy recommendations to produce a set of final policy recommendations for the EOSC covering Open Science, Data Protection, Procurement and Ethics. It aims to build on prior consultation input, discuss the most important issues for research infrastructures with respect to the EOSC and agree those policy actions which would most appropriately address them. Format: This will be an interactive session involving a small panel including representatives of the EOSCpilot, research infrastructures and e-Infrastructures. It will be facilitated to encourage contributions from the audience to a shared collaborative document capturing their views and suggestions for the “EOSC of the future”, helping to support a constructive and focussed discussion of the desired environment and the steps needed to achieve it.
    
    Speakers: Alex Vermeulen (ICOS ERIC), Bob Jones (CERN), Dale Robertson (Jisc), Iryna Kuchma (EIFL), Natalia Manola (University of Athens, Greece), Dr Pascal Kahlem (ELIXIR)
    
    Slides
- 14:15 → 15:45
  Open Science: Services¶ Auditorium JJLaginha
  
  Auditorium JJLaginha
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Paolo Manghi (Istituto di Scienza e Tecnologie dell'Informazione - CNR)
  - 14:15
    
    Repackaging OpenAIRE Services¶ 15m
    
    In the new phase of OpenAIRE, among the project’s goals, there is the one to repackage OpenAIRE services providing them as complete products to the final users. In fact, OpenAIRE is working to bundle the current services into products to address specific stakeholders’ needs and Product Management processes. Each product has an assigned product manager that foresees the development and implementation of the product and also communicates the vision and the functionalities to the stakeholders. The products will have the form of dashboards, allowing different stakeholders to benefit from tailor-made solutions addressing specific needs. These dashboards are: - Data Provider Manager Dashboard. Already piloted in OpenAIRE2020, it gathers all functionalities that data providers (repository managers, OA journals, CRIS’s, aggregators) use to interact with OpenAIRE: registration and validation, aggregation process and status, registration and visualization of usage statistics and other metrics, interaction with the OpenAIRE Broker (subscription and notification of metadata enrichment). - Research community Dashboard. Being developed in the framework of the OpenAIRE-Connect project, this Dashboard includes functionalities used by research communities to configure and deploy on-demand services: restrict the search, browse, and navigation related outputs, authoritatively tune-up backend text-mining algorithms, reliably monitor and report the research impact of their scientific production, authoritatively provide links between artefacts. In addition to the two dashboards mentioned above, three additional monitoring dashboards will be available to enable the aggregation of results based on the type of stakeholder and the compliance checking. - Funder Dashboard: it allows to monitor research results by aggregated and summarised statistics, to drill down queries by specific facets (timelines, subjects, countries), compliance to OA mandates (with a breakdown for gold/green), correlations to other funding streams, and research trends. - Project Dashboard: all project research results are gathered in one place displaying the compliance with open access mandates, timelines, related data providers, and correlations with other projects. - Institutional Dashboard: it monitors all institution’s related research outcomes, the compliance with funder’s mandate on OA and much more. The dashboards described above are also part of the OpenAIRE service catalogue deployed in collaboration with the eInfraCentral and EOSCpilot projects.
    
    Speaker: Pedro Principe (University of Minho)
    
    Slides
  - 14:30
    
    The OpenAIRE ScholeXplorer Service: aggregation and resolution of literature-dataset links¶ 15m
    
    **OpenAIRE** OpenAIRE is the European infrastructure in support of Open Science. It fosters and monitors the adoption of Open Science across Europe and beyond, at the National and international level and at the research community level. It advocates the importance and the uptake of Open Science-oriented research life-cycles and publishing workflows, in support of reproducible science, transparent assessment, and omni-comprehensive scientific reward. To this aim OpenAIRE leverages the required cultural shift via a pervasive network of people in Europe (NOADs - National Open Access Desks) and beyond (“global alignment” via CORE), and facilitates the technological shift by providing technical services and interoperability guidelines. **Scholix** Under the international forum of the Research Data Alliance and in collaboration with relevant stakeholders in the field, such as DataCite, CrossRef, World Data System, and Elsevier, OpenAIRE has participated to a [Working Group][1] for the definition of the [Scholix framework][2] (Scholarly Link eXchange). The goal of Schoix is to establish a high level interoperability framework for exchanging information about the links between scholarly literature and data. It aims to enable an open information ecosystem to understand systematically what data underpins literature and what literature references data. Scholix maintains an evolving set of Guidelines consisting of: (i) an information model (conceptual definition of what is a Scholix scholarly link), (ii) a link metadata schema (set of metadata fields representing a Scholix link), and (iii) a corresponding XML and JSON schema. Scholix is currently adopted as export format for links by DataCite and CrossRef via the [CrossRef EventData][3] service, by EuropePMC, and by OpenAIRE via the ScholeXplorer service. **ScholeXplorer** Scholexplorer is an OpenAIRE production service that since 2017 offers access to a unique collection of links between publications and datasets collected from publishers (EventData), data centres (DataCite), and institutional and thematic repositories (OpenAIRE). The collection is constantly populated and features 31Mi bi-directional links between 880.000 articles and 5.840.000 datasets from an overall 13.000 providers. The resulting graph of links can be accessed via the [ScholeXplorer portal or via the APIs][4], which support third-party services at resolving publication/dataset PIDs to obtained related datasets or publications - content is also made available as a [JSON dump via Zenodo.org][5]. Since the beginning of 2018 the service has counted around 700 Million requests for PID resolution by third-party services (mainly Elsevier ScienceDirect) which have integrated ScholeXplorer in their workflows to show the list of datasets (publications) linked to their publications (datasets). In this presentation we shall present the benefits of Scholix and the technical challenges underlying ScholeXplorer as a production service (i.e. aggregation, resolution, deduplication of link metadata) and the solutions adopted to achieve the quality of service as agreed on with data centers and publishers, which are today using the service as their main link-exchange channel. [1]: https://goo.gl/F6WEzS [2]: http://www.scholix.org [3]: https://goo.gl/BQ377S [4]: http://scholexplorer.openaire.eu/ [5]: https://goo.gl/1oN2Vm
    
    Speaker: Paolo Manghi (Istituto di Scienza e Tecnologie dell'Informazione - CNR)
    
    Slides
  - 14:45
    
    Open Science for the Neuroinformatics community¶ 15m
    
    OpenAIRE-Connect is a European project which aims at providing services enabling uniform exchange of research artefacts (literature, data, and methods), with semantic links between them, across research communities and content providers in scientific communication. The Neuroinformatics community in OpenAire-Connect is represented by members of the France Life Imaging (FLI) collaboration. Some of the FLI members are also connected to INCF, the International Neuroinformatics Coordinating Facility, to integrate solutions at a global level. In this context, we aim at leveraging OpenAire-Connect services and give our community members the possibility to easily publish and exchange research artefacts from FLI platforms, such as VIP (for processing) and Shanoir (for data management). This will enable open and reproducible science, since literature, data, and methods can be linked, retrieved, and replayed by all the members of the community. VIP (Virtual Imaging Platform) is a web portal (https://vip.creatis.insa-lyon.fr) for the simulation and processing of massive data in medical imaging. By effectively leveraging the computing and storage resources of the EGI e-infrastructure, VIP offers its users high-level services enabling them to easily execute medical imaging applications. VIP has, in June 2018, more than 1000 registered users and about 20 applications open to all its users. Shanoir is an open source neuroinformatics platform designed to share, archive, search and visualize neuroimaging data. It provides a user-friendly secure web access and offers an intuitive workflow to facilitate the collecting and retrieving of neuroimaging data from multiple sources. Shanoir comes along many features such as anonymization of data, support for multi-center clinical studies on subjects or group of subjects. By leveraging OpenAire-Connect services and integrating them into VIP and Shanoir, we aim at providing the neuroinformatics community with open Science tools to enhance the impact of science and research.
    
    Speaker: Sorina POP (CNRS)
    
    Slides
  - 15:00
    
    OpenAIRE Open Science publishing for Research Infrastructures: the EPOS use-case¶ 15m
    
    OpenAIRE is the European infrastructure in support of Open Science. It fosters and monitors the adoption of Open Science across Europe and beyond, at the National and international level and at the research community level. It advocates the importance and the uptake of Open Science-oriented research life-cycles and publishing workflows, in support of reproducible science, transparent assessment, and omni-comprehensive scientific reward. To this aim OpenAIRE leverages the required cultural shift via a pervasive network of people in Europe (NOADs) and beyond (“global alignment” via CORE), and facilitates the technological shift by providing technical services and interoperability guidelines. Among its technical services OpenAIRE provides the Research Community Dashboard (RCD), which offers research communities the functionality to publish, aggregate, and discover their research outputs via a set of underlying OpenAIRE services that interlink publications, datasets, software, experiments and other products to produce a fully-fledged view of a specific scholarly discipline. The [European Plate Observing System][1] (EPOS) is a pan-European distributed Research Infrastructure for solid Earth science to support a safe and sustainable society. Through the integration of National research infrastructures and data, EPOS will allow scientists to make a step change in developing new geo-hazards and geo-resources concepts and Earth science applications to help address key societal challenges. CNR-IREA is an Italian service provider of EPOS whose portfolio includes satellite Earth Observation services aimed at generating value-added products for Solid Earth applications & natural disaster analysis, prevention and mitigation. In collaboration with OpenAIRE, CNR-IREA will integrate its EPOS services with the RCD service in order to ensure publishing of research products and experiments in a way that supports their use, reuse and reproducibility. This presentation will describe the use-case selected to drive the integration: in EPOS user interested in Solid Earth analyses through satellite applications. Such a user can benefit from the on-demand EPOSAR service, that implements an advanced Synthetic Aperture Radar interferometric technique to retrieve Earth surface displacements. In particular, EPOSAR allows the user to select from the Copernicus Programme repositories a dataset of Sentinel-1 satellite images in order to generate ground displacement time series and velocity maps suitable to investigate both natural (earthquakes, volcanic unrests, landslides) and man-made (tunnelling excavations, aquifer exploitation, oil and gas storage and extraction, infrastructures monitoring) hazards. The EPOSAR workflow will interoperate with an EPOS RCD to allow the users to publish in Zenodo.org: the list of processed satellite images as Input Dataset; the output results as Datasets; and the configuration of EPOSAR service, with links to input and output Datasets, as Experiment. Each of these products will have its own DOI, citation metadata, semantics links with other products if needed, and be discoverable through the EPOS RCD. It is of course up to the users to opt when their experiment is mature enough to published in OpenAIRE as a citable and preserved Experiment object, and eventually cite the object from any articles they produce. [1]: https://www.epos-ip.org/
    
    Speaker: Paolo Manghi (Istituto di Scienza e Tecnologie dell'Informazione - CNR)
    
    Slides
  - 15:15
    
    UBORA: A digital infrastructure for collaborative research and development of open-source medical devices¶ 15m
    
    Digital infrastructures are already making a real impact in the way we develop innovative products. Platforms for sharing computer-aided designs have emerged in parallel to the maker movement with the advent of rapid prototyping by 3D printing. Besides, manufacturers of industrial components are also keen to share the CAD files of their products, so as to support designers with engineering design. However, in the biomedical field and in bioengineering research information sharing is not so common, in some cases due to patient privacy protection, but in most cases due to industrial growth strategies, in spite of the benefits that collaborative approaches and the related promotion of open-innovation could bring to patients and society. The UBORA digital infrastructure, presented in this study, has been developed to promote collaborative research and developments in biomedical engineering, especially regarding the collaborative engineering design of biomedical devices. This infrastructure includes: a) A section for promoting open-innovation, in which healthcare professionals and patients can propose needs for novel medical devices. b) A section for project development, through which designers can showcase their proposals or select those from healthcare professionals and develop them, in a guided way, as projects in collaboration with members of the UBORA community. c) A library in form of "wiki" for sharing all the information of the developed biomedical device projects, hence fostering open-source strategies. d) A section providing resources for supporting project development and bioengineering design education for all. UBORA has already enabled the creation of a community of more than 200 developers of biomedical devices and showcased around 10 complete projects and 40 concepts of innovative biodevices and the community and wiki are continuously growing. Main conceptual decisions, taken during the design of this digital infrastructure and key decisions during implementation, together with current challenges, are presented. Potential synergies and collaborative activities with EOSC and EDI are also analyzed.
    
    Speaker: Andres Diaz Lantada (Universidad Politecnica de Madrid)
- 14:15 → 15:45
  Training Data Management Planning¶ Auditorium B203
  
  Auditorium B203
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Maria Francesca Iozzi (SIGMA)
  - 14:15
    
    Planning early, following through: Data Management Planning in the EOSC¶ 1h 30m
    
    **Background** The Open Science paradigm strongly contributes towards lifting the barriers that restrict access and re-use of research data. Aligned with the paradigm, funders and agencies, at European and national level, increasingly promote the adoption of strategies of FAIR and open research data. Such strategies, covering all datasets utilised or generated in the course of a research project, are embodied in the Data Management Plan (DMP). **Why are DMPs important?** Data Management Plans are important for individual researchers, community or institutional data managers and fundholders. Most H2020 proposals now require a DMP as specified in Article 29.3 of the Grant Agreement, and many countries also require this for nationally funded research. The aim of a DMP is to: - engage researchers to plan sustainable, result-oriented and cost-effective research strategies during and beyond the project lifetime, - enable research communities to discover and utilise invaluable, trustworthy data and, - allow funders assess their strategy and actions in a multitude of directions. When applicable, open access to the data, complemented with effective citation mechanism, guarantees visibility of the scientific results, for the benefit of the researcher, of the scientific community and of the society in its whole. **What you will learn?** The training is aimed at people who support research projects or research infrastructures. Specifically you will be guided through real use case scenarios and the use of two emerging DMP tools to learn: - Essential background information on the data lifecycle and the rationale of a DMP; - Procedures and policies to ensure high availability and discoverability of data used/generated (e.g., FAIR, GDPR); - How to effectively implement the FAIR principles to ensure open and reproducible science; - Comply with the H2020 grant requirements and community best practices concerning research data management; - How to relate to existing infrastructures, to ultimately ensure interoperability, reproducibility and long-term preservation of all artefacts, be it data, publications or software. Moreover you will gain an overview of EOSC services engaged in the open research data management lifecycle (storage, access, retrieval, archival etc) and learn how to interoperate with them at an early stage. **Who is it for?** The training is aimed at supporters of research projects and research infrastructures, and other stakeholders that are managing research data. The training material is developed by the EOSC-hub and OpenAIRE-Advance projects.
    
    Speakers: Adil Hasan (SIGMA), George Kakaletris (Athena Research & Innovation Center), Maria Francesca Iozzi (SIGMA), Shaun de Witt (UKAEA)
- 15:45 → 16:15
  
  Coffee 30m
- 16:15 → 17:45
  Data Management Services: Part I¶ Auditorium B104
  
  Auditorium B104
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Dr Johannes Reetz (Max Planck Computing and Data Facility (MPCDF))
  
  slides
  - 16:15
    
    A single data ecosystem as framework for a joint infrastructure data world¶ 15m
    
    There exists a need for better solutions for cross-domain data sharing and research collaboration, especially the need to process a multitude of different data types created in different contexts. To address this challenge the Research Data Alliance (RDE) together with GEDE (Group of European Data Experts) have recently suggested a Data Object (DO) architecture to create a network of DOs linked to persistent identifiers (PID) pointing to associated metadata descriptions connecting multitudes of data repositories. This DO architecture will enable a global data environment that may cause a fundamental changes in data practices for more efficient data processing, data sharing and simpler data process automation, and will certainly effect the area of infrastructure service provision. Inspired by biology, the concept of the ecosystem provides a framework to help to comprehend the inter-wined and highly interdependent nature of increasingly complex data infrastructures involved in cross-domain research and Open Science. To maximize the impact of DO architecture on the data environment, we need to overcome the fragmentation in data ecosystem concepts to work with only a single data ecosystem for the emerging global data infrastructure. Though, different kinds of data ecosystems have already been described, like ecosystems for big data, climate data, biomedical data, open data and personal data, we propose to apply the concept of a single data ecosystem for the whole data world covering all different data types used in research with the DO architecture as its main structural component. The advantage is that such a concept of a single, global data ecosystem may allow unique access to novel ideas and analysis methods that may be useful to further the development and evolution of European research infrastructures and e-infrastructures and improve their operations in the global context. Because research questions have become bigger and more complex the need for cross-domain data sharing has to be addressed. Seeing each infrastructure as a separate ecosystem is too restrictive; one should use a single, extended data ecosystem that covers open and protected, big data and micro data, as well as data from all different research domains. Ecosystems are no fixed structures, but are changing and evolving, going through cycles of growth and reorganization. In this way, they can represent a framework to generate ideas to be applied to data ecosystems and their infrastructure components to study their functioning, further development and sustainability. To support the analysis of the data ecosystem we adapt concepts from ecological analysis, like using as basic components data generators, data consumers, data flows, data re-users, and feedback loops. In summary, the consequent exploitation of the concept of a single data ecosystem may produce novel solutions for the depicted limitations in existing data sharing processes, because the interoperation of research infrastructures and e-infrastructures is a prerequisite for streamlining cross-disciplinary processes and global cooperation that will become easier and more transparent by the proper application of the DO architecture as part of a global data infrastructure.
    
    Speaker: Dr Wolfgang Kuchinke (Heinrich-Heine University Duesseldorf)
  - 16:30
    
    RDM: A library perspective of versioning, curating and archiving research data from diverse domains¶ 15m
    
    Libraries are the vanguards for RDM and digital curation. However, beyond archival preservation, versioning and digital curation of research data adds value to knowledge assets insofar that these can be extended across domains to create services that are useful to the research community. At Bielefeld University, the DFG-funded Conquaire project, a collaboration between CITEC and the Bielefeld University library, has created a generic RDM framework that ensures research data quality using continuous integration (CI) principles in order to ease the process of publishing research data to PUB, our institutional repository which is based on the free and open-source LibreCat software. The Conquaire RDM system (RDMS) automates the analytical reproducibility process by unobtrusively monitoring their research data stored within a GitLab repository to validate its data quality for CSV files. Researchers receive automated quality assessments via email whenever they upload research data into their repository that is automatically monitored using the inbuilt GitLab CI. Furthermore, the continuous integration principle standardizes technology (platforms and tools) which enhances the cross-domain data interoperability in an RDM service. A curated digital dataset that validates standardized formats will mitigate digital obsolescence, thereby making the research data accessible, reusable, and archivable for users indefinitely. Among research artifacts, the software source code used for the analysis being an integral part of a research project can be considered to be a form of data – research publications without the code used to process and visualise the research data cannot be analytically reproduced. The source code also needs to be properly versioned, curated and archived in order to fulfill the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. Currently, in addition to the data quality framework, we are in the process of implementing a generic CI system that automates and aids the data validation system based on the technical stack used by the partner groups. In order to understand the nine research partner groups' software toolkits and data analysis process, we undertook independent reproducibility experiments (ReX) that entailed analytically reproducing one result from a paper already published by these groups. Our research experience during the ongoing collaboration with the case study partners has highlighted the technical challenges that diverse research projects throw up during the process of creating a generic data quality framework. These range from finding common document formats to analyse tools used among the various research groups partnering in the Conquaire project. Finding a balance between this diversity (both technical and data-wise) without disturbing the existing workflow of each research group has thrown up cross-domain challenges that need to be addressed.
    
    Speaker: Mrs Vidya Ayer (Bielefeld University)
    
    Slides
  - 16:45
    
    Addressing sustainable long-term preservation wall for scientific data: the European Trusted Digital Repository (ETDR) service¶ 15m
    
    Digital data preservation should be a key feature of all research projects. Some research data are unique and cannot be replaced if lost or destroyed; scientific results can be considered as trustworthy only they refer to verifiable data. In addition to a bit-stream preservation service that ensures data integrity technically, Trusted Digital Repositories (TDRs) are providing a quality of services that preserve information over a long period of time. This requires extra and certified capabilities in the area of curation, metadata, file formats, long-term preservation, diverse data access levels, data quality assessment based on the FAIR principles, etc. During EUDAT and EUDAT2020, TDRs ever used to preserve research data have been assessed and a European generic, innovative and large added-value service has been developed: the European Trusted Digital Repository (ETDR). This constellation of TDRs and other service providers can offer to scientific communities some important securities on data reuse enabling. The three main guarantees taken by the ETDR are on: - data integrity (i.e. bit-stream preservation), - hardware and software readability (i.e. file formats, emulation…), - and understandability of the information over time (i.e. metadata, information classification …). The ETDR front-office will offer access to EUDAT, EGI and OpenAIRE distributed data storage services. Data that needs to be preserved for the long-term will be automatically ingested into the distributed ETDR back-office infrastructure. In addition, front-ends can be featured in discipline-specific research infrastructures or researcher deposit platforms that do not yet have access to certified TDR service. The ETDR provides also customer support on data management including data management planning and requirements for long-term preservation. Within EUDAT2020, Herbadrop and ICEDIG, the ETDR has ever been used by about ten national institutions that belong to DiSSCo, the e-infrastructure for natural sciences. During the EOSC-Pilot and EUDAT2020 projects, a three partners association (CERN, CINECA, CINES) has demonstrated genericity, scalability and accessibility of the ETDR architecture. The next years would be at the convergence of increasing the research community’s number and the ETDR network expansion.
    
    Speaker: Marion Massol (CINES)
    
    Transparents
  - 17:00
    
    Documenting Heritage Science: A CIDOC CRM-based System for Modelling Scientific Data¶ 15m
    
    The paper presents a complete system for documenting scientific data produced in heritage sciences, based on a data model intended to generate valuable information and suitable metadata for them to be stored, accessed, queried, shared and reused in various contexts and research scenarios. The system is built around the concept of a general meta-model, flexible enough to provide descriptions, in a formal language, of all the entities and issues encountered documenting heritage sciences results. It is inspired by CIDOC CRM principles for data modelling and maintains a full compatibility with CIDOC CRM and its extensions, especially CRMsci, CRMdig and CRMpe. The use of a wide set of thesauri and vocabularies for the standard and unambiguous description of all the entities will guarantee internal coherence at data level. Thus, our system is capable of identifying and modelling physical and digital objects, events, activities and actors, i.e. people and teams involved in the various research events and of providing straightforward connection and interoperability with the general documentation of cultural heritage, tightly linking scientific analyses to their heritage context. The metadata model supported by our system will also stand at the very foundations of DIGILAB, a digital infrastructure developed by the E-RIHS European initiative to facilitate virtual access to tools, services and data for heritage research. DIGILAB infrastructure will rely on a network of federated repositories and will enable finding and accessing data through an advanced semantic search system operating on a registry containing metadata describing individual datasets. Thus, our data model is designed to make DIGILAB compliant with the EU policies and strategies concerning scientific data, including the FAIR data principles, the Open Research Data policy, and the EOSC strategy. It will guarantee data interoperability and will foster re-use of information and services to process the data according to specific research questions and use requirements. First tests have been carried out on datasets resulting from various scientific analyses carried out by different research institutions, including the Italian National Council of Researches (CNR), the Istituto Superiore per la Conservazione ed il Restauro (ISCR), the Opificio delle Pietre Dure (OPD) and the National Institute of Nuclear Physics (INFN). A specific subset of information derived from the activity of the INFN-CHNet network (Cultural Heritage Network of the Italian National Institute for Nuclear Physics) are reported in this paper. The heterogeneity of the network analytical techniques examined and encoded, has ensured a good test bench for the developed model, proving its effectiveness and delineating a solid path for its future developments. The encoding of scientific information by means of our system demonstrates the validity of this approach to different cases and offer an overview of the whole model and of how information encoded by means of its classes and properties will benefit the implementation of the DIGILAB infrastructure.
    
    Speaker: Dr Lisa Castelli (INFN)
    
    Slides
  - 17:15
    
    Building interoperable systems for SeaDataNet community¶ 15m
    
    The SeaDataNet project (https://www.seadatanet.org/) offers a robust and state-of-the-art Pan-European infrastructure to harmonise metadata and data from marine data centers in Europe, and offers the technology to make these data accessible. In the existing SeaDataNet common data index service (http://seadatanet.maris2.nl/v_cdi_v3/search.asp), data is residing at the data centers and is offered on demand across the user requests. However, the process is quite slow and also it was not easy to evaluate the quality of data, as data sets are directly uploaded by the data centers. To overcome this problem, the SeaDataNet community has partnered with the EUDAT CDI in the SeaDataCloud project to move its data to the EUDAT cloud storage and offer data directly from the cloud. Moreover, the community wants to perform quality checks on the data residing in the cloud before making it available for users. To implement the new upgraded system the existing SeaDataNet systems and EUDAT services have to be interoperable. This abstract discusses the solutions chosen for making different existing systems interoperable and the new infrastructure developed for the SeaDataNet common data index service. REST APIs are chosen to enable interaction between EUDAT services and community’s existing systems. Defining REST interfaces facilitated the understanding of different systems and helped in realizing the seamless communication between different systems. B2STAGE REST APIs (https://www.eudat.eu/b2stage) are used for all the interactions between the systems, such as uploading data to EUDAT cloud storage, downloading data from EUDAT cloud storage, performing transformations on the data in the cloud, etc. Moreover, in order to perform additional actions on the data in the cloud, such as checking the quality of data, performing transformations on data and for analyzing the data sets the existing EUDAT services are extended with new components. The EUDAT B2HOST (https://www.eudat.eu/services/userdoc/b2host) is extended to provide a container cluster that supports automatic data management in the cloud. The container cluster is managed using the EUDAT B2STAGE service, which allows systems to automatically run different tools on top of the data by interacting with its API. The technologies used for realizing data management in cloud are are Docker containers, RANCHER container platform, RabbitMQ and Elasticsearch, Logstash, and Kibana (ELK stack). The ability to offer automated data management in cloud could be valuable for other research communities as well. Moreover, the technical solutions chosen for developing the SeaDataNet common data index system could be used as reference solutions for building interoperable systems across different communities.
    
    Speaker: Chris Ariyo (CSC)
    
    Slides
- 16:15 → 17:45
  Have a CoP of T in our Cafe¶ Auditorium B203
  
  Auditorium B203
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Iryna Kuchma (EIFL)
  - 16:15
    Have a CoP of T in our café!¶ 1h 30m
    
    The session is organised by a group of people who coordinate training programmes of research and e-infrastructures and who took the initiative of starting a Community of Practice (CoP) for training coordinators and training managers. Through the CoP we aim to map out the training activities of various pan-European, EOSC-related initiatives and strengthen their training capacity by improved alignment, sharing experiences and good practices, initiating cross-infrastructure training activities. ARDC, CESSDA, DARIAH, EGI, ELIXIR, EOSCpilot, EOSC-hub, EUDAT, FOSTER, FREYA, GÉANT, OpenAIRE and PRACE already expressed interest in participating in the CoP. The workshop is follow-up of a training workshop that was organised by the EUDAT training team in January in Porto and a presentation at the RDA in March. The Café is an ideal format to discuss some of the questions living in the group offline, share experiences of what has worked and what has not worked, share ideas and strategy to help the multi-domain knowledge transfer across borders. Over the coming years there will be new challenges to capture coming from the needs of cross-domain data-driven science. The unprecedented access to data and the computational ability to process it will produce new accelerated breakthroughs. During this session, we’ll focus on exciting new developments, we will address urgent gaps and, in the end, we will try to highlight common strategies that can be adopted for improving the knowledge transfer within the group. Agenda: 1. Introduction of the Community of Practice and to the session - Iryna Kuchma, OpenAIRE 2. Open badges: What are they and how are they used? - Giuseppe La Rocca, EGI Foundation 3. Skills and competences frameworks, including the EOSCpilot consultation on its Skills and Capability Framework – Angus Whyte, DCC 4. How to make training materials discoverable? - Ellen Leenarts, DANS 5. Making an impact that matters – Irina Mikhailava, GÉANT 6. Organising Summer Schools and reflecting on the approach What’s in it for you? You should join this session when you want to meet your fellow training coordinators and become part of the Community – let’s have a “CoP of T” together!
    
    Speakers: Angus Whyte (UE), Mrs Ellen Leenarts (DANS), Dr Giuseppe La Rocca (EGI.eu), Iryna Kuchma (EIFL)
    
    Slides
    
    1. Introduction to the session.pptx
    
    2. OpenBadges.pptx
    
    3. EOSCpilotSkills.pptx
    
    4. Improve_discovery_training_materials_DI4R_CoP.pptx
    
    5. Making impact.pptx
- 16:15 → 17:45
  Open Science: Skills and Credits¶ Auditorium JJLaginha
  
  Auditorium JJLaginha
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Paolo Manghi (Istituto di Scienza e Tecnologie dell'Informazione - CNR)
  - 16:15
    
    The Open Science training hub FOSTER Plus - new resources and courses¶ 15m
    
    The EU-funded project FOSTER Plus (2017-2019) (www.fosteropenscience.eu) offers different training opportunities to support researchers to move beyond simply being aware of Open Science (OS) approaches to being able to apply them in their daily workflows. The existing FOSTER portal is becoming an OS training hub, where users can find training materials, advanced-level and discipline-specific courses and resources that build capacity for the practical implementation of OS and promote a change in culture. The project developed a Toolkit consisting of ten new OS online courses (https://www.fosteropenscience.eu/toolkit) addressing key OS topics to enable researchers putting OS into practice. The courses do not provide comprehensive coverage of all possible issues that may fall under a given topic but rather provide focused, practical and, where relevant, discipline specific examples to try and answer some of the burning questions researchers may have about practicing OS. Courses include interactive content to ensure the training is engaging and that capability can be assessed for issue of a badge upon completion. The courses developed include: What is OS?; Best practices; Ethics & data protection; Open access publishing; Open peer review; Managing & sharing research data; Open source software & workflows; OS & innovation, Sharing preprints and Licensing. In addition to these stand-alone courses, there are learning pathways (www.fosteropenscience.eu/badges) through the content to help researchers to hone their skills in specific areas, such as the reproducible research practitioner, the responsible data sharer, the open peer reviewer, the open access author and the open innovator. Furthermore, the project provides a learning management system to facilitate moderated OS courses. We are reusing and reshaping training content deposited within the FOSTER portal during the first phase of the project (2014-2016) and working with our discipline specific partners representing the arts and humanities, social sciences, and life sciences to provide relevant examples. All content is openly licensed and easy to download. Apart from creating new courses, FOSTER follows a train the trainer approach to multiply training forces. The project provides trainings, infrastructure and materials to support people seeking to organize OS training in their own institutions. We initiated an OS trainer bootcamp and the writing of an OS training handbook to equip future trainers with methods, instructions, exemplary training outlines and inspiration for their own OS trainings. Additionally, the project gives recommendations for OS training, provides the infrastructure to conduct moderated courses and to upload or download materials for re-use. Users can also promote their training events in a calendar and maintain their trainer profiles. These profiles are discoverable via a trainers directory and enable users looking for a speaker or advice to contact OS trainers from their region or with a specific expertise directly. The FOSTER portal is a hub for people who want to learn about OS as well as for people delivering OS training. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No.741839.
    
    Speakers: Helene Brinken (Georg-August University Göttingen), Mrs Maria Antónia Correia (University of Minho)
  - 16:30
    
    Skills for dealing with research software as an element of open science¶ 15m
    
    Software plays a crucial role in the research lifecycle. Moreover, software is, alongside text and data, an essential element of open science. In this sense, the FAIR (findable, accessible, interoperable, and reusable) principles apply not only to data but also to research software. In conjunction with text and data, making research software FAIR contributes to making research output comprehensible, verifiable, reproducible, and reusable. The process of making research software FAIR involves many aspects ranging from software development issues and documentation to legal aspects like licensing. The skills of all those involved play a major role here. Developing, expanding and contextualising skills for dealing with research software is an important contribution to increase awareness for the importance of software in the research process and to establish research software as an element of open science. **Developing** IT skills in higher education is important across disciplines. For qualification works (Bachelor, Master, PhD thesis) where research software development plays a role, integrating expertise from scientific disciplines and the field of computer science need to go hand in hand. Moreover, cooperation between higher education bodies, such as between universities and research institutions, makes education paths more interoperable. Especially for researchers who develop software as part of their research activity but have no IT background, providing introductory courses to software development and dealing with research software throughout the research life cycle is an important starting point. Formats include, for instance, seminars, workshops, phd schools as well as online courses. **Expanding** skills concerns both expert scientists in order to deepen their knowledge about dealing with research software and IT specialists in order to master the stat-of-the-art techniques and tools as well as to gain a better understanding of discipline specific knowledge. Formats include workshops, hacky hours, hackathons and software carpentry. **Contextualising** the skills by means of a technical and human infrastructure is vital. By providing a technical infrastructure to (collaboratively) develop, test, review, publish and archive research software, research institutions and universities can create an environment that encourages researchers to apply FAIR principles to research software. By fostering professional networks and communities of practice the formal acquisition of skills is complemented by a practice-based exchange of knowledge and experiences. Finally, providing career opportunities that take the multifaceted skills needed for dealing with research software into account are a means to increase incentives and rewards for efforts put into dealing with research software. In this presentation we want to discuss approaches that can foster open science with a focus on skills for dealing with research software. We want to provide general arguments and give specific examples from initiatives in the Helmholtz Association in Germany.
    
    Speaker: Kaja Scheliga (Helmholtz Association)
    
    Slides
  - 16:45
    
    Data2Paper: Giving Researchers Credit for their Data¶ 15m
    
    Data papers cover methodological detail that is not otherwise captured and published in traditional journal articles and/or dataset metadata. As such, they can improve the findability and reusability of the underlying dataset but it also addresses some deeper underlying concerns. A number of disciplines are experiencing a “crisis of reproducibility” as a result of the inadequacy of information provided by traditional papers and data publication alone, leading to increased retractions and reduced credibility. At the same time, the lack of an avenue for publishing negative results from failed methodological approaches leads to unnecessary repeated efforts at a time when funders are pressing for increased efficiency in the use of experimental resources. Arising from the Jisc Data Spring Initiative,[1] a team of stakeholders (publishers, data repository managers, coders) has developed a simple ‘one-click’ process for submitting data papers related to material in a DataCite/ORCID compliant repository. DataCite and ORCID information is transferred from a data repository to the cloud-based Data2Paper app based on the Fedora/Samvera platform. In the app, the text of the data paper is combined with existing metadata drawn from DataCite and ORCID to generate a package suitable for automated transfer into a journal submission platform without further user interaction. By reusing metadata that has already been previously entered/curated, the process is both simplified and made less error prone. Currently, a small number of repositories have developed specific connections to a small number of journals but the cost of maintaining those links is not scalable in the longer term. Data2Paper aims to provide a single connection point for a partner journal or repository and manage the process of metadata and paper submission. In addition, Data2Paper supports submission to preprint archives either in conjunction with a (possibly later) journal submission or as a publication route in its own right. Data2Paper represents a logical extension of the RDM workflow in EOSC services that currently ends with the deposit of data in a suitable repository and the generation of a DataCite DOI with accompanying metadata. It also integrates with the OpenAIRE SCHOLIX hub to detect completion of the publication process, or to encourage authors to chase publishers if necessary! The presentation will discuss the history of the project, including the results of an initial feasibility study, along with a demonstration of the current pilot implementation with targeted groups. We will outline the current work being done to transition to an operating service with a sustainable business model and consider how the service might develop in the future in conjunction with various other activities in the area, such as the Research Graph, RDA areas of activity (Data Journals Publishing Policy, Credit and Attribution, and Exposing Data Management Plans), issues of impact, reproducibility, FAIR Data, persistent identifiers and new metrics by various national and international bodies.
    
    Speaker: Neil Jefferies (Jemura Ltd)
  - 17:00
    
    Ellip: a collaborative workplace for EO Open Science¶ 15m
    
    Earth observations from satellites produce vast amounts of data. In particular, the new Copernicus Sentinel missions are playing an increasingly important role as a reliable, high-quality and free open data source for scientific, public sector and commercial activities. Latest developments in Information and Communication Technology (ICT) facilitate the handling of such large volumes of data, and European initiatives (e.g. EOSC, DIAS) are flourishing to deliver on it. In this context, Terradue is moving forward an approach resolutely promoting an Open Cloud model of operations, along with Cloud Services (the new ‘Ellip’ solutions) for cross-domain cooperation and applied innovation, supporting users with a collaborative work environment on the Platform. With solutions to transfer EO processing algorithms to Cloud infrastructures, Terradue Cloud Platform is optimising the connectivity of data centres with integrated discovery and processing methods. This is for example the case with NextGEOSS, the European Data Hub and Platform, a EC contribution in support of the Group on Earth Observations initiatives and communities, or the Geohazards Exploitation Platform, an R&D activity funded by ESA. Implementing a Hybrid Cloud model, and using Cloud APIs based on international standards, the Platform Terradue fulfils its growing user needs by leveraging capabilities of several Public Cloud providers. Operated according to an “Open Cloud” strategy, it involves partnerships complying with a set of best practices and guideline: - Open APIs. Embrace Cloud bursting APIs that can be easily plugged into the Platform’s codebase, so to expand the Platform offering with Providers offering complementary strategic advantages for different user communities. - Developer community. Support and nurture Cloud communities that collaborate on evolving open source technologies, including at the level of the Platform engineering team, when it comes to deliver modular extensions. - Self-service provisioning and management of resources. The Platform’s end-users are able to self-provision their required ICT resources and to work autonomously. - Users rights to move data as needed. By supporting distributed instances of its EO Data management layer, the Platform delivers the required level of data locality to ensure high performance processing with optimized costs, and guarantees that value added chains can be built on top of intermediate results. - Federated Cloud operations. The Platform’s collaborative environment and business processes support users to seamlessly deploy apps and data from a shared marketplace and across multiple cloud environments. Moreover, Terradue has learned from past activities (2012-2017) to manage users communities in many scientific domains, and to support their collaborative work in accessing Open Data, using Open source software, and contributing research products as part of the Open Science principles. Ellip is the new Terradue Cloud Platform, a development stemmed by this learning, that incorporates open notebook science (based on the Jupyter Notebook open-source application) for the design, integration, testing, deployment and monitoring of scalable EO data processing chains.
    
    Speakers: Dr Cesare Rossi (Terradue), Fabrice Brito (Terradue), Herve Caumont (Terradue)
    
    Slides
- 16:15 → 17:45
  Rules of Participation for EOSC¶
  The European Open Science Cloud (EOSC) brings together research and e-Infrastructure providers to provide a world-class infrastructure environment for excellent science. The aim of EOSC is to support research communities and scientists to discover, request, access and use services and resources they need to pursue their research in an open framework.
  
  The notion of Rules of Participation has been proposed to specify the conditions under which any service provider may participate in EOSC. According to the SWD EOSC Implementation Roadmap published by the European Commission in March 2018, these rules are expected “to set out in a transparent and inclusive manner the rights, obligations and accountability of the different stakeholders taking part in the initiative (e.g. data producers, service providers, data and service users). However, it is also foreseen that “these rules will apply differently to EOSC participants, depending on their maturity and role (service providers vs. users, scientists or innovators), location (EU vs. global research partners), and would need to respect the specificities of different scientific disciplines”.
  
  Several initiatives have started to address this important topic. The EOSCpilot project has delivered a minimal set of rules following a consultation process with e-Infrastructure and research infrastructure stakeholders. The EOSC-hub project is approaching the topic from a service provisioning perspective, in order to set up common principles for federating service providers as part of the Hub. The 2nd High Level Expert Group on EOSC has just launched an open consultation and is gathering further input from stakeholders with a view to propose an initial set of rules which could be taken up by EOSC.
  
  In this World Cafe session, we will review the current state of the discussion regarding these EOSC Rules of Participation, by presenting the work from the EOSCpilot project, the EOSC-hub project and the 2nd High Level Expert Group on EOSC on this important topic. We will identify commonalities and differences between the three initiatives and discuss the next steps.
  
  Via a panel discussion, valuable feedback will be collected from the audience and from the presenters on the current status and direction taken in designing the rules.
  
  Target audience
  - Community representatives interested in joining or using the EOSC
  - Service providers interested in offering services via EOSC
  - Representatives from funding agencies interested in the rules of participation of EOSC
  Convener: Mark Sanden (SURFsara BV)
  - 16:15
    
    The experience from the EOSC HLEG and the EOSCpilot open consultation¶ 20m
    
    Speaker: Dr Isabel Campos (CSIC)
    
    Slides
  - 16:35
    
    Minimal set of Rules of Participation for Service Providers and Users in EOSC¶ 20m
    
    Speaker: Pascal Kahlem (ELIXIR)
    
    Slides
  - 16:55
    
    Rules of Participation: from theory to practice¶ 20m
    
    Speaker: Mark Sanden (SURFsara BV)
    
    Slides
  - 17:15
    
    eInfraCentral Service Description Template¶ 10m
    
    Speaker: Jorge Sanchez (JNP)
    
    Slides
  - 17:25
    
    Panel discussion¶ 20m
    
    Slides
- 17:45 → 18:45
  
  Zapping Session¶ Main Auditorium
  
  Main Auditorium
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  This session features "one minute" lightning talks from poster presenters. All the posters available here https://indico.egi.eu/indico/event/3973/page/1 will be pitched.
  After the zapping session, a competition for the best poster will be launched on Twitter. The poster with the highest number of votes will get an award.
  
  Check the list of posters and presenters here https://indico.egi.eu/indico/event/3973/page/1
  
  Convener: Ms Sara Garavelli (Trust-IT Services ltd)
- 19:00 → 20:30
  
  Networking Cocktail 1h 30m Lunch place (Refeitorio)
  
  Lunch place (Refeitorio)
  
  Lisbon
  
  ISCTE, University of Lisbon
Wednesday, 10 October ¶
- 08:45 → 11:00
  Plenary¶ Main Auditorium
  
  Main Auditorium
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Jorge Gomes (LIP)
  - 08:45
    
    Pl@ntNet: towards the recognition of the world's flora¶ 30m
    
    Automated identification of plants and animals have improved considerably in the last few years, in particular thanks to the recent advances in deep learning. In 2017, a challenge on 10,000 plant species (PlantCLEF) resulted in impressive performances with accuracy values reaching 90%. One of the most popular plant identification application, Pl@ntNet, nowadays works on 18K plant species. It accounts for million of users all over the world and already has a strong societal impact in several domains including education, landscape management and agriculture. The big challenge, now, is to train such systems at the scale of the world’s biodiversity. Therefore, we built a training set of about 12M images illustrating 275K species. Training a convolutional neural network on such a large dataset can take up to several months on a single node equipped with four recent GPUs. Moreover, to select the best performing architecture and optimize the hyper-parameters, it is often necessary to train several of such networks. Overall, this becomes a highly intensive computational task that has to be distributed on large HPC infrastructures. In order to address this problem, we used the deep learning framework Intel CAFFE coupled with Intel MLSL library. This experiment was carried out on two french national supercomputers, their access was offered by GENCI. The first experiment was carried out on Occigen@CINES, a 3.5 Pflop/s Tier-1 cluster based on Broadwell-14cores@2.6Ghz nodes. The second uses the Tier-0 «Joliot-Curie»@TGCC, a BULL-Sequana-X1000 cluster integrating 1656 nodes Intel Skylake8168-24cores@2.7GHz. We will report our experience using these two platforms.
    
    Speaker: Alexis Joly (Pl@ntNet)
    
    Slides
  - 09:15
    
    INFRAEOSC-02-2019: Prototyping new innovative services for EOSC¶ 30m
    
    The forthcoming call for proposals INFRAEOSC-02-2019 aims at designing and prototyping novel innovative digital services, that cover diverse aspects of the research data cycle, and will be accessible through the EOSC portal. The services will address current gaps in the offering, foster interdisciplinary research and serve the evolving needs not only of researchers but also of industry and the public sector. Consortia should consider innovative models of collaboration and incentive mechanisms for a user oriented open science approach. The participation of SMEs in the consortia is encouraged. The call will open on 16 October 2018, with deadline on 29 January 2019, and a total budget of 28.5 Meuro. More information: https://ec.europa.eu/research/participants/portal/desktop/en/opportunities/h2020/topics/infraeosc-02-2019.html
    
    Speakers: Dr Augusto Burgueño Arjona (Head of Unit "eInfrastructure & Science Cloud", Directorate General for Communications Networks, Content and Technology, European Commission), Ms Georgia Tzenou (Programme Officer, Unit "eInfrastructure & Science Cloud", Directorate General for Communications Networks, Content and Technology, European Commission)
    
    Slides
  - 09:45
    
    Scientific Panel "E-infrastructure: what is it and how does it help you?"¶ 1h 15m
    
    Pannelists: Alexandre Bovin, Sorina Camarasu-Pop, Wolfgang zu Castell, Matthew Dovey, Andy Goetz, Erik Huizer, Kristel Michielsen
    
    Speaker: Prof. Sinead Ryan (Trinity College Dublin)
- 11:00 → 11:30
  
  Coffee 30m
- 11:30 → 13:00
  Computing Services: Part I¶ Auditorium B104
  
  Auditorium B104
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Dr Jakob Tendel (DFN)
  - 11:30
    
    Adaptive, Trustworthy, Manageable, Orchestrated, Secure, Privacy-assuring, Hybrid Ecosystem for REsilient Cloud Computing - ATMOSPHERE¶ 15m
    
    Trust is considered as a key challenge for applications dealing with data on cloud services, whichis built on the basis of guarantees, previous successful experiences, transparency and accountability. It is neither absolute nor constant. A service will have some degree of trust, which could be sufficient for its specific usage within a particular context. Trust is an attribute which is hard to build, but it is easy to be lost. It requires a priori certification and continuous verification and assurance. Evaluating trust involves many metrics (e.g., scalability, availability, performance, robustness, security, privacy assurance, dependability, etc.), but there is currently a lack of technologies and frameworks to build trust on cloud and Big Data applications, both from the self-evaluation and the dynamic adaptation perspectives. To cover the above gap in this research area, we present the Adaptive, Trustworthy, Manageable, Orchestrated, Secure, Privacy-assuring, Hybrid Ecosystem for REsilient Cloud Computing (2017-2019) (hereinafter “ATMOSPHERE”), which is a 24-month Research and Innovation Action, funded by the European Commission under the H2020 Programme and the Secretary of Politics of Informatics (SEPIN) of the Brazilian Ministry of Science, Technology, Innovation and Communication (MCTIC). ATMOSPHERE aims at designing and developing a framework and a platform to implement trustworthy cloud services on a federated intercontinental hybrid resource pool. To achieve cloud computing trust services, ATMOSPHERE focuses on providing four components: - A dynamically reconfigurable federated infrastructure providing isolation, high-availability, Quality of Service and flexibility for hybrid resources, including virtual machines and containers. - Trustworthy Distributed Data Management services that maximise privacy when accessing and processing sensitive data. - Trustworthy Distributed Data Processing services to deploy adaptive applications for Data Analytics, providing high-level trustworthiness metrics for computing fairness and explainability properties. - An trustworthiness evaluation and monitoring framework, to compute trustworthiness measures from the metrics provided by the different layers, and able to trigger adaptation measures when needed. The different trustworthiness properties identified need to be considered at different layers: - The federated cloud platform will provide isolation, stability and Quality of Service guarantees. The cloud platform will enable the dynamic reconfiguration of resource allocation to applications running on federated networks on an intercontinental shared pool. - The Trustworthy Distributed Data Management services will provide privacy risk analysis of the processing of sensitive data by proprietary algorithms on enclaves, guaranteeing that neither the application developer sees the data nor the data owner sees the processing code. - The Trustworthy Distributed Data Processing services will provide a Virtual Research Environment to compute the fairness (i.e. the bias towards ethically affected data such as sex, race, education, etc.) of the Data Analytics models, and the explainability of such models, maximising transparency. - The Trustworthy evaluation and monitoring framework will provide quantitative scores of the trustworthiness of an application running on the ATMOSPHERE platform. More information about ATMOSPHERE can be found in the website (https://www.atmosphere-eubrazil.eu/), in Twitter (@AtmosphereEUBR) and LinkedIn (https://www.linkedin.com/in/atmosphere/).
    
    Speaker: Dr Ignacio Blanquer (UPVLC)
    
    Slides
  - 11:45
    
    HNSciCloud – Large-scale data processing and HPC for science with T-Systems hybrid cloud¶ 15m
    
    As the result of joint R&D work with 10 of Europe’s leading public research organisations, led by CERN and funded by the EU, T-Systems provides a hybrid cloud solution, enabling science users to seamlessly extend their existing e-Infrastructures with one of the leading European public cloud services based on OpenStack – the Open Telekom Cloud. With this new approach large-scale data-intensive and HPC-type scientific use cases can now be run more dynamically, reaping the benefits of the on-demand availability of commercial cloud services at attractive costs. Over the course of the last year, the prototyping and piloting has confirmed, that science users can get seamless, performing, secure and fully automated access to cloud resources over the GÉANT network, simplified by the identity federation with eduGAIN and Elixir AAI. Users can work in a cloud-native way, maintaining existing toolsets or choose from a large and fast-growing community other OpenStack and S3-compatible tools, e.g. Ansible and Terraform to run and manage applications. Users remain in full control and have access to all native functions of the cloud resources, either through web browser, APIs or CLI. Cloud Management Platforms or Broker solutions are not needed, but may be added if further abstraction is required. The extensive service menu of Open Telekom Cloud – based on OpenStack – is opening up new functionality and performance for scientific use cases with build-in support for e.g. Docker, Kubernetes, MapReduce, Data Management, Data Warehouse and Data Ingestion services. The services can be combined with a wide range of compute and storage options. Compute can consist of any combination of containers, virtual, dedicated or bare metal servers. Server-types can be optimized for disk-intensive, large-memory, HPC or GPU applications. The extensive network and security functions enable users to maintain a private and secure environment, whereby access to services can make full use of 10G networking. The is extended with the new Hybrid service, providing the user with a dedicated fully managed on-premise cloud as complement to the public cloud service. The presentation will give an overview of the performance and scale of use cases that have been successfully deployed. It will address how large-scale data can be processed at new performance levels with hundreds of containers and how data can be processed in an intelligent way by pre-fetching the data or leaving the data remote at the existing infrastructure, making use of the state-of-the-art Onedata Data Management solution from Cyfronet. Furthermore, the results of the new high level of transparency and budget control developed will be demonstrated. Ten of Europe’s leading public research organisations led by CERN launched the Helix Nebula Science Cloud (HNSciCloud) Pre-Commercial Procurement to establish a European hybrid cloud platform that will support the high-performance, data-intensive scientific use-cases of this “Buyers Group” and of the research sector at large.
    
    Speaker: Jurry Mar, de la (T-Systems International GmbH)
  - 12:00
    
    Rootless containers with udocker¶ 15m
    
    Technologies based on Linux containers have become very popular among software developers and system administrators. The main reason behind this success is the flexibility and efficiency that containers offer when it comes to pack, deploy and run software. A containerized version of a given software can be created including all its dependencies, so that can be executed seamlessly regardless of the Linux distribution in the target hosts. Linux containers are also very well suited to the heterogeneous run-time environments that researchers face today when running complex applications across computing resources such as laptops, desktops, Linux interactive clusters, cloud providers, throughput computing and high performance computing infrastructures. udocker is a tool developed by LIP in the context of the INDIGO-DataCloud project that addresses the problematic of executing Docker containers in user space, i.e. without installing additional system software, without requiring any administrative privileges and in a way that respects resource usage policies, accounting and process controls. udocker aims to empower users to execute applications encapsulated in Docker containers easily in any Linux system including computing clusters regardless of Docker or Linux namespaces being locally available. udocker provides a command line interface similar to Docker and implements a subset of its commands aimed at searching, pulling, importing, loading and executing containers in a Docker like manner respecting much of the container metadata. The self installation allows a user to transfer the udocker Python script, execute it and automatically pull the required tools and libraries which are then stored in the user directory. This allows udocker to be easily deployed and upgraded by the user himself without system administrator intervention. All required binary tools and libraries are provided with udocker and compilation is not required. udocker is an integration tool that incorporates several execution methods giving the user the best possible options to execute their containers according to the target host capabilities. Several interchangeable execution modes are available, that exploit different technologies and tools, which are integrated by udocker to enable execution both in older and newer Linux distributions. Currently four execution modes are available which can be selected dynamically, namelly: * system call interception and pathname rewriting via PTRACE using a modified PROOT * dynamic library call interception and pathname rewriting via ld_preload using a modified fakechroot * Linux unprivileged namespaces using runC * Linux namespaces using Singularity where available Each approach has its own advantages and limitations, and therefore an integration tool offers flexibility and freedom of choice to adapt to the application and host characteristics. udocker is been successfully used to support execution of high throughput computing, high performance computing (MPI) and GPGPU based applications in many datacenters and infrastructures including EGI. The udocker has more than 300 stars on github (https://github.com/indigo-dc/udocker). This presentation will provide further information about udocker and will highlight several user cases.
    
    Speaker: Jorge Gomes (LIP)
  - 12:15
    
    Composition and Deployment of Complex Container-Based Application Architectures on Multi-Clouds¶ 15m
    
    Cloud computing has been established in recent years as the key technology to offer on-demand access to computing and storage resources. This has been exemplified by both public Cloud providers and on-premises Cloud Management Platforms such as OpenNebula and OpenStack, out of which federated large-scale Cloud infrastructures to support scientific computing have been established, such as the EGI Federated Cloud. Indeed, the European Open Science Cloud (EOSC) is foreseen to consist of a federating core which provides seamless access to a wide range of publicly funded services supplied at national, regional, and institutional levels for science and innovation. These last years have witnessed the rise of the OASIS TOSCA (Topology Orchestration for the Specification of Cloud Applications) standard, adopted by several European projects. This standard allows one to specify the components that underpin an application architecture using a high-level YAML-based language which can be extended to include additional components to satisfy the requirements of a wide variety of applications. However, the recent advances in computing have revealed two major trends that can greatly benefit the application delivery and the computational performance: Linux containers and GPU computing. To this aim, the Horizon 2020 DEEP-Hybrid DataCloud project is developing innovative services to facilitate the composition and deployment of complex cloud application architectures across multiple Clouds (both private and public ones). Therefore, we describe in this presentation the adoption of a visual composition approach of TOSCA templates (based on Alien4Cloud), in order to facilitate the widespread adoption of the standard, and its integration with the INDIGO-DataCloud Orchestrator, which is already part of the EOSC-HUB service catalogue. With this approach, the user can visually compose complex applications that involve, for example, the dynamic deployment of a container orchestration platform on an IaaS Cloud site that executes a highly-available Docker-based application to facilitate application delivery. The users can also deploy an Apache Mesos cluster with GPU support that contains a deep learning application for the recognition of certain plant species, offered as a service to a community of users. This introduces unprecedented flexibility, from visual composition, to the automated application delivery, using a graphical interface that is already integrated with an Orchestrator layer that performs resource provision from multiple Clouds and application configuration. The integration of easy-to-use graphical interfaces builds a bridge between the users and the orchestration services.It also represents a step forward to foster the adoption of innovative computing services that are hidden from the user, as they can focus on the high-level description of the services requirements and definition, instead of working on their technical implementation.
    
    Speaker: Andy S Alic (UPVLC)
    
    Slides
    
    Video
  - 12:30
    
    Applications Database: New features for user communities¶ 15m
    
    The EGI Applications Database (AppDB) is a central service that stores and provides information about software solutions in the form of native software products and virtual appliances, about the scientists involved, and about publications derived from the aforementioned solutions. Furthermore, through its VMOps Dashboard, it enables users to deploy and manage Virtual Machines on the EGI Cloud infrastructure. **Persistent Identifiers and OpenAIRE integration** The AppDB’s development process has always been focused on providing a solid user experience, by adding new and improving on existing features. In this light, AppDB has recently been extended with *support for persistent identifiers* (PIDs), via GRNet’s HANDLE.NET service, for each registered solution. This makes sharing, documenting, and referencing solutions easier and more consistent, both for end users, as well as between services. As an example of the latter, this new feature allows for tighter, two-way integration with OpenAIRE. AppDB has been working on improving its existing integration of offering OpenAIRE data about projects and organizations within its portal and on exporting data about software solutions and virtual appliances back to OpenAIRE through its new OAI-PMH service. **Improvements to VA management and VM operations** As cloud-related services are rapidly proliferating, a versatile, friendly user experience is capital to their success. Up until now, the AppDB portal required that users maintain the information for each release of a virtual appliance, manually. This may become cumbersome to VA authors that use automated services or continuous integration processes to develop and build new VAs. In order to be able to integrate with such automated release flows, a continuous delivery policy has been introduced. When this policy is enabled and configured for a VA, it allows the AppDB backend to monitor for new virtual appliance releases and automatically publish them in the AppDB registry, without requiring any user interaction through the portal. Moreover, with respect to VM operations through the VMOps dashboard, some of AppDB latest developments have been focused on giving users *more control over the resources acquired by their deployed VMs*. Among other features, users can now request and release public IP addresses and attach new block storages at point of the VM lifecycle. Finally, with the upcoming use of OpenID connect, users may authenticate to AppDB’s backend services and access their deployed VMs without the need of intermediate proxy certificate. **Consolidation of backend services** Stable and well-tuned backends are crucial to a satisfactory end result to frontend services such as the AppDB portal and its VMOps dashboard. To this end, a new information system service has been developed to harvest and correlate infrastructure information from resource providers and other external services. Its goal is to unify and satinize infrastructure information and provide simple query interfaces from a single access point. Furthermore, as OCCI is becoming obsolete and difficult to maintain for the resource providers, efforts are being made to populate VM image access information through each available Cloud Management Framework (CMF) native API, instead of relying on OCCI semantics.
    
    Speaker: Marios Chatziangelou (IASA)
    
    Slides
- 11:30 → 13:00
  Shaping the EOSC service roadmap¶ Auditorium B203
  
  Auditorium B203
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Sergio Andreozzi (EGI.eu)
  - 11:30
    Shaping the EOSC service roadmap: what users need¶ 1h 30m
    
    The EOSC is an ambitious initiative aiming at the federation of existing and planned digital infrastructures for research. It seeks to remove barriers among disciplines and countries and make it easier for researchers to share and access the digital resources they need. It will mobilise service providers from both public and private sectors, funders, research communities and other relevant stakeholders. In order to be successful, it needs to meet current and emerging needs of researchers and to rely on sound business models that stimulate service providers to join the ecosystem, users to utilise the services, and funders to support them. In this area, the EOSCpilot project explores possible services to be part of the EOSC service catalogue by collaborating with a number of science demonstrators that explore, use and evaluate services and service concepts that are already available at resource providers. The experiences of the demonstrators are coupled back to the project. The EOSC-hub project started in January 2018 has launched an initial catalogue of services pre-selected via an open call mechanism and publicly accessible to researchers. Looking forward to the evolution of the EOSC, it is essential to understand and prioritise what services are most needed and that should be added to the future EOSC service portfolio and importantly, what criteria are to be used for uptake in the service portfolio. The goal of this session is to take advantage of the collective knowledge of the audience to extract high-level needs for services and identify priorities for the coming years to develop a service roadmap. The approach is to use the world cafe style where, after setting the context, small discussion groups are created (e.g. homogeneous by stakeholder category) and are asked to answer specific questions (see more about world cafe session format: http://www.theworldcafe.com/wp-content/uploads/2015/07/Cafe-To-Go-Revised.pdf). At the end of the discussion phase, outputs per group will be collected and summarised and will be used later on as inputs to the service roadmapping activities of the EOSCpilot and EOSC-hub projects. This session is a natural follow-up of the session “EOSC Service Architecture: how the services could support the user communities” which presents the state of art.
    
    Speakers: Jelena Angelis (European Future Innovation System (EFIS) Centre), Nuno Ferreira (SURFsara BV), Sergio Andreozzi (EGI.eu)
    
    Introduction¶ 15m
    
    Group discussion: round 1¶ 20m
    
    Group discussion: round 2¶ 20m
    
    Group discussion: round 3¶ 20m
    
    Final comments and wrap-up¶ 15m
- 11:30 → 13:00
  The Who and the How of Open Science: A user journey in Open Science through the lens of OpenAIRE¶
  
  For OS to succeed, multiple stakeholders, services and communities need to converge. OpenAIRE will facilitate this session which will explore who Open Science benefits and how it can benefit a range of communities. It will set the scene for good OS practice and give practical examples where OS is being implemented by a range of stakeholders (funders, managers, research administrators) in different settings (national, European, institutional). This session will also present a broad portfolio of key dashboards, services and products, and how they interplay to guide certain user types.
  
  Convener: Inge Van Nieuwerburgh (University of Gent)
  - 11:30
    
    The WHO¶ 25m
    
    This talk will touch on scholarly communication workflows and how they interplay with OS. It will present why is OS important in an institution, who OS workflows apply to and what are the realities for institutions to integrate OS into their day to day workflows. For an institution there is an additional need for understanding of the scientific output of an institution. This can only be done with open infrastructures and standards. The talk will also highlight the importance of managing the long-tail of research.
    
    Speaker: Prof. Eva Mendez (University of Madrid)
  - 11:55
    
    The HOW¶ 20m
    
    Different stakeholders have different needs in monitoring and supporting Open Science. There is a need for monitoring systems, which brings a lot of challenges. Paolo will also present the content acquisition policy of OpenAIRE.
    
    Speaker: Paolo Manghi (Istituto di Scienza e Tecnologie dell'Informazione - CNR)
    
    Slides
  - 12:15
    
    Panel session and Questions¶ 45m
    
    1. Funder perspective: Joao Nuno Ferreira, FCT, Portugal. The speaker will outline their situation and need for OS, including the need to monitor content output. Why are synchronized information systems important for funders? 2. National perspective: Mojca Kotar, University of Ljubljana. The speaker will outline their situation and need for OS. The reality of implementing OS at the national level. What are the needs and what infrastructures do we need to have in place and to converge at national level. Where does the national setting fit into EOSC. Mojca 3. RI or research manager perspective: João Dias, ISCTE-IUL, Portugal. The speaker will outline their situation and need for OS and how to implement it within research groups, at a local level, and what are the barriers.
    
    Speaker: Inge Van Nieuwerburgh (University of Gent)
- 11:30 → 13:00
  Towards an AAI service for research communities¶ Auditorium JJLaginha
  
  Auditorium JJLaginha
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Mr Nicolas Liampotis (GRNET)
  slides
  
  2018-10-10-DI4R-DARIAH.pdf
  
  DI4R2018-AAI.pptx
  
  DI4R_2018_-_ELIXIR_AAI_Advanced_Features.pdf
  
  FIM4R_GEANT_presentation_2018_DI4R.pptx
  - 11:30
    
    ELIXIR AAI Advanced Features¶ 15m
    
    ELIXIR AAI has been in production since 2016, and it is continuously extended with new features based on requirements of end users. In the talk, we will present some of the features that have been made available recently to the ELIXIR community. Permission API enables service providers and users to leverage advanced authorization for access to sensitive human data. Another feature is an implementation of the Bona Fide Researcher concept in ELIXIR AAI together with automatically or manually assigned user’s affiliation. Service providers can use those data to do the authorization based on the user affiliation. Allowing access to cloud machines via SSH or VNC has long been an issue when a federated identity is the only authentication option available. ELIXIR AAI provides a user-friendly mechanism based on utilizing of QR codes, which can be used in non-web environments to provide authentication and delegation for computational/cloud/storage services.
    
    Speaker: Michal Prochazka (Masaryk University)
  - 11:45
    
    Applications of the AARC Blueprint Architecture - Migration to a IdP-SP-proxy in the DARIAH AAI¶ 15m
    
    The DARIAH Research Infrastructure (RI) provides, among other things, digital services for researchers in arts and humanities. It offers an authentication and authorisation infrastructure (AAI), the DARIAH AAI, to enable researchers to login into these services with their own campus account, providing a Single Sign-On experience. The DARIAH AAI supplies additional information, such as group memberships specific to the DARIAH community, which can be used by services for authorisation decisions. Historically, the DARIAH AAI is composed of different components: - a self-service interface which allows for the registration of DARIAH accounts; - the DARIAH IdP, which serves as both an Identity Provider (IdP) for these DARIAH accounts, as well as an attribute authority (AA) that releases DARIAH-specific attributes for users authenticating via their home organisation‘s IdP; - a group membership management system for both these types of accounts; - and the DARIAH service providers (SP) that each need to query the AA and check for DARIAH-specific attributes. The Blueprint Architecture (BPA), which was developed in the EC-funded AARC (Authentication and Authorisation for Research and Collaboration) project, recommends an IdP-SP-proxy, which serves as a gateway between service providers of the research infrastructure and identity providers and attribute sources. This approach takes away a lot of the complexity services would have to deal with in a traditional full mesh federation, and allows for a central place for policy decisions. It thus offers a scalable solution to problems such as aggregation of attributes from different sources, and account linking. In order to allow services to connect to the DARIAH AAI in a much simpler fashion, and to allow for interoperability with other e- and research infrastructures, and to create the foundation for new features, such as account linking in the future, the DARIAH AAI was recently extended by an AARC BPA-compliant IdP-SP-proxy component. Since the DARIAH AAI is already largely based on Shibboleth products, we decided to implement this proxy solution based on Shibboleth, as opposed to SimpleSAMLphp or SATOSA, which offer proxy functionality by default. While this solution integrates nicely into the existing DARIAH AAI ecosystem, it provided some technical challenges in actually turning the Shibboleth products into an IdP-SP-proxy. This talk will illustrate the main advantages of, and experiences with the adoption of the AARC BPA from the point of view of the DARIAH research community and showcase our technical solution based on Shibboleth (i.e. how Shibboleth IdP and SP can be used to build a proxy component). We can also give insight into how to migrate from an existing AAI to a proxy-based infrastructure, while ensuring backwards compatibility with legacy use cases.
    
    Speaker: Mr David Hübner (DAASI International GmbH)
    
    Slides
  - 12:00
    
    Towards the EOSC AAI service for research communities¶ 1h
    
    The European Open Science Cloud (EOSC) will provide an Authentication and Authorisation Infrastructure (AAI) through which communities can gain seamless access to services and resources across disciplinary, social and geographical borders. To this end, the EOSC-hub and the GÉANT (GN4-2) project AAIs build on existing AAI services and provide a consistent, interoperable system with which communities can integrate. This session will introduce the main concepts for meeting research community needs for AAI access to EOSC. It will outline how the AARC Blueprint Architecture model (i) leverages eduGAIN to enable users to use their own home organisation credentials to access services and, (ii) underpins community AAI services in EOSC-Hub and complementary projects. By implementing policies that are harmonised and compliant with global frameworks such as the REFEDS Research and Scholarship entity category and Sirtfi, communities are supported in receiving and releasing consistent attributes, as well as in following good practices in operational security, incident response, and traceability. Complementary to this, users without an account on a federated institutional Identity Provider are still able to use social media or other external authentication providers for accessing services. Thus, access can be expanded outside the traditional user base, opening services to all user groups including researchers, people in higher-education, and members of business organisations and citizen scientists. Research communities can use the Community AAI services in EOSC-hub for managing their users and their respective roles and other authorisation-related information. At the same time, the adoption of standards and open technologies, including SAML 2.0, OpenID Connect, OAuth 2.0 and X.509v3, facilitates interoperability and integration with the existing AAIs of other e-Infrastructures and research communities. Development of these technologies has been and continues to be shaped by the requirements defined by the the users of the AAI services. With the recent publication of FIM4R version 2 and further requirements gathering work performed through the AARC2 and EOSC-hub AAI surveys, the question of how research infrastructures respond to these requirements has become a topic of significant interest for many research communities. This will be an interactive session where researchers, research infrastructures and e-infrastructures present their use-cases and more in general describe the response to the obstacles researchers face when accessing resources used in their daily work. You shouldn’t miss this if you are a researcher or representative of a scientific community interested in gaining access to EOSC federated services and resources in a secure and user-friendly way. Draft agenda: - How the EOSC AAI services help communities to access resources - Introduction of the evolved view of the AARC Blueprint Architecture - Common requirements for Federated Identity Management for Research (including findings from FIM4R version 2.0 and requirements gathering activities performed through the AARC2 and EOSC-hub AAI surveys) - Community AAI deployments and experiences - Life Science AAI
    
    Speakers: Chris Atherton (GÉANT), Christos Kanellopoulos (GÉANT), Mr Nicolas Liampotis (GRNET)
    
    Slides
- 13:00 → 14:30
  
  Lunch 1h 30m
- 14:30 → 16:00
  Data Management Services: Part II¶ Auditorium JJLaginha
  
  Auditorium JJLaginha
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Giuseppe Fiameni (CINECA - Consorzio Interuniversitario)
  - 14:30
    
    dCache: storage for XFEL scientific use-cases and beyond¶ 15m
    
    The dCache project provides open-source storage software deployed internationally to satisfy ever more demanding scientific storage requirements. Its multifaceted approach provides an integrated way of supporting different use-cases with the same storage, from high throughput data ingest, through wide access and easy integration with existing systems. In supporting new communities, such as medical research, photon science/XFEL and microbiology, dCache is evolving to provide new features and access to new technologies. Whatever the use case, for federated storage to work well some knowledge from each storage system must exist outside that system. This is needed to allow coordinated activity. To support such scenarios dCache provides a stream of internally generated events. In this approach the storage systems (rather than the clients) become the coordinating service, notifying interested parties of key events. Storage events are also useful in other contexts: catalogues are notified whenever data is uploaded or delete, tape becomes more efficient because analysis can start immediately after the data is on disk, caches can be "smart" fetching new datasets pre-emptively and removing cached content when the source is deleted. In this paper we will present work done at DESY in building a low-latency, compute cloud facility for various XFEL workflows. This was achieved by combining dCache storage events with various Open Source projects, such as Apache Kafka, Apache OpenWhisk and Kubernetes. The resulting "serverless" cloud service is similar to AWS Lambda or Google Cloud Functions. It allows the infrastructure to deploy additional resources automatically, seamlessly scaling to match the demand.
    
    Speakers: Mr Michael Schuh (DESY), Patrick Fuhrmann (DESY)
  - 14:45
    
    Next Generation Data Management Services: the eXtreme DataCloud project¶ 15m
    
    The development of new scalable technologies for federating storage resources and managing data in the current and next generation e-Infrastructures deployed in Europe, such as the European Open Science Cloud (EOSC), the European Grid Infrastructure (EGI), the Worldwide LHC Computing Grid (WLCG) is the aim of the eXtreme-DataCloud (XDC) H2020 funded project. The high-level objective of the project is the semi or fully automated placement of scientific data in the Exabyte region exploiting the resources made available by the modern, cloud based, e-Infrastructures. XDC is focused on providing enriched high-level data management services to access heterogeneous storage resources and services. It enables scalable data processing on distributed infrastructures using established interfaces and allowing the use of legacy applications without the need for rewriting them from scratch. The project will address high-level topics that include: i) federation of storage resources with standard protocols, ii) smart caching solutions to access transparently data stored in remote locations, iii) policy driven data management based on Quality of Service, iv) data lifecycle management, v) metadata handling and manipulation, vi) data preprocessing during ingestion, vii) optimized data management based on access patterns. The solutions implemented by the XDC project are targeted to the real life use cases provided by different scientific communities represented within the project, such as: astrophysics (CTA), Photon Science (European X-FEL), High Energy Physics (WLCG), Life Science (LifeWatch) and Medical Science (ECRIN). The XDC solutions are based on already well established data management components like dCache, FTS, EOS, the INDIGO PaaS Orchestrator and ONEDATA, just to mention some of them. These services will be enriched with new functionalities and organized in a coherent architecture to address the user requirements. For a better understanding of the nature and the scope of the project, the high level architecture overview and related interfaces specification will be presented and described. Moreover, implementation examples on specific use cases will be presented.
    
    Speakers: Alessandro Costantini (INFN), Daniele Cesini (INFN)
  - 15:00
    
    EUXDAT e-Infrastructure for Sustainable Development¶ 15m
    
    EUXDAT proposes an e-Infrastructure for sustainable development. The project partners form a cross-domain group of agricultural experts together with software engineers and technology experts. Agriculture, land monitoring and energy efficiency are addressed, to support planning policies, as opposed to simply increasing current productivity. One of the major challenges to achieve our goals is the management and processing of huge amounts of heterogeneous data, with the added requirement of data and computational scalability, given that the amounts of data will only increase, and so will the complexity of processing it. The EUXDAT e-Infrastructure builds on existing components, and provides an advanced frontend for users to develop applications. The frontend provides monitoring information, visualization, various parallelized data analytic tools, and data and processes catalogues, enabling Large Data Analytics-as-a-Service. A large set of data connectors will be supported, including unmanned aerial vehicles (drones), Copernicus data, and field sensors, for scalable analytics. The infrastructure resources are based on HPC and Cloud, however the choice and usage of physical resources are transparent to the user. EUXDAT aims at optimizing data and resources usage, by on the one hand supporting data management linked to data quality evaluation, and on the other proposing a hybrid orchestration of task execution, by identifying whether the best target is an HPC center or a Cloud provider. The latter will be achieved by using monitoring and profile information and deciding based on trade-offs related to cost, data constraints, efficiency and availability of resources. Throughout the development of the 3-year project, EUXDAT will be in contact with scientific communities, in order to identify new trends and datasets, which will help guide the evolution of the e-Infrastructure. The project aims to result in an integrated e-Infrastructure which will encourage and facilitate end users to create new applications for sustainable development.
    
    Speakers: Mr Francisco Javier Nieto De Santos (Atos Research & Innovation), Spiros Michalakopoulos (Atos)
  - 15:15
    
    Project “UNEKE”: composing storage infrastructures for research data a roadmap for higher education institutions¶ 15m
    
    While various scientific communities started to develop and establish mature infrastructures, (e.g. repositories) to support researchers’ data management, other research areas are still facing the challenge of establishing suitable infrastructures. Thus, researchers in these disciplines rely on technical opportunities offered by their local research institution (Becker et al., 2012). In order to support researchers’ adequate research data management, research institutions carried out surveys to investigate researcher’s requirements. While most of these investigations are often restricted to individual institutes or have small sample sizes (Rudolph et al., 2015), literature could show, that research institutions are facing two types of barriers, that must be taken into account when establishing suitable infrastructures. Those are technical barriers (e.g. infrastructure, security) as well as non-technical barriers (e.g. ethics, management) (Wilms et al., 2018). Therefore, we present the results of the research project UNEKE, which aims is find out more about the technical and non-technical requirements of several research areas. In this work, we present the results of a qualitative research investigation including focus group interviews of 91 researchers from different research areas. For this exploratory, qualitative approach, 12 focus group workshops with 91 employees from University Duisburg-Essen and RWTH Aachen University were conducted in late 2018. This allowed us to gain insights into attitudes, thoughts and experiences that researchers hold about RDM and how this affects daily conduct with RDM tools and infrastructures. We expected that research data itself and its handling might be highly specific for different research areas. In order to monitor disciplinary differences, the participants were divided into groups of researchers from natural sciences, engineering, life sciences, humanities and social sciences. These focus groups were conducted at both universities and structured into a introduction and following distribution into smaller subgroups of 2-4 researchers. In these subgroups the question: “What needs should be considered when developing and introducing an infrastructure for research data management?” was discussed. Results of these discussions were then compiled, presented to, and discussed by the entire group. After the discussion, the group structured the topics to create a thematic mapping. These mappings build the base for the categorical system that is being developed within UNEKE. First results show that requirements are field specific and that the set of categories resulting from the analysis is similar at both participating universities, thus indicating its validity. While the field specific requirements are often technical, non-technical ones such as governance guidelines and training show significant overlap. Becker, J., Knackstedt, R., Lis, L., Stein, A. and Steinhorst, M. (2012) ‘Research Portals: Status Quo and Improvement Perspectives’, International Journal of Knowledge Management, 8(3), pp. 27–46. Rudolph, D., Thoring, A. and Vogl, R. (2015) ‘Research Data Management: Wishful Thinking or Reality?’, PIK - Praxis der Informationsverarbeitung und Kommunikation, 38(3–4), pp. 113–120. Wilms, K., Stieglitz, S., Buchholz, A., Vogl, R. and Rudolph, D. (2018) ‘Do Researchers Dream of Research Data Management?’, Proceedings of the 51st Hawaii International Conference on System Sciences, pp. 4411–4420.
    
    Speaker: Bela Brenger (RWTH Aachen University)
  - 15:30
    
    Federated engine for information exchange (Fenix)¶ 15m
    
    The neuroscience community has to cope with various data sources each with their specific formats, modalities, spatial and temporal scales (i.e. from multi-electrode array measurements to brain simulations) and with no fixed relationship between them. Thus, the scientific approaches and workflows of this community are typically a moving target, which is much less the case in other disciplines, e.g., high-energy physics. Furthermore, the community is experiencing an increasing demand of computing resources to process data. However, at present, solutions to federate different data sources and couple them with high-end computing capabilities do not exist, or are very limited. Fenix (https://fenix-ri.eu/) is based on a consortium of five European supercomputing and data centres (BSC, CEA, CINECA, CSCS, and JSC), which agreed to deploy a set of infrastructure services (IaaS) and integrated platform services (iPaaS) to allow the creation of a federated infrastructure and to facilitate access to scalable compute resources, data services, and interactive compute services. The implementation of the Fenix infrastructure is guided by the following considerations: - It is based on a co-design approach with a set of diverse domain specific use cases which guides both the design of the architecture and its validation. - Data need to be brought in close proximity to the processing resources at different infrastructure service providers to take advantage of high bandwidth with data repositories and services. - Federating multiple data resources shall enable easy replication of data at multiple sites to improve resilience, availability as well as access performance of data. - Services are being implemented in a cloud-like manner that is compatible with the work cultures in scientific computing and data science. Specifically, this entails developing interactive computing capabilities next to extreme-scale computing and data platforms of the participating data centres. - The level of integration should be kept as low as possible to reduce operational dependencies between the sites (to avoid, e.g., the need for coordinated maintenance and upgrades) and to allow for the local infrastructures to evolve following different technology roadmaps. Based on the above principles, the Fenix federated infrastructure includes these main components: - Scalable Compute Services; - Interactive Compute Services; - Active Data Repositories based on fast memory and active storage tiers; - Archival Data Repositories for long term preservation; and - Information/catalogue services to allow findability and recovery of data. The major advantages of the Fenix federated architecture are: the use case driven design, the scalability of the services, the easy extensibility which will allow in the future to move to new state of the art solutions or to enable workflows for other scientific communities. The first steps towards realisation of the Fenix infrastructure will be done within the Interactive Computing E-Infrastructure (ICEI) project, funded by the EC within the Human Brain Project (HBP, https://www.humanbrainproject.eu/). The users of the HBP will be the prime consumers of the resources provided through the infrastructure. Additional resources will be provided to European researchers at large via PRACE (http://www.prace-ri.eu/).
    
    Speaker: Giuseppe Fiameni (CINECA - Consorzio Interuniversitario)
- 14:30 → 16:00
  Organiser session: The EOSC-hub service portfolio and applicable policies for service providers and users¶ Main Auditorium
  
  Main Auditorium
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  TITLE: The EOSC-hub service portfolio and applicable policies for service providers and users
  
  The European Open Science Cloud (EOSC) is taking shape by federating generic and thematic services into a single integrated service portfolio. Since its start in January EOSC-hub succeeded in creating: (1) a EOSC service portfolio management which defines through processes and policies which services and resources can be admitted to be part of the portfolio, and (2) different approaches for integrated services and resources into the overall service management system of EOSC.
  
  During this session we will present the EOSC-hub service catalogue, its delivery channel - the EOSC-hub marketplace, and the rules of participation for becoming a service provider of the Hub. The hub is a meeting point for the demand and the offer, where the communities can have a double role: as service providers, they aim to improve the quality of their services and increase the number of users; as service consumers, they are searching for tools which can suit best their needs, taking into account economical, legal and technical constraints.
  
  Agenda and purpose of the session:
  
  1) General presentation of EOSC-hub: current status of the services and plans for adding further providers from the consortium. A description of the EOSC-Portal is also planned.
  
  2) Guide to EOSC-Hub for users and providers. Walk through the new engagement description and procedures.
  
  3) Thematic Services integration: major achievements/ blocking factors/ technology gaps/ added value. Panel discussion with our experts and the audience.
  
  The feedback from the participants to the session will help us to draw a more detailed picture of the European Open Science Cloud, evolve it and offer concrete opportunities for participation.
  
  Convener: Dr Isabel Campos (CSIC)
  - 14:30
    
    EOSC-hub Service Catalogue and the marketplace¶ 15m
    
    Speaker: Sergio Andreozzi (EGI.eu)
    
    Slides
  - 14:45
    
    EOSC-Hub Engagement opportunities¶ 15m
    
    Speaker: Dr Tiziana Ferrari (EGI.eu)
    
    Slides
  - 15:00
    
    EOSC-hub and eInfraCentral cooperation framework¶ 5m
    
    Speaker: Jorge Sanchez (JNP)
    
    Slides
  - 15:05
    Thematic Services integration in EOSC-Hub¶ 55m
    
    Speakers: Alexandre Bonvin (eNMR/WeNMR (via Dutch NGI)), Anabela Oliveira (National Laboratory for Civil Engineers), Daniele Spiga (INFN), Sandro Fiore (CMCC Foundation)
    
    Slides
    
    DODAS-eosc-hub-panel_spiga-v1.pptx
    
    ECAS_eosc-hub-panel.pptx
    
    eosc-hub-panel-OPENCoastS.pptx
    
    EOSC-hub-panel-WeNMR.pptx
- 14:30 → 16:00
  Research with Sensitive Personal Data¶ Auditorium B104
  
  Auditorium B104
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Maria Iozzi (UIO)
  - 14:30
    
    Sensitive data activities in EOSC-hub¶ 30m
    
    Speaker: Abdulrahman Azab (UIO)
  - 15:00
    
    Research with Sensitive Personal Data in the EOSC¶ 1h
    
    In the present era of digital data explosion, the Open Science paradigm, together with FAIR principles, offer scientists and technology providers a new vision and methods for enhancing research through the fostering of cross-disciplinary access and (re)use of data and data technologies. Data sharing and data cross-linking is the basis for innovative research in many fields of knowledge including health and medical science, but this research often involves personal or sensitive data that must be handled with due consideration of the legitimate right to personal privacy. Whilst several solutions have been developed to facilitate research involving sensitive data in compliance with privacy regulations, there is still the need to implement platforms and protocols that effectively allow cross border, inter-disciplinary research on personal sensitive data. Mechanisms for authentication and vetting, authorization, register data sharing, and analysis of aggregated datasets (possibly from different sources) are all still open questions, both at technological and policy level. Solutions require coordinated efforts, transversally involving service providers, scientist and communities. The session will bring together service providers and communities dealing with sensitive data. We will explore state of the art solutions currently in use in regional or community specific settings and now also offered in the EOSC. We will investigate solutions developed and/or adopted by large advance community to work and share on personal sensitive data transversally across research infrastructures. We will identify gaps between science communities needs and the current offering of e-infrastructures (regional-based or community-based). We will discuss the requirements for an effective research infrastructure for sensitive data that allows data use and re-use within the EOSC. We will explore possible strategies to enhance interoperability between the existent e-infrastructure, with the goal eventually of enabling cross-border, Europe-wide user scenarios in the EOSC framework. The roles of funders and policy makers in this enabling process will be also discussed.
    
    Speakers: Abdulrahman Azab (UIO), Chris Aryio, Maria Francesca Iozzi (SIGMA), Rob Baxter (University of Edinburgh), Susheel Varma (EMBL EBI)
- 14:30 → 16:00
  
  Training: EGI Notebooks¶ Auditorium B203
  
  Auditorium B203
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Dr Enol Fernandez (EGI.eu)
  
  notes
  
  slides
- 16:00 → 16:30
  
  Coffee 30m
- 16:30 → 18:00
  Computing Services Part II¶
  
  Convener: Florian Berberich (JUELICH)
  - 16:30
    
    Delivering added value services for deep learning in the EOSC¶ 15m
    
    Much hope has been recently placed on deep learning as a machine learning technique that enables scientists to develop novel hypotheses and analyse large and complex datasets. Deep learning techniques have emerged from two major technological developments. First, the evolution of the internet has led to the creation and global availability of large datasets. Second, through large-scale computing power, in particular with readily available GPU resources, optimization of large-scale networks with several layers of highly interconnected nodes has become feasible Deep learning in scientific practice offers opportunities and challenges. The following three uses cases serve to illustrate this: Model training and re-training, model transfer, and model use and sharing. Model training: For model raining a scientist may want to address a scientific problem or task by developing a new deep learning model on a complex scientific dataset. Apart from advanced machine learning expertise that is needed to design a suitable network architecture, the scientist is faced with a variety of nontrivial technological challenges. First, training of deep learning models is highly compute intensive, thus, the scientist needs access to adequate computing resources. Second, training of effective deep learning models requires access to very large datasets that need to be transferred close to the computing resources. Model re-training: Deep convolutional neural networks (CNNs) are used to classify images into predefined taxonomic categories. CNNs decompose an image into a hierarchy of increasingly informative features. The features at the lower levels represent colors, contours, etc., whereas the features on the higher levels represent domain entities such as plant leaves or structures of biological cells or tissues. Parts of a CNN model that has learned to classify plant structures may be re-trained to classify cell or tissue structures. This is called transfer learning. The fundamental technological needs of re-training are similar to those needed for training a model from scratch, with the additional task of model transfer, including the transfer of relevant software libraries and transfer and integration of data. Model use and sharing: A major scientific benefit of already existing deep learning models lies in sharing a model across the relevant scientific communities. This facilitates the scientific debate about the knowledge captured by the model and allows the community to use the model for relevant scientific tasks. Sharing and using a trained deep learning model with the scientific community may be realized as a web application. But in order to offer the model as a service, the model typically has to be transferred from a development environment towards a production environment, which is capable of offering the service in a sustainable way. In this presentation we will showcase how the DEEP-Hybrid-DataCloud is developing services that will enable next generation e-Infrastructures to support machine learning and in particular deep learning applications covering the three aforementioned cases, and how these solutions can be used to bring knowledge closer to the users and citizens in the framework of the European Open Science Cloud.
    
    Speaker: Mr Alvaro Lopez Garcia (CSIC)
  - 16:45
    
    Deep Learning for Predicting the Popularity of Datasets¶ 15m
    
    Accessing datasets stored on tape drives is comparatively time-consuming. Therefore, a certain fraction of all datasets is usually provided on a cache storage built of hard disks. Caching algorithms are used to identify popular datasets and to move them in advance from tape drives to the cache storage. In general, there is a considerable gap between the effectiveness of traditional caching algorithms and the optimal (or Belady) caching algorithm. It seems to be unlikely that the gap can be reduced significantly by optimizing traditional caching algorithms. The aim of our project is to explore whether popular datasets can be identified more optimally by applying deep learning methods. Training a neutral network is time-consuming. This is true, in particular, if the training sets are large. The Atlas experiment at the Large Hadron Collider (LHC) stores every access to datasets in log files (many parameters are saved such as the name of the file, name of the dataset the file belongs to, the tool used for accessing the file, and the access time). In total log data of the order of 0.5 TB are stored per month. Applying deep learning techniques to large datasets needs a scalable infrastructure. To speed up the training of neural networks, several proposals were submitted, for example the use of specialized processors like GPUs or TPUs. We designed a cluster of containers for running neural networks in parallel. The cluster allows to investigate different distributed deep learning strategies, e.g. data parallelism and model parallelism. To distribute files across the nodes of the cluster and to train neural networks in parallel, the big data analytics frameworks Apache Flink and Apache Spark are used. The talk gives an overview of the current status of our project. The machine learning workflow running on the cluster system is presented. First results obtained by applying a Convolutional Neural Network to a small subset of Atlas log data are shown. The speedup of different parallelization strategies is evaluated. An outlook on ongoing work will be given.
    
    Speaker: Mrs Nina Zimmermann (Univ. of Applied Sciences (HTW) Berlin)
  - 17:00
    
    On the Implementation of MPI Cluster as a Service on Supercomputer System¶ 15m
    
    The vast majority of HPC users are heavily leveraging MPI middleware for their applications. Historically, MPI was mainly configured on Supercomputer Systems and the applications were living in the boundaries set by the system administrators. This led to different issues, including but not limited to problems with application distribution, environment configuration, resource allocation and filesystem permissions. Recently, the expansion of Cloud Computing brought the attention of many HPC users to offerings like Infrastructure as a Service, Platform as a Service, Software as a Service and HPC as a Service. These Services give the power users much more granular control over the provided resources. For the last year we have been researching variety of Linux operating system-level virtualization technologies aiming to mimic the flexibility, isolation and resource management provided by Cloud Computing into world of Supercomputer Systems without compromising the performance. Our research resulted in Linux containers which gained their popularity and were adopted due to their small footprint, distribution form, runtime isolation and relatively neglectable performance overhead. These make them a very good candidate for implementing virtual Supercomputer Systems. In this talk we present the approach that we used to provide our users with the ease to deploy virtualized MPI clusters and the power to control their configurations through the associated lifecycle operations. We framed all these in MPI cluster as a Service solution. For platform implementation we used Supercomputer System Avitohol at IICT-BAS which is the core of the scientific computing infrastructure in Bulgaria and currently the most powerful supercomputer in the region with its 150 computational servers each equipped with two Intel Xeon Phi coprocessors and theoretical peak performance of 412.3 TFlop/s in double precision. Our experiments showed that the performance overhead of executing MPI applications inside MPI Linux container-based cluster is close to zero. Hardware capacity is used more effectively by many concurrent users. MPI programs can be developed and sanity tested on local computer and easily transferred to the Supercomputer Systems. By using Linux containers, we have improved the overall Quality of Service for Avitohol users of scientific computing. The application domain of this design is not limited to HPC but IoT, meteorology, traffic control, trading systems, in other words almost any MPI application available today.
    
    Speaker: Mr Teodor Simchev (IICT-BAS)
  - 17:15
    
    Addressing Energy Wall for Exascale Computing: Whole System Design implementation at CINES for Energy Efficient HPC¶ 15m
    
    CINES has initiated the deployment of the “Whole System Design for Energy Efficient HPC” solution on its 3,5 Pflops production system Tier1 (OCCIGEN). This solution developed within the PRACE-3IP PCP (joint Pre-Commercial Procurement involving CINECA, CSC, EPCC, GENCI and JUELICH) is a the result of R&D services for improvement of the energy efficiency of HPC systems, to address the energy wall of Exascale Computing. As such PRACE PCP combines elements of conventional hardware procurement with the provision of funding for research and product development. It was setup to procure and develop highly energy efficient HPC systems available for general use, i.e. able to run real applications, and to be operated within a conventional HPC computing centre but nevertheless achieve very high total-system energy efficiency. In addition to the technical goals the PCP intended to develop the HPC vendor eco-system within the European Economic Area (EEA) and as such it is expected to result in commercially viable products. As a result, ATOS integrated in its roadmap an energy optimization oriented suite developed during PCP (BEO, BDPO, HDEEVIZ, SLURM Energy saving plugins) are part of Atos-Bull Supercomputer Suite (SCS5 R2) available since Q1 2018. While hosting one of the PRACE-3IP PCP prototypes, CINES has collaborated with EoCoE (Energy Oriented Center of Excellence) and PRACE 4IP WP7 (application enabling an optimization) to assess and provide guidance to the PCP R&D development from ATOS. CINES has setup a monitoring architecture and tools to complement fine grain monitoring by coarse grain datacenter data collection and analysis. The implementation in production environment of a “Whole System Design for Energy Efficient HPC” is a key element to build the steps, in collaboration with GENCI of a new paradigm for application and HPC efficiency, changing from time-to-solution towards energy-to-solution optimisation. The global collection of energy and resource consumption, is a key repository of application behaviour and profile for data analysis and provide guidance for upcoming procurements, such as PPI4HPC (2019/2020), CINES next Tier1 (2020) and provide input for EuroHPC platforms (2022/2023).
    
    Speaker: Mr Eric BOYER (CINES (Centre Informatique National de l'Enseignement Supérieur), FRANCE)
  - 17:30
    
    Driving data analysis through the Jupyter Notebook at European XFEL¶ 15m
    
    Computational science based on simulation or experimental data typically requires data analysis to extract insight from potentially large data sets. In this project, we explore the suitability of the Jupyter Notebook to drive the processing chain from raw data to figures used in publications and reports. Computational Science is emerging as a key tool in academia and industry. For example, in the field of magnetism, simulations of nano structures have become well established and are used widely. In photon science, the analysis of experimental data is essential and central to understanding correlations, adjusting experiment parameters, and exploiting an instrument's full potential. With regards to this increasing importance in science, there is increasing concern about the reproducibility of scientific results obtained mainly from computational data analysis [1]. Ideally, any scientist should be able to recreate, for example, central figures in publications. This requires keeping track and publishing of all steps taken during the analysis, including tracking of all experiment and simulation runs, data and simulation results used from each, and all metadata, parameters and processing steps. We study the utility of the Jupyter Notebook as a virtual research environment for this common scenario through both data analysis in computational modelling of magnetic devices and Photon Science. For the former, we have taken a well established micromagnetic simulation package [2] based on C++ and added a Python interface [3] to allow convenient control of the package through the Jupyter Notebook [4]. Of particular interest for both application domains is that within the Jupyter Notebook, we can carry out simulation, data analysis, and specialised post-processing within a single document, making the work more easily reproducible and distributable. A special case is the creation of figures in publications: by creating each central figure in a publication within a Jupyter Notebook, we can publish the notebooks together with the manuscript, and thus make the key data elements of the publication reproducible. Emerging developments such as the European Science Cloud (EOSC) demand that the whole computational analysis process can be driven remotely. The method of driving computational science through the Jupyter Notebook provides the remote execution elegantly: by hosting the Jupyter Notebook server where the data and simulation capability is, and connecting the user's web browser with the Jupyter Notebook server via HTTPS, we avoid common problems experienced with remote desktops or X forwarding. Driving computational analysis through the Jupyter Notebook can provide a flexible cloud-enabled data analysis infrastructure. This project is part of the Jupyter-OOMMF activity in the OpenDreamKit [5] project and we acknowledge the financial support from Horizon 2020 European Research Infrastructures project (676541). The work is also supported by the EPSRC CDT in Next Generation Computational Modelling EP/L015382/1. [1] M. Baker, Nature 533, 452 (2016). [2] M. J. Donahue and D. G. Porter, OOMMF User’s Guide, Version 1.0, Interag. Rep. NISTIR 6376, NIST Gaithersburg (1999). [3] M. Beg et al. AIP Advances 7, 056025 (2017). [4] https://github.com/joommf [5] https://opendreamkit.org
    
    Speaker: Dr Marijan Beg (European XFEL GmbH)
  - 17:45
    
    DIRAC Services for EGI users¶ 15m
    
    The DIRAC services are available for the EGI users since 2014 and since 2018 they are making part of the EOSC-Hub service portfolio. The services are providing a versatile Workload Management System which can replace the gLite WMS service. It gives access to all the EGI grid and cloud resources used for intensive computations. Users are allowed to specify also their own computing and storage resources which are not part of the EGI infrastructure. Higher level functionality can be also made available on demand of particular communities, for example, services for managing complex workflows involving massive job submissions. DIRAC is providing also basic tools for managing user data with an easy access to configurable storage elements and a powerful file catalog. The catalog allows not only to store file replica information but also to define complex access control customizable for a given community. There is a possiblity to define user metadata for easy searches of the necessary datasets. The DIRAC functionality is available via several interfaces including command line, RESTful interface as well as a comprehensive Web Portal. The basic services are available for all the EGI communities. More advanced features can be offered as part of the support in the framework of particular Competence Centers. In this presentation we will describe functionalities offered by the DIRAC services to the EGI users as well as the experience of running the services for various EGI communities. Outlook for the further evolution of the service will be also presented.
    
    Speaker: Andrei Tsaregorodtsev (CNRS)
- 16:30 → 18:00
  EOSC from Theory to Practice¶ Auditorium B104
  
  Auditorium B104
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Dr Isabel Campos (CSIC)
  - 16:30
    
    Building the EOSC together: The role of eInfraCentral, EOSC-hub and OpenAIRE-Advance¶ 15m
    
    The concept behind promoting a joint presentation of the three projects – eInfraCentral, EOSC-hub, OpenAIRE-Advance – is a fitting approach to help drive forward and implement the EOSC and interlink **People, Data, Services and Training, Publications, Projects and Organisations.** **1. The challenge for research communities** **People & Training:** Due to a fragmented e-infrastructure landscape, end-users, such as researchers, innovators or industry actors, often are unaware of the e-infrastructure services available in Europe that could aid their work. Similarly, service providers and data producers have difficulty reaching out to potential users due to the lack of coordination and harmonisation across various e-infrastructures in order to support them in core EOSC-related activities such as open science. Even if users find out about the availability of a certain e-service, it is difficult to gather further information and compare it with other existing services. Service providers also lack user feedback on the ways they could improve their offerings. This leads to inefficient funding patterns through the emergence of overlapping efforts and as such, slower rates of innovation due to the lack of competition in the field. **2. Bringing the solution to EOSC** **Projects, Publications & Services:** eInfraCentral, EOSC-hub and OpenAIRE-Advance are the core initiatives in the implementation of the EOSC, actively contributing to the building of the EOSC service catalogue and portal. eInfraCentral creates a unified online service catalogue where users can search, browse, compare and access e-services. The **eInfraCentral standard Service Description Template (SDT) and catalogue** will provide the foundation for the catalogue of services to be accessed via the EOSC Portal to be launched in November. The SDT contains the prerequisites and attributes that are essential for the creation of customer-centric service descriptions. In addition, eInfraCentral facilitates the development of a shared language to describe services, fostering cooperation between infrastructure projects, communities and initiatives as well as sharing and reusing scholarly communication outputs (e.g. publications, research data, software) to support reproducible and transparently assessable science. **3. Jointly implementing the EOSC Portal** **Data, Services & Training:** The three projects will outline their collaboration around the building of the EOSC Portal. It is important to clarify that each project brings different elements and will use their previous outputs to further the implementation of EOSC. The presentation will highlight how eInfraCentral’s already existing service description template, service catalogue, and portal will help build the EOSC Portal, along with the EOSC-hub marketplace. The presenters will also distinguish EOSC and the eInfraCentral catalogue, as there is a difference in their scope. **People & Organisations:** The teams of eInfraCentral, EOSC-hub and OpenAIRE-Advance believe that the DI4R audience could greatly benefit from learning about the collaboration between these projects by clarifying any confusions around the development of the catalogue(s) of services and portal(s). Our session will conclude with an open discussion with the audience to understand what their **value-proposition** is and what they can bring to the table in helping build the EOSC together.
    
    Speaker: Jelena Angelis (European Future Innovation System (EFIS) Centre)
    
    Slides
  - 16:45
    
    The EOSCpilot Science Demonstrators as a demonstration of the EOSC in practice¶ 15m
    
    The EOSCpilot project (2017-2018) has the purpose of supporting the first phase of development of the European Open Science Cloud (EOSC). Among its objectives, there is the one to develop a number of demonstrators functioning as high-profile pilots, that integrate services and infrastructures to show interoperability and its benefits in selected scientific domains. To meet this objective, the project selected and funded 15 Science Demonstrators in different disciplines (Life Sciences, Environmental and Earth Sciences, Energy, Physics, Social Sciences to name a few of them) to demonstrate the effectiveness of the EOSC approach: a digital environment where researchers could use federated services to perform their research projects. This presentation will showcase the experience of the Science Demonstrators within the EOSCpilot; in particular, it will present the recommendations they provided during the project regarding the services interoperability and use of standard protocols, friendly and user-friendly interfaces, open source components, and much more. Being pilots in the pilot, in fact, the Science Demonstrators show the relevance and usefulness of the EOSC Services and their role in enabling data reuse, to drive the EOSC development. Responding to the Consultation Platform on the Rules of Participation, from the SDs will also serve to provide the discussion topics during the event and will be the main takeaways of the session. Presenting these results at DI4R would be an exciting opportunity to reach out on one side to researchers, that are the primary users of the EOSC, and on the other to potential service providers or research infrastructures not yet involved, thus enlarging the services' offer.
    
    Speaker: Ilaria Fava (Göttingen State and University Library)
    
    Slides
  - 17:00
    
    ELIXIR Cloud Analysis Platform for EOSC¶ 15m
    
    The aim of the ELIXIR Cloud Analysis Platform is to co-develop and implement an integrated cloud platform that is compliant with relevant global standards/specifications, such as those coming out of the Global Alliance for Genomics and Health (GA4GH). To date, six national nodes (EMBL-EBI, ELIXIR-FI, ELIXIR-DE, ELIXIR-CH, ELIXIR-UK and ELIXIR-IT) have committed resources to develop and implement standards compliant cloud federation service. The project will leverage multiple emerging specifications from GA4GH’s different work stream areas, namely Cloud (TRS, DOS, WES and TES), Discovery (Search, Service Registry), DURI (DUO, BonaFide), LSG (htsget, RefSeq) and Data Security (AAI). ![ELIXIR Cloud Analysis Platform][1] This presentation will showcase the technical AAI integration between EOSC and ELIXIR AAI to deploy a federated GA4GH compliant workflow analysis service. The service once provisioned using EOSC credentials can subsequently be used by life science researchers to submit standardised workflow descriptions (CWL) to be executed by a Workflow Execution Service which can further leverage Europe-wide distributed task execution services stationed in a number of ELIXIR national nodes. The presentation will also showcase the distributed Reference Dataset Distribution Service developed within the EOSC-Hub project to allow bulk site-to-site transfer for large reference datasets for analysis by computational pipelines. This prototype integration between these two key technical infrastructures is hoped to provide dynamic data-locality based optimisation of workflow task distribution in a federated environment like ELIXIR and EOSC. The specific scientific drivers for this ELIXIR EOSC collaboration is to address the large-scale challenges in analysing Marine Metagenomics/Transcriptomics, distributed computational services to address workflow access to sensitive data (EGA, Local EGA) and support for large scale on-demand industry-driven research workflow execution for Protein homology/analogy recognition As A Service. [1]: https://github.com/EMBL-EBI-TSI/TESK/raw/master/documentation/img/project-architecture.png
    
    Speaker: Susheel Varma (EMBL EBI)
    
    Slides
  - 17:15
    
    Analysis of National Nodes as foundation for the European Open Science Cloud¶ 15m
    
    Both the European Open Science Cloud (EOSC) and the European Data Infrastructure (EDI) are envisaged as federated initiatives that will be built on top of country-level counterparts in order to succeed. The e-Infrastructure Reflection Group (e-IRG) addressed this point already in its 2016 Roadmap and recommended that national governments and funding agencies should reinforce their efforts to: 1) embrace e-Infrastructure coordination at the national level and build strong national e-Infrastructure building blocks, enabling coherent and efficient participation in European efforts; 2) together analyse and evaluate their national e-Infrastructure funding and governance mechanisms, identify best practices, and provide input to the development of the European e-Infrastructure landscape. Also in the Competitiveness Council conclusions (28/29 May 2018) the Member States are encouraged to “invite their relevant communities, such as e-infrastructures, research infrastructures, Research Funding Organisations (RFO’s) and Research Performing Organisations (RPO’s), to get organised so as to prepare them for connection to the EOSC.” However, the current situation across several Member States (MS) and Associated Countries (AC) is that there are different speeds and levels of access and integration to the European initiatives. To proceed, it is imperative that these differences are identified early on and specific actions are taken at national and European levels. e-IRG is working to address this challenge; the first step has been to collect information from each MS/AC about the current status of their e-Infrastructure, based on a survey addressed to the national ministries. The second step is conducting an analysis which will be the core of e-IRG’s next policy document. In the survey the word e-Infrastructure is assumed to cover various 'layers' or components, in particular: networking, computing, data and tools & services. The questions focus on acquiring information about the organizations responsible for providing e-infrastructure services, their governance model, their funding methods, and their access policies. We also collected information on national domain-specific e-Infrastructures or other domain areas of particular interest in each country and whether they use the horizontal e-Infrastructure services. The scope of the presentation is thus to present the preliminary analysis of the survey results, along with a first set of recommendations for the different stakeholders, namely e-Infrastructure providers, funders, policy makers and users and get some initial feedback. We have clustered the countries we have received replies from based on the existence of few, several or many providers at a national level. The results show that there is fragmentation in the national providers in several countries. It can also be seen that fragmentation of service access and provision exists even in countries with advanced e-infrastructure services. Also, as in some cases we identified differences that exist in the number of providers in each domain (network, computing, data or other) for every cluster we proceed to further categorization based on the number of organizations with similar service domains.
    
    Speaker: Sverker Holmgren (VR-SNIC)
    
    Slides
  - 17:30
    
    IBERGRID towards EOSC¶ 15m
    
    IBERGRID was born out of the Iberian Common Plan for distributed infrastructures released in 2007, but the origins can be traced back to the Portuguese and Spanish participation in joint projects since 2002. Since then, IBERGRID has been federating infrastructures from Iberian research & academic organisations mainly focused on grid, cloud computing and data processing. The IBERGRID infrastructure comprises 12 computing and data centers in Spain and Portugal. A number of replicated services guarantees integrity and resilience. The infrastructure has provided 984 million processing hours since 2006 to support the HEP experiments and several user communities. This includes 19 million hours on biomedical applications and ~6 million hours on computational chemistry. Strictly on cloud support, more than 216,000 Virtual Machines have been instantiated providing more than 2 million cloud processing hours to LifeWatch in the last year. On the R&D side, service integration activities are taking place in numerous areas. An example is OPENCoastS, a service to provide on-demand circulation forecast systems as a service for the Atlantic coasts. The service is deployed at the computing site NCG-INGRID-PT, part of the EGI Federation, but it is being integrated into EOSC-hub as a Thematic Service in collaboration with LIP, LNEC, INCD, UNICAN, CNRS, and CSIC. On the software development side, IBERGRID is contributing in many areas. CSIC has developed OpenStack support for VOMS authorization and authentication, cloud pre-emptible instances (OPIE) as well as CPU Cloud accounting. The Technical University of Valencia developed and maintains the Infrastructure Manager (IM), a key service to support the instantiation of tailored clusters now part of the EOSC-hub service catalogue. Support to user-level container execution has been developed and is maintained by the IBERGRID software teams at LIP. Udocker is an extremely successful software product – more than 310 stars in GitHub: which is being recommended in many computing centers around the world as the best solution for users to execute containers, without requiring the intervention of system administrators. Software Quality Assurance has generated an enormous amount of activity in the Iberian area. LIP, CSIC, CESGA and UPVLC are in charge of ensuring the quality of the UMD software deployed by EGI. The Accounting Portal of EGI is maintained & developed by CESGA for the EGI community. Organized since 2007, the IBERGRID conference series is a main opportunity to gather the community and share experiences from the user, infrastructure, policy and research perspectives. IBERGRID looks into the future EOSC with optimism. From the user support side the main assets are a very consolidated user-base, and well-reputed user engineering and support teams. From the technical point of view, IBERGRID counts on worldwide-recognised teams, with expertise and technical background to address the specific requirements from scientific communities in the EOSC era. IBERGRID is a key Operations Centre of the EGI Federation. The resources made available by IBERGRID sites have been instrumental in supporting the four largest scientific collaborations based at the Large Hadron Collider (ALICE, ATLAS, CMS, LHCb).
    
    Speaker: Mario David (LIP)
    
    Slides
  - 17:45
    
    Science Gateways Community Institute: Developing Strategies for Sustainability of Projects via Bootcamps¶ 15m
    
    Sustainability of academic software in general and of virtual research environments (VREs) and of science gateways particularly is a major concern for many academic projects. Solicitations for funding mostly support novel developments and novel research to accelerate science but little to sustain existing computational solutions. The importance of software for science and its sustainability have been recognized in the last decade though and is reflected in the founding of the [UK Software Sustainability Institute (SSI)][1] in 2010 and in the [US Science Gateways Community Institute (SGCI)][2] in 2016 to support academic software and science gateways beyond traditional funding cycles. SGCI serves user communities and science gateway creators to support the growth and success of science gateways in multiple ways and one example is the [Science Gateway Bootcamp][3] organized by the SGCI Incubator service area. The bootcamp is a week-long, intensive workshop for leaders and creators of gateways who want to further develop and scale their work. It addresses sustainability strategies from diverse angles: 1. Core business strategy skills as they apply to leading an online digital presence, such as understanding stakeholder and user needs; business, operations, finance, and resource planning; and project management; 2. Technology best practices, including the principles of cybersecurity; software architecture, development practices, and tools that ensure implementation of strong software engineering methods; usability and 3. Long-term sustainability strategies, such as alternative funding models; case studies of successful gateway efforts; licensing choices and their impact on sustainability. Participants engage in hands-on activities to help them articulate the value of their work to key stakeholders, to create a strong sustainability plan and work closely with one another. The concept is to define actionable items for three to six months, form cohorts who keep in contact with each other and support each other in the continuous process of achieving sustainability. SGCI offers two bootcamps per year in the US with a maximum number of ten teams to be accepted for each event. Up-to-date, three of such bootcamps have taken place with one planned for August 2018. Based on the success of the three events and on lessons learned from these events, SGCI's Incubator service area organized a mini-bootcamp of two days in June 2018 in Edinburgh, UK in collaboration with SSI. Also here the feedback was mostly very positive recognizing that two days can only provide a well-thought through selection of topics in appropriate depth. A future goal is to develop further shorter bootcamps on specific topics and closely collaborate on international level to be able to spread the concept further and train the trainers to scale the support of sustainability via bootcamps. International observers can attend the bootcamps in the US and discussions are underway with European projects to offer such bootcamps. [1]: https://software.ac.uk/ [2]: https://sciencegateways.org/ [3]: https://ieeexplore.ieee.org/document/8109182/
    
    Speakers: Dr Michael Zentner (Purdue University), Dr Sandra Gesing (University of Notre Dame)
    
    Slides
- 16:30 → 18:00
  Lightning Talks¶ Main Auditorium
  
  Main Auditorium
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Karl Meyer (GÉANT)
  - 16:40
    
    WeNMR activities in the EOSC-Hub¶ 5m
    
    Structural biology deals with the characterization of the structural (atomic coordinates) and dynamic (fluctuation of atomic coordinates over time) properties of biological macromolecules and adducts thereof. Since 2010, the WeNMR project has implemented numerous web-based services to facilitate the use of advanced computational tools by researchers in the field, using the grid computational infrastructure provided by EGI [1]. These services have been further developed in subsequent initiatives, such as the West-Life VRC (www.west-life.eu). In particular, the latter project developed implementation of a cloud storage solution, called VirtualFolder [2], which allows the user to connect to her/his account on B2DROP or on public clouds. This solution has been implemented in several thematic portals in order to allow input data to be downloaded from and calculation results to be uploaded to the users cloud storage. Regarding AAI, the thematic portals are transitioning, also in response to the GDPR, to the EGI SSO or other systems that are compatible with it. Finally, all the thematic portals that send calculations to the grid infrastructure are now making use of DIRAC [3]. [1] Wassenaar TA, et al. WeNMR: Structural biology on the Grid. J. Grid. Computing 10:743-767, 2012 [2] https://portal.west-life.eu/virtualfolder/ [3] https://github.com/DIRACGrid/DIRAC
    
    Speaker: Antonio Rosato (CIRMMP)
    
    Slides
  - 16:45
    
    NeIC Dellingr project: long-term cross-border resource sharing.¶ 5m
    
    The NeIC Dellingr project is investigating how a lightweight framework for sharing High Performance Computing (HPC) resources can be implemented between participating countries. These resources will be open to eligible researchers from the participating countries who wish to access resources in other participating countries. A feature of this resource sharing includes the case where the computing project is performed in an HPC centre outside the home country of the researcher. National computing centres for academic research are generally funded by ministries responsible for scientific research and higher education – the same ministries that fund universities. The roles of computing centres and universities are distinct: Universities do scientific research and give education. Computing centres help them to reach their goals in these functions. Money allocated to computing centres should provide better, or at least, comparable results as the same amount allocated to universities. Resource exchange can advance scientific research and education in three ways. First, it can open new research opportunities. Users may have certain technical requirements regarding CPU performance and efficiency, memory size and bus bandwidth, disk storage size and speed, as well as type and speed of interconnects. Other factors users may consider are the type and version of compilers, software and system administration support, and also social and political factors such as available certifications of the system, the source of the electricity and terms of services of a particular resource provider. Users may also prefer one system to another simply based on the perceived ease-of-use, level of user support and other intangibles or subjective measures of an attribute of a particular system. If one centre does not have certain hardware or software that a research group needs, they can ideally use suitable resources made available from other countries. Secondly, resource exchange can balance temporary resource shortages, for example during computer procurements. In the time between when a cluster is decommissioned and the new cluster is available, it is good if the users do not need to wait for the newly commissioned system but instead can “borrow” resources from another provider. This long-term pool of shared resources can be used as temporary resources for users. Thirdly, another sharing scenario to consider is when the HPC resources in one country are constantly “overbooked” and queuing times become unacceptably long for the users, while in some other countries there might be an excess of free resources. This lightning talk will present the resource sharing models, the legal and policy issues. Also, the results of a first resource sharing pilot and a proposed second pilot, will be given.
    
    Speaker: John White (NeIC Nordic e-Infrastructure Collaboration)
    
    Slides
  - 16:50
    
    Preventing security incidents in the EOSC-hub era - by evolving Software Vulnerability Handling¶ 5m
    
    The EGI Software Vulnerability Group (SVG) has been handling software vulnerabilities in order to help prevent security incidents in the EGI infrastructure and its predecessor, the EGEE series of projects, for more than a decade. While the procedure has evolved somewhat it has remained focussed on the fairly well defined Grid and later cloud technologies having a fairly standard configuration, with vulnerabilities mostly investigated and risk assessed by the SVG 'Risk Assessment Team' or 'RAT'. During the last year it has become clear that major changes are needed to the way the SVG handles software vulnerabilities due to the proliferation of software and technology, other collaborating infrastructures, lack of homogeneity and above all the services in the EOSC-hub service catalogue. The current 'RAT' cannot be experts in all the various types of software and services, and how software is configured and deployed. Those selecting software or deploying services will need to take responsibility for investigating vulnerabilities in software used to enable their services and the risk to those services. This talk will describe how we plan to evolve the SVG issue handling procedure so that those who select and deploy software and services have a greater role in vulnerability handling, while aiming for a consistent risk assessment so that the most serious vulnerabilities get priority in their resolution. This will include plans for smooth communication with relevant parties such as experts in specific software, infrastructure providers, as well as those providing specific services in the EOSC-hub catalogue which depend on specific pieces of software. It will also inform service providers what they should do to help SVG to help them ensure that their services are as free from vulnerabilities as possible, to minimize the risk of security incidents due to software vulnerabilities concerning their services.
    
    Speaker: Linda Cornwall (STFC)
    
    Slides
  - 16:55
    
    Growing the data science community by expanding the CODATA/RDA school model¶ 5m
    
    Various reports have commented on the shortage of individuals skilled in Research Data Science worldwide, which limits the transformative effect of the data revolution. Given the extent of the shortage, models to rapidly increase the cohort of researchers equipped to do data science and empower them to be ambassadors for their fields in teaching others is required. The CODATA-RDA School for Research Data Science has established a successful two-week curriculum to provide a foundational level of data science skills to Early Career Researchers from a wide range of disciplinary backgrounds. The course covers the principles and practice of Open Science, research data management, using data platforms and infrastructures, data annotation, analysis, statistics, visualisation and modelling techniques. Students are taught in a computer lab setting with many hands-on exercises using open source tools, allowing them to learn new technologies and return home with access to the software they need. Since the inaugural school in Trieste, Italy in August 2016, annual events have continued in Italy and other regional hubs have been established in Latin America and Africa. In collaboration with the International Centre for Theoretical Physics (ICTP) and its sister sites, we are primarily bringing data schools to researchers from Lower and Middle Income Countries, with the intention of reducing the digital divide. There is however, a big demand for these schools across Europe, North America and Australasia too, provoking us to consider business models to increase the delivery of schools and grow the community of data scientists worldwide. The schools have helped many individuals to take their learning further. We run a student helper programme where participants return as classroom assistants to support the tutors facilitate hands-on exercises. This has offered new perspectives and increased the insights gained, enhancing their learning. (See: https://researchdata.springernature.com/users/81866-sara-el-jadid/posts/29719-enriching-my-learning-by-helping-others and https://researchdata.springernature.com/users/81847-marcela-alfaro-cordoba/posts/29656-my-journey-towards-open-science) Many have also gone on to run their own schools locally, increasing the reach of the schools on a train-the-trainer basis. This year, two previous students ran an Urban Data Science school in India, applying the lessons from the CODATA school to their peers. (See: https://shailygandhi.github.io/Urban-Data-Science-Curriculum-Development) This paper will report on the use of the CODATA school model by others and our plans to expand this. A one-week school was run in Australia this June, taking the course materials as a base. In order to scale up this provision, we are establishing a set of requirements and a process for others to replicate the content (e.g. to retain agreed core elements of the curriculum, to adopt recommended teaching styles, to use open tools so participants retain access etc). A fee structure is also being proposed to endorse / badge these affiliated schools as following the CODATA/RDA model, and to provide a sustainability model for the core LMIC provision.
    
    Speaker: Sarah Jones (UG)
  - 17:00
    
    Sustainable Research Software – Managing a Common Problem of SSH Infrastructures¶ 5m
    
    Research software enabling digital scholarly tools and services are major building blocks of open science in today’s research environment. This is apparent also for Research Infrastructures (RI) in the Social Sciences and Humanities (SSH) community.
    
    These RIs, such as *CESSDA*, *CLARIN* and *DARIAH*, have been set up to support scholars in their research and have a long tradition of supporting open science, particularly through their FAIR data management solutions. One of the major challenges emerging from the operation of these digital RIs is the sustainable management of the research software used to build the components. While general consensus reigns about the need to apply state-of-the-art software engineering principles and industry standards to the development and maintenance of software and services, the implementation proves hard.
    
    Continuing from a joint workshop in 2017 we are currently undertaking measures to align existing efforts towards a common understanding of technical requirements and recommendations. This includes the Software Maturity Modelling developed by CESSDA, the Software Quality Guidelines developed by CLARIAH and the Technical Reference originating from DARIAH.
    
    Building upon these technical foundations, we also want to help promoting software best practices in teaching and education, ideally as part of curricula, to widen awareness of software quality requirements throughout the research community and their software engineers. While adding further requirements to software projects invariably leads to increased development cost and time, re-usability of software and thus reproducibility of the results must become an everyday research practice. Just as classic publications and increasingly research datasets are subject to quality assurance, the softwares used to create them must be as well in order to fully support the research process to advance scholarly and scientific knowledge through open science.
    
    This cooperation is being streamlined under the umbrella EURISE Network, where research infrastructures meet research software engineers, to strengthen the combined foundations for future collaborations of e-infrastructures and the emerging EOSC. We present the current state of this initiative and explain ongoing efforts towards a common set of guidelines and evaluation criteria. We explain why and how our emphasis on improving software quality will ultimately benefit openness and re-usability of science and research data.
    
    Speakers: Dr Carsten Thiel (CESSDA ERIC), Dr Tibor Kalman (GWDG)
    
    Slides
  - 17:05
    
    Federated Identity Management for Research¶ 5m
    
    Granting researchers access to our Digital Infrastructures is a fundamental step in Serving the User Base - this year’s conference theme. However, providing a secure, user friendly, reliable Authentication and Authorisation Infrastructure (AAI) is not a walk in the park for Research Communities. Challenges range from attribute release, to operational support, to non web access, with many Communities looking outside to technology providers and generic e-Infrastructrues to find a sustainable solution for their critical components. 2018 saw over 20 Research Fields come together and expose their common requirements for Federated Identity from the wider community. These requirements, and a related set of recommendations, can be found at [https://fim4r.org/documents/][1] and are already being incorporated into the road maps of future projects. We present an overview of the insights collected by the FIM4R Research Communities and look to the future. How will the recommendations help to shape the evolution of Federated Identity Management? [1]: https://fim4r.org/documents/
    
    Speaker: Ms Hannah Short (CERN)
    
    Slides
  - 17:10
    
    Interdisciplinary research data management service for the whole universities and research institutions in Japan that emphasizes research integrity¶ 5m
    
    This research describes the development progress as of 2018 of GakuNin RDM (1), a nationwide research data management (RDM) service promoted by the Cabinet Office of the Japanese government. GakuNin (2) means the academic access management federation in Japan. Also, RDM is an acronym for research data management. First, in Japan, the results of public fund research are becoming public in principle. Besides, researchers are obliged to submit a data management plan to save research data for ten years from the viewpoint of research integrity. Although infrastructure and guidelines for unified research data management do not yet exist, it is requested by the government that academic institutions should develop them as soon as possible. Second, The National Institute of Informatics (NII) of Research Organization of Information and Systems (ROIS) in Japan provides research data infrastructure to 850 domestic academic institutions. For example, SINET (3) is 100 Gbps high-speed network for science that is connected to GÉANT and Internete2. GakuNin is authentication federation that it has corresponded eduGAIN, GakuNin Cloud is Cloud consulting service for a university, CiNii (4) Research is a discovery service of academic information for research data, JAIRO Cloud is SaaS of an institutional repository that is based on the repository software WEKO (5). OpenAIRE is harvesting all research article metadata from the institutional repository database by NII. Third, The Research Center for Open Science and Data Platform (RCOS) of NII began developing GakuNin RDM to respond to the request for support of research data management from academic institutions IT centers, libraries, university research administrators, legal intellectual property departments and boards since 2016. We have adopted the Open Science Framework (OSF) (6) for the core system of GakuNin RDM. In open science promotion in Japan, it is focused on preventing scientific misconduct. In particular, we extended the function of stamping commercial timestamps for file operations and provided operation logs to store aggregated research data. Also, We have developed functions for administrators to customize the user interface for each university and research institution. It includes functions to control the use of GakuNin RDM add-ons, create usage statistics reports and announce them to administrators and functions for announcing to end users by institution managers. Furthermore, we developed several add-ons not found in OSF and strengthened GakuNin RDM as research data infrastructure. For example, an add-on to WEKO that is used more than 500 organizations in Japan, a cooperation add-on between JupyterHub, a data analysis platform, and workflow tool Galaxy. In this research, we introduce Japanese research data management service and discuss whether we can collaborate with European research data infrastructure. **References** (1): https://doi.org/10.1109/IIAI-AAI.2017.144 (2): https://doi.org/10.1109/SAINT.2010.14 (3): https://doi.org/10.1109/ICUFN.2016.7536928 (4): http://dl.acm.org/citation.cfm?id=1670638.1670658 (5): https://doi.org/10.1007/978-3-319-23207-2_40 (6): https://doi.org/10.5195/jmla.2017.88
    
    Speaker: Dr Yusuke Komiyama (National Institute of Informatics)
  - 17:15
    
    A campus wide ePosters management system: KAUST Library initiative to build Digital Infrastructure to promote Open Access and Digital Preservation¶ 5m
    
    King Abdullah University of Science and Technology (KAUST), established in 2009 as an international research University in Saudi Arabia, has adopted the first Open Access mandate for scientific publications in the region and leads with a well-established research repository managed and promoted by the University Library. Having several scientific poster events annually at the campus, with hosting supported by the Library, printed posters have remained static, highly localized and short-lived. These characteristics are at odds with what is often the first formal communication of scientific research and, as such would be of great interest to other researchers. Addressing these limitations was a major motivator behind the trialing of an ePoster alternative at KAUST. This project was conceived, piloted and will be implemented and managed by the University Library, in collaboration with IT Services. In addition to digitally capturing research content for display and preservation, ePoster functionality changes the engagement dynamics whilst helping to bridge the gap between academia and professional practice. ePosters have been extensively embraced by international professional organizations, however, academic institutions remain bound to printed posters. This project identified a short-list of possible companies that responded to criteria identified by KAUST as requirements for its campus wide ePoster management system. The evaluation process included student and researcher participation, as well as webinars and demonstrations and culminated with site visits to the company headquarters of the two finalists. The preferred supplier was then involved with several pilot conferences at KAUST to demonstrate their system’s capabilities and, as importantly, to expose academic staff to ePosters in operational settings. Surveys were conducted of conference participants, academic staff, students and conference organizers to obtain feedback and reaction to this approach. Advantages were both obvious and embraced by respondents; they appreciated the functionality which included ongoing editing and/or updating of content by authors, the ability of organizers to monitor progress of submissions and control content display and statistics being available via a dashboard. ePoster presentations engage the audience better; they are more interactive, dynamic and informative as a result of incorporating high resolution images and videos (with associated zoom capabilities) and audio. In addition, the elimination of print and poster mounting aligns with KAUST commitment to environmental stewardship and open access to scientific output through a direct upload of content to the Research Repository. Interest in ePosters is expanding; this has seen the Library involved in associated skills training and outreach. Academia is notably behind this practitioner-driven trend. KAUST Library believes that, by rolling out an ePoster system to the University, it is the first campus in the world to offer this as a campus-wide solution, truly reflecting a digital smart campus vision of KAUST.
    
    Speaker: Garry Hall (King Abdullah University of Science and Technology)
  - 17:25
    
    De-provisioning in context of AAI¶ 5m
    
    De-provisioning of users’ data from the end service is an important yet often neglected aspect of the whole AAI lifecycle. Services need to be notified when the user left the organization/project so that they can initiate the clean-up processes. De-provisioning becomes a big issue especially in the context of management of private data (incl. GDRP) and services which hold users’ persistent data. Moreover, the de-provisioning mechanisms can be used as a part of security incidents mitigation process to disable or suspend compromised accounts. In the lightning talk, we will emphasize the critical aspects of de-provisioning processes and demonstrate the requirements on particular use-cases.
    
    Speaker: Slavek Licehammer (CESNET)
    
    Slides
  - 17:30
    
    Challenges in building Virtual Research Environments¶ 5m
    
    Virtual Research Environments (VRE) are trending. As data and processing become bigger, more distributed and more collaborative, more and more research communities call for a VRE to execute data-drived science on the cloud. The advantages are obvious: Processing is no longer bound by the user's laptop's computing power and memory. Large datasets do not have to be downloaded to local disk before they can be processed. This is an advantage especially for researchers from institutions or locations where access to good hardware, large network bandwidth or performant computing facilities are difficult to obtain. To be attractive and useful to users, VREs need to provide efficient access to interesting datasets. In conjunction with Open Data, accessed efficiently through a VRE, they can be an catalyst for Open Science. If designed with this intention, processing results can easily be shared and openly published in their turn. Similarly, the processing workflows can often be made available to and reproducible by others. This encourages the FAIRness of not only the data, but the processing services. Developing such a VRE holds some challenges, mainly because of the multitude of actors and tools. Within this lighting talk, examples from the geosciences will be used to highlight specific challenges and solutions. On a desktop, every researcher puts together their own collection of resources, tools and applications. A VRE generally tries to replace that desktop environment by an online environment. As such it is usually aimed at a larger group of different users and thus has to cater to varied needs. The tools that researchers use for their work exist already and need to be incorporated into the VRE. They may be quite diverse, of diverse programming languages, frameworks, etc. As re-implementing the tools is of course not an option, a way must be found that allows to efficiently integrate diverse existing applications into a common VRE and to keep the VRE extensible for future services to be included. Another challenge is the development mode. Often, VREs are not commercial software products developed by commercial software companies, but they are developed in research communities or in consortia between research institutions and research infrastructure providers. This leads to distributed, heterogeneous development teams, with additional efforts for manage and have effective communications. Closely related to this is the funding scheme. Not being commercial products, the development and the operation of a VRE need to be funded through other mechanisms. Typical ones are those of H2020, national or even cross-institutional fundings. Such programs usually fund development efforts, but often operations and hardware acquisition are not sufficiently funded.
    
    Speaker: Merret Buurman (German Climate Computing Centre (DKRZ))
  - 17:35
    
    eInfraCentral – Helping users focus on being users¶ 5m
    
    eInfraCentral is a coordination and support action funded under the EU’s Horizon 2020 framework programme. Its mission is to ensure that by 2020 **a broader and more varied set of users benefits from European e-Infrastructures**. eInfraCentral is one of the key initiatives **driving implementation of the European Open Science Cloud.** The talk will consist of three parts: **1. The challenge for research communities:** Due to a fragmented e-infrastructure landscape, end-users, such as researchers, innovators or industry actors, often are unaware of the e-infrastructure services available in Europe that could aid their work. Similarly, service providers and data producers have difficulty reaching out to potential users due to the lack of coordination and harmonisation across various e-infrastructures. Even if users find out about the availability of a certain e-service, it is difficult to gather further information and compare it with other existing services. Service providers also lack user feedback on the ways they could improve their offerings. This leads to inefficient funding patterns through the emergence of overlapping efforts and as such, slower rates of open innovation due to the lack of competition in the field. **2. eInfraCentral brings the solution:** eInfraCentral is one of the core initiatives in the implementation of the European Open Science Cloud, actively contributing to the building of the EOSC service catalogue and portal. eInfraCentral creates a unified online service catalogue where users can search, browse, compare and access e-services. Users can also rate services, helping service providers to improve their offerings, which is also aided by the availability of usage statistics on the service level. The eInfraCentral’s standard Service Description Template and catalogue were designed via an open and guided discussion with the e-Infrastructure community. This joint approach to defining and monitoring e-infrastructures services helps increase their uptake and enhances understanding of where improvements can be made in delivering and professionalising services. Moreover, eInfraCentral also facilitates the development of a shared language to describe services across the e-infrastructure community, fostering cooperation between infrastructure projects, communities and initiatives. EInfraCentral helps initiate new service offerings and to engage with a broader set of users and needs, thus speeding up the creation of innovation through Open Science. **3. Call to action:** The audience will be invited to engage with eInfraCentral in a number of ways, such as i) exploring the updated version of the eInfra Portal and leaving feedback that will help the project team improve it; ii) learning about eInfraCentral through the poster and website, and iii) following project developments by signing up to the newsletter and engaging with it through social media updates. The eInfraCentral team believes that the audience could greatly benefit from learning about eInfraCentral as many of the conference participants could utilise the project outcomes – the portal, the service catalogue and standard Service Description Template that will be fed into the development of the EOSC – in their daily work, both from the service provider and end-user/researcher side.
    
    Speaker: Jelena Angelis (European Future Innovation System (EFIS) Centre)
    
    Slides
  - 17:45
    
    EOSC-hub market research and business model analysis: call to action¶ 5m
    
    The EOSC-hub project is conducting a market analysis to increase the understanding of the demand for digital services and resources for research over the coming years. It also seeks to understand what are the suitable business and procurement models that would allow to reduce the time, effort and risk while increasing cost effectiveness, especially for organisations that lack procurement experience. The goal of this lightning talk is to advertise the current activities and stimulate engagement from the community in contributing by attending face-to-face interviews that will be conducted during the event or by filling online surveys.
    
    Speaker: Sergio Andreozzi (EGI.eu)
    
    Slides
- 16:30 → 18:00
  Tools to support researchers¶ Auditorium B203
  
  Auditorium B203
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Dr Alvaro Lopez Garcia (CSIC)
  - 16:30
    Open, Effective and Innovative tools to support researchers in Worldwide Infrastructures¶ 1h 25m
    
    Modern science is increasingly becoming computational. Therefore, for the future advance of science it will be indispensable to provide scientists with the proper computational tools, breaking down the technological barrier they have been facing so far. Due to the advent of the cloud computing model and orchestration tools, the resources once identified as sites in the e-Infrastructures have become “liquid” and highly dynamic. Sites can be created, destroyed, attached and detached from the infrastructure with few mouse clicks, at a time rate inconceivable only few years ago. Nowadays, use cases requiring sites with a customized configuration that need to interact with the rest of the infrastructure are becoming more and more frequent. Relevant examples are the use of resources temporarily available in HPC centers, or the creation of diskless sites to cope with peak user activity. To address these computational needs, new functionalities in the field of the data management and new paradigms involving hybrid computational resources have to be developed and implemented. As an example, the vision of bringing hybrid cloud solutions into applications is further pushed by additional use case scenarios, such as moving data from closely shielded HPC systems towards more open cloud systems, or applying advanced machine learning algorithms on top of large data streams (e.g. in intrusion detection systems). These recent advances offer potential solutions to the technological challenges represented by intensive computing use cases. Container technology allows moving entire computer applications over the internet so that they can be executed on various hardware platforms. Appropriate orchestrator appropriate orchestrator solutions able able to run applications on a hybrid cloud environment (i.e different infrastructures and environments included GPUs) are now now available. The development and the adoption of new solutions for the data lifecycle management, the federation of storage resources with standard protocolsand smart caching technologies will explicitly reduce data movements and improve access latency. Moreover, new storage models based on policy driven data management and Quality of Service, the metadata handling and manipulation and the data processing during ingestion will enable the data distribution depending on specific and complex policies aimed to speedup the analysis exploiting various storage types. This World Cafe session will cover the mentioned issues, showing technological advances in operation from a user perspective. In particular we aim to show how a user community could benefit from the services that are released from the DEEP-Hybrid DataCloud and the eXtreme DataCloud EU funded projects to better implement their user stories, with more powerful and easy to exploit approach. In order to make the European Open Science Cloud (EOSC) become a viable vision, those services are expected to become a reliable part of the final solutions available in the EOSC Service Catalogue and made available to researchers. Temptative agenda: - Understanding modern research requirements - Advanced services on Hybrid DataClouds - Advanced services on data management for distributed e-infrastructures - Common use cases scenarios - Solutions adopted by external communities - Discussion
    
    Speaker: Mr Alvaro Lopez Garcia (CSIC)
    
    Slides
    
    2018-10-10-DEEP-technology.pdf
    
    2018-10-10-world-cafe-intro.pdf
    
    2018-10-10-xdc-users.pdf
    
    DEEP-WorldCafe_DI4R_Lisbon_Ca.pdf
    
    XDC_DI4R_Oct2018_DEEP_Joint.pdf
Thursday, 11 October ¶
- 09:00 → 11:00
  Plenary¶ Main Auditorium
  
  Main Auditorium
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Volker Guelzow (DESY)
  - 09:15
    
    Michael Wise - SKA (Keynote 2)¶ 45m
  - 10:00
    
    Mike Payne - EPSRC (Topical 4)¶ 30m
    
    Slides
  - 10:30
    
    HPC Simulation of and Simulation on Quantum Computers and Quantum Annealers¶ 30m
    
    A quantum computer (QC) is a device that performs operations according to the rules of quantum theory. There are various types of QCs of which nowadays the two most important ones considered for practical realization are the gate-based QC and the quantum annealer (QA). Practical realizations of gate-based QCs consist of less than 100 qubits while QAs with more than 2000 qubits are commercially available. We present results of simulating on the IBM Quantum Experience devices with 5 and 16 qubits, on the CAS-Alibaba device with 11 qubits and on the D-Wave 2X QA with more than 1000 qubits. Simulations of both types of QCs are performed by first modeling them as quantum systems of interacting spin-1/2 particles and then emulating their dynamics by solving the time-dependent Schrödinger equation. Our software allows for the simulation of a 48-qubit gate-based universal QC on the Sunway TaihuLight and K supercomputers. References: K. Michielsen, M. Nocon, D. Willsch, F. Jin, T. Lippert, H. De Raedt, Benchmarking gate-based quantum computers, Comp. Phys. Comm. 220, 44 (2017) D. Willsch, M. Nocon, F. Jin, H. De Raedt, K. Michielsen, Gate error analysis in simulations of quantum computers with transmon qubits, Phys. Rev. A 96, 062302 (2017) H. De Raedt, F. Jin, D. Willsch, M. Nocon, N. Yoshioka, N. Ito, S. Yuan, K. Michielsen, Massively parallel quantum computer simulator, eleven years later, arXiv:1805.04708 D. Willsch, M. Nocon, F. Jin, H. De Raedt, K. Michielsen, Testing quantum fault tolerance on small systems, arXiv:1805.05227 K. Michielsen, F. Jin, and H. De Raedt, Solving 2-satisfiability problems on a quantum annealer (in preparation)
    
    Speaker: Kristel Michielsen (FZ Juelich)
    
    Slides
- 11:00 → 11:30
  
  Coffee 30m
- 11:30 → 13:00
  
  Building better collaborative national networks to support Open Science¶ Auditorium B203
  
  Auditorium B203
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  This session puts forward an interactive world-cafe style workshop which will build on previous joint workshops and meetings (during the Open Science Fair and DI4R in 2017, and as a webinar in 2018) to bring together national representatives of identified infrastructures, including OpenAIRE, EOSC-hub, GEANT, PRACE and RDA Europe. Project representatives and national coordinators will discuss how to work together and align on a range of activities, from engaging communities, support activities, outreach, FAIR data, training, open science, and policy. After a series of short introductions by the infrastructures, there will be a chance for updates on the collaboration and a few lightning talks by national representatives to present existing collaborations among their national nodes, highlighting good practices for others. The audience then will have the chance to discuss the status of collaborations inside national nodes; national activities and future possibilities for collaboration, integration, harmonisation; experiences in managing networks and engaging with research communities; gaps in the support of Open Science, and suggest ways of filling these by national or by international initiatives, particularly ones supported by the organiser OpenAIRE-Advance, EOSC-hub, GEANT, PRACE and RDA Europe.
  
  Notes-DiscussionGroup-East
- 11:30 → 13:00
  Innovation for Open Science with SMEs¶ Auditorium B104
  
  Auditorium B104
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Jelena Angelis (European Future Innovation System (EFIS) Centre)
  - 11:30
    
    Business or Research Project? A case study of the evolving business model of HUBzero¶ 15m
    
    Sustainability is a state that many science gateway efforts strive toward; however, this is still elusive. The HUBzero® platform has seen several phases of evolution on its sustainability path since it was first founded in 2007, and earlier existed as the infrastructure running the nanoHUB.org science gateway since 2002. A key learning is that there are several turning points that gradually take an effort from a project more toward operating as a business as it becomes self sustainable. The science gateway nanoHUB.org was created under the vision of Professor Mark Lundstrom at Purdue University in 1998 as a focused functionality site for submitting simulation jobs to high performance computing resources and downloading results. In 2002, nanoHUB.org became the online delivery vehicle for the newly funded Network for Computational Nanotechnology. As users desired more functionality, a larger software team was built with expertise in middleware, web front end, database, and operations. All of these functions one would begin to recognize as development and operations within a commercial software enterprise. Coupled with the growth of the software team, Professor Lundstrom added Professor Gerhard Klimeck to the team as technical director, and later as director. Professor Klimeck took nanoHUB.org beyond the visionary founding, and worked within the nano communities to scale up the user base. As the nanoHUB.org team grew, an annual National Science Foundation review panel suggested that the infrastructure could be used to run many science gateways, not just nanoHUB.org. At the same time, the software team was large enough that a career path beyond one project was desirable. In 2007, the team was therefore relocated from the research project to the Research Computing group in Information Technology, under the leadership of Dr. Michael McLennan and became the HUBzero group. The unit became responsible for its own revenue and began scaling out across many communities. Additional personnel were added to develop and run a reliable infrastructure with high uptime, to provide front line customer service, and to handle additional development tasks. The Purdue University model for operating such a group is a, “recharge center,” where the group is allowed to run in a non-profit manner. In 2015, Dr. McLennan left, and Dr. Michael Zentner became director. By this time, the HUBzero group had operated more than 30 science gateways, and many others were using the open source HUBzero platform to run gateways. A key learning was that the recharge center model hindered platform innovation. The original costs of operation did not include several essential functions: internal research and development to continue innovating the platform and to replace aging functionality, sales and marketing to continue to grow the community, and helping HUBzero clients sustain their science gateways beyond their initial funding period. Today the HUBzero team is comprised of 25 full time professionals, has operated cash flow positive for 3 consecutive years, and is addressing these needs by altering the team composition and adapting its platform and business offerings, including OneSciencePlace.org to sustain gateways.
    
    Speakers: Dr Michael Zentner (Purdue University, HUBzero), Dr Sandra Gesing (University of Notre Dame), Silvia Olabarriaga (University of Amsterdam)
  - 11:45
    
    Towards a common approach on KPIs from e‑Infrastructures¶ 15m
    
    Key Performance Indicators (KPIs) will play an important role in monitoring the development of projects and services of research infrastructures and e-Infrastructures, and their commitment to the principles of Open Science. They help measure the effectiveness of investments in infrastructure, and can provide convincing arguments for sustainable support to funders. The joint presentation of the EU-funded projects e-IRGSP5 and eInfraCentral will explain how the projects work together on developing methodologies to collect and aggregate Key Performance Indicators (KPIs) and other performance-related information from European e-Infrastructures and several key projects. e-IRGSP5 places a strong emphasis on financial and policy indicators and is interested in developing a broad overview of metrics used by e-infrastructures and related projects, whereas eInfraCentral is focused on operational KPIs. For objective criteria to exist and for any comparison to be meaningful, there needs to be an agreement among the e-infrastructures community on a lightweight and easy-to-use framework based on reliable data and meaningful metrics. Currently, some e‑Infrastructures have existing methodologies to assess their own performance. However, due to a lack of consensus on how KPIs should be categorized and calculated, this information is difficult to interpret and compare.Thus, a process to obtain, categorize and present them to the public has been defined and is being implemented, fostering collaboration across the e-infrastructure community. Along with the metrics gathered and their initial analysis, some state of the art KPI examples will be presented. These were obtained from specific projects tasked to suggest financial and policy related KPIs, such as e-Fiscal and LEARN, as well as the GEANT and EGI Compendia. From our analysis of these state of the art KPIs, we suggest a basic minimum set of general financial and policy KPIs for projects to adhere to, and to possibly expand upon with more project-specific KPIs. In addition, we will suggest a common vocabulary to define KPIs and related metrics, in an effort to standardize terminology across performance monitoring efforts in e-infrastructures. In addition, we will present the preliminary results of an active discussion with e-Infrastructures with the goal of developing a common approach that can be used to collect data and analyse KPIs. We believe that European e-Infrastructure projects and the EOSC can benefit from a structured exchange of know-how and practices on KPIs and related metrics in providing robust information to funders and policy makers on their success and in developing and improving the services offered.
    
    Speaker: Dr Fotis Karagiannis (Independent)
    
    Slides
  - 12:00
    
    Enlighten Your Research – travels around the world¶ 15m
    
    Enlighten Your Research (EYR) is a program designed to increase the use and awareness of e-infrastructure resources in various fields of research. The goal of this EYR is to provide access and support for network, compute, and storage resources to meet the growing data needs of research, in addition to inspiring new and understanding existing collaborations between Europe and another major regions of the world such as India with the NKN Network or the Eastern European Partnership (EaP) countries. The first EYR programme was started by SURFnet, the Dutch research and education network, to disseminate the adoption of point-to-point network connections for research collaborations. Over a couple of iterations of the Dutch EYR programme, and trying to further meet the needs of researchers, resources from other e-infrastructures (such as high performance computing hours, or programming expertise to process researchers’ data), were also included in the programme ‘awards’. The idea of the EYR programmes has now been taken up by GEANT to foster international research collaborations and to promote the use of GÉANT’s global links connecting European e-infrastructure resources. This Lightning Talk will feature the challenges of running the international programme Enlighten Your Research (EYR) as regional editions with the objective to initiate challenging international research collaborations in Networking and Data-Intensive Research and to foster cooperation between the pan-European e-infrastructure GÉANT and NREN Associations from other regions of the world.
    
    Speakers: Dr Leonie Schäfer (DFN e.V.), Mary Hester (SURFnet BV)
    
    Slides
- 11:30 → 13:00
  The Frontier of Data Discovery¶ Main Auditorium
  
  Main Auditorium
  
  Lisbon
  
  ISCTE, University of Lisbon
  To foster the idea of Open Science reproducibility and to stimulate the optimal use and reuse of research data, it can only be realised if data is consistently maintained according to the FAIR principles (findable, accessible, interoperable and re-usable) within a secure and trustworthy environment. In the current era in which data produced through science is exponentially growing, in more automatic ways, with a higher need to share among fellow researchers within and across scientific disciplines, making research data discoverable is an essential step. Scientific communities and data providers have adopted very different standards to describe scientific output. This makes it difficult to extract enough content related information to enable cross-disciplinary search and to link scientific output to publications. Scientific output is stored highly distributed across different European, National and/or in regional institutional and/or community-based repositories. Via OpenAIRE and EUDAT, cross-disciplinary data discovery services are being provided in which metadata from many of these repositories are being harvested and are presented in a simple and user-friendly way.
  
  In data discovery, there is a high reliance on data providers and on the quality of the information provided. The semantics in which relationships between datasets and publications are described are heterogeneous across communities. The granularity in which datasets are described is perceived in different ways across disciplines. A dataset can consist of single or a few objects or consists of a large number of objects referring to Terabytes or even Petabytes of data. For example, bio-databases of sequences can bear millions of links between one publication and millions of sequences and there is no formal way to identify sets of sequences. There are still many challenges to overcome, for example: lack of standards for or the use of licenses, poor descriptive metadata, heterogeneous ways to refer to format and schemas, how to link datasets to research communities, how versioned datasets can be referred or discovered, how to handle deduplication of links when information is collected at different places and how quality of data can be measured in terms of access (usage stats), liveliness (#versions), citations or on feedback from users. Where can we provide added value to the individual researcher, the research community and to other stakeholders active within the science domain?
  
  In this World Cafe session, we present the current state on data discovery, by presenting the work from the OpenAIRE and EOSC-hub project and from the angle of a community. Via a panel discussion, valuable feedback will be collected from the audience and presenters on the current status and future direction to improve data discovery.
  
  Target audience
  - Community representatives and data managers with interest to extend
    data discovery
  - Data repository owners to make research data findable
    through EOSC
  Convener: Mark Sanden (SURFsara BV)
  - 11:30
    
    OpenAIRE Research Community Dashboard¶ 20m
    
    Speaker: Paolo Manghi (Istituto di Scienza e Tecnologie dell'Informazione - CNR)
    
    Slides
  - 11:50
    
    EUDAT B2FIND¶ 20m
    
    Speaker: Claudia Martens (Deutsches Klimarechenzentrum / German Climate Computing Center)
    
    Slides
  - 12:10
    
    Metadata in ICOS¶ 15m
    
    Speaker: Alex Vermeulen (ICOS ERIC)
    
    Slides
  - 12:25
    
    Metadata in Astronomy - LOFAR¶ 15m
    
    Speaker: Oonk Oonk (SURFsara BV)
    
    Slides
  - 12:40
    
    Panel discussion¶ 20m
- 11:30 → 13:00
  Training: Security Incident Management¶ Room C103
  
  Room C103
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Dr Sven Gabriel (NIKHEF)
  - 11:30
    
    Security Incident Management in the EOSC era Part-1¶ 1h 30m
    
    The security training proposed here would be split into two sessions, focusing on different areas of incident handling. An important area that will be highlighted is the close collaboration of experts necessary for the successful resolution of a security incident in the EOSC era The first session targets the more technically oriented attendees. Here, after an introduction to forensics, the participants will have to analyse images provided by a security team of a FedCloud site. The results of the investigations will be used as input for the second session, where the case will be handled within a role-play involving the various service providers active in the EOSC-Hub project, including identity providers, SIRTFI, the service catalogue, and the infrastructures coordinated by EGI and EUDat. The goals of this training are twofold. Firstly, the collaboration of project members with a managerial background and those with a technical background will be explored. The second goal is to examine the existing set of policies and procedures to challenge them and identify possible issues. It is hoped that this will help to prioritize the security related activities within the EOSC-hub project.
    
    Speaker: Daniel Kouril (CESNET)
- 13:00 → 14:30
  
  Lunch 1h 30m
- 14:30 → 16:00
  Digital Innovation Hubs for Industry Engagement¶ Auditorium B104
  
  Auditorium B104
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Sy Holsinger (EGI.eu)
  - 14:30
    
    Session Intro¶ 5m
    
    Digital Innovation Hubs is a concept developed by the European Commission under the Digital Single Market as a mechanism for private companies to collaborate with public sector institutions in order to access technical services, research data, and human capital. There is a network of Digital Innovation Hubs in place across Europe, already supporting sectors such as manufacturing, internet of things, cybersecurity or cognitive computing. The EC aims to support a Pan-European network of DIHs and has earmarked 500M€ from Horizon 2020 budget to support the development of DIHs, of which 300M€ for WP2018-2020. The EOSC-hub Digital Industry Hub (DIH) will enrich the network by bringing private companies into the European Open Science Cloud through piloting concrete business cases. The EOSC-hub DIH builds on individual public e-Infrastructures business engagement programmes and outreach activities in place for several years. The added value brought through a joint effort is in packaging a wider variety of services and expertise into a more coherent offer that would otherwise have to be accessed individually or compiled on their own. In addition to supporting individual companies, one of the key activities of the EOSC-hub DIH is to connect with regional and pan-European networks of Digital Innovation Hubs. Therefore, this session is designed to 1.) showcase the EOSC-hub Digital Industry Hub (DIH) structure and engagement model, promote the availability of services for industry and highlight the variety of business pilots that will be starting to produce results and create new value 2.) gather existing European DIHs and initiatives to facilitate a closer collaboration with the European Open Science Cloud and further implement the EC objective of creating a pan-European network of DIHs.
    
    Speaker: Sy Holsinger (EGI.eu)
    
    Slides
  - 14:35
    
    EC DIH Initiatives: the wider context¶ 10m
    
    Speaker: Roberta Piscitelli (EGI.eu)
    
    Slides
  - 14:45
    
    EOSC Digital Innovation Hub (DIH): Digitizing Industry through EOSC-hub¶ 10m
    
    Speaker: Sy Holsinger (EGI.eu)
    
    Slides
  - 14:55
    
    EOSC-hub Commercialisation support services¶ 10m
    
    Speaker: Nuno Varandas (F6S)
    
    Slides
  - 15:05
    
    CloudiFacturing: the first wave of manufacturing SMEs supported by DIHs¶ 15m
    
    The mission of H2020 CloudiFacturing consortium is to optimize production processes and producibility of manufacturing SMEs using Cloud/HPC-based modelling and simulation. The supported experiments are leveraging online factory data and advanced data analytics. In this way, he CloudiFactoring project partners including Digital Innovation Hubs (DIHs) contributes to the competitiveness and resource efficiency of SMEs, ultimately fosters the vision of Factories 4.0 and the circular economy. In CloudiFacturing, more than 20 cross-border application experiments will be conducted in three waves. Seven experiments comprising the first wave have been already supported, while the participation in the second and third waves are organized via Open Calls. The experiments run across national borders. In order to increase the impact of the experiments, the project relies on its DIH network, and each experiment is accompanied by a dedicated DIH. The presentation will summarize the experiences with the first wave, the current achievements of involved DIHs, and the future plans/collaboration opportunities to maximize the impact with the assistance of DIHs.
    
    Speaker: Dr Robert Lovas (MTA SZTAKI)
    
    Slides
  - 15:20
    
    Distributed Compute Protocol: Credit-based monetisation of idle compute¶ 15m
    
    Modern day research requires extensive computing power. Researchers are competing for limited resources, either in availability or cost. The Distributed Compute Protocol (DCP) connects existing compute resources to researcher projects. Compute providers receive Distributed Compute Credits (DCC) in exchange for computing those projects. Credits can then be used to deploy new compute projects, or sold in DCP's global marketplace. By soaking up otherwise idle compute resources, DCP aims to support researchers and industry at a fraction of the cost of current commercial cloud computing services, disrupting existing market powers and accelerating compute-enabled research, innovation and discovery.
    
    Speaker: Dr Daniel Desjardins (Kings Distributed Systems Ltd.)
    
    Slides
  - 15:35
    
    Panel / Discussion / Q&A¶ 25m
- 14:30 → 16:00
  OpenAIRE services for Research Communities¶ Auditorium B203
  
  Auditorium B203
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Pedro Principe (University of Minho)
  
  slides
  - 14:30
    
    OpenAIRE service for Research Communities: Open Science as-a-Service¶ 20m
    
    OpenAIRE-Connect fosters transparent evaluation of results and facilitates reproducibility of science for research communities by enabling a scientific communication ecosystem supporting exchange of artefacts, software, packages of artefacts, and links between them across communities and across content providers. To this aim, OpenAIRE-Connect is introducing and implementing the concept of Open Science as a Service (OSaaS) on top of the existing OpenAIRE infrastructure (www.openaire.eu), by delivering out-of-the-box, on-demand deployable tools in support of Open Science. OpenAIRE-Connect is realizing and leveraging the uptake of two new services that build on and extend the existing OpenAIRE technical and networking infrastructure, to stimulate a technical and cultural shift towards a scholarly communication ecosystem supporting more effective/transparent evaluation and reproducibility of research results. The first service serves research communities to (i) publish research artefacts (packages and links), and (ii) monitor their research impact. The second service engages and mobilizes content providers, and serves them with facilities enabling notification-based exchange of research artefacts, to leverage their transition towards Open Science paradigms. Both services will be served on-demand according to the OSaaS approach, hence be re-usable by different disciplines and providers, each with different practices and maturity levels. This World Cafe session will present the new OpenAIRE service for Research Communities, showcasing real use cases from five pilot communities (i: Neuroinformatics from France Life Imaging national infrastructure; ii: European Marine Science from Pangaea and Atlas community; iii: Cultural Heritage and Digital Humanities from the PARTHENOS research infrastructure; iv: Fisheries and aquaculture management from the BlueBridge and MARBEC infrastructures; and v: Environment & Economy from the national/EU node of the United Nations Sustainable Development Solutions Network), addressing community based solutions, and will also demonstrate the features available for research initiatives and infrastructures. The session will discuss the future challenges and the next steps to extend the OSaaS tools for research communities. The OpenAIRE Research Community Dashboard, to be presented and discussed during the World Café session, is the service that offers access to a virtual space (a graph) including metadata descriptions of all products relevant to the community as well as links between such products; the graph is built by i) scientists depositing their products (via Zenodo) or claiming products and links (associating a DOI to the community, specifying a link between products) or (ii) by services collecting product metadata and links from a number of content providers, ranging from publications repositories to data repositories and repositories of other kinds of products.
    
    Speakers: Paolo Manghi (Istituto di Scienza e Tecnologie dell'Informazione - CNR), Pedro Principe (University of Minho)
    
    summary
- 14:30 → 16:00
  Thematic Services: Environmental Sciences¶ Main Auditorium
  
  Main Auditorium
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Bjorn Backeberg (EGI.eu)
  - 14:30
    
    The Urban TEP – Analysis of Multi-Source Data for Innovative Urban Monitoring¶ 15m
    
    Settlements and urban areas represent the cores of human activity and development. Besides climate change, urbanization represents one of the most relevant developments related to the human presence on the planet. Both global trends challenge our environmental, societal and economic development. In this context, the availability of and access to accurate, detailed and up-to-date information will impact decision making processes all over the world. The suite of Sentinel Earth Observation (EO) satellites in combination with their free and open access data policy contributes to a spatially and temporally detailed monitoring of the Earth’s surface. At the same time a multitude of additional sources of open geo-data is available – e.g. from national or international statistics or land surveying offices, volunteered geographic information or social media. However, the capability to effectively and efficiently access, process, and jointly analyze the mass data collections poses a key technical challenge. The Urban Thematic Exploitation Platform (U-TEP), funded by the European Space Agency (ESA), is developed to provide end-to-end and ready-to-use solutions for a broad spectrum of users (experts and non-experts) to extract unique information/ indicators required for urban management and sustainability. The key components of the system are an open, web-based portal, which is connected to distributed high-level computing infrastructures and providing key functionalities for i) high-performance data access and processing, ii) modular and generic state-of-the art pre-processing, analysis, and visualization, iii) customized development and sharing of algorithms, products and services, and iv) networking and communication. U-TEP aims at opening up new opportunities to facilitate effective and efficient urban management and the safeguarding of livable cities by systematically exploring the unique EO capabilities in Europe in combination with the big data perspective arising from the constantly growing sources of geo-data. The capabilities of participation and sharing of knowledge by using new media and ways of communication will help to boost interdisciplinary applications with an urban background. The services and functionalities are supposed to enable any interested user to easily exploit and generate thematic information on the status and development of the environment based on EO data and technologies. The innovative character of U-TEP platform in terms of available data and processing and analysis functionalities attracted already a large user community (>300 institutions from >40 countries) of diverse users (i.a. from science, public institutions, NGOs, industry).
    
    Speaker: Dr Felix Bachofer (German Aerospace Center (DLR))
    
    Slides
  - 14:45
    
    Development of the new Research Infrastructure for Europe’s Natural Science Collections using novel building blocks in EOSC¶ 15m
    
    [DiSSCo][1], a Distributed System of Scientific Collections, is a Research Infrastructure (RI) included in the ESFRI 2018 Roadmap with over hundred self-sustaining partners in Europe aiming at providing unified physical and digital (data) access to the approximately 1.5 billion biological and geological specimens in collections distributed across Europe. DiSSCo will transform the currently scattered provision of collection data across the continent into one set of services providing unified specimen data at the scale, quality and FAIRness ((Findable, Accessible, Interoperable, Reusable) required for excellent research. It will repackage specimen data as Digital Specimen Digital Objects (DSDOs) to integrate and link these with data from other domains in the future Internet of FAIR Data and Services (IFDS) supporting the European Open Science Cloud (EOSC). In the European landscape of environmental Research Infrastructures, the effectiveness of services that aim at aggregating, monitoring, analysing and modelling geo-diversity information relies on the primary description of the bio- and geo-diversity. It also relies on the availability of this primary reference data that today is scattered and disconnected. Many RIs in environment and other fields have links to biodiversity, and biodiversity loss is many times mentioned as one of the biggest societal challenges. DiSSCo provides the required bio-geographical, taxonomic and species trait data at the level of precision and accuracy required to enable and speed up research towards achieving the Targets of the Sustainable Development Goals for Life on Earth, Life below Water and Climate Action. Novel building blocks in EOSC are required for the development and successful operation of DiSSCo to deliver data at the economies of scale and scope needed. Examples of such building blocks are portable research data packaging formats, a distributed file system like IPFS (InterPlanetary File System) that can scale, verification and audit mechanisms to control FAIRness and what needs to be stored, plus novel index, discovery and linkage mechanisms. RDA (Research Data Alliance) and groups like C2Camp (a [Go-FAIR Implementation Network][2]) are already working on recommendations and guidelines and test implementations in this area towards an infrastructure of Digital Objects, but further development of TDWG standards, practices developed in the CETAF, Consortium of European Taxonomic Facilities network and novel technological approaches for e.g. large scale digitisation are also needed to deliver data at the economies of scale and scope needed. In the presentation, we: - discuss technical barriers for interoperability and possible action lines to overcome these including practices and technologies to underpin the FAIR data principles; - outline the unified DiSSCo API (Application Programming Interface) services to provide data suitable for thematic services in environmental Research Infrastructures like LifeWatch, eLTER (European Long-Term Ecosystem and socio- ecological Research Infrastructure) as well as RIs in other domains such as E-RIHS (European Research Infrastructure for Heritage Science) in the field of social sciences; - explain the DiSSCo strategy to align project outcomes and standards development towards a common unified research infrastructure. [1]: http://www.dissco.eu [2]: https://www.go-fair.org/implementation-networks/
    
    Speaker: Wouter Addink (Naturalis Biodiversity Center)
    
    Slides
  - 15:00
    
    DARE as a platform to support Climate Data Analytics using Cloud Infrastructures¶ 15m
    
    Supporting data analytics in climate research with respect to data access is a challenge due to increasing data volumes, especially for end users, as the whole climate data archive is expected to reach a volume of 30 Pb in 2018 and up to 2000 Pb in 2022. Several international and European initiatives have emerged and provide standalone solutions that offer potential for interoperability. The DARE e-science platform (http://project-dare.eu) is designed for efficient and traceable development of complex experiments and domain-specific services on the Cloud. In Europe, the IS-ENES (https://is.enes.org) consortium has developed a platform, that is a component of the ENES CDI (Climate Data Infrastructure), to ease access to climate data for the climate impact community (C4I: https://climate4impact.eu). One of the important aspect of the C4I platform is that it enables users to perform on-demand data analysis calculations through its backbone based on a collection of OGC WPS (Web Processing Service). These, coupled with authorization mechanisms based on access tokens, enable the delegation of the calculations onto distributed infrastructures and the controlled management of the results. These characteristics have been further extended with provenance integration, especially to obtain the traceable calculation of climate impact indicators, in the context of the FP7-CLIPC project. A solution based on a standard representation (W3C-PROV) and a set of lineage management and workflows tools that will scale to other computational use cases, and that will be interoperable with ongoing European initiatives. In the DARE project, the provenance system will be also built on top of W3C-PROV, ensuring interoperability. DARE will also integrate services from the EUDAT CDI, enabling generic access and cross-domain interoperability, as well as providing compliance and integration with the future EOSC platform. As DARE will use containerization technologies, it will be easily deployed on heterogeneous architectures. A scientific pilot has been designed within the DARE project for the ENES community (climate domain). The objectives are to enable delegation of on-demand computational-intensive calculations to the DARE platform. In the presented Use Case, on-demand data analytics will be initiated on the IS-ENES C4I platform by end users of climate data, in a seamless fashion. A schematics of the architecture and Use Case will be presented, along with initial development status.
    
    Speaker: Christian Page (CERFACS)
    
    Slides
  - 15:15
    
    Enabling Reproducible Computing on the EPOS ICS-D¶ 15m
    
    The EPOS-IP project is implementing solutions to enable user-driven reproducible computations exploiting the large wealth of data, data products and software discoverable through its Centralised Integrated Core Services (ICS)-C catalogue. The actual data is accessible through the web services that are managed by geographically distributed and interdisciplinary RIs organised in Thematic Communities called “Thematic Core Services”. The variety of methodologies and interoperability requirements between data and software suggests the need for identifying and implement general use cases supported by flexible and scalable e-Science solutions. These must be integrated in the EPOS architecture with the preliminary objective of assisting the users in basic tasks, such as allocation of computational and storage resources and data-staging, incrementally accommodating more complex computational scenarios and reusable workflows. We will present the approach envisaged for the integration of processing functionalities within the EPOS ICS portal. It will allow users to develop and execute new data-intensive methods and workflows within dedicated processing environments that are implemented as Jupyter notebooks and that are associated with contextual workspaces. Users of the EPOS ICS portal will select the data to be staged from one of their workspace, after having populated it with search results of interest obtained from the ICS catalogue. Such service requires the data to be staged to remote computational facilities that adopts software containerisation and infrastructure orchestration technologies (Docker Swarm, Kubernetes) to dynamically allocate and prepare the needed resources. These will be heterogeneous and managed by national and European e-Infrastructures that will constitute the EPOS Distributed Integrated Core Services (ICS-D). We envisage that, beyond staging, many common operations could be encoded as configurable scientific workflows that will automatically preprocess the data before repurposing it to the researcher for further analysis, suggesting the need of a workflow as a service (WaaS) interface. Once data is staged and preprocessed, users can then define and evaluate their own methods via traditional scripting or still adopting advanced workflow technologies. Thanks to containerisation, special attention is dedicated to portability and reproducibility of the processing environments, thereby allowing user to explicitly save, trace and access the different stages of their progress. Moreover, we will illustrate the approach for the adoption and integration of scientific workflow tools (CWL, dispel4py), that include validation and monitoring services. These are implemented on top of a provenance model and management system (S-ProvFlow), that allows the exploration of large lineage collections describing the obtained results. The system offers access to multi-layered, context-rich provenance information through interactive tools. We will discuss the importance of the communication of such service with the EPOS ICS-C catalog and how it will contribute to produce and ultimately deliver research data that comply to the FAIR principles (Findable, Accessible, Interoperable and Reusable). The activities will be also presented in the scope of the cooperation with ongoing H2020 initiative such the newly funded project DARE (Delivering Agile Research Excellence on European e-Infrastructures).
    
    Speaker: Alessandro Spinuso (KNMI)
    
    Slides
  - 15:30
    
    OPENCoastS On-demand Operational Coastal Circulation Forecast Service¶ 15m
    
    Seas and oceans are important drivers for the European economy and they need to be preserved and developed in a sustainable way. OPENCoastS provides on-demand coastal circulation forecasts systems that are useful in research and in many other areas of human activity. The forecast systems can be setup by the end-users for a given region of interest of the European Atlantic coast. They run daily and predict water levels, 2D velocities and wave parameters for periods between 48 to 72 hours. The service was developed by the Portuguese National Civil Engineering Laboratory (LNEC) in 2010 as WIFF (Water Information Forecast Framework) and uses the SCHISM modeling system. The deployment of forecast systems requires strong knowledge of coastal processes and IT, along with access to significant computational and storage resources. The OPENCoastS service offers a user friendly web interface and a back-end that takes care of all complexity. This approach reduces the barriers to the adoption and use of coastal circulation forecasts systems making them available to a much broader audience. The system has been producing 48-hour forecasts on a daily basis for the Portuguese coast and is running on High-Throughput Compute and storage resources provided by the Portuguese National Distributed Computing Infrastructure (INCD). In the context of EOSC-hub, LNEC is working with LIP, INCD, University of Cantabria and University of La Rochelle to open the service to users from other European countries so they can also benefit from this innovative service. To cope with the internationalisation the system is being enhanced to include federated AAI, resilient scheduling to distributed computing resources, accounting, data management and long-term data storage. The system is now integrated with the EGI-checkin for federated AAI enabling simpler user authentication. The front-end has been split into components that can be instantiated in IaaS cloud systems such as the EGI fedcloud, the use of INDIGO orchestration for cloud services deployment is planned. For increased compute capacity and resilience the simulations can be scheduled to the EGI High Throughput Computing service via a DIRAC scheduling system also provided by EGI within EOSC-hub. To provide independence and encapsulation the application components are encapsulated in Linux containers. The use of EUDAT services for long-term storage and/or data preservation is also being considered. In this presentation we will describe the details and challenges of adapting and deploying a complex application service that is both compute and data intensive, and exploits multiple computing paradigms such as cloud computing and high throughput computing across multiple locations taking advantage of pan-European services made available by several infrastructures and technology providers.
    
    Speakers: Anabela Oliveira (National Laboratory for Civil Engineers), Joana Teixeira (LNEC - Laboratório Nacional de Engenharia Civil), Joao Rogeiro (LNEC - Laboratório Nacional de Engenharia Civil)
    
    Slides
- 14:30 → 16:00
  Training: Security Incident Management¶ Room C103
  
  Room C103
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Convener: Dr Sven Gabriel (NIKHEF)
  - 14:30
    
    Security Incident Management in the EOSC era Part-2¶ 1h 30m
    
    The security training proposed here would be split into two sessions, focusing on different areas of incident handling. An important area that will be highlighted is the close collaboration of experts necessary for the successful resolution of a security incident in the EOSC era The first session targets the more technically oriented attendees. Here, after an introduction to forensics, the participants will have to analyse images provided by a security team of a FedCloud site. The results of the investigations will be used as input for the second session, where the case will be handled within a role-play involving the various service providers active in the EOSC-Hub project, including identity providers, SIRTFI, the service catalogue, and the infrastructures coordinated by EGI and EUDat. The goals of this training are twofold. Firstly, the collaboration of project members with a managerial background and those with a technical background will be explored. The second goal is to examine the existing set of policies and procedures to challenge them and identify possible issues. It is hoped that this will help to prioritize the security related activities within the EOSC-hub project.
    
    Speakers: Daniel Kouril (CESNET), Dr David Crooks (UG), David Groep (NIKHEF), Dr Sven Gabriel (NIKHEF), Urpo Kaila (CSC), Vincent Brillault (CERN)
- 16:00 → 16:30
  
  Closing Plenary and Awards¶ Main Auditorium
  
  Main Auditorium
  
  Lisbon
  
  ISCTE, University of Lisbon
  
  Wrap-up of DI4R 2018
  
  Convener: Prof. Sinead Ryan (Trinity College Dublin)
- 16:30 → 17:00
  
  Farewell Coffee 30m

Digital Infrastructures for Research 2018

Lisbon

Main Auditorium

Lisbon

Main Auditorium

Lisbon

Auditorium B104

Lisbon

Auditorium JJLaginha

Lisbon

Auditorium B203

Lisbon

Main Auditorium

Lisbon

Auditorium B104

Lisbon

Auditorium JJLaginha

Lisbon

Auditorium B203

Lisbon

Auditorium B104

Lisbon

Auditorium B203

Lisbon

Auditorium JJLaginha

Lisbon

Main Auditorium

Lisbon

Lunch place (Refeitorio)

Lisbon

Main Auditorium

Lisbon

Auditorium B104

Lisbon

Auditorium B203

Lisbon

Auditorium JJLaginha

Lisbon

Auditorium JJLaginha

Lisbon

Main Auditorium

Lisbon

Auditorium B104

Lisbon

Auditorium B203

Lisbon

Auditorium B104

Lisbon

Main Auditorium

Lisbon

Auditorium B203

Lisbon

Main Auditorium

Lisbon

Auditorium B203

Lisbon

Auditorium B104

Lisbon

Main Auditorium

Lisbon

Room C103

Lisbon

Auditorium B104

Lisbon

Auditorium B203

Lisbon

Main Auditorium

Lisbon

Room C103

Lisbon

Main Auditorium

Lisbon