Digital Infrastructures for Research 2017

Europe/Brussels
The Square Meeting Centre

The Square Meeting Centre

Mont des Arts street, no. 1000 Brussel, Belgium
Description

The official website of DI4R 2017 is: https://www.digitalinfrastructures.eu/

The official Twitter feed of DI4R 2017 is: @DI4R_eu 

 

Europe's leading e-infrastructures, EGI, EUDAT, GÉANT, OpenAIRE, PRACE and RDA Europe, invite all researchers, developers and service providers for two days of brainstorming and discussions at the Digital Infrastructures for Research 2017 event, from 30 November to 1 December 2017. 

Under the theme “Connecting the building blocks for Open Science”, the 2017 edition of the DI4R conference will showcase the policies, processes, best practices, data and services that, leveraging today’s initiatives – national, regional, European and international – are the building blocks of the European Open Science Cloud and European Data Infrastructure. The overarching goal is to demonstrate how open science, higher education and innovators can benefit from these building blocks, and ultimately to advance integration and cooperation between initiatives.

The event is collocated with the EOSCpilot 1st Stakeholder Engagement Event taking place on the 28 & 29 November 2017.

  • Thursday, 30 November
    • Opening Plenary Copper Room

      Copper Room

      The Square Meeting Centre

      Mont des Arts street, no. 1000 Brussel, Belgium
      Convener: Laurent Ghys (Belgian Science Policy Office)
      • 1
        Welcome message, Belgian Federal Science Policy Office - BELSPO
        Speaker: Laurent Ghys (Belgian Science Policy Office)
        Slides
      • 2
        The Big Data challenge of the Human Brain Project – building an infrastructure to address human brain complexity
        Speaker: Prof. Katrin Amunts (Forschungszentrum Jülich/ RWTH Aachen University / Heinrich-Heine University Duesseldorf)
        Slides
      • 3
        eInfrastructures and the EOSC: state of play and outlook for FP9
        Speaker: Thomas Skordas (Head of Director for Digital Excellence and Science Infrastructure, European Commission)
        Slides
    • 10:30
      Coffee Break The Square, Brussels Meeting Centre

      The Square, Brussels Meeting Centre

    • Data science and skills presentations 214 & 216 (The Square, Brussels Meeting Centre)

      214 & 216

      The Square, Brussels Meeting Centre

      Convener: Marjan Grootveld (DANS-KNAW)
      • 4
        CODATA/RDA Research Data Science Schools for Low and Middle Income Countries
        The ever-increasing volume and variety of data being generated impacts academia and the private sector. Contemporary research and evidence-based decision making cannot be done effectively without a range of data-related skills, such as, but not limited to, the principles and practice of Open Science, research data management and curation, data platforms and infrastructures implementation, data analysis, statistics, visualisation and modeling techniques, software development, and annotation. We define ‘Research Data Science’ as the ensemble of these skills. While the international, collective ability to create, share, and analyze vast quantities of data is profound, there remains a shortage of individuals skilled in Research Data Science worldwide, which limits this transformative effect. With the appropriate data training however, the 'Data Revolution' offers great opportunities for students and professionals with modern data skills, such as when entering a job market where these skills are in demand or conducting research. The CODATA-RDA School for Research Data Science brings these concepts and tools to communities that may not have been introduced to the wide range of open resources currently available. This two week course builds core data science skills and introduces open tools and resources for researchers. We will introduce the foundational schools that will take place this year in Trieste, Italy and Sao Paulo, Brazil to encourage participation from communities worldwide. We will also discuss the framework of course materials and structure that allows regional instances of the School for Research Data Science, as well as touching on future sustainability models. The CODATA/RDA Schools for Research Data Science has partnered with EGI and EUDAT for curriculum, student, and instructor support.
        Speaker: Rob Quick (IU (OSG))
        Slides
      • 5
        The Life Cycle of Structural Biology Data
        Research data is acquired, interpreted, published, reused, and sometimes eventually discarded. Understanding this life cycle better will help the development of appropriate infrastructural services, ones which make it easier for researchers to preserve, share, and find data. Structural biology is a discipline within the life sciences, one that investigates the molecular basis of life by discovering and interpreting the shapes and motions of macromolecules. Structural biology has a strong tradition of data sharing, expressed by the founding of the Protein Data Bank (PDB) in 1971. The culture of structural biology is therefore already in line with perspective that data from publicly funded research projects are public data. This presentation is based on the data life cycle as defined by the UK Data Archive. It identifies six stages: creating data, processing data, analysing data, preserving data, giving access to data, re-using data. For clarity, ʻpreserving dataʼ and ʻgiving access to dataʼ are discussed together. A final stage to the life cycle, ʻdiscarding dataʼ, is also discussed. The presentation concludes with recommendations for future improvements to the IT infrastructure for structural biology.
        Speaker: Chris Morris (STFC)
      • 6
        ResOps: Research DevOps across clouds
        As more and more researchers are hitting the limits of traditional computing facilities, the interest into using public and private clouds keeps growing. However, many researchers are unwilling and unable to leave behind their existing infrastructure meaning science is in a transitional period. At the moment, many research groups across Europe are struggling with the same basic questions when exploring the potential of cloud facilities: How do we move away from our traditional scheduling environment? How do we move and keep our data on the cloud? How do we run our workflows on the cloud efficiently and at low cost? After a number of successful ventures at EMBL-EBI of both new and existing workflows into the cloud, we have used the lessons learned from these processes and combined them to create an interactive hands on workshop that aims to guide researchers getting their ‘feet wet’ for the first time with clouds. The basics of cloud computing are explained thoroughly, after which the case for moving towards cloud computing is made using benchmarks and real-life experiences. The course then dives deeper into the practical application of porting an existing HTC workflow to a cloud environment, discussing hybrid cloud setups where a traditional scheduler environment is emulated on cloud resources but also outlining how to move to a fully cloud-aware environment to make optimal use of resources available. Given this theoretical underpinning, participants are then able to get hands-on experience through a set of practicals which guide them through basic infrastructure deployment with the Terraform deployment tool and subsequent configuration of this infrastructure with tools such as Puppet and Ansible. They are then guided through emulating a traditional job scheduler resulting in a hybrid environment suitable for porting a traditionally scheduled workflow. So far, four iterations of the course have been given and feedback has been overwhelmingly positive. More courses are being planned in which we hope to also reach non-bioinformatics researchers. This presentation will present the case for creating this course, give a summarized overview of the topics discussed and discuss further needs and potential additional training that is needed based on experiences and feedback from this course.
        Speaker: Erik van den Bergh (EMBL)
        Slides
      • 7
        OpenScience MOOC: Community-driven Framework for Open Research Practices
        Research practices are getting a radical makeover, in part thanks to the power of the internet and the tools it provides, and in part due to a growing demand for accountability in research (e.g., reproducibility and data provenance). In order to achieve this, global policies are emerging at different levels that include some aspect of ‘Open Research’, ‘Open Scholarship’, ‘Open Education’ or ‘Open Science’. In addition, research evaluation frameworks and models are under pressure to include more diverse outputs, behaviour, and practices that go beyond traditional methods of counting publications, citations, and patents. However, many universities and research institutes are failing to support researchers in these crucial matters, in terms of both training and providing incentives to effectively make their research more accessible, reproducible and transparent. This begs the question of whether current graduate students will be prepared to appropriately perform research to the high standards of a 21st century research environment. Open Science is about increased rigour, dissemination, accountability, reproducibility, and also long-term efficiency. It is based on principles of inclusion, fairness, and sharing, and it extends across most domains of research. However, there are numerous socio-technological barriers yet to be overcome. Although technological solutions are maturing, cultural or ideological change is frequently the bottleneck. This bottleneck could be widened by teaching new entrants into research and academia to adopt transparent research practices. Expanding the awareness about Open Science could drive changes in the scientific community to promote reproducible and transparent science. We have previously proposed the development of an Open Science Massive Open Online Course (MOOC), designed to equip students, teachers and researchers with the knowledge and the skills they need in order to participate in an Open Science environment. It is bringing together the efforts and resources of hundreds of researchers who have volunteered their time and expertise to creating a curriculum, aimed at making research more efficient and transparent. The MOOC is aimed at researchers at all levels, but especially students at the undergraduate and postgraduate level. The content of the MOOC is distilled into 10 core modules that complement existing aspects of research training programs. Each module will comprise a complete range of resources including videos, research articles, dummy datasets and code to ensure that students are equipped to perform high quality research. ‘Homework’ tasks also reinforce learning by doing. We describe the evolving context of the initiative, and the progress made so far. We focus as well on issues such as technology stack, community engagement strategies, collaboration with other initiatives curriculum development and sustainability. We propose that this activity can support the development of the European Open Science Cloud, and provides a robust and open model for developing the skills of both researcher and educator in the 21st century.
        Speaker: Bruce Becker (South African National Grid)
      • 8
        Supporting Open Science Oriented skills building by Virtual Research Environments
        According to the EC’s Open Science Skills Working Group Report “When all researchers are aware of Open Science, and are trained, supported and guided at all career stages to practice Open Science, the potential is there to fundamentally change the way research is performed and disseminated, fostering a scientific ecosystem in which research gains increased visibility, is shared more efficiently, and is performed with enhanced research integrity”. Supporting these practices can be very complex without appropriate technological support for instructors and students. The scattered nature of facilities and data of potential interest for a course or for an instructor, the willingness to cover the entire lifecycle of a research activity since open science is pervasive, the ever growing multidisciplinary nature of courses, the aim to reduce as much as possible the distance between the way scientists work, the latest research results and what the students are provided with are all factors posing requirements and challenges on how skills building courses are taking place and on the environments supporting them. Today, all the steps in preparing course supporting environments require manual work, which is repeated each time a new course is held. These steps, in fact, need the training environment to be reset and reconfigured for the next course. Available data processing services and models come under heterogeneous programming languages, which usually require installing complex software on users’ computers. Furthermore, executing models on users’ data usually demands a long phase of data preparation and powerful hardware, not always available in the institutions laboratories. Sharing data, parameters and results of experiments is difficult and collaboration between teachers and students is limited to the duration of the physical course. Furthermore, no instrument exists that currently bridges the scientific and training environments. This presentation introduces the solution implemented and experimented in the context of the BlueBRIDGE project to overcome the above issues. This solution largely exploit e-infrastructure capacity and customised Virtual Research Environments (VREs). The VREs support both (i) the instructors in easily defining and developing web-based supporting environments for their courses that allow students to execute, in parallel, data and processing intensive experiments; and (ii) the students in partaking in a course and being provided with “real life” and “state of the art” data, services and tools. Beyond the discipline specific data and tools, by exploiting the dedicated VREs instructors and students are transparently provided with automatic facilities suggesting Open Science practices in all the phases and activities occurring in a course. The presentation will provide an overview of the solution implemented and of the lessons collected during a series of training courses (http://www2.bluebridge-vres.eu/training-courses) run by a number of international organizations and academic institutions dedicated to data scientists operating in the fisheries, aquaculture, and marine biodiversity related domains.
        Speaker: Leonardo Candela (ISTI-CNR)
        Slides
    • EOSC building blocks presentations 211 & 212 (The Square, Brussels Meeting Centre)

      211 & 212

      The Square, Brussels Meeting Centre

      Convener: Paolo Manghi (Istituto di Scienza e Tecnologie dell'Informazione - CNR)
      • 9
        SeaDataCloud – further developing the pan-European SeaDataNet infrastructure for marine and ocean data management
        SeaDataCloud is a new project funded by the European Commission to considerably advance the SeaDataNet Services and increase their usage and adoption of cloud and High Performance Computing technology for better performance. The SeaDataCloud project involves a network of 56 partners across 29 countries, bordering the European seas. The Consortium is made of 41 data centres and major research institutes whose activities comprise collecting and managing marine and ocean data. The SeaDataCloud project has recently entered into a strategic and technical cooperation with the EUDAT Collaborative Data Infrastructure consortium, with which it will collaborate to build upon the state of the art in Information and Communications Technology (ICT) and e-infrastructures for data, computing and networking for the SeaDataNet community. EUDAT is participating in SeaDataCloud through five sites which joined the project as partners: CSC (Finland), CINECA (Italy), DKRZ (Germany), GRNET (Greece), and STFC (United Kingdom). These five EUDAT centres will establish a common cloud and computing environment that will be integrated with the SeaDataNet data management infrastructure to provide central caching and cloud computing facilities. EUDAT is bringing in EUDAT common services that will be adopted and, where needed, adapted for upgrading and optimizing the SeaDataNet Common Data Index (CDI), Data Discovery and Access service and for establishing the new SeaDataNet Virtual Research Environment (VRE). Acquisition of oceanographic and marine data is expensive with an annual cost in Europe estimated at 1.4 Billion Euro (1.0 = in-situ; 0.4 = satellites). Therefore, professional data management is required with agreements on standardisation, quality control protocols, archiving, catalogues, and access. Collect once; use many times! The new partnership with EUDAT’s five data centres CSC, DKRZ, STFC, GRNET and CINECA will provide a data cloud environment providing data and computing services which will advance the integration, quality and availability of data for the SeaDataNet community. All services, upgraded and / or newly established in the SeaDataCloud project, will be accessible from the SeaDataNet portal. The SeaDataNet infrastructure will be extended to leverage EUDAT capability to principally allow; i. individual data centres to replicate data onto EUDAT resources improving data archiving and conservation; ii. instrument automated quality control checks on replicated data; iii. improve data access; iv. account and monitor data usage and distribution. The aim of the partnership with EUDAT is to increase considerably the number of users and transactions by expanding and engaging potential user communities by promotion, dissemination, development and demonstration of use cases, and organising workshops, encouraging use of the online tools and services, adoption of common principles such as data sharing, and collaborative research.
        Speaker: Mr Chris Ariyo (CSC)
        Slides
      • 10
        Cloud Orchestration at Application Level
        Overview. H2020 COLA (Cloud Orchestration at the Level of Application) project is developing the MiCADO (Microservices-based Cloud-application Level Orchestrator) platform. This platform supports the dynamic application level orchestration of cloud applications on multiple heterogeneous federated clouds. It enables execution of these applications based on their specific needs, such as cost, resources, security requirements, time constraints etc. in two phases: optimised deployment (1) and run-time orchestration (2). In phase 1 application developers create a high-level description of their applications in TOSCA. This description, besides application topology, also includes various QoS (Quality of Service) parameters such as cost and performance requirements, and security policies. This description is passed on to the Coordination/Orchestration component of MiCADO. It collaborates with the Security facilitator that converts user-defined security policies into specific security solutions, and also with the Optimisation decision maker that translates cost and performance related parameters into actual deployment values. Following this, the Coordination/Orchestration component passes on the deployment instructions to MiCADO’s Deployment executor that deploys the services required to run the application on the targeted cloud infrastructure. After deploying the application in the Cloud run-time orchestration starts (phase 2). MiCADO continuously collects various metrics from the running application and passes it on to the Coordination/orchestration component. This data is analysed from performance/cost aspects by the Optimisation decision maker and from security enforcement point of view by the Security facilitator. If adjustment is required, the Deployment executor is called to scale up or down the infrastructure. Users also have the possibility to adjust any requirement, either security or performance/cost related, during runtime. If the user provides a modified description with updated QoS parameters, then it is passed on to the Coordination/Orchestration component that analyses the received data and instructs the Deployment executor to modify the infrastructure. MiCADO runs applications on a cluster to dynamically allocate and attach, or detach and release cloud resources for optimizing the resource usage. This cluster consists of two main components: Master node and Worker node. Master node is the head of the cluster performing the collection of information on microservices, the calculation of optimized resource usage, the decision making and the realization of decisions related to handling resources and to scheduling microservices. Worker nodes represent execution environments for microservices. Worker nodes are continuously allocated/released based on the dynamically changing requirements of the running microservices. Once a new worker node is allocated and attached to the cluster, the master node utilizes its resources by allocating microservices on it. Conference themes and track topic. The presentation will address “thematic building blocks to the EOSC” topic (topic 5) outlining an orchestration approach that allows running applications on federated clouds. This approach enables application specific deployment and their scaling up/down considering user requirements. Targeted audience. The presentation addresses application developers. They can describe application in TOSCA using a GUI, such as Alien4Cloud, deploy and run applications on federated clouds through the MiCADO platform.
        Speaker: Gabor Terstyanszky (University of Westminster)
        Slides
      • 11
        Hybrid Cloud: Integration challenges of Big Data Science
        Cloud computing is now everywhere, and by many heralded as the solution to most - if not all - the compute and storage needs across the board. But how true is this, especially in Science? Should we abandon on-premise datacenters and transfer years’ worth of efforts to cloud-based environments? Or should these two be integrated together to exploit the best of the two worlds? EMBL-EBI has been piloting adoption of cloud resources at many levels of its operations, ranging from shifting entire workloads into independent “compute islands”, to hybrid scenarios where cloud compute becomes an integral part of our on-premise resources and all the way down to disaster recovery. This has required to adapt - or define from scratch - policies to cope with these new scenarios, in particular considering the non-trivial issues around - quite obviously - data privacy and procurement. This also helped to define best practices in porting Science applications, which are now at the basis of our Research Operations (ResOps) training, because boarding the cloud requires new concepts and practices to take full advantage of the benefits it can offer. Most pipelines - and the infrastructures underpinning them - will require being reworked to fully unlock the benefit of this fundamentally different environment where flexibility is everything and efficiency is key to a reasonable and sustainable bill. This presentation will provide insights on the lessons we’ve learned - and taught - in our efforts to reach out for the clouds, and our experience in EU projects such as “Helix Nebula - The Science Cloud”, which will soon deliver important results on how commercial clouds can be procured, and exploited.
        Speaker: Dr Dario Vianello (EMBL)
        Slides
      • 12
        Federated digital services for Open Science in Southeast Europe and the Eastern Mediterranean
        Building a federated Open Science e-Infrastructure and related services is an imperative that will boost the research capabilities and make Europe more competitive player in the global research. Such common infrastructure should stretch and cover as wider areas as possible, both in the scientific fields supported, but also geographically. Bridging the digital divide and supporting the scientist from the less-developed regions will integrate their intellectual potential into the joint European efforts to keep the leading role in science and technology. In this presentation we will describe the innovative approach developed by the project VI-SEEM focusing on: - The integrated state-of- the-art platform, built jointly by e-Infrastructure providers and end users, consisting of computational resources (HPC, cloud, Grid), data storage and management, visualization tools and discipline-specific services, etc, provided via an integrated Virtual Research Environment. - The fact that the platform serves a large geographical area (South Eastern Europe and Eastern Mediterranean) of about 300 million inhabitants having a diversity of e-Infrastructures with the aim to enable high-caliber research in strategic areas for the region, namely the Life Sciences, Climatology and Meteorology, and Digital Cultural Heritage. Wishing to adopt a service-orientated approach, the project consortium has developed a catalogue of fully managed services, using the FitSM methodology. The services themselves have been designed and developed in an innovative collaboration process between both researches and service providers and operators. Such approach leads to targeted services that are highly usable to the scientific community, at the same time highly optimized and strongly supported by the service providers and operators. Additionally, the diversity of services (listed above) can be used as building blocks to perform advanced scientific workflows and produce relevant datasets and results, which are published back into the Virtual Research Environment. The developed service catalogue is fully compatible with other e-Infrastructure projects (EUDAT), enabling their easier integration into a single or federated service catalogue that is an essential component of EOSC. Access to the services is provided using eduGAIN compatible AAI (compatible with EGI RCIAM solution), allowing seamless access for the scientist, but also strong authentication and authorization on the service providers’ side. The integrated Virtual Research Environment is accompanied with comprehensive training material set, facilitating its easier adoption by even larger number of scientists, students, but also SMEs and other relevant actors. The scientific excellence of the platform and its supporting consortium has been recognized by the significant number of publications of directly supported by the regional e-Infrastructure and related services.
        Speaker: Anastas Mishev (on behalf of the VI-SEEM consortium)
        Slides
      • 13
        Federated engine for information exchange (Fenix)
        In contrast to experimental high-energy physics community and others, that already operate federated data infrastructures, neuroscience has to cope with a diverse set of data sources with their specific formats, modalities, spatial and temporal scales, coverage, and more (from high-resolution microscopes and magnetic-resonance data to electro-physiological data, from multi-electrode array measurements to brain simulations on HPC or neuromorphic systems) and with no fixed relationship between them. Thus, the scientific approaches and workflows of this community are a much faster moving target compared to, e.g., high-energy physics. Furthermore, there is the need for using HPC resources for processing these data. However, robust solutions do not exist currently for the federation of data services that can be readily adopted to fulfill the requirements of the neuroscience community. Fenix is based on a consortium of five European supercomputing and data centres (BSC, CEA, CINECA, CSCS, and JSC), which agreed to deploy a set of infrastructure services (IaaS) and integrated platform services (iPaaS) to allow the creation of a federated infrastructure and to facilitate access to scalable compute resources, data services, and interactive compute services. The setup of this federated data infrastructure is guided by the following considerations: - Data are brought in close proximity to the data processing resources at different compute and data infrastructure service providers in order to take advantage of high bandwidth active data repositories as well as data archival services. - Federating multiple data resources enables easy replication of data at multiple sites. This capability can be exploited to improve data resilience, data availability as well as data access performance. - Services are implemented in a cloud-like manner that is compatible with the work cultures in scientific computing and data science. Specifically, this entails developing interactive supercomputing capabilities on the extreme computing and data platforms of the participating data centres. - The level of integration is kept as low as possible in order to reduce operational dependencies between the sites (to avoid, e.g., the need for coordinated maintenance and upgrades) and to allow for the site local infrastructures to evolve following different technology roadmaps. The Fenix federated infrastructure includes as main components: - Scalable Compute Services; - Interactive Compute Services; - Active Data Repositories based on fast memory and active storage tiers; - Archival Data Repositories; and - Information/catalogue services. The major advantages of the proposed architecture are: the use case driven design (it is being co-designed with continuous analysis and consideration for scientific neuroscience use cases), the scalability of the services, the easy extensibility which allows in the future to move to new state of the art solutions or to enable workflows also for other scientific communities. The Fenix infrastructure will primarily offer resources to the neuroscience community as part of the [Human Brain Project][1] but it is meant to grow into a more generic provider. The goal of this abstract is to present the status of the infrastructure, the technological choices made so far and the future plans. [1]: https://www.humanbrainproject.eu/
        Speaker: Giuseppe Fiameni (CINECA - Consorzio Interuniversitario)
        Slides
    • Interoperability presentations 213 & 215 (The Square, Brussels Meeting Centre)

      213 & 215

      The Square, Brussels Meeting Centre

      Convener: Dr Gergely Sipos (EGI.eu)
      • 14
        Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure
        We present the joint work by Europeana (http://www.europeana.eu), a European cultural heritage (CH) infrastructure, with CLARIN (http://www.clarin.eu), a European research infrastructure, to make promptly available for research use the vast data resources that Europeana has aggregated in the past years. Europeana provides access to digitised cultural resources from a wide range of institutions all across Europe. It seeks to enable users to search and access knowledge in all the languages of Europe, either directly via its web portals, or indirectly via third-party applications leveraging its data services. The Europeana service is based on the aggregation and exploitation of (meta)data about digitised objects from very different contexts. The Europeana Network has defined the Europeana Data Model (EDM) to be used as its model for interoperability of metadata, in line with the vision of linked open vocabularies. One of the lines of action of Europeana, is to facilitate research on the digitised content of Europe’s galleries, libraries, archives and museums, with a particular emphasis on digital humanities. CLARIN (Common Language Resources and Technology Infrastructure) is a networked federation of language data repositories, service centres and centres of expertise. CLARIN aggregates metadata from resource providers (CLARIN centres and selected “external” parties), and makes the underlying resources discoverable through the Virtual Language Observatory (VLO) to provide a uniform experience and consistent workflow. The VLO can also serve as a springboard to carry out natural language processing tasks via the Language Resource Switchboard (LRS), allowing researchers to invoke tools with the selected resources directly from its user interface. The potential inclusion of many new CH resources by ‘harvesting’ metadata from Europeana, opens up new applications for CLARIN’s processing tools. CLARIN and Europeana do not share a common metadata model, and therefore a semantic and structural mapping had to be defined, and a conversion implemented on basis of this. CLARIN’s ingestion pipeline was then extended to retrieve a set of selected collections from Europeana and apply this conversion in the process. Several infrastructure components had to be adapted to accommodate the significant increase in the amount of data to be handled and stored. Currently about 775 thousand Europeana records can be found in the VLO, with several times more records expected in the foreseeable future. Currently, about 10 thousand are technically suitable for processing via the LRS. Relatively straightforward improvements to the metadata on the side of Europeana and/or its data providers could substantially increase this number. CLARIN is working with Europeana to implement such improvements. More tools are also expected to be connected to the LRS in the short to mid-term, which is also expected to lead to an increased ‘coverage’. As a next step, CLARIN can extend and refine the selection of included resources, and Europeana can adapt their data and metadata to optimally serve the research community. CLARIN’s experience and potentially part of its implementation work can be applied to integrate Europeana with other resource infrastructures.
        Speaker: Dr Maria Eskevich (CLARIN ERIC)
        Slides
      • 15
        O2A - Data Flow Framework from Sensor Observations to Archives
        The Alfred Wegener Institute coordinates German polar research and is one of the most productive polar research institutions worldwide with scientists working in both Polar Regions – a task that can only be successful with the help of excellent infrastructure and logistics. Conducting research in the Arctic and Antarctic requires research stations staffed throughout the year as the basis for expeditions and data collection. It needs research vessels, aircrafts and long-term observatories for large-scale measurements as well as sophisticated technology. In this sense, the AWI also provides this infrastructure and competence to national and international partners. To meet the challenge the AWI has been progressively developing and sustaining an e-Infrastructure for coherent discovery, visualization, dissemination and archival of scientific information and data. Most of the data originates from research activities being carried out in a wide range of sea-, air- and land-based operating research platforms. Archival and publishing in PANGAEA repository along with DOI assignment to individual datasets is a pursued end-of-line step. Within AWI, a workflow for data acquisition from vessel-mounted devices along with ingestion procedures for the raw data into the institutional archives has been well established. However, the increasing number of ocean-based stations and respective sensors along with heterogeneous project-driven requirements towards satellite communication, sensor monitoring, quality control and validation, processing algorithms, visualization and dissemination has recently lead us to build a more generic and cost-effective framework, hereafter named O2A (observations to archives). The main strengths of our framework (https://www.awi.de/en/data-flow) are the seamless flow of sensor observation to archives and the fact that it complies with internationally used OGC standards and assuring interoperability in international context (e.g. SOS/SWE, WMS, WFS, etc.). O2A comprises several extensible and exchangeable modules (e.g. controlled vocabularies and gazetteers, file type and structure validation, aggregation solutions, processing algorithms, etc.) as well as various interoperability services. We are providing integrated tools for standardized platform, device and sensor descriptions following SensorML (https://sensor.awi.de), automated near-real time and “big data” data streams supporting SOS and O&M and dashboards allowing data specialists to monitor their data streams for trends and early detection of malfunction of sensors (https://dashboard.awi.de). Also in the context of the "Helmholtz Data Federation" with outlook towards the European Open Science Cloud we are developing a cloud-based workspace providing user-friendly solutions for data storage on petabyte-scale and state-of-the-art computing solutions (Hadoop, Spark, Notebooks, rasdaman, etc.) to support scientists in collaborative data analysis and visualization activities including geo-information systems (http://maps.awi.de). Our affiliated repositories offer archival and long-term preservation as well as publication solutions for data, data products, publications, presentations and field reports (https://www.pangaea.de, https://epic.awi.de).
        Speaker: Dr Angela Schäfer (Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research)
        Slides
      • 16
        International Image Interoperability Framework @ KU Leuven (Belgium). Current applications and future projects
        There is an increased use of digital resources in humanities research and the expectations towards accessibility are high. Researchers want their material to be available faster, in higher resolution and multiple formats. They want tools for swift deep-zooming on the artworks and manuscripts, for browsing through complex objects in a heartbeat, and for comparing multiple image-based resources stored in digital repositories all over the word. In this context LIBIS, the Library information service of the University of Leuven (Belgium), implemented IIIF and the Mirador viewer for the interoperability and visualisation of the digitised manuscripts stored in the university’s long-term preservation repository. IIIF or the International Image Interoperability Framework is a community-developed framework for sharing high-resolution images in an efficient and standardized way across institutional boundaries by offering a set of shared API specifications for the interoperable functionality in digital image repositories. Using a IIIF manifest URL, a researcher can simply pull the images and related contextual information such as the structure of a complex object or document, metadata and rights information into any IIIF compliant viewer such as the Mirador viewer. Simply put, a researcher can access a digital resource from the British Library and from the KU Leuven Libraries in a single viewer for research, while allowing the institutions to exert control over the quality and context of the resources offered. KU Leuven implemented IIIF in 2015 in the framework of the idemdatabase.org project and has since been using it in a number of Digital Humanities projects with a focus on high-resolution image databases for research. By now the IIIF community has grown considerably with institutions such as the ‘The J. Paul Getty Trust’ and the ’Bibliotheca Vaticana’ implementing it as a standard and providing swift and standardized access to thousands of resources. Its potential has however not reached its limits with ongoing work on aspects such IIIF resource discovery and harvesting of manifest URLs, annotation functions and an extension to include audio-visual material. The Mirador viewer has also realised an increasing amount of interest and willingness with Digital Humanities researchers to preserve research material in the university’s digital archive and make it available for reuse. But with the increasing number of requests to use Mirador for research and digital collection showcases, it’s important to keep investing in the infrastructure to ensure speed, quality and continued innovation. New collaborative projects with humanities researchers will continue to define the development roadmap and finance the enhancements of the viewer, such as the possible addition of multi-spectral image viewing options. The presentation will introduce IIIF and its concepts, highlight KU Leuven projects and viewers, and give an overview of its current and future application options for image-based research in the Digital Humanities. More info: [libis.be/mirador-iiif-en][1] ; [iiif.io][2] ; [projectmirador.org][3] [1]: http://www.libis.be/mirador-iiif-en/ [2]: http://iiif.io [3]: http://projectmirador.org
        Speaker: Roxanne Wyns (KU Leuven - LIBIS)
        Slides
      • 17
        Toward FAIR semantic resources
        The objective of producing FAIR scientific data (Findable, Accessible, Interoperable, Reusable) is increasingly supported by the use of domain specific ontologies and thesauri. Generalizing this approach to all scientific domains and building the necessary multi-disciplinary semantic tools and services will require finding and reusing multi-disciplinary semantic resources. These resources are heterogeneous and scattered through the web either as individual entities or within various specialized repositories. The lack of discoverability, the technical and metadata heterogeneity of the semantic resources (ontologies, thesauri and specialized repositories) pose a challenge for their effective integration. In this presentation, we argue that we need to work toward building FAIR semantic resources. To achieve this objective, we are considering two main challenges: the interoperability of the semantic resources both at the metadata level and the API level and the discoverability of these resources. We will first introduce an international and multi-disciplinary collaborative effort initiated by the EUDAT Semantic Working Group and driven by the RDA Vocabulary and Semantic Service Interest Group. This collaborative effort aims at addressing the interoperability and governance of semantic resources challenges. We will then present the proof-of-concept service developed by EUDAT to support discoverability and aggregation. This service, called the Semantic Look Up service (Goldfarb and al., 2017), allows semantic providers to publish a description of their resource which will be used to index their content. This semantic index provides a central point of access to multi-disciplinary concepts for semantic tools and services (both academic and commercial) and a unique resource to analyse the resources. We will briefly discuss the potential impacts of the Semantic Look Up Service and hope to trigger discussions regarding the future of this effort during the DI4R meeting. This presentation is open to all participants.
        Speaker: Dr Yann Le Franc (e-Science Data Factory)
        Slides
      • 18
        Establishing and Extending Open Services for the Scholarly Community: The Bielefeld Academic Search Engine (BASE) use case
        Starting as a library-orientated search engine around 2004 BASE has integrated many new open services and issues upcoming in the academic information network. Very soon the application switched from ingesting database contents to collecting scientific metadata via the OAI-PMH protocol. The open access movement represented and supported by institutional repositories driven by numerous university units and community-based services as PubMed, arXiv or RePEc has brought up a solid fundament of valuable open accessible bibliographic metadata with related full texts. BASE is aiming to integrate with API based value added services, such as the persistent citation linking service Crossref and the research data registration agency DataCite, both based on the concept of the DOI. In the context of the DFG funded project ORCID-DE BASE extends its service portfolio to address author identification by connecting the ORCID with author and contributor names in repository metadata. Data normalization and curation together with linked open data strategies are used to improve the metadata quality and to enrich the metadata. Additional scientific material as digital collections with digitized material such as books, images and other source material, research data, current research information, conference proceedings, the contents of publishing services and open educational resources with a global scope. Due to local cooperation activities innovative features such as an automatic classification tool (developed by the Computer Linguistic Group of Bielefeld University) has been developed and a search environment based on virtual reality technology (in cooperation with CITEC, Cluster of Excellence Cognitive Interaction Technology of Bielefeld University) is currently explored. On the other hand BASE supports the scholarly communication infrastructure in a bi-directional approach. All the collected and enriched data are available via a search API and a metadata delivery API based on the OAI-PMH protocol which allows all interested non-commercial stakeholders to re-use the data. Around 200 partners, mainly with an academic background from all around the world, are using these interfaces. Today BASE has indexed more than 115 mill. scientific objects from more than 5800 academic resources from all around the world. The service is globally available and is a strong partner in a network of academic information services and heterogeneous partners as libraries, academies, publishers, scientific organizations, institutes and hosting services. This strategy includes the alignment with several national and international projects in the field of information management such as OpenAire, ORCID-DE and Open APC-INTACT. Built on the active role in the different scopes (especially COAR, Confederation of Open Access Repositories and DINI, German Network Initiative) many contacts and cooperation activities have evolved with different partners. This position allows to share expertise and analytic views of the global information network infrastructure. The presentation describes the implementation of an application using available resources in the open science network with the focus on including scientific publications and related objects. It will address especially researchers interested in supporting tools, librarians and technicians involved in building information retrieval applications and the re-use of related interfaces.
        Speaker: Mr Friedrich Summann (Bielefeld University Library)
        Slides
    • National initiatives Copper Room

      Copper Room

      The Square Meeting Centre

      Mont des Arts street, no. 1000 Brussel, Belgium

      The session will showcase how national digital service providers supporting research and open science, are strategically getting organized nationally to increase cross-coordination, strategy making and sustainability.
      Purpose of the session is share information and best practices about:
      - governance of national digital research infrastructures and their funding models,
      - roadmaps and national policy agendas for open science and the infrastructures serving the different components of open science

      During the session pathways to increase the availability of digital resources and services for researchers and Open Science in Europe and beyond will be discussed.

      Conveners: Pedro Principe (University of Minho), Dr Tiziana Ferrari (EGI.eu)
      • 19
        Italian Computing and Data Infrastructure (ICDI): status and next steps
        Speaker: Donatella Castelli (Consiglio Nazionele delle Ricerche (CNR) - ISTI)
        Slides
      • 20
        The Dutch National e-Infrastructure for Research: Current status and challenges
        ABSTRACT The Dutch National e-Infrastructure has been in flux over the last couple of years. With the addition of SURFsara to SURF, the Dutch Open Science ambitions, and changing responsibilities towards national stakeholders, the role of SURF and its partners has changed. New outreach methodologies have been adopted and greater focus has been put on collaboration, which include collaborations with national funders, research projects and scientific institutions. This talk will highlight some of the trade-offs that have been made and which results they have yielded. ABOUT THE SPEAKER Jan Bot is a community manager at SURFsara. He leads the Support4research project in which SURF together with national partners tries to tailor the national services to the needs of the Dutch research community. He is a member of the outreach team of EUDAT, works as the Dutch liaison for EGI and leads the "EOSC Service Portfolio" task (WP5.2) in EOSCpilot. Jan has a background in bioinformatics and has been working on outreach and support for the last 9 years.
        Speaker: Jan Bot (SURFsara BV)
        Slides
      • 21
        Portuguese roadmap for research infrastructures and the Open Science national policy initiative
        Speakers: João Mendes Moreira, Paulo Soares
        Slides
      • 22
        National approach to open science in Slovenia
        Speaker: Mojca Kotar (University of Ljubljana)
        Slides
      • 23
        Discussion
    • Your gateway for European e-infrastructures 201 A/B

      201 A/B

      The Square Meeting Centre

      Mont des Arts street, no. 1000 Brussel, Belgium

      eInfraCentral (EIC) is a coordination and support action funded by the European Union’s Horizon 2020 research and innovation programme. Its mission is to ensure that by 2020 a broader and more varied set of users discovers and accesses the existing and developing e-infrastructure capacity. The underlying idea is to create a marketplace. For that, eInfraCentral has engaged in an open discussion with e-infrastructures to define a common service catalogue for their services. A beta version of the eInfraCentral Portal has been created, the gateway for users to browse the service catalogue with all the functionality that has been defined via a survey, desk research on reference marketplaces, and expert advice from European e-Infrastructure flagship initiatives. The next step is to further align the services on offer and test the portal with potential users. This session at DI4R aims to substantially contribute to this process.

      The uptake of e-infrastructures by a wider set of stakeholders has been slow primarily due to issues of fragmentation of service offerings, lack of service discoverability, comprehensibility and clarity, as well as the inconsistent use of performance indicators for assessing added value and impact across different service providers at national and international level. The service harmonisation and uniform representation of e-Infrastructures have been key driving elements in the development of eInfraCentral. In order to achieve this, two main challenges need to be addressed. First, a common service catalogue requires alignment of various different e-infrastructure service offerings along a commonly agreed service description. Such an approach to defining and monitoring e-infrastructures services will increase their uptake and enhance understanding of where improvement can be made in delivering services. Secondly, for future updates of the service catalogue and related performance indicators, automatic data harvesting/exchange need to be ensured in a manner interoperable with existing service providers’ practices and data repositories.

      The session will include an introductory word from the European Commission DG CNECT Unit C1 overseeing this initiative and explaining their vision for eInfraCentral. It will then proceed with presentations on the process of service alignment and a demo of the portal. The key objective of the session is to have an active discussion/interaction with the audience. eInfraCentral will prepare a list of questions to kick-start the discussion and will encourage the RIs, VREs and users to bring to the table what they want to see “in”.

      This session is targeted at a) e-infrastructure services providers (pan-European, regional and national, monothematic or poly-thematic, uni-disciplinary or multi-disciplinary, etc.), b) virtual research environments (VREs), c) potential users of the e-infrastructure services.

      Convener: Jelena Angelis (European Future Innovation System (EFIS) Centre)
      • 24
        Why EIC is important?
        Speaker: Mr Enrique Gomez (European Commission, DG Connect)
      • 25
        The EIC vision
        Speaker: Jelena Angelis (European Future Innovation System (EFIS) Centre)
        Slides
      • 26
        Creating a Service Catalogue 201 A/B

        201 A/B

        The Square Meeting Centre

        Mont des Arts street, no. 1000 Brussel, Belgium
        Speaker: Jorge-A. Sanchez-P. (European eInfrastructures Observatory)
        Slides
      • 27
        Online eInfraCentral gateway in action
        Speaker: GEORGE PAPASTEFANATOS (University of Athens)
        Slides
      • 28
        Endorsement from service providers
        - EUDAT – Rob Baxter - EGI – Sergio Andreozzi - OpenAIRE – Natalia Manola
        Speakers: Natalia Manola (University of Athens, Greece), Rob Baxter (University of Edinburgh), Sergio Andreozzi (EGI.eu)
        Slides
      • 29
        Discussion
        Speaker: Jelena Angelis (European Future Innovation System (EFIS) Centre)
      • 30
        Call for actions 201 A/B

        201 A/B

        The Square Meeting Centre

        - TALK TO US! Benefits and use of eInfraCentral - JOIN US! Have a bilateral discussion during DI4R about how to include your service on our Portal - VISIT US! Catch up with us at our poster - FOLLOW US! Sign-up for our newsletter to follow eInfraCentral
        Speaker: Jelena Angelis (European Future Innovation System (EFIS) Centre)
        Slides
        • a) Call for actions
          • TALK TO US! Benefits and use of eInfraCentral. • JOIN US! Have a bilateral discussion during DI4R about how to include your service on our Portal. • VISIT US! Catch up with us at our poster. • FOLLOW US! Sign-up for our newsletter to follow eInfraCentral.
    • 12:30
      Lunch break The Square, Brussels Meeting Centre

      The Square, Brussels Meeting Centre

    • AAI for Researchers 213 & 215 (The Square, Brussels Meeting Centre)

      213 & 215

      The Square, Brussels Meeting Centre

      Over the last years there have been significant developments in the way authentication and authorisation is handled by large research collaborations. Everybody recognises the importance of a secure and reliable infrastructure to manage users and groups they belong to, to reduce the number of credentials users need and consequently the services they can access to.

      Now more than ever, however researchers feel they should be able to log in once and access as many resources and services as needed regardless of the research or e-infrastructures that offers them. This approach poses new challenges on the r/e-infrastructures service providers.

      In this interactive session we would like to both present collaborative approaches as well as get feedback - from the communities beyond those already engaged in AARC and research federations - on the suitability of the approach.

      Inspired by the research and infrastructure needs of global collaboration, best practices and architectural models for collaborations have been developed that allow infrastructures to build their own AAI without reinventing the wheel.

      We will present this blueprint and the "proxy" concept, and also show (asking research infrastructures to report on their experience) how it's applied to production. Operational and sustainability aspects will be addressed jointly with the AEGIS Infrastructures implementing this blueprint today.

      We will ask participants to provide real time inputs using online tools and by organising round tables during the session.

      Convener: Licia Florio (GÉANT)
      • 31
        Introduction
        Speaker: Licia Florio (GÉANT)
        Slides
      • 32
        Setting the scenes AAI research requirements - inputs from the audience
        Speaker: Licia Florio (GÉANT)
      • 33
        Intro to the AARC blueprint architecture
        Speaker: Mr Christos Kanellopoulos (GÉANT)
      • 34
        AARC blueprint Guidelines
        Speaker: Mr Nicolas Liampotis (GRNET)
        Slides
      • 35
        AARC Policy frameworks
        Speaker: David Groep (NIKHEF)
        Slides
      • 36
        Working together: mapping the attendees requirements to the AARC BPA
      • 37
        Pilots in AARC: why and what is happening
        Speaker: Mr Arnout Terpstra (SURFnet)
        Slides
    • Cross e-infrastructure of training/technical support 211 & 212 (The Square, Brussels Meeting Centre)

      211 & 212

      The Square, Brussels Meeting Centre

      The promotion of cutting-edge solutions for networking, advanced computing, management of big data, trust and identity, open scholarship is paramount to leverage existing investments, avoid duplication and ultimately increase sustainability by supporting a larger number of researchers.

      However, various challenges are being faced, like the ability to reach out to an increasing number of researchers and innovators, aggregate demand and the offer from multiple providers. This can be a demanding activity, especially in the case of small and highly distributed research teams. Fortunately, EOSC provides an opportunity for more coordination and integration of outreach activities currently conducted in isolation. This interactive session will discuss the different engagement activities & strategies of organisations involved in supporting use of national and European e-Infrastructures/Research Infrastructures. The session will feature presentations and discussions highlighting national and European opportunities of coordination and collaboration.

      Participants will have the opportunity to provide input to the Cooperation Agreement that will involve e-Infrastructures and in particular the future H2020 EINFRA-12 projects EOSC-hub and OpenAIRE-Advance involving EGI, EUDAT, INDIGO and OpenAIRE.

      This session addresses Research (e-)Infrastructure managers, research collaboration managers and digital infrastructure providers.

      Convener: najla rettberg (university of Goettingen)
      slides
    • EOSC building block presentations Copper Room (The Square, Brussels Meeting Centre)

      Copper Room

      The Square, Brussels Meeting Centre

      Convener: Giuseppe Fiameni (CINECA - Consorzio Interuniversitario)
      • 43
        e-Infrastructure for the Multi-Scale Complex Genomics Virtual Research Environment
        3D/4D genomics is one of the next great challenges for biology and biomedicine. While major milestones have been achieved in sequencing, imaging and computation, still understanding of the 3D folding of the chromatin fiber, its role in fundamental cellular processes, and connection with pathology remains a huge challenge. Genomics projects, together with astrophysics, are among the major generators of Big Data, thus being in need for the kind of solutions developed by the MuG VRE. The particularity in managing 3D/4D genomics data lies in the diversity of data formats generated and analysis methods due to the continued advent of new experimental techniques as well as the multi-resolution problem involved in integrated navigation in data that range from sequence to 3D/4D chromatin dynamics. The successful implementation and uptake of MuG VRE solutions is sure to serve as an example for other research communities that may face a high multidisciplinary component and that need to handle very diverse data. Multiscale Genomics (MuG) Virtual Research Environment (VRE) is developing a cloud-based computational infrastructure to support the deployment of software tools addressing the various levels of analysis of the genome. Integrated tools tackle needs that range from high computationally demanding applications (e.g. molecular dynamics simulations) to the analysis of NGS or Hi-C data, where stress is on data management and high throughput data analysis. The development of such infrastructure includes the building of unified data management procedures, and distributed execution to minimize data transmission and ensure sustainability. The present MuG Infrastructure is based two main cloud systems (Institute for research in Biomedicine, IRB and Barcelona Supercomputing Center BSC), with a satellite installation at EBI’s Embassy cloud. The infrastructure is based in openNebula and openStack cloud management systems, and has developed specific interfaces for users and developers. Interoperability of the tools included in the infrastructure is maintained through a rich set of metadata for both tools and data, that allow the system to associated tools and data in a transparent manner. Two alternatives for execution scheduling are provided, a traditional queueing system to handle demand peaks in applications of fixed needs, and an elastic and multi-scale programming model (PyCOMPs, controlled by the PMES scheduler), for complex workflows requiring distributed or multi-scale executions schemes. The first release of the infrastructure will be presented in November 2017 to the 3D/4D research community.
        Speaker: Dr Josep Ll. Gelpi (Barcelona Supercomputing Center (BSC), Barcelona, Spain. Dept. of Biochemistry and Molecular Biomedicine, University of Barcelona, Barcelona, Spain)
        Slides
      • 44
        Monitoring and exploring virtual digital infrastructures with perfSONAR
        A perfSONAR monitoring infrastructure described in this presentation can be used in virtual networks environment, in order to add performance monitoring and troubleshooting capabilities across a multi-domain infrastructure. Such a feature can be used for testing any connectivity realised via a Virtual Private Network (VPN), either within a domain, or in a multi-domain environment. The GÉANT Multi-domain Virtual Private Network (MD-VPN) is providing an international network service, enabling scientists all over Europe to collaborate via a common private network infrastructure. This service is delivered jointly with NRENs up to the end user. MD-VPN offers an end-to-end service suited for international projects facing the challenge of interconnecting distributed resources. In such a multi-domain environment, provisioning and infrastructure monitoring requires a performance measurement tool that helps to identify network issues and troubleshoot them effectively. In addition, such a tool can enable an end-user to validate the connectivity experience he or she is facing, with the measured results of connectivity parameters such as bandwidth, jitter or latency seamlessly over IPv4 or IPv6 network. perfSONAR Toolkit supports network monitoring in a federated, as well as a single-domain environment to carry out tests to determine performance metrics between various networks and to support problems troubleshooting. The latest extensions to perfSONAR bring support for Linux namespaces enabling researchers to allow separate measurement specification per individual service instance. Linux namespaces provide separation awareness and address overlapping capabilities. This presentation shows how perfSONAR latest functions can be used to add monitoring capabilities to support VPN-based research infrastructures. Our presentation addresses the challenge of monitoring VPN-based infrastructures as building blocks of the European Open Science Cloud and the European Data Infrastructure. MD-VPN service can be used for connectivity between geographically distributed resources including HPC centers and research communities. With this in mind, exploring how perfSONAR monitoring is applied to such network service addresses the theme of solutions to support secure and efficient researchers collaboration amongst EOSC building blocks. Open Science requires highly available and resilient infrastructure that is usually federated and distributed in several countries and organisations. The bigger is the complexity of the infrastructure, the stronger is the need for proper, comprehensive, transparent and reliable network monitoring solution. By leveraging the multi-domain, single-domain and federated environment, providing vendor-agnostic performance measurements end-to-end, perfSONAR also enables and supports the interoperability of individual elements of the complex infrastructure, federated services or policies for research in Europe and beyond. Intended audience is researchers who would like to learn how to efficiently monitor research infrastructures, researchers who would like to learn how they could integrate their own network monitoring tools and research developments with perfSONAR, as well as service operators who seek for practical implementation of the tools for verification and monitoring multi-domain services.
        Speaker: Mr Szymon Trocha (Poznan Supercomputing and Networking Center)
        Slides
      • 45
        A marine virtual research environment providing open data services in support of marine biodiversity and ecosystem research
        The European Open Science Cloud has the ambition to provide open and seamless services to analyse and re-use research data. For the marine biology domain such services and supporting data systems have been developed in the framework of several projects over the past five to ten years. LifeWatch marine has taken the initiative to bring these data services together into a Marine Virtual Research Environment (http://marine.lifewatch.eu). The thematic scope of the provided tools and data include biodiversity observation, omics, taxonomy, trait, geographical and environmental reference information. Contributions are based on communities gathered in the framework of LifeWatch, EMBRC, EMODNET biology, EMBRC, AssemblePlus, MarBEF, MicroB3, BioVEL, VIBRANT and other related initiatives. The common framework stimulates further developments towards more advanced and seamless integration of the services and tools. In addition to the available analytical interfaces, the provided open data services allow users to build their own applications on top. REST, SOAP and OGC compliant web services make the data accessible in a standardized way allowing the development of workflows and applications in a range of platforms: PHP web pages, R and Phyton scripts, workflow tools like Taverna, Galaxy, etc. Performance and availability could be further increased by virtual multiplication and load balancing over virtual servers in a cloud environment. Initial steps in that direction have been taken in collaboration with EGI as part of the LifeWatch Competence Center.
        Speaker: Klaas Deneudt (VLIZ)
      • 46
        BRUSELAS: A HPC based software architecture for drug discovery on large molecular databases
        **1. Overview** In the context of computer-aided drug discovery (CADD), virtual screening (VS) is a collection of in-silico techniques to filter large molecular databases searching for bioactive compounds. Such techniques, together with the development of high performance computing (HPC) infrastructures, allow the access to a huge number of compounds in a short time and at a low cost.1 In the last years, a diversity of VS web servers and HPC platforms has been developed.2 However, there is still a gap to be covered: the need of an architecture capable of scalably integrating a large number of data sources and extract a drug candidate list that fits the user needs. Aiming at filling this gap, we have developed BRUSELAS (Balanced Rapid and Unrestricted Server for Extensive Ligand-Aimed Screening) which is a software architecture to perform 3D similarity searches on large datasets of compounds using HPC techniques. BRUSELAS exhibits a modular design capable of importing data coming from several sources containing very diverse contents. It is accessible free of cost at http://bio-hpc.ucam.edu/Bruselas, and its development is based on the experience acquired in previously awarded DECI / PRACE projects. **2. BRUSELAS and Big Data** A recurrent question is to what extent molecular databases represent Big Data resources. In scientific literature, they are usually considered as Big Data because they accomplish the 5Vs rule in the following terms:3 1. Volume - represented by the number of compounds in the existing databases. 2. Variety - determined by the collection of entities and properties of very different nature stored in such databases. 3. Velocity - given by the time employed to screen datasets. 4. Veracity - validated by comparing the predicted results with experimental ones. 5. Value - given by the success of the process. In cases where similarity algorithms handle flexibility, a set of conformers representing diverse poses are generated for each compound. In such situations, the volume is given by the total number of conformers to screen, which usually is quite larger than the initial amount of compounds. In addition, the comparison of millions of conformers is a very slow process which can be significantly accelerated by using HPC resources. **3. Intended audience** VS is a topic of growing interest that is closely related to bioinformatics, biochemistry, HPC and Big Data. Furthermore, it is of potential application in many fields such as theoretical and experimental chemistry as well as in more applied biology and medicine areas. **References** 1. Bajorath J. (2002) Integration of virtual and high-throughput screening. Nat. Rev. Drug. Discov. 1(11):882–894 2. Pérez-Sánchez H., Rezaei V., Mezhuyev V., Man D., Peña-García J., Den-Haan H., Gesing S. (2016) Developing science gateways for drug discovery in a grid environment. Springerplus. 5(1):1300 3. Laney D. (2001) 3D Data management: controlling data volume, velocity and variety. Appl. Deliv. Strateg. Internet(February 2001):1–4
        Speaker: Antonio-Jesus BANEGAS-LUNA (Universidad Católica San Antonio de Murcia)
        Diapositivas
      • 47
        HUBzero platform as a brick for Open Science construction
        HUBzero is an open source software platform for building powerful websites that host analytical and collaboration tools, data and digital infrastructure resources to build communities in a single web-based ecosystem. The platform originates from the 1990s, being developed by researchers at Purdue University in conjunction with the Network for Computational Nanotechnology to support nanoHUB.org. HUBzero is now used across a large variety of disciplines, including Earth and Environmental Sciences, Engineering, and Healthcare. Today the platform is the basis for 40+ hubs worldwide with more than 1,5 million unique visitors in the past 12 months. The HUBzero platform includes a powerful content management system built to support scientific activities and link through digital infrastructure services. Users on a hub can write blog entries, participate in discussion groups, work together in projects, publish datasets and computational tools with Digital Object Identifiers (DOIs), and make these publications available for others to use as live, interactive digital resources. Simulation, modeling, and data analysis tools published on a hub can be accessed with the click of a button, running transparently on cloud computing resources, campus clusters, and other national and international high-performance computing (federated) facilities. The platform also provides support for legacy applications in various environments (Windows, Jupyter, R and Linux), as well as appropriate security and control for large collaborations. HUBzero can be fully customized to the particular needs of a community hub. For all these features, HUBzero can provide an important brick for the construction of community-targeted virtual research environments or science gateways that facilitate the exploitation and combination of (existing) digital infrastructure services. More importantly, the collaboration and publication facilities offered by the HUBzero platform can promote and foster the engagement of researchers and society for the construction of an Open Science culture and ecosystem. Finally, the HUBzero foundation currently has 16 members, serving as a vehicle to provide support for the development and hosting of hubs through various services. After several years supporting hubs all over the globe with millions of visitors, HUBzero has become a sustainable part of the ecosystem of solutions for modern science. The presentation will highlight relevant HUBzero features for Open Science through a couple of examples of successful hubs. A discussion will be conducted with the audience about the role that this platform could take in the construction of the European Open Science Cloud and the European Data Infrastructure. The target audience for this presentation and discussion are professionals interested in the construction or utilization of virtual research environments and science gateways, including research and higher education digital infrastructure innovators.
        Speaker: Silvia Olabarriaga (University of Amsterdam)
        notes
        Slides
    • Pioneering EOSC thematic domains: the Blue Growth case and beyond 201 A/B

      201 A/B

      The Square Meeting Centre

      Mont des Arts street, no. 1000 Brussel, Belgium

      The EOSC will federate existing and emerging horizontal and thematic data infrastructures by providing over 1.7m EU researchers with an environment with free, open services for data storage, management, analysis and re-use across disciplines. It will also promote co-ordination and progressive integration into the EOSC of open data infrastructures and services developed under initiatives focused on specific thematic areas such as Blue Growth, food, health, etc. to accelerate the ongoing transition to a more Open Science and Open Innovation model for research, stimulate intra-and
      interdisciplinary research, and increase the impact of research investments and infrastructures.

      This session will provide an overview of how a set of pioneering initiatives are already putting into practice the vision of the thematic EOSC in different domains.
      It will address aspects of federation, networking and coordination of RIs for the purpose of improving the services provided to research communities and increasing cooperation, sharing and reusability across them.

      The session starts from the Blue Growth sector highlighting best practices that can contribute to the EOSC ecosystem. This sector is characterised by the need for a better understanding and prediction of natural phenomena and the impact of human activities on ocean ecosystems, their resilience and effect on climate, including how and why the oceans and its resources are changing. As highlighted by the report of the G7 “Future of the Sea and Oceans Working group” the improvement of global
      data sharing infrastructures is instrumental to achieve this objective.
      BlueBRIDGE, Building research environments fostering Innovation, Decision making, Governance and Education for Blue Growth, and SeaDataCloud, the follow-up of SeaDataNet, are both working in this direction. They are building applications, the so called VREs, that exploit existing e-infras, respectively EGI and EUDAT, existing data sharing activities (such as EMODnet), and leverage on relevant results of past and ongoing global, national and EU projects. Both initiatives are addressing the increased complexity of data sharing and analysis as well as reproducibility within the Blue
      Growth, as well as the sharp growth in data volumes.

      During the session, through the presentation of concrete use cases, the two initiatives will showcase their early results and will explain how they are benefitting from such horizontal e-infrastructures and what are the challenges that they are facing. The presentations will stimulate the dialogue with the audience and will set the scene for an interactive debate where representatives from parallel initiatives operating in other domains, namely the food and the environmental sector, will be invited to contribute.

      The debate will be the opportunity to investigate the following topics:
      1. What should be the principles governing the thematic EOSCs?
      2. What might be the challenges in implementing them?
      3. How can a coherent development among the thematic clouds and between them and more general EOSC related initiatives be assured?

      The session will also present a great opportunity for the projects to identify collaboration opportunities, to understand if they can share resources to improve their services and to identify potential overlaps.

      Agenda:

      14:00 - 14:15 The Blue Cloud and the Food Cloud - Agostino Inguscio, European Commission, Marine Resources Unit of the Bioeconomy Directorate of DG Research & Innovation & Wim Haentjens, European Commission, Directorate-General Research & Innovation – Agri-food unit
      14:15 - 14:25 The BlueBRIDGE Project - Pasquale Pagano, CNR-ISTI, BlueBRIDGE Technical Coordinator
      14:25 - 14:35 SeaDataCloud - Chris Ariyo, CSC

      14:35 - 15:30 Panel discussion
      Donatella Castelli, CNR-ISTI, BlueBRIDGE Project Coordinator
      Chris Ariyo, CSC, SeaDataCloud
      Francisco Hernandez, VLIZ, EMODnet & Lifewatch Marine
      Odile Hologne, INRA, eRosa project
      Brian Matthews, STFC, EOSCpilot

    • Shortening the long tail of science 214 & 216 (The Square, Brussels Meeting Centre)

      214 & 216

      The Square, Brussels Meeting Centre

      The session will discuss the needs of the of the Digital Humanities, Language studies and Cultural Heritage (DH+L+CH) research communities about the EOSC, and how these needs require to be addressed. It will also highlight how the PARTHENOS cluster project is working to provide common solutions to Research Infrastructures in this wide domain.
      Addressing standards, fostering interoperability and findability with a common data model and supporting research with tools and training will pave the way towards full-fledged participation in EOSC of the researchers in this area.
      The research community needs and the solutions achieved so far will be presented in live demonstrations, to be interactively discussed with attendees who are welcome to propose their own problems and verify if the PARTHENOS or other solutions are suitable to address them. Thus the approach will not be a sequence of conference-like theoretical lectures, it will instead consider and interactively discuss practical cases with the public, checking how the PARTHENOS solution fits them.

      TARGET AUDIENCE
      All researchers, especially those labelled as belonging to the "Long tail of science"

      EXPECTED IMPACT
      Improve awareness, provide feedback and discuss solutions

      STRUCTURE
      Participants from the audience are invited to register for two minutes’ statements or for questions. This may be done contacting in advance the moderator (franco.niccolucci@gmail.com) or during the session, if time allows.

      Presentations will include:
      1. Introduction to the session – Franco Niccolucci, PARTHENOS Project Coordinator
      2. The PARTHENOS Joint Data Model (JDM) and catalogue – George Bruseker (FORTH). How the JDM can support cross-discipline interoperability, discoverability and access 

      3. Mapping data models to the JDM – Alessia Bardi (CNR) How to advance towards integration
      4. The PARTHENOS standards survival kit (SSK) – Dorian Seillier (INRIA) Standardization for beginners as the 
foundation of every interoperability effort 

      5. The Data Management Plan made simple – Hella Hollander (KNAW-DANS) No-hassle fulfilling an obligation towards the research community and funding agencies
      6. First steps towards the EOSC – Achille Felicetti (PIN) A success story of moving tools to a cloud environment to use and reuse data 

      7. RIs and e-infrastructures: united we stand, divided we fail – Parallel questions to Franco Niccolucci (PARTHENOS) and Daan Broeder (EUDAT) A mandatory but difficult dialogue between the two pillars of EOSC 


      Convener
      Franco Niccolucci, PARTHENOS coordinator

    • 15:30
      Coffee Break The Square, Brussels Meeting Centre

      The Square, Brussels Meeting Centre

    • EDI capabilities, architecture and complementarity with EOSC Copper Room (The Square, Brussels Meeting Centre)

      Copper Room

      The Square, Brussels Meeting Centre

      This session will showcase the capabilities that EDI will need to deliver to enable open science and international research projects, and will provide an opportunity to the audience to discuss how EDI can integrate with complementary e-Infrastructures, Research Infrastructures and the EOSC in general.
      In particular, during the session we will discuss how EDI needs to work together with EOSC in a co-ordinated way towards the user, and what interoperability will be needed to meet the EDI and EOSC vision.

      Conveners: Ms Annabel Grant (GEANT), Florian Berberich (JUELICH)
      • 48
        Introduction to EDI
        Speaker: Leonardo Flores (European Commission)
        Slides
      • 49
        Moving towards EDI. Challenges and next steps
        Speaker: Serge Bogaerts (PRACE AISBL)
        Slides
      • 50
        Moving towards EDI
        Speaker: Erik Huizer (GEANT)
        Slides
      • 51
        Roundtable: how EDI will support EOSC?
    • European Data Science Competence Framework 211 & 212 (The Square, Brussels Meeting Centre)

      211 & 212

      The Square, Brussels Meeting Centre

      A series of stakeholder discussions have taken place over the last two years coordinated as part of the recently completed EDISON project. These events have helped to develop a consensus view on importance and complexity of data-related skills, competences and roles to support data dependent science and businesses. We propose to continue this conversation with a focus on capturing the commonalities and differences between the various research infrastructures and their related scientific domains.

      Session aims

      The session will bring together practitioners, educators and RI managers to discuss how the Data Science skills can be addressed in education, professional and workplace training, what framework models and approaches need to be developed to create sustainable critical competences and skills management for European research and industry.
      The session will provide brief introduction on the EDISON Data Science Framework (EDSF) developed in the EU funded EDISON project and currently published as Release 2 under CC BY Open Source license, proposed roadmap and actions plan to create sustainable skills management and capacity building for EOSC and new Skills Agenda for Europe in general. This will be complemented by short/lightning talks from early implemented and adopter of data science education and training including universities, RIs, industry and governmental organisations. The second part of the session will host a panel of experts, practitioners and policy makers to discuss key actions to address Open Science and EOSC priorities in responding to demand of new skills critical for increasing efficiency and competitiveness of European research and industry.

      EDISON Data Science Framework

      The EDISON Data Science Framework (EDSF) includes four main components: Data Science Competence Framework (CF-DS), Data Science Body of Knowledge (DS-BoK), Data Science Model Curriculum (MC-DS), Data Science Professional profiles and occupations taxonomy (DSPP), and provide a conceptual framework and a model for building sustainable Data Science Educational Environment addressing needs of different stakeholders and professional groups and industries. The EDSF has been developed with wide participation and contribution from the European academia, research and industry and open for wide use and future development under CC BY Open Source license.

      Convener: Yuri Demchenko (University of Amsterdam)
      • 52
        Session introduction
        Speaker: Yuri Demchenko (University of Amsterdam)
      • 53
        EDISON Data Science Framework (EDSF) overview and possibilities for customised curriculum design
        Slides
      • 54
        Competency frameworks and training programmes for managers and operators of research infrastructure
        Speaker: Dr Cath Brooksbank (EBI-UK, RITrain/CORBEL/BioExcel)
        Slides
      • 55
        EOSCpilot Skills Framework: linking service capabilities to data stewardship competences for professional skills development
        Speaker: Angus Whyte (UE)
        Slides
      • 56
        Panel: On the way to develop a European Data Science Framework to address critical Data Competence and Skills for European Open Science Cloud and industry
        Panel of experts and representatives from RI, H2020 projects and industry (experts and practitioners, including academia-industry cooperation) Moderator: Steve Brewer
        Slides
    • Massively distributed scientific corpora

      This session aims to stimulate the debate upon perspectives for the European Digital Infrastructures to foster knowledge-building through massively distributed scientific corpora. Several aspects will be tackled, in particular, the advantage of deploying Trusted Digital Repositories (eTDR) in order to offer high quality data curation services.

      A panel discussion will aim to raise open issues such as standards to adopt in building eTDR. Different approaches are possible (ISO standards, DSA/WDS, ...).

      Convener: Daan Broeder (Meertens Institute)
      • 57
        Introduction to the session
        Keys of success for the future European Open-Science Cloud will depend on many factors such as efficient service interoperability as well as availibility of solutions to facilitate scientific collaboration. Though several recent success stories, we can foresee the potential of a collaborative approach between eInfrastructure for accelerating knowledge-building. Although, their need are very different, several communities have greatly benefited from eInfrastructure to foster their collaboration. This session is suitable for people interested in both interoperability between services from different infrastructures and initiatives at different levels (areas 1) and topics on the development and provisioning of services and solutions needed to enable researchers to collaborate and share resources in a federated environment (area 5).
        Speaker: Dr Johannes Reetz (Garching Computing Centre of the Max Planck Society / MPI for Plasma Physics)
      • 58
        New perspectives for knowledge-building through massively distributed scientific corpora
        Speaker: Daan Broeder (Meertens Institute)
      • 59
        The European Network for Earth System modelling case study
        ENES community found many benefit to replicate its data in datacenter close to HPC resource. Also, an experiment has been lead with CMIP data stored onto B2SAFE (EUDAT) service to be processed on a virual HPC resource provided by EGI.
        Speaker: Mr Xavier Pivan (cerfacs)
      • 60
        Preservation of natural and cultural heritage: Herbadrop & Europeana case studies
        Herbadrop & Europeana communities have archived their collections of natural and cultural digital heritage onto a Trusted Digital Repository for long term archival on EUDAT CDI using B2SAFE. This repository offers a suitable Data Management Plan, including policies for managing the data life cycle. Herbadrop images have been processed using OCR techniques on HPC resources. Metadata have been crowled from the GBIF index in order to be made available using B2FIND service.
        Speaker: Dugenie Pascal (CINES)
        Slides
      • 61
        EOSC Pilot: DPHEP
        The EOSC Pilot DPHEP, should demonstrate the faisibility to ingest very large archival packages uded in high-energy physics into a dual node infrastructure located in CINES (France) and CINECA (Italy). Transfer will require specific collaborations with National (Renater, Garr) and pan-european networks (GEANT).
        Speaker: Giuseppe Fiameni (CINECA - Consorzio Interuniversitario)
        Slides
      • 62
        Open discussion
        This discussion aims to raise open issues to adopt in building eTDR. Different approaches are possible.
    • Security, trust and identity management 213 & 215 (The Square, Brussels Meeting Centre)

      213 & 215

      The Square, Brussels Meeting Centre

      Security, trust and identity management is especially required by scientific communities that need to process and manage sensitive data. We hope to bring together experts with complementary skills and viewpoints from scientific, e-Infrastructure and management perspectives in the session, and discuss the solutions and gaps that exist.

      The session sets the present scene by describing scientific requirements and solutions emerging from the life sciences which has progressed in the field due to pressing demand to analyse sensitive human data. A range of technologies are already being more offered and developed by European e-Infrastructures to meet some these challenges. For example, operational security for the European Open Science Cloud and federated authentication and authorisation services for scientists to enable scientific collaborations at scale.

      Convener: Tommi Nyronen (CSC)
      • 63
        Authentication and Authorisation Service for ELIXIR Research Infrastructure
        ELIXIR is the European research infrastructure for biological data. ELIXIR AAI (authentication and authorisation infrastructure) is the ELIXIR service portfolio for authenticating researchers for the ELIXIR services and assisting the services in deciding what the users are permitted to do in the service. ELIXIR AAI is part of the ELIXIR Compute Platform that was established in 2015 to build distributed cloud, compute, storage and access services for the life-science research community. ELIXIR AAI enables the researcher to use their home organisation logins, enhanced by a multi-factor authentication, to access the ELIXIR services. ELIXIR AAI augments the user identities with extra roles, group memberships and dataset permissions which are useful for managing access rights in the relying services. ELIXIR makes it possible for a researcher to have a single login to all services. The service providers can outsource their access management to the ELIXIR AAI, enabling them to focus on provisioning of the service. Centralising the AAI service to the research infrastructure allows the development of a more advanced common AAI service for the whole ELIXIR community with less money. For researchers, this mean for instance shorter lead times to access the services and to start the actual research work. ELIXIR AAI is developed and operated by the Czech and Finnish ELIXIR nodes and became operational in November 2016. It is based on open source components. By October 2017 there were 987 ELIXIR users from 359 universities or research institutions, belonging to 101 groups and making use of the 61 production or test services that rely on the ELIXIR AAI. The services range from simple collaborative services (like intranet or ticketing) to scientific workflows (like Metapipe for marine metagenomics) and private (community) or public (commercial) clouds. ELIXIR AAI has been developed in connection with the AARC/AARC2 project and e-infrastructures and is an implementation of the AARC blueprint architecture. The ELIXIR AAI together with other BMS AAIs have inspired the work towards a common Life Science AAI for the Life Science ESFRI domain and a related pilot is starting in AARC2. **Overview of the proposed presentation** This presentation describes the requirements and design of the ELIXIR AAI and how it is used on some of the ELIXIR services. Although the ELIXIR AAI is developed to serve the ELIXIR community, many of the components have been deployed also in other Life Science research infrastructures and are applicable in other ESFRI domains. The presentation will also discuss the need to have wider cross-infrastructure co-operation for deployment and operations of AAI services. **How does your submission address the conference themes of the conference?** The presentation fits well the topic area 4 on security, especially “Trust and identity use-cases and how they are addressing them”. **Who is the intended audience for your presentation/session?** The intended audience are the developers of access management for research or e-infrastructures and the decision makers who design the future digital infrastructures (like the EOSC) and decide what services they will offer.
        Speaker: Mikael Linden (CSC)
        Slides
      • 64
        User authentication, authorization, and identity management for services in structural biology
        The West-Life Virtual Research Environment provides services for computation and data management to researchers in structural biology. It builds on European e-Infrastructure solutions from EGI and EUDAT and links together web services and repositories for structural biology. The West-Life VRE continues to develop the community and services established in previous projects and activities. So far, some of the services use the authentication and authorization solution developed by the WeNMR project, which is based on standalone user registration, password, and links to X509 user certificates and EGI virtual organization memberships eventually. Other services use isolated solutions. Therefore revisions towards both convergence to a single solution and to use of emerging, more user friendly technology are desirable. In our presentation we describe the revised architecture of AAI and identity management in the West-Life VRE and introduce the pilot implementation. The new model builds on common principles examined in other similar infrastructures, like ELIXIR, BBMRI and INSTRUCT. The viability of the new solution will be demonstrated on selected West-life services. The implemented solution utilizes well-established technologies and conforms the guidelines of AARC. The infrastructure developed provides an interoperable solution that is compatible with the foreseen Life-science AAI arrangements.
        Speaker: Daniel Kouril (Masaryk University)
        Slides
      • 65
        Evolving Operational Security for the EOSC era
        The EOSC-hub proposes a new vision to data-driven science, where researchers from all disciplines have easy, integrated and open access to the advanced digital services, scientific instruments, data, knowledge and expertise they need to collaborate to achieve excellence in science, research and innovation. The process towards the integration of the different security activities will be supported through the development of harmonized policies and procedures, to ensure consistent and coordinated security operations across the services provided in the catalogue. Coordinating the Operational Security in such a broad environment is a challenge. At the same time it offers many possibilities of a closer collaboration of the already existing security teams active in the distributed infrastructures. The expertise built, and tools developed in response to specific problems in the different infrastructures can be used in cross-infrastructure co-operations. In this presentation we will present examples for possible collaborations in: * Incident Prevention * Incident Handling/Coordination * Security Training and Exercises In our presentation we will also share information on some actual cases of vulnerability management and incident coordination within and across infrastructures. A discussion on lessons learned will include review on how comprehensively critical vulnerabilities and incidents have been identified and how efficiently risks and incidents have been contained. As usual, the results from the debriefings give us pointers to which tools and procedures need further development to further improve our cross infrastructure operational security capabilities.
        Speakers: Dr Sven Gabriel (NIKHEF), Urpo Kaila (CSC)
        Slides
      • 66
        Challenges for Data Loss Prevention at Research Infrastructures
        Data Loss Prevention (DLP) is a classic but, currently, often overlooked security domain. DLP aims to identify and restrict access to non-public information, such as Personally Identifiable Information (PII) or confidential information.  DLP covers data in rest, data in transit and data in use and also requires procedures and tools to monitor how DLP is implemented. Researchers, research institutions, and research infrastructures handle a rapidly increasing amount of PII and confidential information, due to digitalization. Compared with IT services where the format of data and data transfer is strictly defined, e.g. the financial sector, DLP at Research infrastructures is a formidable challenge due the formidable spectrum of formats and platforms for the data.  The complexity of research related DLP is increased by the autonomous and dynamic nature of research, the sometimes unclear responsibilities on accountability for DLP, and last but not least, the growing amount of PII in research environments that have originally been designed for handling public scientific data. Many commercial tools available for DLP are not always feasible for research infrastructures for the reasons mentioned above. In addition, pricing can make the current commercial DLP applications out of reach for the research community. Instead, jointly developed open source based software and common services to ensure DLP, could provide a feasible roadmap for research. Additional pressure on implementing satisfactory management and technical controls for DLP comes from the General Data Protection Regulation (GDPR) with urges for compliancy with requirements for user consent, right to erasure, security of processing of PII, and privacy by design. As joint projects with private industry is on the increase, pressure to implement DLP for information protected by non-disclosure agreements is also increasing. In our presentation, we aim to pinpoint the current state of DLP at Research Infrastructures, to identify the most urgent challenges, and suggest a sustainable roadmap on how the research infrastructures can jointly implement measures for adequate DLP. We will discuss available operational solutions to implement and monitor DLP. We will also highlight the community driven policy frameworks developed through WISE (Wise Information Security for collaborating E-infrastructures) that guide infrastructures to developing suitable environments to address DLP.
        Speaker: Urpo Kaila (CSC)
        Slides
      • 67
        Check-in towards an integrated authentication and authorisation infrastructure for the EOSC
        The European Open Science Cloud (EOSC) aims to enable trusted access to services, systems and the re-use of shared scientific data across disciplinary, social and geographical borders. The EOSC-hub will realise the EOSC infrastructure as an ecosystem of research e-Infrastructures leveraging existing national and European investments in digital research infrastructures. Check-in, which is the AAI Platform for the EGI infrastructure, will be an enabling service of the EOSC-hub AAI platform aiming to provide researchers from all disciplines with easy, integrated and open access to the advanced digital services, scientific instruments, and data. Check-in has been implemented based on the blueprint architecture and the policy framework from the AARC project. As such, it has been integrated with Identity Providers from eduGAIN and individual organisations to allow users to access services (web and non-web based) using their own credentials from their home organisations. EGI operational tools and services that are connected to Check-in can become available to over 2000 Universities and Research Institutes from the 46 eduGAIN Federations with little or no administrative involvement. Compliance with the REFEDS Research and Scholarship entity category and the Sirtfi framework facilitate sufficient attribute release, as well as operational security, incident response, and traceability. Complementary to this, users without an account on a federated institutional Identity Provider are still able to use social media or other external authentication providers for accessing services that do not require substantial level of assurance. The adoption of standards and open technologies by Check-in, including SAML 2.0, OpenID Connect, OAuth 2.0 and X.509v3, has facilitated interoperability and integration with the existing AAIs of other e-Infrastructures and research communities, such as ELIXIR. Research communities can leverage Check-in for managing their users and their respective roles. For communities operating their own group management system, Check-in has a comprehensive list of connectors that allows to integrate their systems as externally managed Attribute Authorities. Check-in will contribute to the EOSC infrastructure implementation roadmap by enabling seamless access to a system of research data and services provided across nations and disciplines. Specifically, together with EUDAT B2ACCESS, EGI Check-in will serve as the initial basis of an integrated EOSC-hub AAI that will allow the use of federated identities to authenticate and authorise users and expand the access to services outside the traditional user base, opening them to all user groups including researchers, high-education, and business organisations. The integration activities will ensure the harmonisation of user attributes, the alignment of the levels of assurance, and the uniform representation of group and other authorisation-related information. The presentation will provide an overview of the Check-in architecture and the various integration workflows in support of today’s use cases for federated access, with an eye to the integrated EOSC AAI ecosystem.
        Speaker: Mr Nicolas Liampotis (GRNET)
        Slides
      • 68
        OIDC-Agent - OIDC for the commandline
        OIDC-Agent is a tool to simplify OpenID Connect (OIDC) token management on the commandline. It was designed to manage OIDC tokens and make them easily usable. We followed the ssh-agent design, so users can handle OIDC tokens in a similiar way as they do with ssh keys. Multiple account configurations may be loaded in oidc-agent concurrently. oidc-add is also used to remove a loaded configuration from OIDC-Agent. OIDC-Agent may be started in the beginning of a login session. Through use of environment variables the agent can be located and OIDC access tokens are available on the whole system. Full documentation can be found at https://indigo-dc.gitbooks.io/oidc-agent/.
        Speaker: Uros Stevanovic (KIT-G)
        Slides
      • 69
        eduTEAMS: An AAI solution for scientific collaborations at scale
        eduTEAMS is a suite of AAI services, which enable the integration of users from a wide range of environments, connecting them to specific services (such as instruments), and also to other generic services such as storage and compute provided by eInfrastructure providers or even commercial entities. The development of Phase 1 of eduTEAMS was completed in July 2016 with the successful delivery of the first stage of eduTEAMS Membership Management Service and Identity Hub. The Membership Management Service provides a platform for managing groups, attributes and enrollment for research collaboration’s participants. The service provides additional attributes and groups information for the participants in the context of the collaborations. The Identity Hub proxies multiple external identity providers to one single, persistent SAML2 IdP. This allows research collaborations to use one endpoint for all Guest/External ID scenarios, while at the same time allowing the end users to choose the service they prefer. Following identification of requirements in the market analysis conducted by GEANT with different types of communities, features were prioritised and key components to deliver those requirements were evaluated and selected, and two classifications for use eduTEAMS cases – basic and advanced – were identified. The requirements identified by the market analysis are in line with the AARC Blueprint Architecture as part of their wider evaluation of research requirements. The basic use case classification focuses on long-tail usage of federated identity and group management, while the target users for the advanced use case scenario are large Virtual Organisations having a defined legal status and more complex requirements for group and attribute management, as well as control over VO-specific data. Approaches to service delivery for the use cases were then defined in terms of software development, platform architecture, service approach and outreach. These include: a development approach that combines existing open source components with glue developed by GÉANT to deliver a platform to meet a range of use cases; a platform architecture composed of flexible interoperable components; and a service operational model which enables this common eduTEAMS software platform to deliver either single-tenant or multi-tenant service instances. Phase 2 development of eduTEAMS takes place within the parameters of these approaches and will cover the following areas: - General platform improvements to UIs, manageability and scalability. - Implementation of additional membership management workflows. - Support for non-SAML attribute authorities. - Integration of a wider range of identity providers. - Migration to enhanced discovery services.
        Speaker: Christos Kanellopoulos (GÉANT)
        Slides
    • Special focus on Earth Observation 214 & 216 (The Square, Brussels Meeting Centre)

      214 & 216

      The Square, Brussels Meeting Centre

      Convener: Hannes Thiemann (DKRZ)
      • 70
        The Geohazards Exploitation Platform: an advanced cloud-based environment for the Earth Science community
        The idea to create advanced platforms, where the users can find data but also state-of-art algorithms, processing tools, computing facilities, and instruments for dissemination and sharing, in the field of the satellite Earth Observation has been launched several years ago. The initiatives developed in this context have been supported firstly by the Framework Programmes of the European Commission and the European Space Agency and, progressively, by the Copernicus programme. The Geohazards Exploitation Platform (GEP) is an ESA funded R&D activity to exploit the benefits of new techniques for large scale processing of EO data. It supports the geohazards community by creating an Exploitation Platform with new models of collaboration where data providers, users and technology partners produce and deliver scientific and commercial information products in the Cloud. The Platform is creating an ecosystem of partnerships for data, applications and ICT resources. Initiated in March 2015 and with a strong and growing user base of early adopters, it defines a new paradigm for EO data exploitation and valorisation, where partners bring in applications, and processors are deployed close to the data, in order to create value-added products with a scientific and/or a commercial value. It builds on a partnership model where: - Data providers benefit from an integrated workplace to outreach users that seek to extract value out of sensor measurements; - Technology providers benefit from the Platform connectivity to data sources, and from the turn-key environment (PaaS) for software integration; - Cloud providers benefit from opportunities to provision commodities and services in support of the ICT challenges created by the growing volume of environmental data from space. The initiative has already secured funding in order to expand its user base, and will gradually reach a total of 70+ users from more than 50 organisations worldwide by the end of 2017. The GEP supports federated Cloud operations. The Platform collaborative environment and business processes support users to seamlessly deploy apps and data from a shared marketplace and across multiple cloud environments. In particular it already supports a set of systematic services, automatically producing value added products out of Copernicus Sentinel-1 and Sentinel-2 acquisitions at global scale, federating resources from e-infrastructures (EGI), public research centres (PSNC) and private providers (IPT.PL). GEP is currently about to enter the pre-operations phase under a consortium led by Terradue and six pilot projects concerning different challenging applications using SAR and optical satellite data. GEP was selected to participate to the EO pillar of the new EOSC-Hub H2020 project and will be offered to a wider public through the EOSC service catalogue. This project will mobilize e-Infrastructures comprising more than 300 data centres worldwide and 18 pan-European infrastructures, representing a ground-breaking milestone for the implementation of the EOSC. This activity will manage the access provisioning for EOSC services and provide training activities on the usage of EO data and services with outreach activities to widen the exploitation of EO satellite data to non-EO communities.
        Speaker: Pacini Pacini (Terradue srl)
        Slides
      • 71
        Bringing user communities to cloud based Virtual Research Environments – The Co-ReSyF Experience
        In the last years there has been an increasing demand to develop geospatial data systems that provide users working with Earth Observation data the capability to access, visualise and process large volume EO datasets currently available (e.g. Copernicus and Sentinel) to develop their research activities or operational services. The Coastal Waters Research Synergy Framework (Co-ReSyF) project tackles these issues, by introducing platform for combined data access, processing and visualisation in one place. Co-ReSyF is a Virtual Research environment to support the development of research applications using Earth Observation (EO) data for Coastal Water Research. Co-ReSyF provides a cloud platform, which simplifies integration of EO data use into multi-disciplinary research activities that fits the needs of inexperienced scientists as well as EO and coastal experts. Those components are complemented by a set of user support systems that helps guiding the researcher through the wide array of datasets, applications and processing chains. The platform is based on cloud computing to maximise processing effort and task orchestration. Co-ReSyF addresses issues faced by inexperienced and new EO researchers, and also target EO experts and downstream users. We reach a wide community of coastal and oceanic researchers, who are offered the opportunity to experience, test and guide the development of the platform, whilst using it as a tool for their own research. The platform includes a set of 5 core Research Applications, developed under the project, and also a set of tools that the researchers can use to build their own applications in a user friendly manner. Each of these research applications consists of subcomponent modules, which users can apply to different research ventures. Additionally, other potential tools or applications can be added by the research community for sharing with other researchers that may find it useful. The set of core applications to be developed during the project lifetime are: - Bathymetry Determination from SAR Images; - Determination of bathymetry, benthic classification and water quality from optical sensors; - Vessel and oil spill detection; - Time-series processing for hyper-temporal optical data analysis; - Ocean coastal altimetry Additionally, a group of 8 Master/PhD students have been selected to use the platform and contribute with their own tools and/or applications to be incorporated into the platform. Co-ReSyF provides flexible and scalable data access, visualisation and processing solutions to a network of user communities that already extends beyond Coastal Areas to other thematic fields like Agriculture, Disaster Risk Management, in the scope of other projects where Co-ReSyF expert partners are involved. These solutions will be fully adapted to their needs during the different stages of the EO product development cycle, from research to application development and operationalisation. This will provide a highly sustainable framework for their long term growth by assuring a continuous and steady increase of users as well as the number of shared datasets, tools, developed applications, and ultimately maximising collaborative research and scientific knowledge.
        Speaker: Mr Nuno Grosso (Deimos Engenharia SA)
        Slides
      • 72
        NextGEOSS: Next generation European GEOSS data hub and cloud platform
        The NextGEOSS project, a European contribution to Global Earth Observation System of Systems (GEOSS), proposes to develop a centralised hub for Earth Observation (EO) data, where the users can connect to access data and deploy EO-based applications. Through developing further technologies in the scope of GEOSS, the project will enable increased use of EO data supporting decision making. Moreover, a central component is the strong emphasis put on engaging the communities of providers and users, and bridging the space in between. NextGEOSS holds an exploitation platform (virtual workspace), providing the user community with access to large volume of data (EO/non-space data), algorithm development and integration environment, processing software and, computing resources, collaboration, and general operation capabilities. NextGEOSS focuses on a fundamental change to facilitate the connectivity to the European and global data centres with new discovery and processing methods to support innovation and business. It will leverage Web and Cloud technologies, offering seamless and user-friendly access to all the relevant data repositories, as well as providing efficient operations for search, retrieval, processing/re-processing, visualization, analysis and combination of products from federated sources. As such, this project requires the collaboration with ICT Research Centers to serve our communities. Included in NextGEOSS there are ten pilot activities which will test the integration in the data hub, and provide GEO-related activities on their own, supporting the achievement of the Sustainable Development Goals. Six of the pilot applications are Innovative Research Pilots, in which the focus is on intensive research and development activities. The remaining four pilot activities are dedicated to Business Opportunities and Services, focusing on a commercial-oriented approach.
        Speakers: Mr Nuno Almeida (Deimos Engenharia), Mr Nuno Catarino (Deimos Engenharia)
      • 73
        Solutions for Cloud/HPC and big data frameworks interoperability
        Forthcoming EO and scientific space missions create unprecedented opportunities to empower new types of user applications, and to develop a new generation of user services, particularly for the European scientific community. The associated generated data are steadily increasing volume, delivery rate, degree of variety, complexity and interconnection of data. Challenges stemming from such an increase of volume, velocity and variety of data drive the urgent need of new processing concepts which shall ensure the necessary power but also scalability and elasticity to actually exploit those data. Furthermore, new requirements from the scientific communities are emerging. In particular the requirement to easily integrate their own processing, manipulation and analysis tools into harmonized frameworks (platforms), which on their side should provide basic processing features, like e.g. efficient data access, distributed massive processing, load balancing, etc. Platform users aim at integrating their own processing tools in a seamless and easy way, avoiding software changes and/or development of additional interfaces/components for the sole purpose of their integration and deployment. These challenges can be faced to with the help of cloud computing, the development of many new computing patterns under the banner of Big Data technologies and a variety of libraries and toolboxes for processing currently available users to support and simplify their processing needs. However, there is no unique Big Data or HPC Framework able to address all computing patterns (Map/Reduce, Streaming, Directed Acyclic Graph...) and all data types (Satellite Imagery, IOT data, Social network stream…). That is why modern scientific computing platforms should be able to combine efficiently Big Data and legacy computing patterns on hybrid on-premise/cloud computing infrastructures. This presentation will describe the solutions proposed by CS to build such a processing platform. These solutions are based on a multi-cloud strategy that allows to always have the right offer, to benefit from a maximum of flexibility and to ensure independency against cloud vendors. For this purpose CS developed CS ViP (Critical System Virtual Platform), a multi IaaS system for interoperability with most of popular cloud providers through an unified API. CS ViP uses cutting-edge devops, monitoring and remote desktop technologies. On top of it, the profusion of Big Data frameworks can be used. Unfortunately, they not interoperable and the choice of one of them is divising with regards to the ecosystems of the others. In the same way, it is really difficult to access the large valuable code database targeting traditional HPC from the chosen framework. To ensure interoperability between these frameworks, CS designed SCREW, a PaaS system providing on demand computing platforms that combine major Big Data Frameworks - Spark, Hadoop, Ignite… - with traditional HPC framework - MPI and batch scheduling using DRMAA standard. A precursor to the future Copernicus DIAS platforms, RUS - Research and User Support Service - https://rus-copernicus.eu/, is already running on top of these open source technologies. RUS is a good example of the provisioning of a federated service for research, enabling interoperability between different cloud providers.
        Speaker: Mr Sylvain D'HOINE (CS Communication & Systèmes)
        Slides
      • 74
        The Terradue’s Open Cloud Strategy: the case of leveraging the EGI Federated Cloud as a commodity for the EO communities
        Earth observations from satellites produce vast amounts of data. In particular, the new Copernicus Sentinel missions are playing an increasingly important role as a reliable, high-quality and free open data source for scientific, public sector and commercial activities. Latest developments in Information and Communication Technology facilitate the handling of such large volumes of data, and European initiatives (e.g. EOSC, DIAS) are flourishing to deliver on it. In this context, Terradue is moving forward an approach resolutely promoting an Open Cloud model of operations. With solutions to transfer EO processing algorithms to Cloud infrastructures, Terradue Cloud Platform is optimising the connectivity of data centres with integrated discovery and processing methods. Implementing a Hybrid Cloud model, and using Cloud APIs based on international standards, the Platform fulfils its growing user needs by leveraging capabilities of several Public Cloud providers. Operated according to an “Open Cloud” strategy, it involves partnerships complying with a set of best practices and guidelines: - Open APIs. Embrace Cloud bursting APIs that can be easily plugged into the Platform’s codebase, so to expand the Platform offering with Providers offering complementary strategic advantages for different user communities. - Developer community. Support and nurture Cloud communities that collaborate on evolving open source technologies. - Self-service provisioning and management of resources. The Platform’s end-users are able to self-provision their required ICT resources and to work autonomously. - Users rights to move data as needed. By supporting distributed instances of its EO Data management layer, the Platform delivers the required level of data locality to ensure high performance processing with optimized costs, and guarantees that value added chains can be built on top of intermediate results. - Federated Cloud operations. The Platform’s collaborative environment and business processes support users to seamlessly deploy apps and data from a shared marketplace and across multiple cloud environments. As a recent case, thanks to the integration within the Platform of the Open Cloud Computing Interface (OCCI), and the close partnership between EGI and Terradue, our provisioning of ICT resources supports ever more demanding exploitation scenarios. For example, EGI compute and storage resources from ReCaS Bari (Italy) are used to support the VITO’s Sentinel-2 Biopar Pilot within the NextGEOSS project, an initiative funded by the European Commission to implement a federated data hub for access and exploitation of Earth Observation data. Furthermore, EGI compute and storage resources from GOEGRID-GWGD (Germany), ReCaS Bari (Italy), BELNET-BEGRID (Belgium) are used in the context of the ESA Geohazards Exploitation Platform initiative, where several Platform services automatically produce interferograms out of Copernicus Sentinel-1 acquisitions, over a subset of the global strain rate model. All the applications enabled by the Terradue Cloud platform will be integrated in the EOSC service catalogue during the EOSC-hub project to promote them and enlarge the user base. It is also planned to exploit more e-infrastructure services during the project, integrating selected services (from EGI, EUDAT and INDIGO-DataCloud) from the EOSC-hub catalogue in the cloud platform.
        Speaker: Mr Cesare Rossi (Terradue)
        Slides
      • 75
        Do a market for Earth Observation Data exist?
        Recently, a great attention has been given to the exploitation of Earth Observation data, as a mean of industrial innovation and source of potential societal benefits. EU is at forefront in Earth Observation technologies: ESA launched the Sentinel Constellations, a set of redundant satellites that will offer high availability and resiliency as required by industries to run businesses. Several attempts and approaches have been experimented with alternate success in terms of self-sustainability, easy-of-use and scalability: from Thematic Exploitation Platforms , Business incubators to a Marketplace for EO services and data . All these approaches seem to enable a pipeline business model, i.e. where business is based on the acquisition of resources (a product and/or service) that are pushed to the consumer through the value chain in a unidirectional way. However, the Digital Transformation is radically changing the market landscape: ubiquitous connectivity, hands-held technology and user interactions are enabling elements of the platform business model, as successfully exemplified in various markets such as AirBnB, Uber, Google, etc. The platform model, instead, focuses on the creation of value through establishing an intelligent networking among users: where pipelines create value “on-top” of managed resources, platforms (that usually don’t even own such resources) create value by linking producer and consumer of resources. Platforms, as analysed in depth by S.P. Choudary (Choudary et.al. 2015), execute as a content aggregator that can simultaneously satisfy different type of interests. Platforms exploit also the phenomenon of network externalities, i.e. a service increases in value whenever increases the number of interacting individuals. Externalities could be of two types: same side (e.g. as in the telecommunication networks) or cross-side. Platform model exploits cross-network externalities, which are linked to the diffusion of the product, not among members on the same side of the market, but to the diffusion of the product on another network (or side). For example, Amazon marketplace bridges producers and consumers, while Uber helps drivers and riders to meet or AirBnB links housekeepers and tourists. These features also allow the business model to scale faster, thanks to the nature of the user, who could be both resource provider as well consumer (prosumer): this creates new opportunities, but also new challenges to face with. The breakthrough work of Choudary on the platform analysis models highlights the distinctive features of this new approach as well as the best practices that facilitate its understanding and implementation. This presentation aims to open a debate among stakeholders and to illustrate the initial hypothesis about how to implement the platform model in the EO sector for the benefit of all actors involved.
        Speaker: Andrea Manieri (Engineering Ingegneria Informatica spa)
    • Networking Cocktail
  • Friday, 1 December
    • Opening Plenary Copper Room (The Square, Brussels Meeting Centre)

      Copper Room

      The Square, Brussels Meeting Centre

      Convener: Franciska Jong, de (CLARIN ERIC)
      • 76
        The LIGO/VIRGO scientific and data analyis challenges
        Speaker: Dr Sarah Caudill (NIKHEF)
        Slides
      • 77
        Forecasting elements for the future European Open Science Cloud Hub
        Speaker: Augusto Burgueño Arjona (Head of Unit "eInfrastructure", European Commission)
        Slides
    • 10:30
      Coffee Break The Square, Brussels Meeting Centre

      The Square, Brussels Meeting Centre

    • Building ENVRI-as-a-Service to the EOSC 213 & 215 (The Square, Brussels Meeting Centre)

      213 & 215

      The Square, Brussels Meeting Centre

      Scientific communities are important stakeholders for European Open Science Cloud (EOSC). However, there are many open questions from communities on the newly emerged concepts, e.g., what does EOSC means for communities, how communities can benefit from EOSC, and how to connect to EOSC.

      ENVRI is a community of the Environmental research infrastructures, projects and networks. Through 2 EU-funded projects, ENVRI has been endeavouring on building service solutions to a set of common challenges from environmental Research Infrastructures (RI), with accumulation of experiences of using pan European e-Infrastructure resources services such as EGI and EUDAT. Those solutions promote a more coherent, interdisciplinary and interoperable cluster infrastructure across Europe.

      This session brings ENVRI community to DI4R conference, and from community point of view analysis opportunities and benefits from EOSC. We will start by listening to success stories of using service solutions provided by ENVRI, leading to an open discussion between a mini panel and audience. The objective is to develop an understanding of EOSC for ENVRI, identify gaps and challenging issues, and define a roadmap to connect ENVRI to EOSC.

      EXPECTED IMPACT
      - The establishment of a forum for environmental scientists, RI service developers, and technology providers to discuss technical challenges and solutions.
      - The promotion of new collaborations between user communities, the development teams and e-infrastructure service providers.
      - The formulation of a conceptual paradigm for ENVRI-as-a-Service for Open Science Cloud.

      TARGET AUDIENCE
      - Environmental Research Infrastructures who want to come together to jointly build thematic services to EOSC.
      - E-infrastructure technology providers who want to help community requirements.

      STRUCTURE
      - Four invited talks from representative European environmental RIs on scientific use cases and service solutions (4 x10 mins presentation +Q&A)
      - One mini-panel discussion on “A Roadmap for Building ENVRI-as-a-Service to EOSC” (50min)

      Before go to the session, please complete the following survey:
      https://www.surveymonkey.com/r/ENVRI2EOSC

      Convener: Dr Yin Chen (EGI.eu)
      • 78
        EuroArgo Data subscription service
        The EuroArgo community aggregates the marine domain datasets, publishing and sharing access to its. An extensive number of research infrastructures needs those services. Since environmental data lifecycle is complex, keeping an eye on accumulated datasets is a repetitive and time consuming task for end-users. Thus a data subscription service has been developed. Euro Argo stations are exposed through a discovery web portal. Using criteria on localization, data quality, platform and parameters, user can subscribe to a dataset. With this functionality, user can receive regular updates of datasets without searching it again. Each update is notified by email and data could be pushed on a related repository. Using standardized input files, others ENVRI communities could take benefits of this reusable solution for their own purpose. The presentation will focus on the service’s architecture and will highlights the interoperability between infrastructures.
        Speaker: Dr Claudio Cacciari
        Slides
      • 79
        VRE support for EISCAT_3D user-driven data analysis
        For the ENVRIplus RIs, the ultimate goal is to provide quality-checked and calibrated observational data to their user communities. Virtual Research Environments (VREs) have in recent years emerged as an important approach to providing web-based systems to help researchers. A VRE for the ENVRIplus community has been setup using the D4Science platform D4Science https://www.d4science.org/ supports a flexible and agile application development model based on the notion of Platform as a Service (PaaS), in which components may be bound instantly at the time they are needed. In this way, it enables user communities to define their own research environments by selecting the constituents (the services, the data collections, the machines) among the pools of resources made available through the D4Science e-Infrastructure. Several ENVRIplus use cases are evaluating this service. In this talk we will present EISCAT’s experience of using the VRE service to support individual scientists to process radar data using their own algorithms. Benefits and limitation of the VRE service will also be discussed.
        Speaker: Dr Ingemar Haggstrom (EISCAT)
        Slides
      • 80
        Towards data and metadata interoperability of ICOS
        ICOS (Integrated Carbon Observation System) is a pan-European research infrastructure with a mission to collect high-quality observational data on greenhouse gases and the environment. ICOS has pledged to make all its data openly available and FAIR (Findable, Accessible, Interoperable, and Reusable) for all users. To support this, ICOS is developing a data management based on a PID-centric approach. Here persistent identifiers are used to link and cross-reference all ICOS items - including data & metadata objects, measurement stations, instruments, people and publications. The presentation will highlight some examples of how all ICOS services for data discovery, visualization and user-initiated analysis are driven by an underlying ontology-based catalogue.
        Speaker: Dr Maggie Hellstrom (Lund University (Sweden))
      • 81
        From service portfolio to interoperable operational models in environmental Research Infrastructures
        The data for science theme in the ENVRIPLUS project aims to provide the reusable solutions to the common challenges that research infrastructures in environmental and earth science face. In this talk, the key achievements made by the theme on reference model, ontological framework, knowledge base and the service portfolio will be presented. The presenter will also discuss the new challenges that the theme is facing and the considerations towards ENVRI EOSC.
        Speaker: Dr Zhiming Zhao (EGI.eu)
        Slides
      • 82
        Panel Discussion: A Roadmap for Building ENVRI-as-a-Service to EOSC
        The discussion will be moderated based on the following questions: 1. What does EOSC mean for ENVRI RIs? How the community can benefit from EOSC? 2. What are the challenging issues to connect ENVRI to EOSC? 3. What is a roadmap toward an ENVRI asS to EOSC?
        Speakers: Dr Giovanni Morelli (CINECA), Dr Ingemar Haggstrom (EISCAT), Dr Maggie Hellstrom (Lund University (Sweden)), Dr Markus Stocker (Universität Bremen), Dr Zhiming Zhao (EGI.eu)
    • Lightning Talks 211 & 212 (The Square, Brussels Meeting Centre)

      211 & 212

      The Square, Brussels Meeting Centre

      Convener: Dr Maria Eskevich (CLARIN ERIC)
      • 83
        Requirements for the use of structural biology data from the perspective of users and research infrastructures from neighboring fields
        Structural biology deals with the characterization of the structural (atomic coordinates) and dynamic (fluctuation of atomic coordinates over time) properties of biological macromolecules and adducts thereof. The West-Life H2020 project [1] is an initiative to bring the world of complex data analysis in structural biology to a simple Web browser-based Virtual Research Environment (VRE), available to any research team in the field. In addition, West-Life aims to further the use of structural biology data beyond the current reference community. To address the latter task we convened representatives of various research infrastructures (RIs) not directly operating in the field of structural biology to hold a round table discussion on the occasion of a meeting of the users of structural biology facilities [2]. In parallel, we organized an online survey addressed to individual scientists in the general biological community. RI representatives conveyed the need for innovative services that bridge structural information and the biomedical information that each RI is providing to its own reference communities. Individual researchers from the chemical, biological and biomedical communities expressed the need for tools that are more easily discoverable, better documented and affording a deeper comprehension of the quality of the underlying experimental data than currently available. In addition, the research community requested for improved reuse of structural data in complex scenarios such as modelling of biochemical events or in the interpretation of biological and functional information. This goes in the same direction as the aforementioned requirement of RI representatives that there should be tools that bridge structural information to other types of biomedically relevant information. [1] https://about.west-life.eu/ [2] https://www.structuralbiology.eu/content/bringing-together-the-bio-medical-scientific-communities-the-role-of-research-infrastructures
        Speaker: Antonio Rosato (CIRMMP)
        Slides
      • 84
        Towards mutually beneficial industrial engagement with the EUDAT collaborative data infrastructure
        Supporting the innovation capacity of European SMEs in not only a key goal of Horizon2020, but offering services to this class of user can be a significant contributor to providing sustainable funding for EU infrastructure. However, in practice many obstacles - both technical and practical - exist which prevent the uptake of research infrastructure by industrial partners. In this talk we will focus on the issues that we have uncovered trying to engage SME’s in the use of the EUDAT collaborative data infrastructure. We have identified both the need for a more commercial outlook regarding pricing and service provision and the need to give more control of how data is made available to the SMEs as the key challenges that underline this activity. Meeting these challenges requires a change of mind set in terms of infrastructure provisions, which is currently focused at academic usage with its concomitant model of access to resource free at the point of use. During our work with a number of industrial partners looking to use datasets in conjunction with HPC, this issue is frequently compounded by usage policies adopted by HPC facilities. Furthermore, providing transparent pricing using commercial cloud resources (such as Azure, Amazon and Google) is non-trivial as they often charge depending on the use of the data rather than simply for holding it. In our efforts to give SME users access to EUDAT on terms that are acceptable to their commercial interests, we have worked on deploying containerised versions of the EUDAT B2SHARE component for use by SMEs, and customising the service offering to their requirements. In conclusion we believe that active collaborations with SMEs can not only help sustain EU infrastructure projects into the future, but also aid in hardening their services and furthering the development of new services. Such interactions are beneficial not only for the SMEs and infrastructure provider, but the wider research community. However, in order to take advantage of these benefits, projects such as EUDAT need to flexible in their SME engagement, allowing them to adopt services without compromising their commercial interests and to provide transparent cost models for installation and on going usage of storage services.
        Speaker: stefan zasada (UCL)
        Slides
      • 85
        Plans for ELI Computing Infrastructure
        Extreme Light Infrastructure project currently consists of 3 pillars in 3 countries and a coordination body ELI-DC. The project will become European Research Infrastructure Consortium and will unify existing pillars. The common ICT infrastructure is planned within a preparatory project ELITRANS, which includes also European e-Infrastructure organizations EGI.eu, EUDAT and PRACE. We will present our current ideas for user and data management and challenges on the way for a unified computational environment.
        Speaker: Jiri Chudoba (CESNET)
        Slides
      • 86
        The OpenAIRE content provider dashboard: monitoring and enriching local collections using OpenAIRE services
        The OpenAIRE content provider dashboard is a one-stop shop supporting content providers at registering their sources to OpenAIRE (journal platforms, data repository, institutional repository, etc.) to make their metadata visible via the OpenAIRE portal. Among its functionalities, the dashboard offers validation of compliance to OpenAIRE guidelines, aggregated usage statistics for articles and repository, and enrichment of source content via subscription and notification mechanisms. The talk will provide details on the Dashboard functionalities, of the interoperability services offered through it and the benefits it brings to a number of stakeholders. After a repository is registered and compliant with the OpenAIRE interoperability guidelines, the dashboard applies (through the OpenAIRE information graph) cleaning, transformation, and disambiguation processes, and identifies relationships among all research entities available in OpenAIRE, such as publications, data, funding, researchers, organisations, and data sources. Using all these data, OpenAIRE populates, enriches, and maintains a graph of the aggregated objects. Through infrastructure services, the objects are harmonized to achieve semantic homogeneity, de-duplicated to avoid ambiguities, and enriched with missing properties and/or relationships. OpenAIRE content providers interested in enhancing or incrementing their content benefit from this service in a number of ways, as it provides information that is not otherwise readily available to them. Moreover, the dashboard offers statistics and metrics on the contents, which can be integrated with download counts from the local repositories using a plugin specifically developed to facilitate the integration with OpenAIRE (available on EPrints and DSpace). Through the Dashboard, the OpenAIRE Literature Broker Service is made available, offering content providers subscription and notification functionalities for events happening around their collections. In fact, by exploiting the provenance information tracked by the OpenAIRE infrastructure, it will be possible to subscribe to "enrichment" events and be notified whenever OpenAIRE enriches a publication metadata record with new properties (subjects, citation list, research initiatives) or new relationships to other projects or datasets. By enhancing records with relationships and analyzing the information space graph, the service will also be able to notify repository managers about "addition" events whenever a publication metadata record relevant for their repository is aggregated from another data source.
        Speaker: Pedro Principe (University of Minho)
        Slides
      • 87
        AEGIS: AARC Engagement Group for InfrastructureS
        The **AARC Engagement Group for InfrastructureS (AEGIS)** brings together research and e-Infrastructure providers who implement AAI solutions for research collaborations, based on the AARC Blueprint Architecture. AEGIS establishes a bi-directional channel between the AARC2 project and the infrastructure providers to advise each other on the developments and production integration aspects of the AARC results. The group will ensure that: - the results of AARC2 are known to all research infrastructures and e-infrastructures; - infrastructures and AARC2 team can discuss AARC2 sustainability models, implementation aspects and approaches to use-cases that may be received by AARC2; - all key parties share the same vision and the same information about AARC2 objectives and developments in the trust and identity area even if they may be in different deployment phases. The current membership list of AEGIS include representatives from AARC, DARIAH, EGI, ELIXIR, EUDAT, GÉANT, PRACE and XSEDE.
        Speaker: Christos Kanellopoulos (GÉANT)
        Slides
      • 88
        FIM4R and DI4R
        Federated Identity Management is seen as a vital component for research infrastructures. Excellent technology exists as eduGAIN provides an Interfederation that makes the single national research federations interoperable, i.e. a researcher from country X can access a Service from country Y, via her own campus account. In practice there are a number of yet unfulfilled requirements of the research infrastructures. FIM4R, a group of research infrastructures that meet regularly since 2011, has converged on a common vision for FIM, enumerated a set of requirements and proposed a number of recommendations for ensuring a roadmap for the uptake of FIM is achieved. A second version of a paper documenting these is currently being worked on. The lightning talk wants to make other research infrastructures aware of this work and provoke new input from them.
        Speaker: Mr Peter Gietz (DAASI INternational / DARIAH)
        Slides
      • 89
        For researchers in need of data management skills: The CESSDA Online expert tour guide on RDM
        Many members of the Consortium of European Social Science Data Archives (CESSDA ERIC) host workshops on how to manage, store, organise, document and publish data for researchers in the social sciences. The forthcoming CESSDA Research Data Management (RDM) expert tour guide is an **online tutorial** that brings together the archives’ experiences when engaging with researchers in their workshops. The guide is based on the **research data life cycle** and can be used by individual researchers as self-study, but also as part of an online or face to face workshop on research data management, or as part of a university offering. When it comes to RDM, many different skills are required from researchers. To acquire these skills many sources of information exist that scientists can make use of. However, information is often scattered and researchers need to comply with specific requirements, e.g. related to funders, national legislation and common practices within their particular domain which can be hard to find. One of the unique features or our tutorial is that it describes the **diversity** that can be found in Europe with respect to the practical implementation of the whole research data life cycle. There is another recurring element that describes what a researcher needs to do when creating and **adapting a data management plan** in the various stages of their research, taking into account discipline-specific context (e.g. privacy issues, data documentation). The guide also features practical examples and checklists which encourages the direct application of the provided material. The new guide will be **promoted** within the CESSDA ERIC, and we aim to encourage its direct use by universities and research institutes as well using the module as a basis for more in-depth trainings. We will support the latter by organising train the trainer workshops and providing additional guidance materials for local trainers on various types of workshops. We would like to present this project as a lightning talk and/or a poster in which we will give a sneak preview of the new online Expert Tour Guide that will be launched at the end of this year (2017).
        Speaker: Mrs Ellen Leenarts (DANS)
        Slides
      • 90
        BEXIS 2 – more than a data management system for the biodiversity domain
        In this presentation, we will demonstrate BEXIS 2, a modular, scalable, interoperable, free and open source system supporting research teams of several hundred researchers on all aspects of data life cycle management. The general idea is to support researchers as early as possible within the active phase of a project (e.g. dataset design, workflow documentation), but also provide or incorporate services for data preservation and publication (e.g. GFBio.org, Pensoft Biodiversity Data Journal). The software is being developed based on requirements from the biodiversity and ecology domain that mostly deal with tabular data, but it can be easily configured to serve other domains and data types as well. For tabular data, there are dedicated features to manage, share and re-use data structures, variables, units of measures, and data types. BEXIS 2 is very flexible and can be instantiated with multiple tenants, multiple metadata schemas, and various database systems. Other advanced features are a faceted search (incl. primary data), a customizable data download and export (i.e., filter, sort, select, views), direct data access via API’s (e.g., from R), a highly flexible authentication and authorization system (incl. single sign-on), and dataset versioning. BEXIS 2 will be compliant with the FAIR data principles soon and we are exploring ways to offer BEXIS 2 as a hosted service.
        Speaker: Markus Baaske
    • Procurement of commercial services for research communities 214 & 216 (The Square, Brussels Meeting Centre)

      214 & 216

      The Square, Brussels Meeting Centre

      Convener: Bob Jones (CERN)
      • 91
        Introduction
        Speaker: Bob Jones (CERN)
      • 92
        GEANT IaaS framework
        Speaker: Andres Steijaert (Surfnet)
        Slides
      • 93
        Helix Nebula Science Cloud Pre Commercial Procurement
        Speaker: Daniele Cesini (INFN)
        Slides
      • 94
        T-Systems
        Speaker: Jurry Mar, de la (T-Systems International GmbH)
        Slides
      • 95
        RHEA
        Speaker: Alastair Pidgeon (RHEA System)
        Slides
      • 96
        Procurement activities in the EOSC-Hub project
        Speaker: Sergio Andreozzi (EGI.eu)
        Slides
      • 97
        Open Clouds for Research Environments
        Speaker: Bob Jones (CERN)
        Slides
      • 98
        Panel discussion with all speakers + Dario Vianello (EMBL-EBI) and Marc-Elian Begin (SixSq)
      • 99
        Wrap-up
    • Service and data interoperability Copper Room (The Square, Brussels Meeting Centre)

      Copper Room

      The Square, Brussels Meeting Centre

      The lack of interoperability is a major barrier to open data sharing. The barriers between disciplines and organisations have arisen for many historical, technical and cultural reasons, and they are difficult to overcome. If we are to develop wide support for open science, we need to adopt common approaches to support data and service interoperability. In this session, we shall discuss some current activities which are developing best practise for interoperability within and between major European digital infrastructure initiatives.

      Convener: Brian Matthews (STFC)
      • 100
        EGI-EUDAT joint access to data and computing resources: an executive report
        The EGI-EUDAT interoperability collaboration started in 2016 with the goal to harmonise the two e-infrastructures. In order to create seamless access, and pairing data and computing resources together into one perceived infrastructure offering both EGI and EUDAT services, user communities were identified and selected to bring in their requirements on technical interoperability, authentication, authorisation and identity management, policy and operations. After the definition of a universal use case, this end-user driven approach continued and EGI and EUDAT worked closely together with EPOS, ICOS and later also IS-ENES research infrastructures as usecase pilots to test-drive the cross-infrastructure usage of the storage resources managed by EUDAT and the computing resources available through EGI and to finally also validate the results. Throughout the project, priorities were adjusted to match the aspects most important to the user communities: the user communities put much more emphasis on automated approaches and quality of end user documentation than foreseen in the beginning. Involving the user communities in this way also meant a sometimes steep learning curve on the technological understanding and effective communication using the right terms, which presented a major - but necessary - time investment from their side. This time and trust investment had to be properly administered by not misusing them as free beta-users of any not production-ready and largely undocumented new features, that were eagerly put forward. Concrete outcomes of the soon to be finished work include e.g. valuable feedback on data-handling support within the EGI DataHub and testing to use the EGI Federated Cloud with automatic submission, data transfer tests between the VMs and B2STAGE instances using both OneData and EGI DataHub to access a common storage for several VMs and evaluating the new B2STAGE HTTP API. This presentation will show the final established design of the workflows for each of the followed user communities, highlighting the adaption to their specific needs and also the cross-fertilization between them.
        Speaker: Dr Michaela Barth (KTH)
        Slides
      • 101
        Combining HPC and Data services
        The goal of this proposal is to present the collaboration activity between two major European Infrastructures, EUDAT, the European Collaborative Data Infrastructure, and PRACE, the Partnership for Advanced Computing in Europe, to support communities into the management of data sets resulting from scientific simulation. The EUDAT infrastructure initiative is a consortium of several major European data & compute centers and research community centers and organizations that are working towards the development and realisation of the Collaborative Data Infrastructure (CDI) which provides an interoperable layer of common data services and a common model for managing data spanning all European research data centres and data repositories to create a single European data infrastructure. PRACE – the Partnership for Advanced Computing in Europe – research infrastructure enables high impact European scientific discovery and engineering research and development across all disciplines to enhance European competitiveness for the benefit of society. PRACE seeks to realize this mission through world class computing and data management resources and services open to all European public research through a peer review process. The broad participation of European governments through representative organizations allows PRACE to provide a diversity of resources throughout Europe including expertise for the effective use of these resources. The capability to couple data and compute resources together is considered relevant to accelerate scientific innovation and advance research frontiers. In ever growing scientific and industrial domains, the use of large-scale instruments (synchrotron, telescopes, satellites, sequencers, network of sensors, scanners), supercomputers and open data archives is leading towards the convergence among HPC, High throughput Computing, Networks and Data Management facilities. The aim of this collaboration is to implement the vision where supercomputing and data resources of any kind and size are accessible without technical barriers and produced data are managed in a profitable way. It aims at connecting experts, and representatives from scientific user communities, exploring ways in which such e-Infrastructures can develop synergically and provide compound services. The joint activity covers different aspects, including the standardization of service interfaces, the harmonisation of access policies, the lowering of technical barriers, the joint support of users, and the coordination of joint training activities. The presentation will report about the status of the collaboration, results achieved so far, and available opportunities for the users to participate.
        Speaker: Giuseppe Fiameni (CINECA - Consorzio Interuniversitario)
        Slides
      • 102
        Sensitive data services and their integration with European e-infrastructures
        Nowadays data are collected and stored in unprecedented ways, thus enabling new research opportunities and novel innovations based on data mining and data aggregation. However, in fields such as medical, social and environmental sciences research often includes personal or sensitive data that must be handled with consideration for the personal privacy. Many projects have developed legally compliant solutions to deal with using sensitive data for research and the General Data Protection Regulation (GDPR) is now setting the stage for accessing, sharing and processing of sensitive personal data in Europe. Several strategies have been adopted locally to comply with national privacy regulations. But there is still the need to implement policies and the corresponding technologies that effectively allow cross border, inter-disciplinary research on personal sensitive data. The ePouta cloud from CSC Finland, and the TSD system from USIT Norway are operational services that provide secure computing and data environments for sensitive data. These services are being used nationally and regionally by researchers in collecting, storing and processing of sensitive data. They are operated by EUDAT partners and plans for their integration to European e-infrastructures are being prepared by matching them with community pilot cases. Both of the services are also included in the upcoming EOSC-hub portfolio. TSD and ePouta represent complementary resources and thus they offer wide potential for integration to the European e-Infrastructures through the EUDAT and EOSC service portfolios. TSD and ePouta are based on the concepts of Platform-as-a-Service and Infrastructure-as-a-Service, respectively. CSC ePouta delivers a secure cloud infrastructure with powerful computing and data storage connected to the network at the user community domains. The TSD offers storage capability, computing infrastructure, analysis / visualization platforms and web-based data collection tools suitable for running complex research projects within an efficient and secure IT-infrastructure. Work towards connecting TSD and ePouta with a secure connection is underway and will provide an example of cross-border use of such infrastructures.
        Speaker: Mr Antti Pursula (CSC)
        Slides
      • 103
        OpenAIRE services in support of “Open Science as-a-Service”
        The effective implementation of OpenScience calls for a scientific communication ecosystem capable of enabling the “Open Science publishing principles” of transparency and reproducibility. Such ecosystem should provide tools, policies, and trust needed by scientists for sharing/interlinking (for “discovery” and “transparent evaluation”) and re-using (for “reproducibility”) all research products produced during the scientific process, e.g. literature, research data, methods, software, workflows, protocols, etc. OpenAIRE fosters OpenScience by advocating its publishing principles across Europe and research communities and by offering technical services in support of OA monitoring, research impact monitoring, and Open Science publishing. Its aim is to provide Research Infrastructures (RIs) with the services required to bridge the research life-cycle they support - where scientists produce research products - with the scholarly communication infrastructure - where scientists publish research products - in such a way science is reusable, reproducible, and transparently assessable. OpenAIRE is fostering the establishment of reliable, trusted, and long lasting RIs by compensating the lack of OS publishing solutions and providing the support required by RIs to upgrade existing solutions to meet OpenScience publishing needs (e.g. technical guidelines, best practices, OA mandates). To this aim, OpenAIRE is working closely with existing RIs to extend its service portfolio by introducing two services implementing the concept of “Open Science as a Service” (OSaaS): The Research Community Dashboard. Thanks to its functionality, scientists of RIs can find tools for publishing all their research products, such as literature, datasets, software, research packages, etc. (provide metadata, get DOIs, and ensure preservation of files), interlink such products manually or by exploiting advanced mining techniques, and integrate their services to automatically publish metadata and/or payload of objects into OpenAIRE. As a consequence, scientists populate and access an information space of interlinked objects dedicated to their RI, through which they can share any kind of products in their community, maximise re-use and reproducibility of science, and outreach the scholarly communication at large. The Catch-All Broker Service. Thanks to its functionality, data sources such as institutional repositories, data repositories, software repositories can be notified of metadata records relative to products (datasets, articles, software, research packages) that are “of interest to them”, i.e. metadata records that should be in the data source, or “linked to them”, i.e. a scholarly link exists between one of the data source product and the identified product. Notifications are sent only to subscribed data sources, following a subscription and notification pattern, and can be delivered by mail, OAI-PMH end-user interfaces, or, currently under investigation, via push APIs (e.g. SWORD protocol), FTP and ResourceSync. The idea behind the service is to disseminate and advocate the principle that scholarly communication data sources are not a passive component of the scholarly communication ecosystem, but rather active and interactive part of it. They should not consider themselves as thematic silos of products, but rather as hubs of products semantically interlinked with any kinds of research products and, more broadly, up-to-date with the evolving research ecosystem.
        Speaker: Paolo Manghi (Istituto di Scienza e Tecnologie dell'Informazione - CNR)
        Slides
      • 104
        Integrated service delivery across e-Infrastructures and Service Providers
        GEANT is putting forward a standards' (ITIL, TMForum and MEF) based architecture and framework for operational integration and service delivery orchestration across e-infrastructures and service providers. The specification is addressing minimum requirements for the operations and business support systems (OSS/BSS) of participating service providers, defining orchestrated processes across service providers and inter-service-provider open application programming interfaces (APIs) for the different types of service provider interactions (e.g. business agreement establishment, order management, service delivery). The architecture makes it possible for end-users to pull together and interconnect the strands from multiple service providers, through use of self-service portals and user centric workflows. It thus facilitates cross provider service delivery, where an order for a service or resource is placed and managed in one location (a portal) and distributed in the background among the engaged service providers. Offerings to users are presented in the form of order-able products, masking technology-specific services, operations and resources into back end functional elements both service provider internal and business-to-business across providers ones. Challenges include the modelling and advertising of offerings, services and technologies across the eInfrastructures and service providers so that service chaining and composition is possible, incorporating federated AAI functions to standards (where they do not exist), accommodating R&E but also commercial service provider existing APIs, enabling dynamic onboarding of service providers and users, managing business agreements and terms of service use programmatically and exploiting systems' orchestration to eliminate manual tasks in light of the expected scale of service requests across the EOSC service area. EOSC users are expected to enjoy a coherent, transparent, comprehensible, consistent, predictable service experience across multiple providers, the same way as they can today order and receive instantaneously cloud and connectivity resources from commercial providers but with the added value of specialized, science-oriented offerings. The presentation will include a demonstration of the framework in action, where an institution user at one edge of Europe can request dedicated connectivity to access cloud resources provisioned at a cloud provider's data center at the other end of the continent, via the orchestration of the intermediate network service providers and the cloud provider API.
        Speaker: Ms Afrodite Sevasti (GEANT)
      • 105
        Discussion
    • Impact evaluation and metrics 211&212

      211&212

      The Square Meeting Centre

      Mont des Arts street, no. 1000 Brussel, Belgium
      • 106
        Everything Counts in Large Amounts: Measuring the impact of Usage Activity in Open Access Scholarly Environments
        **Overview of the proposed presentation / session / poster / demo** Evaluation of scholarly impact has a strong influence on the assessment of the scholarly ecosystem, such as authors, publishers and institutions. With the advent of scholarly communication in the web new types of research output have evolved and in addition to conventional metrics new web based indicators have been developed. However there is a strong need to overcome limitations regarding accessibility, coverage of research output types and disciplines in metrics that prevent such evaluations to be performed in a transparent, robust and reproducible way. The main topic of this presentation, is the Usage Statistics Service developed in the context of the OpenAIRE project. The service aims to address the requirements mentioned above and offers an integrated infrastructure for assessing scholarly information. **How does your submission address the conference themes and the topics of the track?** The key challenge for usage statistics as a contribution to impact evaluation is the generation of comparable, consistent, standards based usage statistics across publishing platforms that take into account different levels of scholarly information: the usage of data sources, the usage of individual items in the context of their resource type, the usage of individual web resources or files and the usage of resources among different repositories. Towards tackling this challenge, we will discuss the methodology of the OpenAIRE’s Usage Statistics Service for tracking, collecting, processing and analyzing usage activity from the network of OpenAIRE’s data providers. We will exploit the evaluation metrics, such as scholar items’ downloads and metadata views and how these metrics are calculated and presented using guidelines for consistent and credible usage data, such as the COUNTER Code of Practice. We will also present how the OpenAIRE’s distributed network of data providers allows aggregation by the service, of statistics about usage activity, published in several places. We will present the impact of both manifestations of the service, i.e., the complete methodology and the aggregation of usage statistics. We will discuss its significance for different stakeholders, given that for non-traditional output types (e.g. research data, research software), usage statistics are often the only indicator available, while the implementation of data citation standards lags behind. In particular, we will show how repository managers and hosting institutions can use the service as a tool to evaluate the success of their publication infrastructures. Authors and readers can exploit the popularity of an individual item among others. Finally, funding authorities can be informed in research evaluation processes, in addition to other traditional (e.g. citation counts) and alternative metrics (e.g. blogs, social activity, etc). We will discuss how the Usage Statistics Service is aware of the sensitivity of usage data and the legal constraints that should be considered regarding the EU Data Protection Directive and policies on the national level. **Who is the intended audience for your presentation/session/etc?** Publishers, funders, repository managers, research administrators
        Speaker: Dimitrios Pierrakos (ATHENA Research and Innovation Center)
        Slides
      • 107
        Metrics for Open Access Digital Monographs: the HIRMEOS Project
        Open Access has matured for journals, but its uptake in the book market is still delayed, despite the fact that books continue to be the leading publishing format for social sciences and humanities. The 30-months EU-funded project HIRMEOS (High Integration of Research Monographs in the European Open Science infrastructure) tackles the main obstacles of the full integration of five important digital platforms supporting open access monographs. The content of participating platforms will be enriched with tools that enable identification, authentication and interoperability (via DOI, ORCID, Fundref), and tools that enrich information and entity extraction (INRIA (N)ERD), the ability to annotate monographs (Hypothes.is), and gather usage and alternative metric data. This presentation will present the specific contribution of the HIRMEOS Work Package 6 for the development and implementation of metrics services on our platforms. Being able to demonstrate the uptake, usage and reusage of OA books is important for authors, publishers and funders. Collecting this type of data for books presents a set of challenges different than for articles. Authors and publishers are interested in obtaining overall usage data for both the book, and individual chapters within a book. Use of DoIs for books remain limited - so many citations do not include a DoI reference. Books and individual chapters have different DoIs. Many platforms hosting digital editions of OA titles assign their own DoIs or permanent url references to the content. Numerous different digital formats for ebooks exist and circulate. Ubiquity Press and Open Book Publishers are working together to create and populate a database of title specific usage data - aggregating usage data across multiple different platforms and formats. Drivers are being developed to query alternative hosting platforms for usage data, and an api and widget created for publishers to use query the database and present the aggregate data on their own websites. All the code and architecture created for this project will be Open, and be made available for other publishers and platforms to freely download, adopt or adapt as they wish, facilitating broader uptake, collection and presentation of this data for Open Access books. This presentation will: a. Identify the main theoretical difficulties to be addressed in collecting and aggregating this data for Open Access books, and the various solutions adopted. b. Explain the data collection and aggregation packages we are developing, and how they may be adopted by other publishers and platforms. c. Provide technical specifications of the implemented metrics services being developed, the database architecture and the Metrics API Standards used for the statistics collection agent and the OA metrics Widget. The intended audience for this presentation will be publishers seeking to collect aggregate usage data of this kind, online content platforms creating usage statistics, libraries and research agencies wishing to access and analyse usage data for the content created by their researchers.
        Speaker: Mr Javier Arias (Opeb Book Publishers)
        Slides
    • 12:30
      Lunch Break The Square, Brussels Meeting Centre

      The Square, Brussels Meeting Centre

    • EOSC engagement with target groups Copper Room (The Square, Brussels Meeting Centre)

      Copper Room

      The Square, Brussels Meeting Centre

      • 108
        Introduction
        Speakers: Enzo Capone (GÉANT), Giuseppe Fiameni (CINECA - Consorzio Interuniversitario)
        Slides
      • 109
        GÉANT
        Speaker: Enzo Capone (GÉANT)
        Slides
      • 110
        EUDAT
        Speaker: Giuseppe Fiameni (CINECA - Consorzio Interuniversitario)
        Slides
      • 111
        OpenAIRE
        Speaker: Natalia Manola (University of Athens, Greece)
      • 112
        RDA
        Speaker: Juan Bicarregui (STFC)
        Slides
      • 113
        PRACE
        Speaker: Ms Marjolein Oorsprong (Partnership for Advanced Computing in Europe (PRACE) aisbl)
        Slides
      • 114
        EGI
        Speaker: Dr Gergely Sipos (EGI.eu)
        Slides
      • 115
        EOSC-hub
        Speakers: Mr Claudio Cacciari (Cineca), Dr Gergely Sipos (EGI.eu), Sy Holsinger (EGI.eu)
        Slides
      • 116
        Panel discussion
        Slides
    • Evaluation of Research Careers fully acknowledging Open Science Practices - what needs to be done next? 211 & 212 (The Square, Brussels Meeting Centre)

      211 & 212

      The Square, Brussels Meeting Centre

      Recently, the European Commission's Working Group on Rewards under Open Science published the report “Evaluation of Research Careers fully acknowledging Open Science Practices”. Noting that “exclusive use of bibliometric parameters as proxies for excellence in assessment (...) does not facilitate Open Science”, the report concludes that “a more comprehensive recognition and reward system incorporating Open Science must become part of the recruitment criteria, career progression and grant assessment procedures...”

      The report includes a useful matrix with evaluation criteria for assessing Open Science activities and recommends that “Open Science activity by researchers should become a cross cutting theme in all of the Work Programmes of Horizon 2020 and, most importantly, in the future Framework Programme, FP9.”

      However, rewards and incentives for researchers practicing Open Science are needed now, so that researchers who currently practise Open Science do not get discouraged from doing so, and researchers who are hesitant about it, feel encouraged to engage. Therefore, to promote and accelerate cultural change within the research community, the suggestions described in the European Commission’s report should be put into practice as soon as possible.

      But what needs to be done for this to happen? What should be the goals and actions of the different stakeholders (e.g. research institutions, governments and funding bodies, publishers, and principal investigators)? What would be the most effective methods to engage them? And should the progress of the different stakeholders towards recognising Open Science practices be evaluated?

      This will be an interactive session, during which the participants will work together to create roadmaps aimed at the different stakeholders. The roadmaps will propose effective methods for engaging these stakeholders, goals for successful embedding of Open Science practice in the evaluation of research careers, and metrics for evaluating the progress of the different stakeholders towards rewarding Open Science activities. The session will start with a short presentation to provide context and set the scene, and it will conclude with a discussion of how to disseminate its results and take the work forward. The outcomes of the participants work will be shared publicly. The authors and contributors to the EC report have been made aware of this workshop and will be informed of its results. The outcomes of this session may thus influence further steps taken by the European Commission on this topic.

      Conveners: Dr Maria Cruz (Delft University of Technology), Marta Teperek (U Delft)
      slides
    • How to make EOSC services FAIR? Experience and challenges 214 & 216

      214 & 216

      The Square Meeting Centre

      Mont des Arts street, no. 1000 Brussel, Belgium

      Digital scientific data, tools, workflows and services are becoming available at increased speed and unprecedented scale. Unfortunately, a large segment of these digital objects remains unnoticed, un-accessed or un-used beyond their producer team, limiting our abilities of extracting maximum benefit and knowledge from these research investments.
      The F.A.I.R. (Findable, Accessible, Interoperable, Reusable) principle was first introduced in a workshop held in Leiden in 2014, where a group of like-minded academic and private stakeholders met to discuss ways to overcome obstacles in data discovery and reuse. While various initiatives are already active defining how FAIR principles can be implemented for research data, more work is needed to understand how services can be made FAIR. Reproducibility of science cannot be achieved without FAIR data processing services.

      This session aims to introduce initiatives that are tackling this problem space and will offer a forum to discuss future work needed to offer a platform for the development and operation of FAIR services. This session is intended for prospective EOSC service providers, technology providers and software providers from research infrastructures, research projects and scientific collaborations.

      The presentations and discussion of the session aim at answering the following questions:
      1. What makes a service FAIR?
      2. How do different initiatives, tools, protocols, policies and processes support FAIR service providers?
      3. How should FAIR-compliant services be identified, certified and shared within EOSC?
      4. How should the EOSC community coherently support the developers and operators of FAIR services?

      Convener: Diego Scardaci (EGI.eu)
      • 117
        Introduction to F.A.I.R.
        Speaker: Michel Dumontier (Maastricht University)
        Slides
      • 118
        eInfraCentral Catalogue (covering Findable)
        Speaker: Jorge Sanchez (JNP)
        Slides
      • 119
        EGI Marketplace (covering Findable+Accessible)
        Speaker: Dobrzańska Roksana (CYFRONET)
        Slides
      • 120
        Service management and rules of engagement with service providers in EOSC (covering ‘Accessible+Interoperable’)
        Speaker: Simone Sacchi
        Slides
      • 121
        Turning applications into reusable services - The EGI Applications On Demand Service experience (covering Interoperable+Reusable)
        Speaker: Dr Giuseppe La Rocca (EGI.eu)
        Slides
      • 122
        Panel discussion with the speakers and the audience
    • Text and data mining for open science 213 & 215 (The Square, Brussels Meeting Centre)

      213 & 215

      The Square, Brussels Meeting Centre

      The proposed session discusses ways of supporting the EOSC vision by fostering
      collaboration between infrastructures and bringing into the spotlight Text and Data Mining (TDM) as a valuable research instrument opening up new highways in (multi-/cross-) disciplinary research.
      The session is structured in two parts.
      The first part consists of presentations that introduce the topic of TDM and give an overview of the services OpenMinTeD offers to research communities, the technical and legal barriers it aims to overcome and the solutions it has adopted.
      The second part is an interactive discussion with a panel of experts on Open Science, Infrastructures and Text Data Mining.

      Convener: Natalia Manola (University of Athens, Greece)
      • 123
        Overview of OpenMinTeD
        Speaker: Natalia Manola (University of Athens, Greece)
      • 124
        Use case I – Agriculture & Biodiversity: Text-mining for agriculture & biodiversity
        Textual data is one of the main source of knowledge in agriculture and biodiversity domains. It is incredibly underexploited compared to other data. We will present applications on two major issues, food microbiology and wheat selection deployed on two infrastructures of IFB (French Institute for Bioinformatics) and available as web applications. They reuse a same OpenMinTeD semantic content analysis workflow easily adapted to the tasks by machine learning and ontology use. Both applications integrate various data (e.g. experiments, observations).
        Speaker: Claire Nédellec (INRA)
        Slides
      • 125
        Use case II – Life Sciences: Harnessing text-mining for biophysically-detailed brain modeling
        This short presentation illustrates the kind of modelling that is being done in the context of the Blue Brain Project. It explains our need to populate, from the literature, values for a large amount of modelling parameters. Finally, it describes a manual literature curation framework that has been set up to tackle this need and how OpenMinTeD can be (and started to be, in context of LS-B) instrumental in harnessing text-mining to speed-up this literature curation work.
        Speaker: Christian O'Reilly (EPFL)
        Slides
      • 126
        Use case III – Social Sciences: Mining data references from social science publications to enhance information discovery and linking
        Research in the Social Sciences is usually based on survey data which consist of a number of single research questions (so-called variables). However, due to a lack of standards for data citations a reliable identification of variable references in scientific publications is often difficult. We present a work-in-progress study that seeks to provide a solution to the variable detection task based on supervised machine learning algorithms.
        Speaker: Peter Mutschke (Gesis)
        Slides
      • 127
        The challenges and opportunities of OpenMinTeD: a legal perspective
        Text and Data Mining represents the future of research in many ways. Yet, it is surrounded by a world of legal barriers, from copyright law to licensing agreements, that impede its full development. OpenMinTeD aims at overcoming such barriers and helping researchers achieve the highest potential of TDM.
        Speaker: Giulia Dore (CREATe)
        Slides
      • 128
        Panel discussion
        A panel group consisting of key persons involved in designing and/or running related infrastructures will exchange ideas and views on the role OpenMinTeD can play in the Open Science paradigm. They are expected to bring in the discussion different perspectives, presenting how they see their collaboration with OpenMinTeD: for instance, if and how they could deploy OpenMinTeD for the objectives of their infrastructure, further requirements not yet addressed by OpenMinTeD, areas they can identify where their expertise or resources could be of benefit to OpenMinTeD. Topics to be discussed include but are not limited to: exchanging resources and sharing research results, working on and promoting common interoperability standards, raising awareness and training activities on TDM, Open Access, legal policies etc. The audience will be asked to join in the discussion and offer their own insights. Panelists: Franciska de Jong (CLARIN), Paolo Manghi (OpenAIRE), Ron Dekker (CESSDA)
    • Coffee Break
    • Closing Plenary Cooper Room (The Square, Brussels Meeting Centre)

      Cooper Room

      The Square, Brussels Meeting Centre

      Convener: Franciska Jong, de (CLARIN ERIC)