EGI Conference 2019

Europe/Amsterdam
WCW Congress Centre

WCW Congress Centre

Science Park 123 1098 XG Amsterdam
Tiziana Ferrari (EGI.eu)
Description

The EGI Conference 2019 took place in Amsterdam, 6-8 May 2019, as a forum for the EGI Community to discuss the state of the art of the EGI Federation, future and emerging trends, requirements and experiences: national, local and at the level of the data centre.

The event was also an opportunity to celebrate the 15th anniversary of Operations of what is now the EGI Federation. Read the special issue of the EGI Newsletter.

The programme was a mix of topical sessions, workshops and space for debate on topics related to the EGI Federation.

We thank all participants for the success of the event!

Support
    • Welcome coffee and registration
    • Opening Plenary Turing

      Turing

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam
      Convener: Arjen van Rijn (NIKHEF)
      slides
      • 1
        Welcome and introduction Turing

        Turing

        WCW Congress Centre

        Science Park 123 1098 XG Amsterdam
        Speaker: Arjen van Rijn (NIKHEF)
        Slides
      • 2
        The EGI Federation is 15! The role and challenges of computing in data-driven science Turing

        Turing

        WCW Congress Centre

        Science Park 123 1098 XG Amsterdam
        Speaker: Dr Tiziana Ferrari (EGI.eu)
        Slides
      • 3
        Evolving distributed computing for the LHC Turing

        Turing

        WCW Congress Centre

        Science Park 123 1098 XG Amsterdam
        In this talk I will look back at the successes of the global distributed computing environment for LHC, and some of the lessons learned. In looking forward to the High Luminosity upgrade of the LHC, where we anticipate data volumes on the order of several Exabytes per year, there are a number of ongoing R&D project investigating how the system will evolve to manage the overall capital and operational cost, whilst retaining the key attributes of a globally federated and collaborative data and computing infrastructure. I will describe some of the most important of these activities, and the potential synergies with other large scale and international science projects.
        Speaker: Ian Bird (CERN)
        Slides
      • 4
        e-Infrastructures in the European Open Science Cloud Era Turing

        Turing

        WCW Congress Centre

        Science Park 123 1098 XG Amsterdam
        Presentation by the new Head of the DG-CONNECT Infrastructures Unit
        Speaker: Andreas Veiskpak (Head of Unit e-Infrastructures, DG-CONNECT)
        Slides
    • Coffee break
    • Implementations of AAI VK1,2 SURFsara

      VK1,2 SURFsara

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      This session includes contributions focusing on:
      - Use cases and experiences from the integration of community AAIs with the EGI federated AAI infrastructure (from the perspective of VREs, scientific gateways, data analytic frameworks, analytics services, and data exploitation platforms);
      - EGI and (other) e-infrastructure service providers requirements and experience in integrating their own existing AAI infrastructure with the EGI federated AAI;
      - The latest technical advancements of federated AAI solutions, related use cases and technical developments aiming at advancing the EGI federated AAI infrastructure.

      Convener: Mr Nicolas Liampotis (GRNET)
      • 5
        Integration of OPENCoastS in EGI Check-in
        OPENCoastS is a thematic service in the scope of EOSC-hub project. This service builds on-demand circulation forecast systems for user-selected sections of the coast and maintains them running operationally for the time frame defined by the user. In this presentation we describe the process and experience of integrating OPENCoastS with the EGI Checkin service. The user registration and authentication on the OPENCoastS service can be done in two ways: through the EGI Checkin Federated Identity, based on OpenID protocol, or by direct registration. In either case a new user is registered in the OPENCoastS backend database. The OPENCoastS frontend is a Python/Django application, thus the django-auth-oidc package was used to support OpenID Connect authentication. The process to enable OPENCoastS as service provider in the EGI Check-in followed the description in [1]. The first step was to register the service [2] in the test instance of EGI Check-in. Afterwards, the federated IdP test instance can be enabled in the OPENCoastS service for tests. At this point, users that are authenticated through the EGI Check-in are able to test the registration and request access to the service. During this phase it was necessary to define a set of policies (profiles, roles and priorities) and implement them at the level of OPENCoastS database backend. Upon successful testing, the second phase was to request EGI AAI to move the service from the testing environment to production: https://aai.egi.eu/oidc/. In this step, the registered ClientID and Secret of the SP was preserved. Similarly to OPENCoastS, the endpoint URL of the federated identity service, was changed to the production one. In summary, OPENCoastS was successfully integrated in the EGI Check-in service, in a transparent way. Overall the process was not overly complex either from the operational as well as from the programatically point of view. Choosing OpenID as the authentication protocol did simplify the process when compared with the SAML protocol. [1] AAI guide for SPs: https://wiki.egi.eu/wiki/AAI_guide_for_SPs [2] OPENCoastS service: https://opencoasts.ncg.ingrid.pt
        Speaker: Mario David (LIP Lisbon)
        Slides
      • 6
        AARC2 SA1: Pilots on community-driven use-cases and on infrastructure AAIs integration
        The AARC2 Service Activity 1 Pilots (SA1) demonstrated the feasibility of deploying Authentication and Authorisation Infrastructures (AAI) for research communities and e-infrastructures that fit the overarching AAI model defined by the AARC Blueprint Architecture (BPA). To this end, this activity demonstrated through (pre-)production pilots that: - The AARC BPA and the related policy documents can be instantiated to fit research communities’ requirements, deployed and operated in production environments. - Communities are enabled to design and choose an e-infrastructure provider (or more) that can deliver AAI services compliant with the AARC BPA or operate the AAI by themselves. - User/group information can be retrieved from distributed group managements and attribute providers. This information in combination with the affiliation that is provided by the user Identity Provider is used for authorisation purposes. To achieve this, several research communities were brought into the project to closely work together with the AARC BPA experts in order to design and develop their own AAI. The communities who were part of AARC2 are: CORBEL, CTA, DARIAH, EISCAT_3D, EPOS, LifeWatch, HelixNebula, Ligo Scientific Collaboration (LSC) and WLCG. The e-infrastructure providers who were part of AARC2 are: EGI, EUDAT, GÉANT and PRACE. The AARC2 pilots were driven by three main use cases: - Research and/or e-infrastructures who need an AAI (including an IdP/SP proxy) to enable federated access to their (Web and non-Web) services. The AARC BPA fits these requirements; SA1 supports these communities to deploy their AAI in the most effective and interoperable way. - Research communities that require access to services offered by different research or e-Infrastructures and wish to use their existing credentials. - Validating results from Joint Research Activity 1 (JRA1) and Networking Activity 3 (NA3) in a (pre-)production environment. The approach used by the pilot team started with elaborate interviews with the research collaborations to review the use-cases, scope and plan the pilots. This led to the next 'implementation' phase in which either the research communities themselves or representatives of the e-infrastructures, with support by the pilot team, started implementing the proposed architecture according to their use case. With the feedback from members of the community, lessons learned and the creation of manuals, SA1 closes a pilot cycle. For most of the pilots, the sustainability model is already built in, since the communities had an active role in building their own AAI with the support of the AARC2 team. During this session, we will briefly present the approach used by the AARC2 pilot team to design and implement a pilot infrastructure according to the AARC BPA. We will also give an overview of the pilots held in AARC2 and the results that came out of it. With this presentation, we hope to inspire other communities and e-infrastructure providers to ensure their infrastructures are in line with the recommendations by AARC to increase interoperability and contribute to improving research.
        Speaker: Arnout Terpstra (SURFnet)
        Slides
      • 7
        Information Security 3: Who you gonna call?
        Have you ever wondered why the security team makes so much fuss? Why can’t I just get on with my work? Why do I have to urgently patch my services? I am involved in Open Science, I don’t need security! The EGI CSIRT hears such statements all the time. During these five short talks (one per conference track) at the EGI Conference 2019 we will explain all! Our aim is to make the need for “security” clearer and to explain why the EGI CSIRT does what it does. In each of the 5 talks, we will share an amusing but instructive “War story” or two, relevant to the particular track, demonstrating the problems that can occur when security breaks down. Services may cease to be available or data may be lost or corrupted. We will follow this with details of our security controls aimed at protecting against such an event happening again. Cybersecurity attacks are an ever-growing problem and we must act to both reduce the security risk and to handle security incidents when they inevitably happen. There is no one way of preventing security incidents, but a range of security controls can help reduce their likelihood. Commercial cloud infrastructures sell compute and storage. Whether the cycles are used by the paying customer, or if the service the user runs on the infrastructure is available is of secondary interest, and the responsibility is shifted to the customer. In e-Infrastructures like EGI, we have more control over who can access the infrastructure and their allowed actions. The emphasis is to support users in making sure the infrastructure they use or the services they run are indeed used for the intended purpose. The first line of defence is policy, which states what the various parties which interact with the infrastructure can and cannot do. The choice of technology used on the infrastructure is also important, to ensure it does not have obvious security problems, can be configured to comply with security policies and is under security maintenance. It is important that any software vulnerabilities discovered are fixed by the software provider in a timely manner, and that patches are deployed appropriately. Data centres that host the distributed infrastructure should be managed and configured securely. We deploy monitoring to ensure that they, for example, are not running software known to have serious security problems. Another EGI CSIRT activity is to assess incident response capabilities, via security exercises, known as Security Service Challenges. EGI services should provide sufficient traceability of user actions as well as interfaces to their systems that offer methods needed to contain an incident, e.g. the suspension of credentials found in activities violating security policies. Fundamental to our approach is the performance of regular security risk assessments, where threats are identified and the likelihood and impact of these occurring are assessed. The results are used to identify places where additional effort is required to mitigate the important risks.
        Speaker: David Groep (NIKHEF)
        Slides
      • 8
        De-provisioning - necessity even in proxy IdP/SP architecture
        Most of the current AAI infrastructures are aligned with AARC Blueprint Architecture model, where the most distinct component is the authentication proxy. Even though the proxy solves most of the issues for registering services and enabling users to access them, there is still a significant group of services with additional requirements on access control. Using the proxy, services obtain informations about a user only when the user is signing in. That is not sufficient for services which need to know their users upfront or for the services which need to know when a user is no longer authorised to use the service, so the service might de-provision that user and properly follow GDPR requirements In this presentation, we will present possible solutions for provisioning and de-provisioning identity information which are aligned with AARC Blueprint Architecture and show how to use them to enhance capabilities provided by the proxy. In addition to that, we will explain which of these models might be used in EGI AAI with EGI Check-in service.
        Speaker: Slavek Licehammer (CESNET)
        Slides
      • 9
        The DARIAH AAI
        Introduction The DARIAH research infrastructure offers the DARIAH AAI as one of the core technical services for researchers in arts and humanities. It enables researchers to log in to various DARIAH services, by either using their own campus account or an account registered at the DARIAH homeless IDP. in any case the DARIAH AAI adds information, such as group memberships specific to the DARIAH community as well as approval of general and optionally service specific terms of use, which can be used by services for authorisation decisions. Version 1 of the DARIAH AAI has been in production for multiple years and required every service to implement several details by themselves, e.g. connection to eduGAIN, attribute query to DARIAH Identity Provider for the additional attributes, validation of policy attributes and blocking and redirecting the user to the DARIAH self service portal if any of the information was missing or out of date. Integration of a SP-IDP-proxy based on the AARC BPA In order to improve these limitations, while being in line with the Blueprint Architecture (BPA) by AARC and therefore allow interoperability with other infrastructures, we decided to implement the DARIAH AAI version 2 as part of an AARC2 pilot in late 2017. The scope of this pilot was twofold. Firstly, DARIAH implemented an SP-IDP-proxy based on Shibboleth software, integrated the components into the production AAI and adopted all relevant AARC guidelines. Secondly, a pilot to connect the new DARIAH AAI proxy with EGI in order to allow DARIAH researchers to use EGI services, such as the VM dashboard, was agreed upon. The implementation of the proxy was completed in mid-2018 and implemented in the production service. Since then all DARIAH services have been moved behind the proxy. As the architecture was designed with backwards compatibility in mind, the transition process did not create any major issues. Using the proxy to connect to federated AAI is now much simpler for service operators, and thus a number of additional services could be connected to the DARIAH AAI. For the second part of the pilot we've successfully connected the DARIAH proxy with the development instance of EGI check-in. This included attribute and entitlement mapping between DARIAH and EGI, as well as on the fly user provisioning within EGI. For this a number of plugins to the existing EGI check-in infrastructure had been developed. In the presentation we will present both, the technical implementation of the DARIAH proxy and give an overview of the interoperability endeavour with EGI from the point of view of DARIAH. Furthermore we'll present our experience with the migration process and duscuss future work.
        Speaker: Peter Gietz (DAASI International / DARIAH)
        Slides
      • 10
        Native OpenID Connect Implementation for OpenStack Clouds
        The EGI Federated Cloud, continuing with the Grid AAI, based its initial authentication and authorization mechanisms on the usage of X.509 certificates and VOMS proxies. Although these technologies have made possible the initial usage and movement into production of the Federated Cloud as an Infrastructure as a Service cloud, it has also been shown to be an obstacle for the integration of additional components, such as Platform and Software as a Service components or simply web portals. Moreover, it is also perceived as a cumbersome authentication mechanism for external users willing to adopt the EGI Federated Cloud that is not used with the X.509 and VOMS infrastructures. Nowadays, EGI.eu is transitioning its Authentication and Authorization infrastructure from X.509 certificates and proxies towards the use of the EGI Check-In and the OpenID Connect standard. The most widely used Cloud Management Framework in the EGI Federated Cloud is OpenStack, an open source cloud software system whose development is community driven. The Identity component of the OpenStack cloud distribution (code named Keystone) is a REST service that leverages the Apache HTTP server and a 3rd party module named “mod_auth_openid” to provide OpenID Connect authentication to an OpenStack Cloud. Due to the current status of these components, the OIDC standard is not purely implemented and this makes impossible to configure two different providers at a single resource center to be used from command line tools. This project is currently implementing a keystone plugin to enable Open ID configurations in a standard-manner, which will also make possible to consume Oauth 2.0 tokens and make requests to the corresponding Oauth 2.0 introspection endpoints. Furthermore, it will solve the limitation to configure only one provider at a single resource. The proposed presentation will show the approach to design and implement the plugin based on the Open ID Connect standard to work with keystone.
        Speakers: Ms Aida Palacio (IFCA-UC), Fernando Aguilar (CSIC)
        Slides
      • 11
        INAF Remote Authentication Portal
        The new Authentication and Authorization system in INAF called RAP is capable to handle multiple accounts for web application usage and it was a joint venture between Radio Astronomical Institute (IRA) and Italian Astronomical Archives (IA2) in the scope of SKA Pre-construction phase and in use at IA2. Both working groups shared skills and experiences in the field of Authentication and Authorization to allow users and client applications to access remote resources, data and services. An advancement and prototyping pilot is under development and it is composed by RAP, an Authorization module (Grouper) and a service to allow account linking authorizations sharing using Virtual Observatory recommendations. The aim was to implement a multi protocol authentication mechanism SAML2.0, OAuth2, X.509 and Self Registration, to permit the account linking (join of digital identities) and to manage groups of users. This talk will describe the current harmonization activities between existing systems and the recommendations of the IVOA. This activity will be applied also in the AENEAS- ESRC scope to validate requirements and provide an effective test bed.
        Speaker: sara bertocco (INAF)
        Slides
    • Jupyterhub Deployment - hands-on training Euler

      Euler

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      Jupyter provides a powerful environment for expressing research ideas as notebooks, where code, test and visualizations are easily combined together on an interactive web-frontend. JupyterHub allows to deploy a multi-user service where users can store and run their own notebooks without the need of installing anything on their computers. This is the technology behind the EGI Notebooks service and other similar Jupyter-based services for research.

      In this training we will demonstrate how to deploy a JupyterHub instance for your users on top of Kubernetes and explore some of the possible customisations that can improve the service towards your users like integration with authentication services or with external storage systems. After this training, the attendees will be able to deploy their own instance of JupyterHub on their facilities.

      Target audience: Resource Center/e-Infrastructure operators willing to provide Jupyter environment for their users.

      Pre-requisites: basic knowledge of command-line interface on Linux.

      Convener: Dr Enol Fernandez (EGI.eu)
      feedback form
      slides
    • Opening Plenary: Community presentations Turing

      Turing

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam
      Convener: Dr Lukasz Dutka (CYFRONET)
      • 12
        Structural biology in the clouds: Past, present and future
        Speaker: Alexandre Bonvin (eNMR/WeNMR (via Dutch NGI))
        Slides
      • 13
        BIOMED, VIP and the long tail of science: achievements and lessons learnt
        Speaker: Sorina POP (CNRS)
        Slides
      • 14
        The Datafication paradigm to face the Global Changes of our Planet
        Speaker: Stefano Nativi (CNR)
      • 15
        Advanced VIRGO offline processing and data analysis
        Speaker: Sarah Caudill (NIKHEF)
      • 16
        Panel discussion
        Speaker: Arjen van Rijn (NIKHEF)
    • Drinks and hapjes Oerknal Cafe (Universum building)

      Oerknal Cafe (Universum building)

      We invite all attendants to join us for drinks and Dutch snacks at the Oerknal Cafe (just across the road in the Universum building)

    • AAI Discussion and Roadmap Turing

      Turing

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      This session is a working meeting for discussing the evolution of the EGI AAI Check-in service and its roadmap based on recent technical developments, as well as related use cases and experiences in the areas of federated identity and access management.

      Convener: Mr Nicolas Liampotis (GRNET)
      slides
    • Accelerated Computing VK1,2 SURFsara

      VK1,2 SURFsara

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam
      slides
      • 17
        Accelerators in the Cloud: users and resource providers perspective.
        Speaker: Dr Alvaro Lopez Garcia (CSIC)
      • 18
        Accelarated Computing in the Cloud.
        Speaker: Viet Tran (IISAS)
      • 19
        Orchestrating containers and GPU accelerated resources
        Speakers: Alessandro Costantini (INFN), Marica Antonacci (INFN)
        Slides
      • 20
        udocker support for accelerators
        Speaker: Mario David (LIP)
        Slides
      • 21
        Panel of discussion
    • Service Security Challenge 2019 - Forensics and debrief Euler

      Euler

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      The Security Workshop at the EGI Conference 2019 will address aspects of the recent Service Security Challenge run against the EGI infrastructure, SSC-19.03. The intended audience for the workshop includes system administrators and security contacts, as well as FedCloud users operating services connected to the internet.

      The participants will get an introduction to the basic forensic techniques needed to successfully respond to the simulated attack
      mounted during SSC-19.03. The attack was designed to allow the responders to find a set of artifacts by applying a range of forensic techniques of increasing complexity.

      In the hands-on session, the participants will be provided with a VM infected with the 'malware' used in SSC-19.03. They will then be guided through the methods necessary to solve the challenges built into the simulated attack.

      An additional introductory session would give an overview of the EGI CSIRT procedures and background to the development of technology used to run SSC-19.03

      Convener: Dr Sven Gabriel (NIKHEF)
      • 22
        Service Security Challenge 2019: what we did and why
        Speaker: Dr Sven Gabriel (NIKHEF)
      • 23
        Intro to forensics
        Speaker: Daniel Kouril (CESNET)
        Slides
    • Coffee break
    • Jupyter Notebooks Turing

      Turing

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      EGI Notebooks is an environment based on Jupyter and the EGI cloud service that offers a browser-based, scalable tool for interactive data analysis. Jupyter provides users with notebooks where they can combine text, mathematics, computations and rich media output.
      This session aims at building an open community of users and providers around the Jupyter ecosystem. First we will present first the current status the EGI Notebooks service and those similar initiatives from other EGI partners. Second, user communities will have the opportunity to bring their specific requirements and needs for Jupyter-based services. Third an open discussion will focus on defining a roadmap for the EGI Notebooks and related to better serve the needs from the community.

      Convener: Dr Enol Fernandez (EGI.eu)
      • 24
        The EGI Notebooks experience
        Slides
      • 25
        Data analysis using Jupyter Notebooks for Open Science
        The Jupyter Notebook and the Jupyter ecosystem provide a computational and data research and exploration environment, with great potential for open science, FAIR data and the European Open Science Cloud (EOSC). Research facilities, such as those using Photons and Neutrons for imaging [1] have started to use these capabilities to support the data analysis for their users. Increasingly, the recorded data sets are so large that they cannot be 'taken home' by scientists after having visited the research facilities to record data. Remote data analysis is becoming more important: the data stays at the computing center of the facility, and analysis can be carried out, for example, through ssh and X forwarding. JupyterHub offers a technically attractive alternative for remote data analysis. As part of the PaNOSC project [1], we propose to use the Jupyter notebook to allow remote exploration and analysis of data sets on the EOSC. We hope to demonstrate this for data from Photon and Neutron facilities, but expect the design to be useful for other types of data sets as well. We assume that researchers or members of the wider public want to learn from a data set. Using the EOSC portal, they have found the data set, and now want to access it. They have the option to select a particular data analysis procedure from a list of available options that can be applied to the type of data set selected. The data analysis procedure is then provided as a Jupyter Notebook which the user can control and execute. In the simplest case, the pre-formulated data analysis procedure is all that the user is interested in and enables extraction of the meaning of the data immediately. However, there is the possibility to modify, extend and execute the notebook template interactively as usual for Jupyter Notebooks. A particular use case are reproducible publications based on a data set: for such a data set and publication, we propose to make available data processing commands (ideally in form of a notebook or commands that can be driven from a notebook) that reproduce figures and other key statements in the paper. The combination of the data with the analysis within the proposed framework makes the data set and published research immediately re-usable. One technical challenge in this proposed design is that the computational environment within which the notebooks execute needs to be preserved and made available on demand. Another technical challenge is that some of the data sets are so large (terabytes and upwards), that it will not be possible to move the data to the notebook server, but rather we need to move the notebook server to the data. In this presentation, we describe some examples of current use of Jupyter Notebooks, and describe our vision for interactive open science data analysis. This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 823852
        Speakers: Hans Fangohr (European XFEL), Robert Rosca (EuXFEL)
        Slides
      • 26
        Onedata and Jupyter notebooks
        Speaker: Michal Orzechowski (CYFRONET)
      • 27
        CVMFS and Jupyter Noteboks
        Speaker: Catalin Condurache (STFC)
        Slides
      • 28
        Open discussion: what's next
    • Operations Management Board VK1,2 SURFsara

      VK1,2 SURFsara

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      The April meeting of the Operations Management Board. This is a monthly or two-monthly meeting designed to be relevant for NGIs and includes topical issues relating to operations within the EGI Federation.

      Please join my meeting from your computer, tablet or smartphone.
      https://global.gotomeeting.com/join/992742429

              You can also dial in using your phone.
                              Netherlands: +31 202 251 017
      
                              Access Code: 992-742-429
      
              More phone numbers
                              Austria: +43 7 2081 5427            
                              Belgium: +32 28 93 7018            
                              Canada: +1 (647) 497-9391            
                              Denmark: +45 32 72 03 82            
                              Finland: +358 923 17 0568            
                              France: +33 170 950 594            
                              Germany: +49 691 7489 928            
                              Ireland: +353 15 360 728            
                              Italy: +39 0 230 57 81 42            
                              Norway: +47 21 93 37 51            
                              Spain: +34 932 75 2004            
                              Sweden: +46 853 527 827            
                              Switzerland: +41 435 5015 61            
                              United Kingdom: +44 330 221 0088
      
                              New to GoToMeeting? Get the app now and be ready when your first meeting starts: https://global.gotomeeting.com/install/992742429
      
      Convener: Matthew Viljoen (EGI.eu)
      • 29
        Introduction
        Speaker: Matthew Viljoen (EGI.eu)
        Slides
      • 30
        Security update
        Speaker: Vincent Brillault (CERN)
        Slides
      • 31
        GOCDB and its development roadmap
        Speaker: George Ryall (STFC)
        Slides
      • 32
        Cloud badging
        Speaker: Alessandro Paolini (EGI.eu)
        Slides
      • 33
        Discussion and SRM/DPM
        Speaker: Matthew Viljoen (EGI.eu)
    • Service Security Challenge 2019 - Hands-on Training Euler

      Euler

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      The Security Workshop at the EGI Conference 2019 will address aspects of the recent Service Security Challenge run against the EGI infrastructure, SSC-19.03. The intended audience for the workshop includes system administrators and security contacts, as well as FedCloud users operating services connected to the internet.

      The participants will get an introduction to the basic forensic techniques needed to successfully respond to the simulated attack
      mounted during SSC-19.03. The attack was designed to allow the responders to find a set of artifacts by applying a range of forensic techniques of increasing complexity.

      In the hands-on session, the participants will be provided with a VM infected with the 'malware' used in SSC-19.03. They will then be guided through the methods necessary to solve the challenges built into the simulated attack.

      An additional introductory session would give an overview of the EGI CSIRT procedures and background to the development of technology used to run SSC-19.03

      Convener: Dr Sven Gabriel (NIKHEF)
      • 34
        Hands-On Training, solving the SSC-19
        Slides
      • 35
        Debrief: Divide and Conquer - Distributing delivery of large Security Challenge payloads
        Speaker: Jouke Roorda (NIKHEF)
    • Lunch break
    • Federated Data Management Euler

      Euler

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      Session around Data Management, this part will focus on providing updates from developers and solutions providers.

      Convener: Baptiste Grenier (EGI.eu)
      • 36
        dCache in XDC and EOSC: Event-driven data placement and processing
        Latest technologies for molecular imaging at state of the art photon and neutron facilities produce petabytes of data, challenging established data processing strategies. DESY develops innovative, flexible and scalable storage and compute services for collaborative scientific computing on fast growing cloud infrastructures like the European Open Science Cloud. Covering the entire data life cycle from experiment control to long term archival, the particular focus on re-usability of methods and results leads to an integrated approach that bundles data, functions, workflows and publications. On the frontend, scientists increasingly build on the popular Jupyter ecosystem to compose, run and share analysis workflows. Usually, they have little control over the underlying resources and consequently discover boundaries when working with big data. On one hand, provisioning of Jupyter Servers with limited resources diminishes the user experience, while on the other hand allocating larger environments for exclusive access quickly becomes inefficient and unfeasible. Furthermore, it remains difficult to provide dedicated setups for all possible use-cases which often require special combinations of software components. We demonstrate, that a Function-as-a-Service approach to this problem leverages efficient, auto-scaling provisioning of cloud resources for scientific codes from lambda functions to highly specialized applications. Scientists develop and deploy containerized micro-services as cloud functions, while at the same time they are preserving software environments, configurations and algorithm implementations. Codes that are well-adopted and are successfully delivering services to the scientific community automatically scale up, while less frequently used functions do not allocate idle resources, but still remain operable and accessible. Functions can be called from Jupyter Notebooks and in addition integrate as a backend service for distributed cloud computing applications. In the eXtreme-DataCloud (XDC) project, DESY demonstrated that event-driven function execution as-a-service adds a flexible building block to data life-cycle management and smart data placement strategies. The peta-scale storage system dCache provides storage events which directly feed into automation on production systems. In response to incoming files, services are invoked to immediately create derived data sets, extract metadata, update data catalogues, monitoring and accounting systems. Enforcing machine actionable Data Management Plans (DMP), rule-based data management engines and file transfer systems consume storage events e.g. to create replicas of data sets with respect to data locality and Quality of Service (QoS) for storage. With a focus on metadata and data interoperability, sequentially executed functions span pipelines from photon science to domain specific analysis and simulation tools e.g. in structural biology and material sciences. Well-defined interfaces allow users to combine functions from various frameworks and programming languages. Where data connectors or format converters are needed, scientists can deliver their solutions as additional micro-services and programmable interfaces. This presentation addresses the perspective of both, users and providers, to a cloud based micro-service oriented architecture and illustrates how to share codes and continuously integrate them in automated data processing pipelines as well as interactive workflows.
        Speaker: Michael Schuh (DESY)
        Slides
      • 37
        openRDM.swiss: a national research data management service for the Swiss scientific community
        Funding agencies, journals, and academic institutions frequently require research data to be published according to the FAIR (Findable, Accessible, Interoperable and Reusable) principles. To achieve this, every step of the research process needs to be accurately documented, and data needs to be securely stored, backed up, and annotated with sufficient metadata to make it re-usable and re-producible. The use of an integrated Electronic Lab Notebook (ELN) and Laboratory Information Management System (LIMS), with data management capabilities, can help researchers towards this goal. ETH Zürich Scientific IT Services (SIS) has developed such a platform, openBIS, for over 10 years in close collaboration with ETH scientists, to whom it is provided as a service on institutional infrastructure. openBIS is open source software that can be used by any academic and non-for-profit organization; however, the implementation of a data management platform requires dedicated IT resources and skills that some research groups and institutes do not have. In order to address this, ETH SIS has recently launched the national openRDM.swiss project. openRDM.swiss offers research data management as a service to the Swiss research community, based on the openBIS platform. The service is available either as a cloud-hosted version on the SWITCHengines infrastructure, or as a self-hosted version using local infrastructure. The cloud-hosted version, with optional JupyterHub integration for data analysis, will be available via the recently launched SWITCHhub, a national marketplace for digital solutions tailored to research. In addition, openRDM.swiss includes training activities so that researchers can successfully adopt the new service in their laboratories. Finally, the project plans to improve interoperability with other data management and publication services, in particular research data repositories including the ETH Research Collection and Zenodo.
        Speaker: Dr Alex Upton (ETH ZURICH)
        Slides
      • 38
        Onedata
        Speaker: Michal Orzechowski (CYFRONET)
        Slides
      • 39
        Evolution of data management and data access in scientific computing
        Speaker: Xavier Espinal (CERN)
        Slides
      • 40
        Update about EGI Services for Data Management
        Speaker: Baptiste Grenier (EGI.eu)
        Slides
    • Future After CREAM-CE VK1,2 SURFsara

      VK1,2 SURFsara

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      Following the recent announcement of the end of support for Computing Resource Execution And Management (CREAM) software component, the EGI and CERN Operations Team are running this session to help service providers migrate to alternative solutions. This session will look at ARC and HTCondorCE as well as the work being done to integrate HTCondorCE into EGI. It will provide the opportunity for anyone to discuss any concerns with the changes resulting from the end of support for CREAM.

      Convener: julia andreeva
      Link for remote attendance
      • 41
        Introduction and setting the scene. CREAM-CE decommissioning.
      • 42
        Plans for EGI migration readiness
        Speakers: Alessandro Paolini (INFN), Alessandro Paolini (EGI.eu)
        Slides
      • 43
        ARC Introduction
        Speaker: Balazs Konya (EMI project)
        Slides
      • 44
        ARC Site Experience
        Speaker: Catalin Condurache (STFC)
        Slides
      • 45
        HTCondor and HTCondor-CE Introduction (focussing on HEP)
        Speakers: Brian Bockelman (University of Nebraska-Lincoln), Dr Brian Bockelman
        Slides
      • 46
        APEL accounting with HTCondor
        Speaker: Stephen Jones (Liverpool University)
        Slides
      • 47
        HTCondor-CE Site Experience (Pic)
        Speaker: Jose Flix (IFAE)
        Slides
      • 48
        HTCondor-CE Site Experience (Frascati (INFN))
        Speaker: Stefano DalPra (INFN)
        Slides
    • National initiatives and engagement Turing

      Turing

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      TBA

      Conveners: Geneviève Romier (CNRS), Dr Gergely Sipos (EGI.eu)
      • 49
        France Grilles: 9 years of engagement with user communities and the long tail of science
        France Grilles Scientific Interest Group was created and labeled Research Infrastructure in 2010. France Grilles (http://www.france-grilles.fr) is represented as the French NGI by CNRS that is a founding member of the EGI consortium. France Grilles partners are the main French research institutes and the universities. « France Grilles aims at building and operating a multidisciplinary national Distributed Computing Infrastructure open to all sciences and to developing countries. » is the French NGI vision. France Grilles set up in 2011 a multidisciplinary “national VO” for the research needs of users that don’t fall into the scope of a more relevant VO. It is open to users from all disciplines, provided they work in France in the academic research or in partnership with a France Grilles member. This VO is accepted by almost all French sites. Primarily dedicated to the long tail of science, it gives access to shared services such as FG-DIRAC, FG-iRODS and FG-Cloud, the French cloud federation. The first part of the presentation will focus on this context, main services, evolutions, usages and how newcomers and user communities have been involved in the infrastructure and services. The objectives of our current and future work on VIP scientific gateway (https://www.creatis.insa-lyon.fr/vip/) will be shortly presented. France Grilles team has built during these 9 years a long experience of engaging with user communities, from large communities to the long tail of science. Dissemination, training and organisation of user community events are an important part of its work. The second part of this presentation will give an overview of the actions conducted by the team, their assessment and the relationships we build with other stakeholders such as HPC centres, business networks and projects.
        Speaker: Geneviève Romier (CNRS)
        Slides
      • 50
        Supporting Users of the Australian Nectar Research Cloud
        The Australian Nectar Research Cloud has developed and is operating a successful collaborative user support model across our federation, run on a lean operational budget. We operate a federated user support model that aligns with our broader service provision model of centrally standardised and coordinated services delivered as a national service through a partnership of separate organisations. Nectar Cloud user support encompasses the operation of a distributed helpdesk, development and provision of online and face to face training, continuous maintenance and development of the online user knowledge base (tier 0 support material), and user communication and engagement delivered through a number of different methods. This presentation will outline how this is achieved, the benefits of this model, the lessons we have learnt along the way and how we plan to improve the model further. Our guiding principles are to continuously improve the user’s experience, strive to improve user engagement, provide users with what they need, and ultimately continue to increase the uptake and use of our services.
        Speaker: Dr Paul Coddington (Australian Research Data Commons)
        Slides
      • 51
        Nuclear Medicine research environments: from EGI production to EOSC services
        Inserm is the first biomedical research institute in Europe in terms of publications. Inserm contribution to the EOSC architecture for Life and medical science is therefore essential, first to fully cover the field and second to make sure that EOSC service offer corresponds to identified needs of the community. INSERM’s contribution to the service catalogue is expressed through several INFRAEOSC call answers (at different levels of participation, WPs, tasks, development, dissemination): **EOSC-Life** for INFRAEOSC-04 (clusters), **EOSC-Pillar** for INFRAEOSC-05b (National services) and our current proposal, **BOSSEE**1 for INFRAEOSC-02 (new innovative services). At Inserm, we coordinate our participation to these cross initiatives by providing well-tooled case studies led by very active communities, such as the one described here. The **OpenDose**2 collaboration was established to generate an open and traceable reference database of dosimetric data for nuclear medicine. Producing data using a variety of Monte Carlo codes and analyzing results represents a massive challenge in terms of resources and services. The amount of data to generate requires running tens of thousands of simulations per dosimetric model, for a total computation time estimated to millions of CPU hours. The **EGI Infrastructure** was quickly identified as the best solution to run those simulations, because of the embarrassingly parallel nature of the computation and the availability of existing tools and workflows to run Gate on the grid. The first part of the presentation will focus on the data production process using GATE on the EGI Infrastructure. We will describe solutions developed by CRCT, Inserm CISI and CREATIS using existing tools such as **VIP** and **GateLab**. we will also present first production results and insights with regards to execution times and submission strategies. The second part of the presentation will present ongoing work on data analysis aspects. Once in the database, raw data has to be processed into dosimetric data in a robust and reproducible way. In this aim, CRCT will build on the Jupyter ecosystem to create a unified data analysis framework for the OpenDose project. CRCT is leading a task within the BOSSEE proposition, for which EGI.eu is also a partner. The deliverable of this task will be a demonstrator available via the **EOSC hub**, contributing to the development of innovative services for the scientific community.Together with a global overview of the BOSSEE proposal, we will describe technical aspects of our work in this context, and show how this will foster user engagement and facilitate support throughout the community with the development of EOSC services. ---------- 1 BOSSEE: *Building Open Science Services on European E-infrastructure* (call INFRAEOSC-02-2019, topic: Prototyping innovative services) 2 M. Chauvin et al. *“OpenDose: A collaborative effort to produce reference dosimetric data with Monte Carlo simulation software”*. In: Physica Medica 42 (Oct. 2017), pp. 32–33. DOI: 10.1016/j.ejmp.2017.09.081.
        Speakers: Gilles Mathieu (INSERM), Isabelle PERSEIL (INSERM)
        Slides
      • 52
        IBERGRID strategy in EOSC: expanding capacity, services and user base
        IBERGRID was born out of the Iberian Common Plan for distributed infrastructures released in 2007, but the origins can be traced back to the Portuguese and Spanish participation in joint projects since 2002. Since then, IBERGRID has been federating infrastructures from Iberian research & academic organisations mainly focused on grid, cloud computing and data processing. The IBERGRID infrastructure comprises 12 computing and data centers in Spain and Portugal. A number of replicated services guarantees integrity and resilience. The infrastructure has provided about 1 Billion processing hours since 2006 to support the HEP experiments and several user communities. This includes 22 million hours on biomedical applications and ~6 million hours on computational chemistry. In the framework of EOSC National integration, IBERGRID is coordinating the project EOSC-synergy, which will serve as an envelope to foster the integration of different types of resources and services in the Iberian peninsula, such as data repositories and thematic services in EOSC. In this presentation we aim at presenting the Iberian strategy in EOSC, and the cooperation framework we envision with other regional initiatives.
        Speakers: Dr Isabel Campos (CSIC), Joao Pina (LIP)
        Slides
      • 53
        Bulgarian National Centre for High Performance and Distributed Computing - history and perspectives
        The Bulgarian National Centre for High-performance and Distributed Computing (NCHDC) is part of the Bulgarian National Roadmap for Research Infrastructures since 2014. The Centre is the only electronic infrastructure supported by the Roadmap that is not domain specific and is open to research groups from all Bulgarian Universities and research institutes. Three leading partners (IICT-BAS, SU, TU-Sofia) provide resources, while the other partners participate in the development of services and applications. Historically, NCHDC integrates the consortia for supercomputing applications with the Bulgarian NGI. The NCHDC infrastructure is built using modern technologies and strives to ensure access for Bulgarian researchers to computational and storage resources, services and development tools. The main computational resource of the NCHDC is the supercomputer Avitohol, which entered the top500 list in 331st place during its introduction in 2015. The latest update of the National Roadmap for the period of 2018-2022 includes substantial funding for maintenance of the hardware and development and deployment of services for the infrastructure. Each of the resource providers carries out a medium-term program for substantial expansion of their computational and storage capacity, focusing on a different mix of applications. The heaviest use of the supercomputer comes from computational chemistry users and environmental modeling, while data-intensive applications come from the domains of Digital Cultural Heritage and Astronomy. The infrastructure is integrated with European and regional-level initiatives and strives to ensure compatibility and follow technological developments. In this presentation, we discuss the current status of the NCDC, its structure and its future perspectives based on the national policies and provided (and planned) funding. The synergies with European initiatives and various national programmes and other funding sources are also presented.
        Speaker: Emanouil Atanassov (IICT-BAS)
        Slides
    • Coffee break
    • Federated Data Management Euler

      Euler

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      Session around Data Management, this part will start by presentations of requirements and exploitation by some user communities and will finish by an half an hour discussions to explore collaborations and piloting activities.

      Convener: Baptiste Grenier (EGI.eu)
      • 54
        XDC use cases
        Speakers: Alessandro Costantini (INFN), Fernando Aguilar (CSIC)
        Slides
      • 55
        JRC/Copernicus
        Speaker: Mr Guido Lemoine (EC)
        notes
        Slides
      • 56
        PaNOSC use cases
        Speaker: Jamie Hall (ILL)
        Slides
      • 57
        EISCAT-3D
        Speaker: Ingemar Haggstrom (EISCAT)
        Slides
      • 58
        Exploring collaborations and piloting activities
        Speaker: Bjorn Backeberg (EGI.eu)
    • Future After CREAM-CE VK1,2 SURFsara

      VK1,2 SURFsara

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      Following the recent announcement of the end of support for Computing Resource Execution And Management (CREAM) software component, the EGI and CERN Operations Team are running this session to help service providers migrate to alternative solutions. This session will look at ARC and HTCondorCE as well as the work being done to integrate HTCondorCE into EGI. It will provide the opportunity for anyone to discuss any concerns with the changes resulting from the end of support for CREAM.

      Convener: julia andreeva
      Link for remote attendance
      • 59
        No CE solutions. Introduction.
        Speaker: Dr Maarten Litmaath (CERN)
      • 60
        No CE. VAC & VCycle
        Speaker: Andrew McNab (MANCHESTER)
      • 61
        No CE. DODAS.
        Speaker: Daniele Spiga (INFN)
        Slides
      • 62
        Pros and cons of various CE alternatives
        Speaker: Stephen Jones (Liverpool University)
        Slides
      • 63
        SIMPLE framework (Easy deployment)
        Speaker: Mr Mayank Sharma (CERN)
        Slides
      • 64
        Dirac and No CE
        Speaker: Sorina POP (CNRS)
        Slides
      • 65
        Panel discussion
    • National Initiatives and engagement - International Liaisons' meeting

      This session is a working meeting of the NGI International Liaisons, but it is open for conference attendees as well. During the session EGI User Community Support Team and the NGI Liaisons will share and discuss recent updates and experiences in the topic of engaging with and supporting new communities.

      Conveners: Geneviève Romier (CNRS), Dr Gergely Sipos (EGI.eu)
    • Cloud Platforms Euler

      Euler

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      The EGI Cloud offers a multi-cloud IaaS with Single Sign-On that allows users to easily run their research workloads on different providers with similar capabilities. Dealing with the heterogeneity of the providers may be a daunting task that can be alleviated by using Cloud Platforms that simplify and facilitate the access to the EGI Cloud by providing higher level abstractions. In this session we will feature several of these Cloud Platforms and have an open discussion for analysing how to enable these as full featured EGI services.

      Convener: Ivana Krenkova (CESNET)
      • 72
        Introduction to the Cloud Workshop
        Speaker: Dr Enol Fernandez (EGI.eu)
        Slides
      • 73
        AppDB
        Speaker: Alexander Nakos (IASA)
        Slides
      • 74
        OSCAR: Serverless Computing for Data-Processing Applications in the EGI Federated Cloud
        Serverless computing, in the shape of an event-driven functions-as-a-service (FaaS) computing model, is being widely adopted for the execution of stateless functions that can be rapidly scaled in response to certain events. However, the main services offered by the public Clouds providers, such as AWS Lambda, Microsoft Azure Functions and Google Functions do not fit the requirements of scientific applications, which typically involve resource-intensive data processing and longer executions than those supported by the aforementioned services. Still, scientific applications could benefit from the ability of being triggered in response to file uploads to a certain storage platform, so that the execution of multiple parallel invocations of the function/application would speed up the simultaneously processing of data while provisioning on-demand the required computing power to cope with the increased workloads. Enter OSCAR, an open-source platform to support serverless computing for data-processing applications. OSCAR supports the FaaS computing model for file-processing applications. It can be automatically deployed on multi-clouds thanks to the EC3 (Elastic Cloud Compute Cluster) and IM (Infrastructure Manager) open-source developments. OSCAR is provisioned on top of a Kubernetes cluster which is configured with a plugin created for the CLUES elasticity manager, in order to automatically provision additional nodes of the Kubernetes cluster to achieve two-level elasticity (elasticity for the number of containers and elasticity for the number of nodes). The following services are deployed inside the Kubernetes cluster: i) Minio, a high-performance distributed object storage server with an API compatible with Amazon S3; ii) OpenFaaS, a FaaS platform to create functions triggered via HTTP requests; iii) OSCAR UI, a web-based GUI aimed at end users to facilitate interaction with the platform. The development of OSCAR has reached the status of prototype (TRL6) and it has also been integrated with the following EGI Services: i) EGI DataHub, in order to use OneData as the source of events. This way, scientists can upload files to their OneData space and this triggers invocations of the function in order to perform parallel processing on multiple files where the output data is automatically stored back in the same space; ii) EGI Applications on Demand, for users to self-deploy their elastic OSCAR clusters on EGI Federated Cloud through the EC3 portal; iii) EGI Cloud Compute, to provision Virtual Machines, as nodes of the Kubernetes cluster, from the EGI Federated Cloud. The benefits of this platform have been assessed by integrating a use case related to Plants Classification using DEEP learning techniques that arised in the context of the DEEP Hybrid-DataCloud European project. This activity is co-funded by the EGI Strategic and Innovation Fund.
        Speakers: Mr Alfonso Pérez (Universitat Politècnica de València), Dr Germán Moltó (Universitat Politècnica de València)
      • 75
        EKaaS - Elastic Kubernetes as a Service in EGI Federated Cloud
        The use of Docker containers for packaging applications is becoming extraordinarily popular. With more 100.000 applications, Docker Hub is becoming the de-facto standard for application delivery. In the last years, Kubernetes has also become a highly convenient solution for deploying Docker containers in a multi-node backend. Kubernetes is, therefore, a suitable platform for many types of application topologies, such as microservices applications, function-as-a-service deployments, high availability services or even high-throughput computing. Kubernetes applications can be described as services embedded into Docker containers and their configuration dependencies, with convenient tools for scaling up and down, publishing endpoints or isolating traffic. Kubernetes manages the resources where the Docker containers are deployed and run. Kubernetes installation and configuration requires some system administration knowledge, especially concerning the overlay network plugins. EKaaS aims at developing a service to deploy self-managed and customised Kubernetes clusters as a service with additional capabilities to support specific hardware backends in the EGI Federated Cloud. In such aim, the proposal will provide the user with a convenient and user-friendly interface to customise, deploy and manage the Kubernetes cluster, including the integration of software management components such as Helm for managing applications on the cluster. The clusters will be self-managed thanks to CLUES, an elasticity management service that powers on and off physical or virtual resources on demand, according to the workload. Infrastructure Manager, an orchestration system already integrated into the EGI Federated Cloud infrastructure will perform the deployment of the cluster. The cluster definition will be coded into TOSCA documents, to maximise portability. We will integrate Helm to offer a catalogue of ready-to-deploy applications inside Kubernetes, to be able to deliver a fully deployed application on a customised Kubernetes cluster to the user. Finally, a convenient, user-friendly user interface will be developed and integrated into the Applications on Demand service of EGI. A set of application examples will be developed and integrated into the catalogue to address common problems arising in scientific communities. The solution addresses mainly the long-tail-of-science researchers who find it difficult to find resources for medium-scale processing or want to expose a processing service for a specific research problem. In such cases, there is a need for a moderated amount of computing resources and a simple way to manage them, as the resources and expertise on system administration are low. Finally, it also serves system administrators to ease end-users to access the resources and to reduce the misuse of resources. This activity is co-funded by the EGI Strategic and Innovation Fund.
        Speaker: Dr Ignacio Blanquer (UPVLC)
      • 76
        MiCADO - A highly customisable multi-cloud orchestration and auto-scaling framework
        Many scientific and commercial applications require access to computation, data or networking resources based on dynamically changing requirements. Users and providers both require these applications or services to dynamically adjust to fluctuations in demand and serve end-users at a required quality of service (performance, reliability, security, etc.) and at optimized cost. This may require resources of these applications or services to automatically scale up or down. The European funded, H2020 COLA (Cloud Orchestration at the Level of Application - https://project-cola.eu/) project set out to design and develop a generic framework that supports automated scalability of a large variety of applications. Learning from previous similar efforts and with the aim of reusing existing open source technologies wherever possible, COLA proposed a modular architecture called MiCADO (Microservices-based Cloud Application-level Dynamic Orchestrator - https://www.micado-scale.eu/) to provide optimized deployment and run-time orchestration for cloud applications. MiCADO is open-source (https://github.com/micado-scale) and built from well-defined building blocks implemented as microservices. This modular design supports various implementations where components can be replaced relatively easily with alternative technologies. These building blocks, both on the MiCADO Master and also on the MiCADO Worker Nodes are implemented as microservices. The current implementation uses widely applied technologies, such as Kubernetes as the Container Orchestrator, Occopus as the Cloud Orchestrator, and Prometheus as the Monitoring System. When compared to similar auto-scaling frameworks, MiCADO is distinguished by its multi-cloud support, highly customisable scaling policies, policy-driven security settings, easy Ansible-based deployment, and its intuitive dashboard. The user facing interface of MiCADO is a TOSCA-based (Topology and Orchestration Specification for Cloud Applications, an OASIS standard) Application Description Template which describes the desired container and virtual machine topology and its associated scalability and security policies. This interface has the potential to be embedded to existing GUIs, custom web interfaces or science gateways. MiCADO has been tested with several large-scale industry and research applications, and on various public (AWS, CloudSigma, MS Azure, CloudBroker) and private (OpenStack, OpenNebula) cloud resources. The two main targeted application types are cloud-based services where scalability is achieved by scaling up or down the number of containers and virtual machines based on load, performance and cost, and the execution of a large number of jobs where a particular experiment consisting of hundreds or thousands of jobs needs to be executed by a set deadline. The proposed presentation will give an overview of MiCADO, will explain its architecture and characteristics, and will demonstrate via application case-studies how EGI user communities could utilise this technology.
        Speaker: Prof. Tamas Kiss (University of Westminster)
        Slides
      • 77
        Open Q&A
    • Future technical challenges of data and compute intensive sciences Turing

      Turing

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam
      Convener: Volker Guelzow (DESY)
      • 78
        Data intensive agricultural sciences : requirements based on Aginfra+ Project and high throughput phenotyping infrastructure
        The H2020 e-ROSA has defined a three-layer architecture as federated e-infrastructure to address societal challenges of Agriculture and Food that require multi-disciplinary approaches. In that direction, the European project AGINFRA+ aims to exploit core e-infrastructures such as EGI.eu, OpenAIRE, EUDAT and D4Science, to provide a sustainable channel addressing adjacent but not fully connected user communities around Agriculture and Food. In this context, a Virtual Research Environment (VRE) has been developed for the Plant Phenotyping Research community. A VRE is a collaborative Web platform which provides different useful components to make data analysis. This VRE has been enriched with data exploration and data retrieving services in order to transparently access to multiple sources of phenotyping data. These services are based on OpenSILEX-PHIS which is an open-source information system designed for plant phenotyping experiments. Several instances of OpenSILEX - PHIS have been deployed for french phenotyping platforms on a national infrastructure. It is planned to deploy others instances on EGI infrastructures for the european partners (Emphasis ESFRI). OpenSILEX-PHIS interoperates with external resources via web services, thereby allowing data integration into other systems. The VRE is also equipped with a JupyterLab provisioned by EGI. JupyterLab is the next-generation web-based user interface for Project Jupyter allowing users to work with documents and activities such as Jupyter notebooks. EGI also provided a Galaxy server which is a scientific workflow system used by plant science community for building multi-step computational analyses facilitating data analysis persistence. That significative advance makes OpenSILEX-PHIS as a representative component of the future Food cloud and gives a basis for the requirements for an e-infrastructure supporting data intensive agri-food sciences.
        Speaker: Mr Vincent Negre (INRA)
        Slides
      • 80
        Towards a Data Integration System for European Ocean Observatories – EMSO ERIC’s Perspective
        Speaker: Ivan Rodero (EMSO ERIC)
    • Coffee break
    • Cloud: State of the Federation and Future Opportunities Euler

      Euler

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      This session will showcase the current status of research cloud federations, with a clear focus on the EGI Cloud providers. It will consist on short presentations from cloud sites (new-comers, long-running and non-EGI) highlighting their latest developments and what are the pros and cons of federation. New models and opportunities will be presented followed by an open discussion.

      Convener: Jerome Pansanel (CNRS)
      • 81
        Addressing the challenges of federation in the Nectar Research Cloud
        The Nectar Research Cloud provides a self-service OpenStack cloud for Australia’s academic researchers. Since its inception in early 2012 as a pioneering Openstack research cloud, it has grown to over 35,000 virtual CPUs now being used at any given time, with over 7,000 virtual machines being run by around 2,000 researchers and used by thousands more. It is operated as a federation across several organisations, each of which runs cloud infrastructure in one or more data centres and contributes to the operation of a distributed help desk and user support. A Nectar core services team runs the central cloud services. The Nectar cloud program has recently been merged with other Australian eResearch programs focussed on research data management and storage to create the Australian Research Data Commons (ARDC), with additional capital funding secured for the next 5 years. This presentation will give an overview of the experiences, challenges and benefits of running a federated OpenStack cloud across quite disparate organisations, and recent efforts to incorporate other universities into the federation, including universities in New Zealand. In particular we will discuss some of the tradeoffs of delivering a standardised cloud federation while supporting the requirements of some institutions to provide more customised services and service delivery. We will also discuss plans for the future of the Nectar Research Cloud under the ARDC, including exploration of interoperability with international research cloud activities.
        Speaker: Dr Paul Coddington (Australian Research Data Commons)
        Slides
      • 82
        From bare metal to the cloud: the endless journey of a scientific data center
        Nowadays it is accepted that Cloud computing is a disruptive paradigm that has been rapidly adopted by industry and government sectors due to its unique features such as reduced costs, elastic scalability, self-service provisioning, etc. This cloud advent is not something only restricted to the IT industry, and also research and education have embraced it. However, the road to the adoption of the Cloud computing model is not something that sprouted suddenly and it is worth looking at how cloud datacenters have adopted and migrated into a Cloud model [1]. Starting on 2005, virtualization (when paravirtualization and hardware assisted virtualization blasted-off) has become a commonly adopted solution within modern data centers due to its widely discussed advantages over the usage of traditional machines [2]. Virtualization is one of the key technologies that paved the road for the Cloud computing advent, as virtual machines (VMs) provided the abstraction needed for Cloud resources to be provisioned. From that initial seed, nowadays Cloud Computing providers are able to provide not only virtual machines, but also physical machines, virtual networks, etc. Taking into account this context, the natural evolution for a virtualized datacenter is to move towards a private cloud scenario, where the virtualized resources are not managed anymore manually or with in-house developed tools by the infrastructure administrators, but rather this management is leveraged to a Cloud Management Framework (CMF) such as OpenNebula, OpenStack or CloudStack. Taking this step forward means that the datacenter has evolved into a private cloud deployment, where system administrators, even if they are aware of the underlying infrastructure, can now operate its infrastructure as if they were managing their resources in any cloud provider. Moreover, managing a datacenter in a cloud-like mode opens the door to a new world of possibilities, as it makes possible to offer the infrastructure resources to a wider public. In this session we will present how the scientific datacenter at the Instituto de Física de Cantabria, a research center in Spain has evolved from a virtualized infrastructure into the Cloud model. We will discuss how our datacenter has evolved and the challenges and opportunities that we have faced.
        Speaker: Dr Alvaro Lopez Garcia (CSIC)
        Slides
      • 83
        Feedback on integration to the EGI Federated Cloud
        The High-Performance Computing Center[1] of the University of Lille provides since several years access to a Cloud Computing facility based on OpenStack. Like the Cloud Computing service hosted at IN2P3-IRES[2], it is member of FG-Cloud[3], the French NGI federated Cloud, and has planned to join the French Bioinformatics Institute Cloud[4], biosphere. In addition, this Cloud infrastructure is currently being integrated to the EGI Federated Cloud[5]. It will be the second French site to join this federation. In a first part, this talk will detailed the Cloud Computing infrastructure of the University of Lille. It will also cover the steps required to join the France Grilles Cloud. In a second part, the integration to the EGI federated Cloud will be detailed. It will be a real sharing of experience, useful for any site admin that want to join the EGI Cloud Infrastructure. To conclude, some scientific use-cases based on the usage of this Cloud will be presented. 1. http://hpc.univ-lille.fr/ 2. https://grand-est.fr 3. http://www.france-grilles.fr/catalogue-de-services/fg-cloud/ 4. https://biosphere.france-bioinformatique.fr/ 5. https://www.egi.eu/federation/egi-federated-cloud/
        Speaker: Cyrille TOULET (Université de Lille, France)
        Slides
      • 84
        CESNET status update
        Speaker: Boris Parak (CESNET)
        Slides
      • 85
        Security in a cloud environment
        Speaker: Dr David Crooks (STFC)
        Slides
      • 86
        Opportunities for EGI Service Providers through Pay-for-Use Models
        Over the last few years, there have been various implementations of pay-for-use business models of which a handful of EGI service providers have been taking advantage. This is thanks to the work originally carried out through a pay-for-use proof of concept task force that had larger participation from a number of NGIs. The objective of this presentation is to spread awareness of the EGI pay-for-use success stories, summarize the business models that can be used and outline the opportunities for how additional providers can participate. The presentation will also include open issues and suggestions for potential solutions that we would like to explore and to garner interest such as an automated price calculator and a better system for eliciting service offers from collected requirements.
        Speaker: Sy Holsinger (EGI.eu)
        Slides
      • 87
        Future of the federation: discussion
    • Grid Deployment Board VK1,2 SURFsara

      VK1,2 SURFsara

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      The Grid Deployment Board (GDB) is part of the Worldwide LHC Computing Grid Collaboration (WLCG) and acts as a forum for technical discussions and planning between the resource centres and VOs. In practice this requires background agreements with other organisations that run various parts of the infrastructure - such as grid operations, certificate authorities, organisations providing VO management, network providers, etc.
      The GDB meetings are open to all interested members of the WLCG collaboration. Other parties may be invited for specific discussions.

      Convener: Ian Collier (STFC)
      • 88
        Introduction
        Speaker: Ian Collier (STFC)
      • 89
        Benchmarking update
        Speaker: Domenico Giordano (CERN)
      • 90
        Middleware Recommendations
        Speakers: Erik Wadenstein (University of Umea), Dr Maarten Litmaath (CERN)
      • 91
        Report on Cream Migration Workshop
        Speaker: Jose Flix (IFAE)
        Slides
    • Scientific and technical updates Turing

      Turing

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      This session will provide technical and scientific updates including changes to infrastructure resources, improved scheduling efficiency of user analysis jobs, new technologies facilitating workflow development and deployment to cloud resources, data cataloguing activities, and scientific use cases leveraging distributed compute and data resources.

      Convener: Bjorn Backeberg (EGI.eu)
      • 92
        NGI_CZ Operational Experience
        NGI_CZ provides several types of computing resources to research communities: national distributed infrastructure Metacentrum based on PBS servers, cloud resources connected to FedCloud and 2 computing clusters registered as 2 sites in the GOCDB. We will cover recent changes in the prague_cesnet_lcg2 site caused by change of location and hardware updates. There are also 2 critical servers hosting VOMS servers for several VOs. We have moved one of the VOMS servers to another virtualization infrastructure and changed management rocedures for it to provide mutual independence and maximum availability of VOMS servers. We will also discuss how unfortunate are relatively frequent changes in subjects of certificates for VOMS servers and what measures we took to avoid it.
        Speaker: Alexandr Mikula (CESNET)
        Slides
      • 93
        Boosting the CMS computing efficiency for data taking at higher luminosities
        Thousands of physicists continuously analyze data collected by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) using the CMS Remote Analysis Builder (CRAB) and the CMS global pool to exploit the resources of the World LHC Computing Grid. Efficient use of such an extensive and expensive system is crucial and the previous design needs to be upgraded for the next data taking at unprecedented luminosities. Supporting varied workflows while preserving efficient resource usage poses special challenges: - scheduling of jobs in a multicore/pilot model where several single core jobs with an undefined runtime run inside pilot jobs with a fixed lifetime; - avoiding that too many concurrent reads from same storage push jobs into I/O wait mode making CPU cycles go idle; - monitor user activity to detect low-efficiency workflows and automatically provide them with advices for a smarter usage of the resources tailored on the use case. In this talk we report on two novel complementary approaches adopted in CMS to improve the scheduling efficiency of user analysis jobs: 1. job automatic splitting, 2. job automatic estimated running time tuning. They both aim at finding an appropriate value for the scheduling runtime. With the automatic splitting mechanism, an estimation of the runtime of the jobs is performed upfront so that an appropriate value can be estimated for the scheduling runtime. With the automatic time tuning mechanism instead, the scheduling runtime is dynamically modified by analyzing the real runtime of jobs after they finish. We also report on how we used the flexibility of the global computing pool to tune the amount, kind, and running locations of jobs exploiting remote access to the input data. We discuss the strategies, concepts, details, and operational experiences, highlighting the pros and cons, and we show how such efforts have helped to improve the computing efficiency in CMS.
        Speaker: Leonardo Cristella (CERN)
        Slides
      • 94
        DARE: Integrating solutions for Data-Intensive and Reproducible Science
        The DARE (Delivering Agile Research Excellence on European e-Infrastructures) project is implementing solutions to enable user-driven reproducible computations that involve complex and data-intensive methods. The project aims at providing a platform and a working environment for professionals such as Domain Experts, Computational Scientists and Research Developers. These can compose, use and validate methods that are expressed in abstract terms. DARE’s technology translates the scientists’ workflows to concrete applications that are deployed and executed on cloud resources offered by European e-infrastructures, as well as in-house institutional platforms and commercial providers. The platforms’ core services enable researchers to visualise the collected provenance data from runs of their methods for detailed diagnostics and validation. This also helps them manage long-running campaigns by reviewing results from multiple runs. They can exploit heterogeneous data effectively. To shape and evaluate the integrated solutions, the project analysed a variety of demanding use cases. Prototypes and agile co-development ensure solutions are relevant and reveal the priority issues. Our use cases are presented by two scientific communities in the framework of EPOS and IS-ENES, conducting respectively research in computational seismology and climate-impact studies. We will present how DARE enables users to develop and validate their methods within generic environments such as Jupyter notebooks associated with conceptual and evolving workspaces, or via the invocation of OGC WPS services. These different access modes will be integrated in the architecture of the systems already developed by the two target communities, interfacing with institutional data archives, incrementally accommodating complex computational scenarios and reusable workflows. We will show how DARE exploits computational facilities adopting software containerisation and infrastructure orchestration technologies (Kubernetes). These are transparently managed from the DARE API, in combination with registries describing data, data-sources and methods. Ultimately, the extensive adoption of workflows (dispel4py, CWL), methods abstraction and containerisation, allows DARE to dedicate special attention to portability and reproducibility of the scientists’ progress in different computational contexts. This enables the optimised mapping to new target-platform choices without requiring method alteration and preserving semantics, so that methods continue to do what users expect, but they do it faster or at less cost. Methods can avoid unnecessary data movement and combine steps that require different computational contexts. Validation and monitoring services are implemented on top of a provenance management system that combines and captures interactive and lineage patterns, These are expressed in W3C PROV compliant formats and represent research developers’ choices, the effects on the computational environment and the relationship between data, processes and resources, when their workflows are being executed. We will discuss how the patterns are modelled and implemented using Templates and Lineage services (Provenance Template Registry, S-ProvFlow), in order to integrate and interactively use context-rich provenance information.
        Speaker: Alessandro Spinuso (KNMI)
        Slides
      • 95
        A metadata repository (MDR) for clinical study objects
        In order to keep track with evidence generation in medicine it is necessary not only to have access to results from clinical studies in publications but also to the individual participant data as well as all related study documents (e.g. study protocol , statistical analysis plan, case report form). Data and document sharing is more and more propagated, however, the researcher is faced with a bewildering mosaic of possible source locations and access modalities. There is an urgent need to develop a central resource that can catalogue all the diverse data and documents associated with a clinical study and make that information searchable by a central web portal. In the EU H2020-funded project eXtreme DataCloud (XDC; grant agreement 777367) such a portal is currently developed under coordination of ECRIN-ERIC (European Clinical Research Infrastructure). Methods The use case description, covering goals, requirements, data sources, targeted metadata schema, data structures and user interaction and design, is provided in the XDC project description in confluence. The MDR portal will be integrated in the XDC infrastructure and will be based upon the following components: • Importing metadata from existing registries and repositories • Mapping of imported metadata to the ECRIN metadata standard • Pumping of metadata via restful-API to OneData • Providing functionality for discoverability studies and data objects by INFN (Elastcsearch) • Developing GUI by OneData Results As a first step the ECRIN metadata schema for clinical research based upon DataCite has been updated (1). So far metadata from 7 data sources have been imported (CT.gov, PubMed, WWARN, Edinburgh DataShare, BioLINCC, ZENODO, Data Dryad), covering more than 500000 records from clinical studies. The imported data sources are stored as JSON objects and in relational DB form on the test bed server at INFN, Bologna. The metadata acquired have been mapped to the ECRIN metadata schema using standard JSON templates. Currently under way are the upload of the mapped metadata into OneData and the provision of the search functionality by INFN. Conclusions All preparatory work for the MDR has been performed, integration into the XDC infrastructure has been started. It is planned to have the first demonstrator with full functionality in April 2019, ready for testing. If successful, the MDR will be introduced as data resource into the European Open Science Cloud (EOSC).
        Speaker: Mr Sergei Goryanin (ECRIN)
        Slides
      • 96
        VIP, Boutiques, CARMIN and Dirac to access distributed compute and storage resources
        The Virtual Imaging Platform (VIP) is a web portal which provides access to computing and data resources. It relies on the French national DIRAC service (https://dirac.in2p3.fr/DIRAC) for job submission. The DIRAC Workload Management Service (WMS) integrates resources provisioned by grid and cloud infrastructures but also by supercomputers, standalone computing farms or even volunteer computing systems. DIRAC also allows us to access GPU resources, which become increasingly interesting for the biomed community. For the deployment of the CAD Epilepsy application, for example, we deployed Docker images on GPU resources on the EGI Cloud. DIRAC services include both Workload and Data Management tools.The Data Management System (DMS) of DIRAC provides access to different kinds of data storage systems with virtually any access protocol existing in infrastructures supporting scientific research. The File Catalog service keeps track of all the physical copies of existing files and provides means for user defined metadata which allows efficient selection of datasets for a specific user analysis task. VIP very recently migrated from the EGI LFC (now deprecated) to DIRAC DMS. VIP implements the CARMIN (Common API for Research Medical Imaging Network) API (https://github.com/CARMIN-org/CARMIN-API) and shares this interface with several other compute platforms like CBrain. They can be accessed in a common way to explore and launch some applications, and this has greatly improved their interoperability. Furthermore a recent CARMIN improvement gives the possibility to interact with any data provider that has a usable API. This allows a CARMIN server to easily use the resources from external databases as execution inputs and to put the results directly into them. A recent use-case consisted in bringing together VIP and the Girder storage system. Girder already provides a web interface into which we integrated a module allowing users to process data stored on Girder using a VIP application. As the CARMIN API includes a data API, it is straightforward for a CARMIN server to fetch data from another one, and it is planned in VIP to also support iRODS infrastructures as data providers. VIP also support Boutiques (https://github.com/boutiques/boutiques), a project which defines a format to describe an application in a JSON file and provides several interesting associated tools. Together with CARMIN, it makes it possible to share applications between servers and, as the Boutiques descriptor usually contains a link to a container, it makes it possible to also launch them. The presentation will give a technical overview of these VIP functionalities and present a few use cases and success stories.
        Speaker: Axel Bonnet (CNRS)
        Slides
      • 97
        Using federated cloud computing services and tools to support ocean search and rescue: A use case of the Lagrangian Ocean Search Targets (LOST) application
        Finding a person (or object) lost at sea is like looking for a needle in a haystack. Improving the procedures and the efficiency of search and rescue operations will assist us with saving lives and locating valuable objects that have been lost in the ocean. There are currently a range of techniques for locating objects in the ocean, from people using their experience of the ocean to make rough estimations, to scientists using numerical model outputs to estimate the position of the object. With the idea of optimising the search and rescue operations, for both people and objects, we have developed a virtual particle tracking application called LOST (Lagrangian Ocean Search Targets), which is built upon the ​OceanParcels framework. It has been adapted to provide real-time estimates of the positions of objects based on numerical model outputs and satellite observations. It shows the pathways and location of virtual objects in the global ocean. The web-based LOST application allows users to enter the coordinates (longitude and latitude), select an object type [ranging from marine organisms, persons wearing life vests, to capsized boats] and run a real-time simulation, providing a range of analytics supporting search and rescue operations. LOST aims to target a wide spectrum of users ranging from local authorities to scientists to the general population. The main goal of LOST is to continuously improve the scientific integrity of the underlying ​science to produce analytics that are useful to users in real-life applications. A major focus is to develop a user-friendly interface that is available 24/7, that provides an easy-to-use application for non-specialists, to support their operations. The datasets required to run LOST, are distributed and slow access causes a delay in the production of analytics on the fly. In this presentation we discuss the effort and challenges in running LOST on the EGI e-Infrastructure using Docker and Kubernetes. We put forward suggestions on how to improve the user experience in order to lower the technology threshold for future users.
        Speaker: Mr Michael Hart-Davis (Nelson Mandela University, Nansen-Tutu Centre for Marine Environmental Research)
        Slides
    • Lunch break
    • Cloud Technical Roadmap Euler

      Euler

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      This session will be dedicated to define the technical roadmap of the EGI Cloud Service. Starting with a brief status update of main teams involved, the session will then open the discussion on how to shift of operations of integration tools from sites to a catch-all team and will finalise with a roadmapping exercise covering AAI, cloud-information, cloukeeper and extra services.

      Convener: Mr Miroslav Ruda (Cesnet)
      • 98
        Brief status update from teams
      • 99
        Rethinking Federated Cloud Integration Tools
        The difficulty of setting up and maintaining all required integration tools has been cited as one of the main reasons for resource providers to hesitate whether to join the Federated Cloud infrastructure or not. Until very recently, participating cloud sites have been required to set up interfaces for authentication, high-detail accounting, image management, VM management and for an information system, ad minimum. This was not only a showstopper for some, but also a source of trouble for those who actually took up the challenge and joined. Recently, responsibility for most integration services (with the exception of authentication) started to shift from resource providers to virtual organisations, who actually require (or sometimes don't) that level of service. This has several advantages. Firstly it makes it easier for resource providers to join the federation. Secondly it makes it possible for VOs to opt out of certain integration services, or choose their own alternatives. Finally, with integration tools implemented in such a way that allows them to be operated from outside the target site, it is easier for contracted operators such as the EGI Ops team, to run the integration services for their customers, should they wish to do so. Some integration tools have already been adjusted to this new paradigm, some are in the process of being transformed. This talk presents two examples, one being a new version of Cloudkeeper, an image synchronization system, and another being the GOAT (Go Accounting Tool), a freshly developed tool to extract accounting information from cloud sites. Both have been developed with the change of approach and responsibility in mind, both are ready for per-VO operation rather than per-site, and both are also examples of the new direction of releasing and deploying VO tools in the federated cloud.
        Speaker: Zdenek Sustr (CESNET)
        Slides
      • 100
        Roadmapping
    • Grid Deployment Board VK1,2 SURFsara

      VK1,2 SURFsara

      WCW Congress Centre

      Science Park 123 1098 XG Amsterdam

      The Grid Deployment Board (GDB) is part of the Worldwide LHC Computing Grid Collaboration (WLCG) and acts as a forum for technical discussions and planning between the resource centres and VOs. In practice this requires background agreements with other organisations that run various parts of the infrastructure - such as grid operations, certificate authorities, organisations providing VO management, network providers, etc.
      The GDB meetings are open to all interested members of the WLCG collaboration. Other parties may be invited for specific discussions.

      Convener: Ian Collier (STFC)
      • 101
        Bringing services in to EOSC
        Speaker: Matthew Viljoen (EGI.eu)
        Slides
      • 102
        Privacy update
        Speaker: David Kelsey (STFC)
        Slides
      • 103
        HEPiX San Diego report
        Speaker: Helge Meinhard (CERN)
      • 104
        Security Challenge debrief
        Speaker: Dr Sven Gabriel (NIKHEF)
    • Coffee break
    • Closing plenary
      Convener: Dr Tiziana Ferrari (EGI.eu)
      • 105
        Roadmap for the EGI AAI
        Speaker: Kostas Koumantaros (GRNET)
        Slides
      • 106
        Roadmap for Engagement with research communities
        Speaker: Dr Gergely Sipos (EGI.eu)
        Slides
      • 107
        Roadmap for cloud development
        Speaker: Dr Enol Fernandez (EGI.eu)
        Slides
      • 108
        EGI contributions to the European Open Science Cloud
        Speaker: Dr Tiziana Ferrari (EGI.eu)
        Slides