EGI-Engage meetings at the Community Forum 2015

Europe/Rome
Villa Romanazzi Carducci

Villa Romanazzi Carducci

Via G. Capruzzi, 326 70124 Bari Italy
Tiziana Ferrari (EGI.EU)
Description
This website contains information regarding the EGI-Engage face to face meetings to be held on 9 November 2015 in Bari, as a preamble for the EGI Community Forum 2015.

For more information on and programme of the EGI Community Forum 2015, please go to the CF2015 website.
Exhibiton Prospectus
Slides
Sponsor Prospectus
    • 09:00 18:00
      EGI-Engage face to face meetings
      • 09:00
        EGI-Engage Engagement strategies: bringing together Competence Centres, NILs, UCB and Champions 1h 30m
        This is a request for 1/2 session on Monday for a f2f Engagement meeting. The content would be based on the different activities that are included in the action plan part of the Engagement Strategy (most recent issue is in D2.1). 1 session is sufficient in case NGIs can give engagement stories talks in some other session during Tue-Fri. Otherwise 2 sessions would be needed.
        Speaker: Dr Gergely Sipos (EGI.eu)
      • 11:00
        Operations Managers Board f2f meeting 1h 30m
        The session would provide opportunity to share among NGIs current status and plans of each NGI. The goal of the session is to show current development within EGI on NGI level and identify common interest.
        Speaker: Peter Solagna (EGI Operations Manager)
      • 13:30
        EGI FedCloud F2F meeting 2h
        The EGI Federated Cloud provides access to IaaS cloud resources on a flexible environment for computing and data access fully integrated with EGI core services. This F2F will be an opportunity to discuss: - the status of the cloud infrastructure - the integration of new communities and infrastructures into the federation (WP4.3) - the status of the Engage developments (WP4.2) and to coordinate new activities around the Federated Cloud.
        Speaker: Dr Enol Fernandez (EGI.eu)
      • 16:00
        EGI-Engage e-Infrastructure commons – F2F meeting 2h
        The whole set of EGI tools composes the e-Infrastructure Commons, an ecosystem of services that constitutes the foundation layer of any distributed e-Infrastructure, which is one of the three pillars of the Open Science Commons vision. The technical development of the e-Infrastructure Commons services is user-driven to satisfy the needs of scientific communities, EGI-Engage competence centers, research infrastructures, NGIs, resource providers, technology providers and European Policy boards. Furthermore, interoperability with other e-Infrastructures and research infrastructures will be ensured. The EGI-Engage WP3 coordinates the development of the e-Infrastructure commons. This F2F meeting will be an opportunity to present the status of current activities and coordinate them, discuss about user requirements and debate about common issues.
        Speaker: Diego Scardaci (EGI.eu/INFN)
    • 13:30 15:30
      EGI Marketplace Europa

      Europa

      Villa Romanazzi Carducci

      Convener: Mr Dean Flanders (SwiNG)
    • 13:30 15:30
      Showcasing tools and services from Research Infrastructures Scuderia

      Scuderia

      Villa Romanazzi Carducci

      Convener: Dr Gergely Sipos (EGI.eu)
      • 13:30
        Promoting Grids and Clouds for the Health Science and Medical Research Community in France 20m
        Life sciences in general, and medical research in particular, have increasing needs in terms of computing infrastructures, tools and techniques: research in domains such as genomics, drug design, medical imaging or e-health cannot be undertaken without the adequate computing and data solutions. Yet generalizing the use of large scale and distributed infrastructures requires time and effort, first because of the cultural shift it implies for many researchers and teams, and second because of the heterogeneity of users’ needs and requirements. INSERM, the French National Institute for Health and Medical Research, is facing such a challenge. INSERM is the largest European medical research institution with around 300 research units and more than 1000 teams spread all over the country. As such, it represents a very wide panel of disciplines and domains, but also very different levels of expertise with regards to scientific computing and associated technologies: while a few teams have been using distributed infrastructures for many years, others are only merely aware of their existence and possible benefits. To face this challenge, INSERM has launched in 2014 a Computational Science Coordination Team (CISI) within its IT department. CISI is built as a set of competence centres on major scientific computing themes and technologies (grids, clouds, HPC, big data, parallel computing, simulation...). Building on this expertise, the team aims at addressing and matching the needs of INSERM researchers with the appropriate technical solution. One of the objectives of the team is to build and support communities around these different technical areas. One of those communities will be built around the use of grids and clouds, with the help of France Grilles, the French NGI, and in collaboration with other national institutions. The presentation will first describe CISI's organisation and missions, ranging from infrastructures and usage mapping to projects, training, expertise and support. It will then explain the organisation of knowledge transfer to enlarge grid and cloud users communities at INSERM, especially within the medical imaging, bioinformatics and e-health domains. It will also present 2 practical examples of innovative research on the edge between life sciences and computing: the deployment of a new parallelisation paradigm in a medical imagery use case using EGI infrastructure through DIRAC, and a pharmacovigilance application using iRODS and academic clouds. It will finally present INSERM's vision to empower its researchers through the use of Virtual Research Environments (VREs).
        Speaker: Gilles Mathieu (INSERM)
      • 13:50
        The EPOS e-Infrastructure for solid Earth sciences: architeture and collaborative framework 20m
        Integrating data from Solid Earth Science and providing a platform for the access to heterogeneous datasets and services over the whole Europe is a challenge that the European Plate Observing System(EPOS)is tackling.EPOS will enable innovative multidisciplinary research for a better understanding of the Earth’s physical processes that control earthquakes,volcanic eruptions,ground instability and tsunamis as well as the processes driving tectonics and Earth surface dynamics.To meet this goal,a long-term plan to facilitate integrated use of data and products as well as access to facilities from mainly distributed existing and new research infrastructures(RIs)has been designed in the EPOS Preparatory Phase(EPOS PP).In the EPOS Implementation Phase(starting in October 2015)the plan will be implemented in several dimensions: the Legal & Governance,with the creation of EPOS-ERIC and the implementation of policies for data and trans-national access;Financial,by adopting a financial plan to guarantee the long-term sustainability of infrastructure;Technical,with the implementation of Thematic services(TCS)in the several EPOS domains(e.g.seismology,satellite data,Volcanic observatory and others)and the creation of the Integrated Core Services(ICS)platform to integrate data and services.In this presentation we will deal with the technical aspects and the synergies with e-Infrastructure providers such as EGI,required to build the EPOS ICS platform.We will focus on the EPOS e-Architecture,based on the ICS integration platform and the European community specific TCS services,and its main components: a)the metadata catalogue based on the CERIF[1]standard,used to map and manage users,software,datasets,resources,included datasets and access to facilities;b)a compatibility layer to enable interoperation among ICS and the several TCSs,which includes the usage of web services or other APIs over a distributed VRE-like environment;c)the ICS-D distributed component,to provide computational and visualization capabilities to end-users;d)the implementation AAI module,to enable a user to have a single sign-on to the EPOS platform and retrieve and use resources from TCSs,and the synergies with the EGI-Engage EPOS Compentence Center pilot;e)a computational Earth Science module,where a contribution by VERCE[2]is expected;e)mechanisms to provide persistent identifiers both at ICS and TCS level,and the synergies with other European projects.The building of such complex system,which will hide to the end-user the technical and legal complexity of accessing heterogeneous data,is based on four main principles: 1.ICS-TCS co-development,2.do not reinvent the wheel,3.microservices approach,4.clear long-term technical goals but iterative short-term approach.We will discuss,in the framework of EGI,which are the synergies required with EGI and other e-Infrastructure providers,and which are the issues to be tackled in the short-mid term in order to optimize resources at European level and make the collaboration among ESFRIs,EGI and other relevant initiatives real and active.
        Speaker: Daniele Bailo (EGI.eu)
      • 14:10
        The SADE mini-project of the EGI DARIAH Competence Centre 20m
        The DARIAH Competence Centre (CC) aims to widen the usage of the e-Infrastructures for Arts and Humanities (A&H) research. The objectives of the DARIAH CC, that will run over two years are the following: (i) to strengthen the collaboration between DARIAH-EU and EGI using workflow-oriented application gateways and deploying A&H applications in the EGI federated cloud (EGI FedCloud); (ii) to increase the number of accessible e-Science services and applications for the A&H researchers and integration of existing NGI resources into EGI; (iii) to raise awareness of A&H researchers of the possible benefits (excellence research) of using e-Infrastructure and e-Science technologies in their research, creating conditions for a sustained increase of the user community coming from A&H and social sciences as well; and (iv) to widen the work started within DC-NET, INDICATE and DCH-RP projects to other A&H communities. One of the mini-projects of the DARIAH-CC, led by INFN, is SADE (Storing and Accessing DARIAH contents on EGI) whose overall goal is to create a digital repository of DARIAH contents using gLibrary, a framework developed by INFN Catania to create and manage archives of digital assets (data and metadata) on local, Grid and Cloud storage resources. Datasets for SADE will be provided by the Austrian Academy of Sciences (AAS) and they will relate to >100 years old collection on Bavarian dialects within the Austrian-Hungarian monarchy from the beginnings of German language to nowadays. Several data types will be taken into account: text, multimedia (images, audio files, etc.), URIs as well as primary collection data, interpreted data, secondary background data and geo-data with different license opportunities. The AAS datasets will be orchestrated by gLibrary and the repositories will be exposed to end-users through two channels: (i) as a (series of) portlet(s) integrated both in one of the already existing Science Gateways implemented with the Catania Science Gateway Framework and in the WS-PGRADE-based Science Gateway that will developed by the lighthouse project of the CC, and (ii) as native apps for mobile appliances based on Android and iOS operating systems and downloadable from the official App Stores. The mobile apps will be coded using a cross-platform development environment so that other mobile operating systems could be supported, if needed. Furthermore, the apps could exploit geo-localisation services available on smartphones and tablets to find “near” contents. In order to fulfill SADE requirements, the gLibrary framework is currently being completely re-engineered in order to get rid of its dependence from the AMGA metadata catalogue and in this contribution to the EGI Community Forum the new version of the platform (i.e., gLibrary 2.0) as well the status and results of the SADE mini-project will presented.
        Speaker: Giuseppe La Rocca (INFN Catania)
      • 14:30
        West-Life: Developing a VRE for Structural Biology 20m
        The focus of structural biology is shifting from single macromolecules produced by simpler prokaryotic organisms, to the macromolecular machinery of higher organisms, including systems of central relevance for human health. Structural biologists are expert in one or more techniques. They now often need to use complementary techniques in which they are less expert. Instruct supports them in using multiple experimental techniques, and visiting multiple experimental facilities, within a single project. The Protein Data Bank is a public repository for the final structure. Journals require deposition as a precondition of publication. However, metadata is often incomplete. West-Life will pilot an infrastructure for storing and processing data that supports the growing use of combined techniques. There are some technique-specific pipelines for data analysis and structure determination. Little is available in terms of automated pipelines to handle integrated datasets. Integrated management of structural biology data from different techniques is lacking altogether. West-Life will integrate the data management facilities that already exist, and enable the provision of new ones. The resulting integration will provide users with an overview of the experiments performed at the different research infrastructures visited, and links to the different data stores. It will extend existing facilities for processing this data. As processing is performed, it will automatically capture metadata reflecting the history of the project. The effort will use existing metadata standards, and integrate with them new domain-specific metadata terms. This proposal will develop application level service specific to uses cases in structural biology, enabling structural biologists to get the benefit of the generic services developed by EUDAT and the EGI.
        Speaker: Chris Morris (STFC)
      • 14:50
        BBMRI Competence Center in EGI-ENGAGE 20m
        As has been demonstrated in the previous years, providing high-quality samples and data for biomedial research is one of the key challenges the science is currently facing. BBMRI-ERIC, a European Research Infrastructure Consortium on Biobanking and Biomolecular Resources, strives to establish, operate, and further developing a pan-European distributed research infrastructure of high-quality biobanks and biomolecular resources. The talk will outline the BBMRI Competence Center (BBMRI CC) of EGI ENGAGE, focusing on processing of human omics data, which is very frequent task related to biobanking and also very sensitive because of the protection of personal data. The goal of the BBMRI CC is to enable processing of such data inside the biobanks using private cloud concept, which can be in practice implemented utilizing the EGI Federated Cloud framework. Requirements both from the data processing perspective and from the data protection perspective will be discussed. We will outline the work that is planned for the competence center and discuss its relation with other relevant projects and infrastructures (e.g., BiobankCloud, BBMRI-ERIC Common Service IT), tools from which will be used for the BBMRI CC pilot for integration into the practical workflows based on the EGI and EUDAT technologies.
        Speaker: Petr Holub (BBMRI-ERIC)
      • 15:10
        ENVRI_plus: Toward Joint Strategy and Harmonized Policies for Access to Research Infrastructures 20m
        Speaker: Ingrid Mann (EISCAT Scientific Association)
    • 13:30 15:30
      Tutorial: Introduction to the EGI Federated Cloud Federico II

      Federico II

      Villa Romanazzi Carducci

      • 13:30
        Introduction to the EGI Federated Cloud – the user perspective 2h
        This tutorial is a 3h long introductory course to the EGI Federated Cloud infrastructure from the user perspective. The course will consist of short talks and hands-on exercises using the training.egi.eu VO of the EGI Federated Cloud. During the tutorial attendees can learn the basic concepts of cloud computing, cloud federations, and gain experience in interacting with the EGI federated cloud infrastructure at the IaaS layer through its rOCCI command line interface. The course primarily targets developers of high level cloud environments (PaaS and SaaS) and scientific applications who – after this course – would become able to integrate their system with the EGI IaaS solution. The EGI Federated Cloud is a standards-based, open cloud system as well as its enabling technologies that federates institutional clouds to offer a scalable computing platform for data and/or compute driven applications and services. The EGI Federated Cloud is already deployed on more than 20 academic institutes across Europe who together offer 6000 CPU cores and 300 TB storage for researchers in academia and industry. This capacity is available for free at the point of access through IaaS, PaaS and SaaS capabilities and interfaces that are tuned towards the needs of users in research and education. The technologies that enable the cloud federation are developed and maintained by the EGI community, and are based on open standards and open source Cloud Management Frameworks. Outline of the course is (for 2x90 minutes) - Introduction to clouds, cloud federations and the EGI Federated Cloud - Application porting best practices and examples - Introduction to training infrastructure and first exercises - Exercise – compute and storage management - Introduction to contextualisation - Exercise – Contextualised compute instances - Creating your own Virtual Machine Image - Next steps - How to become a user The content will be based on similar tutorials that have been delivered to international audiences in the UK and NL.
        Speakers: Diego Scardaci (EGI.eu/INFN), Dr Enol Fernandez (EGI.eu), Dr Gergely Sipos (EGI.eu)
    • 15:30 16:00
      Coffee break
    • 16:00 18:00
      Innovating with SMEs and Industry Europa

      Europa

      Villa Romanazzi Carducci

      Convener: Sy Holsinger (EGI.eu)
    • 16:00 18:00
      Showcasing tools and services from Research Infrastructures Scuderia

      Scuderia

      Villa Romanazzi Carducci

      Convener: Dr Gergely Sipos (EGI.eu)
      • 16:00
        A FreshWater VRE for LifeWatch 20m
        The different components and a workflow platform to support a FreshWater VRE for the LifeWatch ESFRI will be presented. The VRE is based on cloud resources to support processing of data from different sources of information. A detailed analysis of the components required to monitor and model a water body (like a lake) will be presented. An overview of different related initiatives that can be integrated under this framework, and of new challenges that need to be addressed, will be also presented.
        Speakers: Fernando Aguilar (CSIC), Jesus Marco de Lucas (CSIC)
      • 16:20
        DARIAH requirements and roadmap in EGI 20m
        DARIAH, the Digital Research Infrastructure for the Arts and Humanities, is a large user community that gathers scientists across Europe from the research field of the Arts and Humanities (A&H). The aim of DARIAH is to enhance and support digitally-enabled research and teaching across the Arts and Humanities in Europe. The objective of DARIAH is to develop, maintain and operate a research infrastructure for ICT-based research practices. The DARIAH infrastructure aims to become a fully connected and effective network of tools, information, people and methodologies for investigating, exploring and supporting research across the broad spectrum of the Digital Humanities. To achieve this goal, a significant amount of effort has to be devoted to the improvement of the current infrastructure. A part of this effort is the EGI-DARIAH Competence Centre (EGI-DARIAH CC), established within the EGI-Engage Horizon2020 project. The EGI-DARIAH CC aims at bridging the gap between the DARIAH user community and the European e-Infrastructure, mainly those provided by the EGI community. To achieve this goal, EGI-DARIAH CC focuses on strengthening the collaboration between the DARIAH user community and EGI by deploying A&H applications in the EGI Federated cloud and increasing the number of e-Science services and applications, as well as raising awareness of A&H researchers about the advantages and benefits of e-Infrastructure by providing end-user support and organizing training events. Considering that the DARIAH community, as well as the general A&H research public, is very specific in their requirements and needs on e-Infrastructure, one of the first actions of the EGI DARIAH CC was to collect all relevant information about the DARIAH research requirements. The collection of the required information was conducted via a comprehensive web-based survey. The aim of this survey was to collect feedback from DARIAH end-users, application/service providers and developers on their knowledge and background on e-Infrastructure (e.g. computational and storage resources, user-support services, authentication policies, etc.), on how research data (information) are shared and accessed, about AAI requirements, what services and application researchers are using in their research and what are their characteristics, etc. Based on the inputs, a set of specific A&H services and application will be developed, such as gUSE/WS-Pgrade workflow oriented gateway, gLibrary framework for distributed information repositories and information retrieval service based on CDSTAR. Concurrently with the application development, a significant working effort is put in the education of DARIAH researchers since many of them have minor or no technical knowledge required to efficiently use various e-Infrastructure resources or new application/services that will be developed during this project. Therefore, a set of training events will be organized to demonstrate the specific applications and services developed within EGI-DARIAH CC, as well as give a general introduction on how to utilize various EGI resources, applications and services.
        Speaker: Davor Davidovic (RBI)
      • 16:40
        Progress of EISCAT_3D Competence Center 20m
        The design of the next generation incoherent scatter radar system, EISCAT_3D, opens up opportunities for physicists to explore many new research fields. On the other hand, it also introduces significant challenges in handling large-scale experimental data which will be massively generated at great speeds and volumes. This challenge is typically referred to as a big data problem and requires solutions from beyond the capabilities of conventional database technologies. The first objective of the project is to build common e-Infrastructure to meet the requirements of a big scientific data system such as EISCAT_3D data system. The work on the design specification has been looked at from a number of aspects such as: Priority Functional Components; Data Searching & Discovery; Data Access; Data Visualisation; Data Storage. There are different technologies at the different stages of the portal, such as dCache, iRods, OpenSearch, LifeRay and different forms of identifiers . We will present the ones chosen and why the suits better for operations and data from an environmental facility like EISCAT_3D. The design specification have been presented it to the EISCAT community and the feedback has been adopted in the portal development environment.
        Speaker: Ingemar Haggstrom (EISCAT)
      • 17:00
        Tsunami Wave Propagation Forward and Inverse Simulation and Scientific Gateway Building for Disaster Mitigation 20m
        Manila trench and Ryukyu trench are the two hazardous subduction zones which might cause disaster tsunami to South East Asia countries if a megathrust earthquake caused by any of the two trenches. This EGI-Engage Disaster Mitigation Competence Centre aims to develop novel approaches of real-time tsunami simulation over the Grid and Cloud by COMCOT-base fast forward tsunami wave propagation simulation. Integration with rapid and correct rupture process solutions to make the tsunami simulation as accurate as possible is the first goal. By collaborating with tsunami scientists, the workflow and computing model are defined according to the case studies conducted by the user communities, and the iCOMCOT web-based application portal has been implemented. iCOMCOT is an efficient and low-cost tsunami fast calculation system for early warning by optimized and parallelized COMCOT in order to meet the requirements of real-time simulation. Also based on the high performance COMCOT simulation by the iCOMCOT, the tsunami inverse simulation is developed to identify the best possibilities of historical tsunami sources according to the evidences at hand. Cases around Taiwan and the Philippine Sea Plate region were studied and supporting the analysis of potential tsunami sources. Based on the e-Science paradigm and big data analytics capability, the target to answer a much open question such as “which fault in what rupture process could cause over 1 meter wave height and 50 meter in-land inundation in Taiwan” could be also achieved in the future.
        Speaker: Eric Yen (AS)
      • 17:20
        User needs, tools and common policies from PARTHENOS, a E-research networking in the field of linguistic studies, humanities and cultural heritage 20m
        PARTHENOS (Pooling Activities, Resources and Tools for Heritage E-research Networking, Optimization and Synergies) is an European project funded within Horizon 2020, the EU Framework Programme for Research and Innovation. The project started in May 2015 and has a duration of 48 months. PARTHENOS aims at strengthening the cohesion of research in the broad sector of Linguistic Studies, Humanities, Cultural Heritage, History, Archaeology and related fields through a thematic cluster of European Research Infrastructures, integrating initiatives, e-infrastructures and other world-class infrastructures, and building bridges between different, although tightly, interrelated fields. The project will achieve this objective through the definition and support of common standards, the coordination of joint activities, the harmonization of policy definition and implementation, and the development of pooled services and of shared solutions to the same problems. PARTHENOS will address and provide common solutions to the definition and implementation of joint policies and solutions for the humanities and linguistic data lifecycle, taking into account the specific needs of the sector that require dedicated design, including provisions for cross-discipline data use and re-use, the implementation of common AAA (authentication, authorization, access) and data curation policies, including long-term preservation; quality criteria and data approval/ certification; IPR management, also addressing sensitive data and privacy issues; foresight studies about innovative methods for the humanities; standardization and interoperability; common tools for data-oriented services such as resource discovery, search services, quality assessment of metadata, annotation of sources; communication activities; and joint training activities. Built around the two ERICs of the sector, DARIAH and CLARIN, and involving all the relevant Integrating Activities projects, PARTHENOS will deliver guidelines, standards, methods, services and tools to be used by its partners and by all the research community.
        Speaker: Sara Di Giorgio (Central Institute for the Union Catalogue of Italian Libraries)
    • 16:00 18:00
      Tutorial: DIRAC service Sala D, Giulia Centre

      Sala D, Giulia Centre

      Villa Romanazzi Carducci

      Convener: Andrei Tsaregorodtsev (CNRS)
      • 16:00
        DIRAC service tutorial 2h
        Many large and small scientific communities are using more and more intensive computations to reach their goals. Various computing resources can be exploited by these communities making it difficult to adapt their applications for different computing infrastructures. Therefore, they need tools for seamless aggregation of different computing and storage resources in a single coherent system. The DIRAC project develops and promotes software for building distributed computing systems. Both workload and data management tools are provided as well as support for high level workflows and massive data operations. Services based on the DIRAC interware are now provided by several national grid infrastructure projects. The DIRAC4EGI service is operated by the EGI project itself. The latter service is used by multiple user communities already and more communities are evaluating it for their future work. The proposed tutorial is focused on the use of DIRAC services and aims at letting the participants learn how to start using the system, perform basic tasks of submitting jobs and manipulating data. More advanced examples will be given with the help of the Web Portal of the DIRAC4EGI service. The tutorial will also show examples of how new computing and storage resources can be connected to the system including Cloud resources from the EGI Federated Cloud project. Configuring system for the use by multiple communities with particular usage policies will be explained. As a result, the participants will have a clear idea about the service functionality, interfaces and extensions.
        Speaker: Dr Andrei Tsaregorodtsev (CNRS)
    • 16:00 18:00
      Tutorial: Dos and Don'ts for Virtual Appliance Preparation Federico II

      Federico II

      Villa Romanazzi Carducci

      Convener: Boris Parak (CESNET)
      • 16:00
        Dos and Don'ts for Virtual Appliance Preparation -- Hands-on Tutorial 2h
        With the introduction of the EGI Federated Cloud, individual users and user communities can now prepare, upload, and launch their own appliances as virtual machines in the EGI environment. This brings new possibilities, but it also places considerable burden on users preparing such appliances. This tutorial will discuss and demonstrate (hands-on) basic dos and don'ts of appliance preparation, focusing on the following topics: 1.) Operating Systems (Linux-based) 2.) Disk Image Formats 3.) Appliance Portability 4.) Contextualization 5.) Security 6.) Automation & Provisioning 7.) The EGI Application Database Attendees are encouraged to bring up real-world problems and experiences for discussion. For the hands-on parts, attendees are expected to have their own laptops with pre-installed VirtualBox [1] & Packer [2] ready. [1] https://www.virtualbox.org/wiki/Downloads [2] https://packer.io/downloads.html
        Speaker: Boris Parak (CESNET)
        Packer Installer
        VirtualBox Installer
    • 09:00 10:30
      Disaster mitigation Competence Centre Sala D, Giulia Centre

      Sala D, Giulia Centre

      Villa Romanazzi Carducci

      Convener: Eric Yen (AS)
    • 09:00 10:30
      EGI EUDAT interoperability use cases Europa

      Europa

      Villa Romanazzi Carducci

      Conveners: Dejan Vitlacil (KTH), Giuseppe Fiameni (CINECA - Consorzio Interuniversitario)
    • 09:00 10:30
      Research Infrastructures in Horizon 2020 Scuderia

      Scuderia

      Villa Romanazzi Carducci

      The aim of the H2020 session at the EGI Community Forum is to introduce services of the RICH Consortium, work of National Contact Points and to introduce new calls in Horizon 2020 Work Programme European research infrastructures (including e-Infrastructure) for years 2016- 2017. Beneficiaries (users of virtual access and coordinators of first e-infrastructures Calls in H2020) will provide participants with useful tips (preparing proposals, creation of consortium, realization phase of the project, etc.).

      Convener: Daniela Mercurio
      • 09:00
        Introduction of RICH 15m
        Research infrastructures (RI) are used by research communities but where it is relevant, they provide services even beyond research: education, public services. Research infrastructures play a key role in forming research communities and offer them the advancement of knowledge, technology and their exploitation. European union is supporting research infrastructures via Horizon 2020 programme, which put an emphasis on long-term sustainability of RI, their expanding role and impact in the innovation chain, widening participation and integrating activities. National Contact Points (NCPs) provide professional support and complex services to national research teams, research organizations and industrial enterprises to facilitate and support their integration into the H2020. By spreading awareness, giving specialist advice, and providing on-the-ground guidance, NCPs ensure that the RI programme becomes known and readily accessible to all potential applicants, irrespective of sector or discipline. To enhance cooperation and networking between these national entities, a higher quality of services, NCPs for RI are involved in H2020 project RICH (Research Infrastructures Consortium for Horizon 2020). RICH 2020, the European Network of National Contact Points (NCPs) for Research Infrastructures in Horizon 2020, facilitates transnational cooperation between NCPs, promotes the effective implementation of the RI programme, supports transnational and virtual access to RIs and highlights the opportunities offered by Research Infrastructures - at the European and international level. The aim of the H2020 session at the EGI Community Forum is to introduce services of the RICH Consortium, work of National Contact Points and to introduce new calls in Horizon 2020 Work Programme European research infrastructures (including e-Infrastructure) for years 2016- 2017. Experienced NCPs in the area of RI will be leading the session as speakers and project beneficiaries will be invited. Main speakers will introduce new RI Calls from Work Programme 2016-17, its topics, aims and conditions. Beneficiaries (users of virtual access and coordinators of first e-infrastructures Calls in H2020) will provide participants with useful tips (preparing proposals, creation of consortium, realization phase of the project, etc.). Timing is perfect, as the new calls from Work Programme 2016-17 would be introduced in the autumn and at the same time, coordinators of first e-infra calls would have enough useful tips and information to share best practice. This session would be helpful for all the EGI participants, as the funding opportunities for research infrastructures in Horizon 2020 are interested for infrastructure users and providers, tool developers, research and scientific communities. The H2020 would be a great value to the EGI Community Forum.
        Speaker: Mrs Daniela Mercurio (APRE - Agenzia per la Promozione della Ricerca Europea)
      • 09:15
        RI Work Programme 2016-17 30m
      • 09:45
        Experience of beneficiaries in RI Calls 30m
      • 10:15
        Discussion 15m
    • 09:00 10:30
      Tutorial: HAPPI toolkit Federico II

      Federico II

      Villa Romanazzi Carducci

      Convener: Briguglio Briguglio (Engineering Ingegneria Informatica S.p.A.)
      • 09:00
        Tracking Dataset Transformations with HAPPI Toolkit 1h 30m
        Results of the research community are based on three main pillars: models of phenomena, dataset gathered from missions and campaigns, validation and refinement of models based on dataset. Since its acquisition and during the whole life cycle of the research processes, dataset undergoes through many transformations (e.g. capture, migration, change of custody, aggregation, processing, extraction, ingestion) in order to be opportunely processed, analysed, exchanged with different researchers and (re-)used. Consequently, trustworthiness of results, and of research community itself, rely on tracking dataset transformations within the whole life cycle of the research processes. Tracking dataset transformations becomes more important whenever dataset has to be treated from researchers communities of different domains and/or the research processes may span over a long interval of time. Open Archival Information System (OAIS ¬- ISO:14721:2012) [1] has identified as “provenance information” the type of metadata where to store and track changes undergone to a generic digital object since its creation. Provenance is part of the so call OAIS Preservation Description Information, the metadata used to preserve digital object in a long-term digital archive, and it includes i) reference information (persistent identifier assigned to digital object); ii) provenance information; iii) context information (relationships to other digital objects), iv) fixity information (information used to ensure that digital object has not been altered in an uncontrolled manner) and v) rights information (permitted roles to access and transform digital object). The HAPPI Toolkit [2], part of the Data Preservation e-Infrastructure produced by the SCIDIP-ES project [3], traces and documents dataset transformations by adopting the Open Provenance Model, a simple information model based on three basic entities (i.e. controller agent, transformation, digital object) that improves interoperability and capability to exchange information among different digital archives and/or research communities. Moreover, HAPPI Toolkit generates for each transformation a record (called Evidence Record) that includes reference information and integrity information. The collection of records represent the history of all the dataset transformations is called Evidence History, and this information is managed by HAPPI Toolkit and provides data managers with evidences that are used during the assessment of the integrity and authenticity of the dataset. Since July 2014, HAPPI Toolkit is running on EGI FedCloud. The tutorial aims to presents how HAPPI Toolkit works, and specifically: how HAPPI Toolkit is configured, how it creates the evidences of dataset transformations, how users can access evidences and dataset information.
        Speaker: Mr Luigi Briguglio (Engineering Ingegneria Informatica S.p.A.)
    • 10:30 11:00
      Coffee break
    • 11:00 12:30
      EGI LifeWatch Competence Centre workshop Sala D, Giulia Centre

      Sala D, Giulia Centre

      Villa Romanazzi Carducci

      Convener: Jesus Marco de Lucas (CSIC)
    • 11:00 12:30
      Federated accelerated computing Scuderia

      Scuderia

      Villa Romanazzi Carducci

      Accelerated computing systems deliver energy efficient and powerful HPC capabilities. Many EGI sites are providing accelerated computing technologies to enable high performance processing such as GPGPUs or MIC co-processors. Currently these accelerated capabilities are not directly supported by the EGI platforms. To use the co-processors capabilities available at resource centre level, users must directly interact with the local provider to get information about the type of resources and software libraries available and which submission queues must be used to submit tasks of accelerated computing.
      The session follows the one held in Lisbon in May, and will discuss the progress on the roadmap to achieve the federation of GPGPU or MIC co-processors capabilities across EGI HTC and Cloud platforms.
      Service providers as well as user communities interested in the use of accelerated computing facilities across Europe are invited to participate bringing their requirements.

      Conveners: Dr Marco Verlato (INFN), Dr Viet Tran (UI SAV)
      • 11:00
        Latest progresses on AC activity in EGI-Engage 20m
        This presentation will discuss the progress on the roadmap to achieve the federation of GPGPU or MIC co-processors capabilities across EGI HTC and Cloud platforms, in the context of the EGI-Engage JRA2.4 task.
        Speakers: Dr Marco Verlato (INFN), Dr Viet Tran (UI SAV)
      • 11:20
        GPGPU-enabled molecular dynamics of proteins on a distributed computing environment 20m
        As part of the activities of the MoBrain Competence Center within the EGI-Engage project, we have implemented services for the use of molecular dynamics (MD) simulations on biological macromolecules (proteins, nucleic acids) based on the AMBER suite and taking advantage of GPGPU architectures within a grid computational infrastructure. The rationale for this development is to improve upon the tools already provided within the WeNMR gateway [1], which allow for MD-based refinement of macromolecular structures derived from NMR spectroscopy data as well as for unrestrained (i.e. without experimental data) MD simulations. These services are available via the AMPS-NMR portal, which has been using the EGI computational infrastructure for almost five years [2]. The portal allows a large range of users that are not computer-savvy to apply successfully state-of-the-art MD methods through completely guided protocols. The current protocols are only designed to work with CPUs. Transitioning to GPGPU services would result in a significant reduction of the wall time needed for calculations, thereby enabling a higher throughput of the portal. Alternatively, one could run simulations for larger, more complex molecular systems or could sample molecular motions more extensively in order to obtain information on various biologically relevant time scales. For the above reasons, we thus decided to extend the capabilities of the AMPS-NMR portal so that it would provide access to both CPU and GPGPU resources, depending on the requirements of the specific calculation requested by the user and taking into account also resource availability. To achieve this it is necessary to modify the submission pipeline that underlies the portal as well as to implement different versions of the AMBER suite. Some changes to the software code were necessary in order to achieve the best treatment of NMR-based experimental restraints during MD simulations. A further hurdle was the lack of an approach generally agreed upon to expose GPGPU resources on the EGI grid environment. To address this middleware limitation, we endeavored to contribute to testing the implementation of different queuing systems within EMI. In this contribution, we show the initial results of the above work. We also demonstrate an example biological application that is only possible on GPGPU systems. [1] Wassenaar TA, et al. WeNMR: Structural biology on the Grid. J. Grid. Computing 10:743-767, 2012 [2] Bertini I, Case DA, Ferella L, Giachetti A, Rosato A. A Grid-enabled web portal for NMR structure refinement with AMBER. Bioinformatics. 27:2384-2390, 2011
        Speakers: Andrea Giachetti (CIRMMP), Antonio Rosato (CIRMMP)
      • 11:40
        Bioinformatics simulations by means of running virtual machines as a grid jobs in the MolDynGrid Virtual Laboratory 20m
        There are number of software for bioinformatics simulations used in MolDynGrid virtual laboratory (VL) [1] for in silico calculations of molecular dynamics, including GROMACS, NAMD, Autodock, etc [2-3]. Computational resources for such simulations are provided by a number of HPC clusters that are mostly the part of Ukrainian National Grid (UNG) infrastructure powered by Nordugrid ARC [4] and few clusters of European Grid Infrastructure. In the heterogeneous grid environment ensuring that every resource provider has required build of software, particular version of software with its dependencies, and moreover handling software updates is a non-trivial task. When number of software and build flavors grows, like in MolDynGrid case, the software management across dozens of clusters becomes almost impossible. The classical approaches to software maintenance includes building software on the fly within grid-job execution cycle and relying on VO-managed common filesystem like CVMFS with pre-built software. Both approaches works well in case of similar resource providers environment, but in case of completely heterogeneous hardware and software, including different OS distributions you should handle software builds for every of this platform. To efficiently handle software in such environments for MolDynGrid researches another approach has been introduced - running hardware accelerated virtual machines (VM) as a grid jobs. This approach eliminates the necessity to build software on every resource provider and introduce a single point for software updates. Software should be build for one virtual platform only. Moreover this approach also allows to use software for Windows. Thus adding virtualization layer will drop performance, the first thing that had been analyzed is the amount of such drop. On the UA-IMBG cluster, molecular dynamics in GROMACS for the same biological object had been computed on the same hardware with and without virtualization. The software environment was cloned from the host to the guest VM. GROMACS was chosen as a main software used by MolDynGrid VL in terms of CPU time consumption. To run VMs as grid jobs on grid-site worker nodes, there are several helpers running with root priveleges needed to setup virtual hardware and transfer job data to VM. The framework of components that support VM execution cycle as a grid job has been originally developed as a part of Ukrainian Medgrid VO project [5] and called Rainbow (ARC in the Cloud) [6]. Rainbow start with providing interactive access to Windows VMs running on UNG resources for analyses of medical data stored in grid for telemedicine [7]. For MolDynGrid VL several components had been added to Rainbow framework that implements data staging to VM and allows to add VM layer to grid-job processing cycle. Both CLI and Web MolDynGrid VRE interfaces had been extended to support VM submission with Rainbow. This approach allows to involve more resources for computations with particular software builds. Further ongoing developments of Rainbow for MolDynGrid includes support of Docker containers in addition to KVM VMs and GPGPU computations by means of GPU device pass-through.
        Speaker: Andrii Salnikov (Taras Shevchenko National University of Kyiv)
    • 11:00 12:30
      Tutorial: Running Chipster in the EGI FedCloud Federico II

      Federico II

      Villa Romanazzi Carducci

      Conveners: Diego Scardaci (EGI.eu/INFN), Kimmo Mattila (CSC)
      • 11:00
        Running Chipster data analysis platform in EGI Federated Cloud 1h 30m
        Chipster is a bioinformatics environment that includes over 350 analysis tools for high-throughput sequencing and microarray data. The tools are complemented with a comprehensive collection of reference datasets, such as genome indexes for the Tophat and BWA aligners. The tools can be used on command line or via an intuitive GUI, which offers also interactive visualizations and workflow functionality. Chipster is open source and the server environment is available as a virtual machine image free of charge. In this tutorial session you will learn how virtual Chipster servers can be launched in the EGI Federated Cloud. The development and support work done by the EGI Federated Cloud community has made launching a Chipster server easy: The rOCCI client, needed to connect EGI Federated Cloud, is first installed to Linux or OSX machine. In addition you need to join chipster.csc.fi virtual organization. After these preliminary steps, you can use a simple utility tool. With the FedCloud_chipster_manager, Chipster VM image is automatically downloaded form the EGI AppDB, launched in the EGI Federated Cloud environment and linked to the required reference data sets and applications using the CVMFS system. More in-depth demonstrations about actually using Chipster for analyzing biological data will be shown in the NGS data analysis tuoral. Requirements: 1. Linux machine with rOCCI client, 2. Chipster.csc.fi VO membership.
        Speaker: Kimmo Mattila (CSC)
    • 12:30 13:30
      Lunch
    • 13:30 15:30
      Big data value workshop Europa

      Europa

      Villa Romanazzi Carducci

      Conveners: Sergio Andreozzi (EGI.eu), nadia nardi (Engineering Ingegneria Informatica spa)
    • 13:30 15:30
      Exploiting the EGI Federated clouds - Paas & SaaS workshop Scuderia

      Scuderia

      Villa Romanazzi Carducci

      Convener: Diego Scardaci (EGI.eu/INFN)
      • 13:30
        An integrated IaaS and PaaS architecture for scientific computing 20m
        Scientific applications often require multiple computing resources deployed on a coordinated way. The deployment of multiple resources require installing and configuring special software applications which should be updated when changes in the virtual infrastructure take place. When working on hybrid and federated cloud environments, restrictions on the hypervisor or cloud management platform must be minimised to facilitate geographic-wide brokering and cross-site deployments. Moreover, preserving the individual operation at the site-level in federated clouds is also important for scalability and interoperability. In that sense, the INDIGO-DataCloud project [1] has been designed with the objective of building up a PaaS-level cloud solution for research. One of the key multi-level components is the PaaS computing core. This part constitutes the kernel for the deployment of services and computing virtual infrastructures for the users. It is complemented with the virtualized storage, federated AAI and networking. The INDIGO-DataCloud PaaS core will be based on a microservice architecture [2]. Microservices consist of a set of narrowly focused, independently deployable services, typically implemented using container-embedded applications, exposed by RESTful interface. Microservices are designed to be highly scalable, highly available and targeted for the use in cloud environments. INDIGO’s microservices will be deployed, dynamically scheduled and managed using tools such as kubernetes [3]. In cases where multi-tenancy is not yet intrinsically supported by the particular microservice, like the container manager, INDIGO-DataCloud may decide to offer multiple instances to bridge that gap. INDIGO PaaS will offer an upper layer orchestration service for distributed applications using the TOSCA language standard [4]. It will deal with the requested service instantiation and application execution, managing the needed microservices in order, for example, to select the right end-point for the deployment. Cross-site deployments will also be possible. This PaaS, aimed at providing a more efficient platform for scientific computing, will require additional characteristics from the underlying layers. The INDIGO PaaS will leverage an enhanced IaaS that will provide a richer set of features currently missing. The usage of TOSCA permits IaaS providers to offer infrastructure orchestration, making possible to manage the deployment and configuration of the resources that are being provided. The life-cycle of the resources is therefore managed through the APIs exposed by the IaaS end-points. The TOSCA templates will be translated into their native deployment schemas using IM [5] for OpenNebula and Heat-Translator [6] for OpenStack HEAT. Both OpenNebula and OpenStack will incorporate drivers to support the deployment of containers as first-class resources on the IaaS. This will provide high efficiency when building up complex configurations from a repository of container images. The scheduling algorithms for both cloud management frameworks will be improved, in order to provide a better experience for the end-users and a more efficient utilization of the computational resources. The usage of two-level orchestrator (at the level of PaaS and within each IaaS instances) will enhance the capabilities of providing a dynamic and on-demand increase in cloud resources.
        Speakers: Dr Germán Moltó Martínez (UPVLC), Dr Giacinto Donvito (INFN), Dr Ignacio Blanquer (UPVLC)
      • 13:50
        OCCO and its usage to build efficient data processing workflow infrastructures in clouds 20m
        IaaS clouds are very popular since you can easily create simple services (Linux PC, web portal, etc.) in the cloud. However, the situation is much more difficult if you want to build dynamically, on demand a complex infrastructure tailored to your particular needs. A typical infrastructure contains database services, processing resources and presentation services. These services together provide the infrastructure you actually need to run your eventually complex application (e.g. a workflow) on it. The OCCO (One-Click Cloud Orchestration) framework developed in SZTAKI attempts to solve this problem in a very generic way by avoiding any specialization, i.e. it can work for any IaaS cloud type, on any Operating System type, for services with any complex interaction among them, etc. OCCO represents the second level above the IaaS layer within any cloud compute architecture. The talk will introduce the main services, the architecture and the internal structure of OCCO and explains how the required flexibility can be achieved with it. Particular attention will be given in the talk on how the TOSCA standard (Topology and Orchestration Specification for Cloud Applications) can be implemented in OCCO. The OCCO framework is currently under development towards supporting TOSCA specifications. In the talk the recent progress towards this support is also going to be introduced. The talk will demonstrate the flexibility of OCCI through an advanced data processing workflow. Data processing workflows are considered as networks of nodes where each node performs some computation/ data processing on the incoming data item and passes the result to the next one. OCCO is an ideal tool to be used for building such network of nodes performing data processing or streaming. The talk will show how an individual workflow or the network layout can be configured and how it is realised by OCCO.
        Speaker: Peter Kacsuk (MTA SZTAKI)
      • 14:10
        R Computing services as SaaS in the Cloud 20m
        R is a programming language and software environment for statistical computing and graphics that is widely used in different context like environmental research thanks to different geodata packages. Within the EGI-Lifewatch Competence Centre context, one of the tasks is to provide a final user oriented application based on R so different solutions to achieve this goal are being explored. One of these solutions is layer based architecture with three layers: • Bottom layer: R instances that can be installed in cloud (allowing load balance) or HPC (we have a testbed based in a PowerPC cluster). • Medium layer: R server that interacts with an R client. • Top Layer: Web based solution with certificate authentication. The user interface can be deployed using tools like iPython notebook or Rstudio web version. This presentation will analyze how the different proposed solutions can fit or not to the final user requirements and what is the satisfaction from user point of view.
        Speaker: Fernando Aguilar (CSIC)
      • 14:30
        The Ophidia stack: a big data analytics framework for Virtual Research Environments 20m
        The Ophidia project is a research effort on big data analytics facing scientific data analysis challenges in multiple domains (e.g. climate change). It provides a framework responsible for atomically processing and manipulating datacubes, by providing a common way to run distributive tasks on large set of fragments (chunks). Even though the most relevant use cases for Ophidia have been implemented in the climate change context, the domain-agnostic design of the internal storage model, operators and primitives makes easier the exploitation of the framework as a core big data technology for multiple Research Communities. Ophidia provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to efficiently deal with multidimensional data and a hierarchical data organization to manage large data volumes. The project relies on a strong background on high performance database management and OLAP systems to manage large scientific datasets. The Ophidia analytics platform provides several data operators to manipulate data cubes, and array-based primitives to perform data analysis on large scientific data arrays (e.g. statistical analysis, predicate evaluation, FFT, DWT, subsetting, aggregation, compression). The array-based primitives are built on top of well-known numerical libraries (e.g. GSL). Bit-oriented primitives are also available to manage B-cubes (binary data cubes). Metadata management support (CRUD-like operators) is also provided jointly with validation-based features relying on community/project-based vocabularies. The framework stack includes an internal workflow management system, which coordinates, orchestrates, and optimises the execution of multiple scientific data analytics and visualization tasks. Real-time workflow monitoring execution is also supported through a graphical user interface. Defining processing chains and workflows with tens, hundreds of data analytics operators can be a real challenge in many practical scientific use cases. The talk will also highlight the main needs, requirements and challenges regarding data analytics workflow management applied to large scientific datasets. Some real use cases implemented at the Euro Mediterranean Center on Climate Change (CMCC) will be also discussed. The results of a benchmark performed on the Athena Cluster at the CMCC SuperComputing Centre and regarding CMIP5 datasets will be also presented.
        Speaker: Dr Sandro Fiore (CMCC)
      • 14:50
        The VESPA Virtual Research Environment: a final report about the project at the beginning of clinical operations 20m
        The VESPA project aimed to provide a Virtual Research Environment (VRE) for qualitative and quantitative evaluation and rehabilitation of motor and cognitive diseases. It addressed more than thirty operational objectives to enable an extremely innovative platform for early evaluation and rehabilitation of cognitive diseases, such as Alzheimer’s Dementia, Mental Retardation and Linguistic Deficit, etc. VESPA is a pioneer project mixing brand new ICT technologies and Computer Science concepts in the fulfilling of patients and caregivers needs in the cognitive diseases field. In fact, on top of a fully immersive Virtual Reality system, it combines innovative and specialized 3D software applications, dedicated hand-free devices, a flexible and scalable Cloud Computing infrastructure, a powerful Science Gateway and a tele-supervision system. The project provided a completely new response to increasing demand for treatment of mental retardation, AD, Parkinson Disease, etc. Our open and flexible framework, the VESPA Library, extended a common gaming platform enabling integration of any cognitive application into a highly productive environment. By deriving from it, one can build a brand new evaluation test or rehabilitation task in the form of a 3D videogame by writing close-to-zero lines of code. The framework is highly customizable and include safe data management and transfer features. The development team created more than 80 applications designed by psychologists and neuropsychiatrists for three different kind of patients, in the range of simple to very hard, by just inheriting basic features from the framework. This is crucial for the growth of the community and the success of the VESPA system, aimed to feed a theoretically unlimited number of installations sites. The system also includes a Science Gateway that allows doctors, administrative staff members, and VESPA technicians to configure and manage system planning, operation, and results. Patients/caregivers take advantage from it by visualizing schedule and results of daily rehabilitation protocols, as well. Telemetries produced during operation are safely sent to the Cloud and the DataBase located at Health Center, so they can be available in near real-time to the community. Innovative and self-built devices can be plugged and used in the VESPA system with very few effort. This was the case of the home made instrumented glove built by VESPA, and devices like MyO Armband, Leap Motion sensor, etc. The validation process ran on three different group of patients. In the final part of the presentation, the results of VESPA system validation through Clinical Trials will be shown. By entering the market, the VESPA system will allow numbers of children and elders to live their daily rehabilitation sessions at schools and rest-homes so that no effort to be spent in transportation by caregivers. Since months, a community all around Europe is growing around the VESPA VRE. The VESPA system is now an open gate looking towards the next telemedicine offering fully immersive Virtual Reality applications inside a highly interactive platform to Health System actors by promising impressive results in terms of impact and speed for effective cognitive training activities.
        Speaker: Marco Pappalardo (Software Engineering Italia srl)
      • 15:10
        CernVM-FS vs Dataset Sharing 20m
        The CernVM-FS is firmly established as a method of software and conditions data distribution for the LHC experiments and many other Virtual Organizations at Grid sites. Use of CernVM-FS is reaching now a new stage, its advantages starting to be acknowledged by communities activating within and making use of the EGI Federated Cloud. As the manipulation of research data within cloud infrastructures becomes more important for many communities, they started to look into the CernVM-FS as a technology that could bring expected benefits. The presentation will explain when CernVM-FS can be used for dataset sharing without losing the main benefits of the technology and then information on how to properly use it will be given. Pros and cons will be discussed and available use cases will be analysed. The presentation is proposed as the start for a round table and discussions with audience participation on the topic of dataset distribution within cloud infrastructures, specifically the EGI Federated Cloud.
        Speaker: Catalin Condurache (STFC)
    • 13:30 15:30
      Infrastructure and services for human brain research Sala D, Giulia centre

      Sala D, Giulia centre

      Villa Romanazzi Carducci

      Convener: Dr Yin Chen (EGI.eu)
      • 13:30
        The Characterisation Virtual Laboratory 20m
        In 2014, Monash University, through the Multimodal Australian ScienceS Imaging and Visualisation Environment (MASSIVE), and project partners, completed development of the NeCTAR-funded Characterisation Virtual Laboratory (CVL), a project to develop online environments for researchers using advanced imaging techniques, and demonstrate the impact of connecting national instruments with computing and data storage infrastructure. The CVL is a collaboration between Monash University, Australian Microscopy & Microanalysis Research Facility (AMMRF), Australian Nuclear Science and Technology Organisation (ANSTO), Australian Synchrotron, National Imaging Facility (NIF), Australian National University, the University of Sydney, and the University of Queensland. The partners joined together around the CVL project with three major goals: 1. To integrate Australia’s imaging equipment with specialised HPC capabilities provided by MASSIVE and National Computational Infrastructure (NCI) and with data collections provided by Research Data Storage Infrastructure (RDSI) nodes. More than 450 registered researchers have used and benefited from the technology developed by the CVL project, providing them with an easier mechanism to capture instrument data and process that data on centralised cloud and HPC infrastructure, including MASSIVE and NCI. 2. To provide scientists with a common cloud-based environment for analysis and collaboration. The CVL has been deployed across clouds at the University of Melbourne, Monash University, and QCIF. CVL technology has been used to provide easier access to HPC facilities at MASSIVE, University of Sydney BMRI, the Pawsey Centre, NCI and Central Queensland University. 3. To produce four exemplar platforms, called Workbenches, for multi-modal or large-scale imaging in Neuroimaging, Structural Biology, Energy Materials (X-ray), and Energy Materials (Atom Probe). The CVL environment now contains 103 tools for specialised data analysis and visualisation in Workbenches. Over 20 imaging instruments have been integrated so that data automatically flows into the cloud for management and analysis. In addition, a number of specialised workflows have been developed and integrated, including atom probe data processing using galaxy, and automatic brain MRI and histology registration. The newly developed infrastructure is also having an impact beyond the four workbenches. For example, HPC facilities across Australia, including facilities at MASSIVE, NCI, Central Queenland University, the Brain and Mind Research Institute at University of Sydney and the Pawsey Centre, use software developed by the CVL to help a wider range of researchers access imaging and visualisation services. The technology developed under the CVL provides simple access to HPC resources by newcomers and inexperienced HPC users. The Characterisation Virtual Laboratory is one of a number of initiatives led by Monash University to develop tailored environments for research communities. This presentation will introduce those initiatives, with a specific focus on the CVL.
        Speaker: Wojtek James Goscinski (Monash University)
      • 13:50
        Instances of big data analysis in neuGRID 20m
        neuGRID (www.neugrid4you.eu) is a web portal aimed to help neuroscientists do high-throughput imaging research and provide clinical neurologists automated diagnostic imaging markers of neurodegenerative diseases for individual patient diagnosis. neuGRID’s user-friendly environment is customised to a range of users from students to senior neuroscientists working in the fields of Alzheimer's disease, psychiatric diseases, and white matter diseases. neuGRID aims to become a widespread resource for brain imaging analyses. neuGRID was first funded by the European Commission DG INFSO within the 7th Framework Program from 2008 to 2011. Here, the hardware and middleware infrastructure were developed. The second wave was funded in 2011 by the European Commission, now DG CONNECT, under the project neuGRID for you (N4U), with the main aim of expanding user services with more intuitive and graphical interfaces. N4U ended in April 2015. Through the single virtual access point Science Gateway web portal, users login and access a “virtual” imaging laboratory. Here users can upload, use, and share algorithms for brain imaging analysis, have access to large neuroimaging datasets, and make computationally intensive analyses, all the time with specialized support and training. Thanks to distributed services and grid/cloud computational resources, analyses with neuGRID are much faster than traditional-style lab-based analyses. neuGRID’s proof-of-concept was carried out when an Alzheimer's disease biomarker (3D cortical thickness with Freesurfer and CIVET) was extracted from 6.500 MR scans in 2 weeks versus 5 years that it would have taken in a traditional setting. This presentation will introduce this initiative, with a specific focus on the different big data analyses conducted Europe wide so far.
        Speaker: alberto redolfi (INFN)
      • 14:10
        VIP: a Virtual Imaging Platform for the long tail of science 20m
        Computing and storage have become key to research in a variety of biomedical fields, for example, to compute numerical simulations for research in medical imaging or cancer therapy, or to automate the analysis of digital images in neurosciences or cardiology. The Virtual Imaging Platform (VIP) is a web portal for medical simulation and image data analysis. It leverages resources available in the biomed Virtual Organisation of the European Grid Infrastructure to offer an open service to academic researchers worldwide. VIP aims to mask the infrastructure and enable a user experience as transparent as possible. This means that VIP has to take decisions as automatically, quickly, and reliably as possible regarding infrastructural challenges such as: -(1) the placement of data files on the storage sites, -(2) the splitting and distribution of applications on the computing sites, -(3) the termination of misbehaving runs. We heavily rely on the DIRAC service provided by France Grilles (the NGI of France) to take such decisions in the changing environment of EGI. In addition, we have developed 'non-clairvoyant’ techniques to specifically address the challenges of the applications provided in VIP. With VIP, researchers from all over the world can access important amounts of computing resources and storage with no required technical skills beyond the use of a web browser. EGI is essential to the success of VIP because it provides an open infrastructure relieving researchers from the burden of negotiating resource allocations with computing centres. Such an open policy enables the supply of services to the long tail of science, i.e. to the large number of groups and projects of modest size such an individual masters or PhD project, proof-of-concept studies, and so on.
        Speaker: Ms Sorina POP (CNRS, Creatis)
      • 14:30
        Test activities for HBP/SP5 neuroinformatics: results and next steps 20m
        Speakers: Dr Lukasz Dutka (CYFRONET), Dr Yin Chen (EGI.eu)
      • 14:50
        Discussion 40m
    • 13:30 15:30
      Tutorial: NGS data analysis Federico II

      Federico II

      Villa Romanazzi Carducci

      Convener: Dr Fotis Psomopoulos (Institute of Applied Biosciences, Center for Research and Technology Hellas)
      • 13:30
        NGS Data Analysis Training Workshop 2h
        Summary ======= "Big data" is one of today's hottest concepts, but it can be misleading. The name itself suggests mountains of data, however big data consists of three V's: volume of data, velocity of data processing, and variability of data sources. These are the key features of information that require big-data tools to make use of high networking and computing power. Researchers working with genomics in medicine, agriculture and other life sciences are producing big bio-data, mainly by the application of Next-Generation Sequencing (NGS) to give answer to important biological issues. NGS technologies are revolutionizing genome research, and in particular, their application to transcriptomics (RNA-seq) is increasingly being used for gene expression profiling as a replacement for microarrays. In order to deal with big bio-data, , current approaches in Life Science research favor the use of established workflows which have been proven to facilitate the first steps in data analysis. We propose to setup a Training Workshop focusing towards the particular needs of the researchers active in the field of NGS data analysis, but so far have limited to no experience with the use of EGI resources for this task. Description =========== The workshop will consist of two parts; initially there are going to be presentations from key applications and workflows that are currently established in the EGI ecosystem that cater to the particular needs of NGS analysis. This first part will give an in depth idea of the state-of-the-art in this field to the participants, and prepare them for the second part of the session, i.e the hands-on exercises. The exercises will be carefully selected both in terms of generality (i.e. applicable to a wide range of NGS data and analyses), as well as time constraints (i.e. small enough to conclude within the context of the session). The scope of these exercises will address issues such as input and reference data management, use of established analysis tools both in a Grid as well as a Cloud infrastructure, and tools for retrieving and further analyzing the produced output. All exercises will be performed on the Chipster platform (http://chipster.csc.fi/) using EGI FedCloud resources. The process of connecting to the FedCloud and launching Chipster will be very briefly addressed in the context of this tutorial. However, there is a dedicated tutorial on "Running Chipster data analysis platform in EGI Federated Cloud" before this session (https://indico.egi.eu/indico/contributionModification.py?contribId=25&sessionId=26&confId=2544) so interested participants are encouraged to attend both tutorials. Impact ====== The goal of this workshop is to attract life science researchers from different fields (such as agro-biotechnology, health and nutrition among others) that are (a) actively using NGS data analysis workflows in their research, and (b) have little experience employing EGI resources. Although researchers active in NGS are currently limited in number, as compared with the number of life scientists, the rise of NGS data across all Life Science domains leads to an increasing demand of both trained personnel as well as novel tools and approaches. Timetable ========= Part 1: NGS Data Analysis in EGI (60') 10' Intro to EGI resources 15' Data Replication 15' Cloud Applications 15' Grid Applications 5' Discussion / Wrap-up Part 2: Hand-on training (60') • Exercise #1: Connect to a Cloud VM • Exercise #2: Select ref data and replicate • Exercise #3: Upload test input NGS data • Exercise #4: Execute workflow • Exercise #5: Post-workflow analysis
        Speakers: Dr Anastasia Hadzidimitriou (Institute of Applied Biosciences / CERTH), Dr Anna Vardi (Institute of Applied Biosciences / CERTH), Dr Fotis Psomopoulos (Aristotle University of Thessaloniki), Kostas Koumantaros (GRNET)
    • 15:30 16:00
      Coffee break
    • 16:00 18:00
      Exploiting the EGI Federated clouds - Paas & SaaS workshop Scuderia

      Scuderia

      Villa Romanazzi Carducci

      Convener: Diego Scardaci (EGI.eu/INFN)
      • 16:00
        Accessing Grids and Clouds with DIRAC services 20m
        Multiple scientific communities are using more and more intensive computations to reach their research goals. Various computing resources can be exploited by these communities making it difficult to adapt their applications for different computing infrastructures. Therefore, there is a need for tools for seamless aggregation of different computing and storage resources in a single coherent system. With the introduction of Clouds as a new innovative way of provisioning the computing resources, the necessity for the means of their efficient usage grows even more. The DIRAC project develops and promotes software for building distributed computing systems. Both workload and data management tools are provided as well as support for high level workflows and massive data operations. Services based on the DIRAC interware are now provided by several national grid infrastructure projects. The DIRAC4EGI service is operated by the EGI project itself. Other DIRAC services are provided by a number of national computing infrastructures ( France, UK, China, etc ). Those services are used by multiple user communities with different requirements and amounts of work. Experience of running multi-community DIRAC4EGI and national DIRAC services will be presented in this contribution.
        Speaker: Andrei Tsaregorodtsev (CNRS)
      • 16:20
        Atmosphere: A Platform for Development, Execution and Sharing of Applications in Federated Clouds 20m
        The advent of cloud computing offers new opportunities for developers and users of scientific applications [1]. This paper presents results of research on efficient development, federation, execution and sharing of cloud computational services. We have investigated methods for integration of heterogeneous public cloud infrastructures into a unified computational environment, cost optimization of application deployment in a heterogeneous cloud environment (choosing optimal resources given the available funds), federated data storage mechanisms with extensions for public storage services, and dealing with resource demand spikes (optimization of platform middleware and user interfaces). This research was undertaken as a part of the EU VPH-Share project [2] and resulted in a platform called Atmosphere [2, 3] Recently, Atmosphere which has also become a part of the PLGrid e-infrastructure [4]. Atmosphere supports an iterative service development process. Computational software is exposed as a set of so-called Atomic Services (virtual machines (VMs) that can be created on demand from VM templates) which can be used either as standalone tools or as building blocks for larger workflows. Developers may iteratively construct new services which may be published, shared, or subjected to further development. Once published, services can be instantiated (executed) and made available on the Web for the community. In this way, with the help of a Web-based user interface, the platform provides an integrated tool supporting development and publishing of services for the entire VPH community. Atmosphere provides a full middleware stack, complete with end-user interfaces and APIs, enabling service developers to create, register and share cloud services, and end users to instantiate and invoke their features. Atmosphere federates 5 cloud IaaS technologies (OpenStack, EC2, MS Azure, RackSpace, Google Compute), 7 distinct cloud sites are registered with the VPH-Share infrastructure, there are over 250 Atomic Services available, and about 100 service instances are operating on a daily basis. Atmosphere has been used to host complex applications from several projects of the VPH community [5]: VPH-Share, MySpine, ARTreat, VPH-DARE and pMedicine as well as for medical students trainings at the University of Sheffield, the Jagiellonian University Medical College, and Karolinska Institutet [2, 3]. The platform is being improved in collaboration with application developers in order to ensure that their software can be easily ported to the cloud and provisioned in the PLGrid e-infrastructure.
        Speaker: Dr Marian Bubak (ACC Cyfronet and Department of Computer Science, AGH University of Science and Technology, Krakow, Poland)
      • 16:40
        Cloud-enabled, scalable Data Avenue service to process very large, heterogeneus data 20m
        Compute-intensive applications such as simulations applied in various research areas and industry require computing infrastructures enabling highly parallel, distributed processing. Grids, clusters, supercomputers and clouds are often used for this purpose. There also exist tools that allow easier design and construction of such complex applications, typically in the form of workflows, which tools can utilize various types of distributed computing infrastructures (DCIs) and provide automated scheduling, submission and monitoring of workflow tasks (jobs). Some tools support job-level granularity, that is, each job in a workflow may potentially be executed in a different computing infrastructure. Numerous storage solutions exist, however, storage resources accessible from within a given DCI are often limited by the protocols supported by the computing elements themselves. Binding jobs to a particular storage resource makes very difficult to port the workflow to other computing resources, or exchange data between different DCIs. To alleviate this problem a data bridging solution had been proposed, called Data Avenue, through which all common storage operations (such as listing, folder creation, deletion, renaming) and data access (download/upload) can be done on a wider set of storage resources (SFTP, GridFTP, SRM, iRODS, S3, etc.) using a uniform web service (HTTP) interface. Jobs, in this way, become capable of accessing diverse storage resources regardless of the DCI where the job is currently being run, resulting in more flexible and portable workflows. Such a mediation service however occasionally implies very high CPU and network load on the server, as data exchanged over a storage-related protocol between the Data Avenue server and the storage has to be converted to HTTP established between the Data Avenue Server and the client. On massive, concurrent use, such as running parameter sweep applications where thousands of jobs may run in parallel, a single Data Avenue server could soon become a bottleneck, and clients may experience a significant decline in transfer rate. On the other hand, such peak loads are often followed by idle periods, when Data Avenue host will be underexploited. This presentation introduces a solution to scale Data Avenue (DA) services on-demand by multiplying the available Data Avenue servers. The solution uses cloud infrastructure (IaaS) to dynamically grow or shrink the capacity of the server depending on the current load, composed of architectural components: load balancer, cloud orchestrator, VM pool, and a common database. Load balancer is responsible for dispatching client requests to one of the servers in the VM pool, which contains virtual machines having individual Data Avenue services pre-installed. Cloud orchestrator continuously monitors the load of VMs in the pool, and based on predefined load thresholds, starts new or shuts down instances, respectively. A common database to which each DA VM connects persists data of client interactions over lifetimes of individual DA VMs. An important advantage of this solution is that clients communicate with a single Data Avenue endpoint (load balancer), whereas mechanisms behind the scenes are hidden. Details of the proposed solution and preliminary experimental results are also reported.
        Speaker: Peter Kacsuk (MTA SZTAKI)
      • 17:00
        Scalable ABM platforms with friendly interaction to address practical problems. 20m
        Many problems can be addressed in a realistic way with the help of Agent Based Model tools. However, these tools are sometimes not easy to use for a final user, or are not able to scale up to use the computing resources required by the problem. We propose to develop a general platform supporting different ABM solutions, and deployed as a service in HPC Cloud resources. We analyze a first possible pilot, through the discussion of a simple but real use case: the anthropogenic impact in the water quality in a lake.
        Speaker: Luis Cabellos (CSIC)
      • 17:20
        Supporting Big Data Processing via Science Gateways 20m
        With the rapid increase of data volumes in scientific computations, the importance of utilising parallel and distributed computing paradigms in data processing is becoming more and more important. Hadoop is an open source implementation of the MapReduce framework supporting processing large datasets in parallel and on multiple nodes in a reliable and fault-tolerant manner. Scientific workflow systems and science gateways are high level environments to facilitate the development, orchestration and execution of complex experiments from a user-friendly graphical user interface. Integrating MapReduce/Hadoop with such workflow systems and science gateways enables scientists to conduct complex data intensive experiments utilising the power of the MapReduce paradigm from the convenience provided by science gateway frameworks. This presentation describes an approach to integrate MapReduce/Hadoop with scientific workflows and science gateways. As workflow management systems typically allow a node to execute a job on a compute infrastructure, the task of integration can be translated into the problem of running the MapReduce job in a workflow node. The input and output files of the MapReduce job have to be mapped into the inputs and outputs of a workflow node. Besides executing the MapReduce job, the necessary execution environment (the Hadoop cluster) should also be transparently set up before and destroyed after execution. These operations should also be carried out from the workflow without further user intervention. Therefore, the concept of infrastructure aware workflow is utilised where first the necessary execution environment is created dynamically in the cloud, followed by the execution of workflow tasks, and finally breaking down of the infrastructure releasing resources. As implementation environment for the above concept, the WS-PGRADE/gUSE science gateway framework and its workflow solution has been utilized. However, the solution is generic and can also be applied to other grid or cloud based workflow systems. Two different approaches have been implemented and compared: the Single Node Method where the above described process is implemented as a single workflow node, and the Three Node Method where the steps of creating the Hadoop cluster, executing the MapReduce jobs, and destroying the Hadoop execution environment are separated. While the Single Node Method is efficient when embedding a single MapReduce experiment into a workflow, the Three Job Method allows more flexibility for workflow developers and results in better performance in case of multiple MapReduce experiments that can share the same Hadoop cluster. Both approaches support multiple storage solutions for input and output data, including local files on the science gateway, and also cloud-based storage systems such as Swift object storage and Amazon S3. These storage types can be freely mixed and matched when defining input and output data sources/destinations of the workflow. The current implementation supports OpenStack based clouds with a more generic solution including OpenNebula and generic EGI Federated Cloud support on its way. The presentation will describe the implementation concept and environment, will provide benchmarking experiments regarding the efficiency of the implemented approaches, and demonstrate how the solution can be utilised by scientific user communities.
        Speaker: Tamas Kiss (University of Westminster, London, UK)
      • 17:40
        Virtual Research Environments as-a-Service 20m
        Virtual Research Environments (VREs) are innovative, web-based, community-oriented, comprehensive, flexible, and secure working environments conceived to serve the needs of science [3]. They are expected to act like "facilitators" and "enablers" of research activities conducted according to cutting-edge science patterns. They play the role of "facilitators" by providing seamless access to the evolving wealth of resources (datasets, services, computing) - usually spread across many providers including e-Infrastructures - needed to conduct a research activity. They play the role of "enablers" by providing scientists with state of the art facilities for supporting scientific practices, e.g. sharing and publishing comprehensive research activities giving access to the real research products while scientists are working with them [1], automatically generating provenance, capturing accounting, managing quota, and supporting new forms of transparent peer-reviews and collaborations by social networking. The development of such environments should be effective and sustainable to actually embrace and support research community efforts. Ad-hoc and from-scratch approaches are not suitable for the development and provision of such working environments because the overall costs (implementation, operation and maintenance) are neither affordable nor sustainable by every scientific community. In this presentation it is discussed the experience made by a series of initiatives and projects (e.g. D4Science and iMarine) enabling the creation and provisioning of Virtual Research Environments by the as-a-Service paradigm [2]. In particular, it is presented the gCube technology by focusing on the mechanisms enabling the automatic creation and operation of VREs by relying on an extended resource space (comprising datasets, functionalities, services) built by aggregating constituents from existing Infrastructures and Information Systems. This mechanism envisages a definition phase and a deployment phase. The definition phase is based on a wizard enabling a user to specify the characteristics of the VRE he/she is willing to enact in terms of datasets to be offered and services to be made available by selecting them from a catalogue. In addition to that, the VRE designer can specify requests for services customisations (e.g. enable/disable features) as well as establish the policies that govern the VRE (e.g. whether it is public or by invitation). The overall goal of the definition phase is to be as easy and as short as possible by abstracting on technical details. The deployment phase is completely automatic and results in the delivery of a web-based environment ready to be used. During this phase the VRE specification, after the approval by a Manager, is analysed and transformed in a deployment plan consisting of creating the software system and the secure application context needed to operate the VRE. This software system is created by instructing service instances to support the new VRE, by deploying new service instances dedicated to it, by allocating computing power, by deploying services giving access to the datasets. All of this is done according to resources usage policies and by maximising the overall exploitation of the resources forming the resource space.
        Speaker: Pasquale Pagano (CNR)
    • 16:00 18:00
      Long tail of science: tools and services Sala D, Giulia centre

      Sala D, Giulia centre

      Villa Romanazzi Carducci

      Convener: Peter Solagna (EGI.eu)
      • 16:00
        Developing the computing environment for new research communities in Romania 20m
        An overview is presented on the implementation of the computing environment for new research communities served by the Romanian Grid Infrastructure. These include the researchers involved in the Extreme Light Infrastructure – Nuclear Physics (ELI-NP) project, in computational biology and in the physics of condensed matter and nanomaterials. The new infrastructure provides access to HTC and HPC resources through a single web portal which features tools for the definition of workflows, job submission and monitoring, data analysis and visualization, and access to third-party software. A multi-disciplinary instance of the DIRAC framework is also integrated and used for production and training. The infrastructure will support various research activities, such as the numerical investigation of the new processes generated by the interaction of the nuclear matter with extreme electromagnetic fields at ELI-NP, the design of nanostructures relevant for the next generation of high-speed electronic devices, the modeling of various subcellular structures in bacteria, and the drug design.
        Speaker: Dr Ionut Traian Vasile (IFIN-HH)
      • 16:20
        Long tali of Science platform 20m
        EGI enables researchers to get access to distributed resources, EGI have recognised the need for simpler and more harmonised access to the distributed EGI Infrastructure. This portal allows individual researchers and small research teams to be productive using EGI without barriers and without unnecessary overhead. The platform presented during the egi conference will reach production during the summer, adding new capabilities by integrating more portals. In this session we will present the functionalities of the platform in production the use cases supported, and how resource providers and users can benefit from it.
        Speaker: Peter Solagna (EGI.eu)
      • 16:40
        Opportunistic use of supercomputers: linking with Cloud and Grid platforms 20m
        Supercomputers are good candidates for opportunistic use, as a non-negligible fraction of the computing time may stay unused while waiting for a large number of cores to be available for a new parallel job to be executed. In practice this results in a typical occupancy below 90%, leaving yet an interesting 10% of computing time that could be used by short jobs using few cores. Users accessing to supercomputers however may not have such need for short jobs that may be more frequent for users of Cloud or Grid computing platforms. We explore the possibility of automatic back-filling execution in a supercomputer of jobs prepared for a Cloud or a Grid platform. Different options for this integration are presented, as well as its implementation on a top500 supercomputer for different applications. The experience using this schema to support the long-tail of science in biodiversity, providing access to more than 100 users from 20 different countries in the world, will be also described.
        Speakers: Fernando Aguilar (CSIC), Luis Cabellos (CSIC)
      • 17:00
        Outreach strategies for the long tail of science in France 20m
        The research landscape in France is very scattered. There are a lot of units that depend on multiple research organisms or universities. The computing offer is also scattered and France Grilles is one of the many stakeholders that work at local, regional, disciplinary or national levels. « France Grilles aims at building and operating a multidisciplinary national Distributed Computing Infrastructure open to all sciences and to developing countries. » is the French NGI vision. The French major scientific organizations joint their forces in France Grilles. It implies that all their scientists may use the services if needed. A question is « how to reach them ? » About 100 000 academic researchers work in France and 60 000 more people are involved in research organisms and universities. CNRS staff represents about 30 000 people working in more than 1000 research units for example. Most of these units are shared with one or more other organisms. Looking for researchers who may need France Grilles resources and services implies to be organised. As computing is related to IT our strategy will mainly rely on IT people and on related initiatives and work-groups. In fact in almost all French research units or entities there is an IT team in charge of information system. This team is often in charge of the unit computing resources and is close to the researchers needs. In the presentation we will present this context and our detailed strategy. We will explain how we got involved in business networks, how we built fruitful relationship with other entities and how we benefit from platforms, tools or events. We will also present our communication and dissemination actions.
        Speaker: Romier Romier (CNRS)
      • 17:20
        The Catania Science Gateway of the EGI Pilot for the Long Tail of Science 20m
        This EGI Pilot for the Long Tail of Science [1] aims to design and prototype a new e-Infrastructure platform in EGI to simplify access to Grid and Cloud Computing services for the Long-Tail of Science (LToS), ie. those researchers and small research teams who work with large data, but have limited or no expertise in distributed systems. The project will establish a set of services integrated together and suited for the most frequent Grid and Cloud computing use cases of individual researchers and small groups. The INFN is involved in the LToS Pilot since its beginning and its responsibility is twofold: (i) to improve the Catania Science Gateway Framework (CSGF) [2], in order to fulfill the requirements of the Pilot, and (ii) deploy a Science Gateway for LToS users. In the last 6 months new features of the CSGF have indeed been implemented to better support diverse multi/inter-disciplinary Virtual Research Communities (VRCs) and allow scientists across the world to do better (and faster) research with an acceptable level of tracking of user activities and zero-barrier access to European ICT-based infrastructures. The most relevant improvements of the CSGF are: (i) the support for Per-User Sub-Proxies (PUSPs) [3] and (ii) the integration with the new EGI User Management Portal (UMP) for LToS researchers [4] developed by CYFRONET and based on Unity [5]. With the support for PUSPs, which add user-specific information to the CN proxy field, now it is possible to uniquely identify users that access ICT-based infrastructures using proxies issued by a common robot certificate. PUSPs are usually generated by the eTokenServer, a standard-based solution developed by INFN for central management of robot certificates and provisioning of proxies to get seamless and secure access to computing e-Infrastructures, based on local, Grid and Cloud middleware supporting the X.509 standard for authorization. The Authorisation and Authentication Infrastructure of the Catania Science Gateway Framework has been extended to support the OpenID-Connect protocol which is used by the EGI UMP to authenticate users. The approach followed by EGI with its UMP is to centralise the authorisation to access resources so only people holding an e-grant and with the right to perform computation and data access are authenticated and authorised. In this contribution we will present the new features of the CSGF, developed to support the LToS Pilot, and we will show some of the use cases already integrated in the Science Gateway dedicated to the project [5] which are seamlessly executed both on the EGI Grid and on the EGI Federated Cloud. Time permitting, a short demonstration will also be given.
        Speaker: Giuseppe La Rocca (INFN Catania)
      • 17:40
        The vo.france-grilles.fr VO: expertise mutualization and integrated services to serve the France Grilles long tail 20m
        Context: France Grilles set up in 2011 a national VO named vo.france-grilles.fr to welcome all potential users from all disciplines for their research needs if they work in France in the academic research or in partnership with a France Grilles member and if there is not a more relevant VO to welcome them. This VO is accepted by almost all grid sites in France and by the production cloud sites. It benefits from two main mutualized services: FG-DIRAC and FG-iRODS. A French speaking collaborative documentation is available on line and user support is organized. This VO is currently in the Top Ten of the VOs in term of CPU time consumed according to the EGI accounting portal (20 M hours Normalised CPU time or 0.43% of the total CPU time consumed during the last year on the EGI infrastructure on the first of July 2015). The first part of the presentation will detail this context: why and how France Grilles engages with its long tail of scientists. We will show how the different mutualized services are integrated to facilitate the users work and how the France Grilles team builds step by step a complete set of services. In the second part we will give elements and statistics showing the achievements and we will conclude with a presentation of the next steps.
        Speaker: Romier Romier (CNRS)
    • 16:00 18:00
      Open Data workshop Europa

      Europa

      Villa Romanazzi Carducci

      Convener: Dr Lukasz Dutka (CYFRONET)
      • 16:00
        Data Repositories and Science Gateways for Open Science 20m
        The steep decrease of costs of large/huge-bandwidth Wide Area Networks has fostered in the recent years the spread and the uptake of the Grid Computing paradigm and the distributed computing ecosystem has become even more complex with the recent emergence of Cloud Computing. All these developments have triggered the new concept of e-Infrastructures which are being built since several years both in Europe and the rest of the world to support diverse multi-/inter-disciplinary Virtual Research Communities (VRCs) and their Virtual Research Environments (VREs). E-Infrastructure components can indeed be key platforms to support the Scientific Method, the “knowledge path” followed every day by scientists since Galileo Galilei. Distributed Computing and Storage Infrastructures (local High Performance/Throughput Computing resources, Grids, Clouds, long term data preservation services) are ideal both for the creation of new datasets and the analysis of existing ones while Data Infrastructures (including Open Access Document Repositories – OADRs – and Data Repositories – DRs) are essential also to evaluate existing data and annotate them with results of the analysis of new data produced by experiments and/or simulations. Last but not least, Semantic Web based enrichment of data is key to correlate document and data, allowing scientists to discover new knowledge in an easy way. However, although big efforts are being done in the last years, both at technological and political level, Open Access and Open Education are still far from being pervasive and ubiquitous and prevent Open Science to be fully established. One of the main drawbacks of this situation is the limiting effect it has on the reproducibility and extensibility of science outputs which are, since more than four centuries, two fundamental pillars of the Scientific Method. In this contribution we present the Open Access Repository (OAR), a pilot data preservation repository of INFN and other Italian Research Organisations' products (publications, software, data, etc.) meant to serve both researchers and citizen scientists and to be interoperable with other related initiatives both in Italy and abroad. OAR is powered by the INVENIO software and is both an Open Access Initiative conforming and an official OpenDOAR data provider, able to automatically harvest resources from different sources, including the Sponsoring Consortium for Open Access Publishing in Particle Physics (SCOAP3), using RESTful API’s. It is also one of the official OpenAIRE archives, compliant with version 3.0 of its guidelines. OAR allows SAML-based federated authentication and it is one of the Service Providers of the eduGAIN inter-federation; it is also connected to DataCite for the issuance and registration of Digital Object Identifiers (DOIs). But what makes OAR really different from other repositories is its capability to connect to Science Gateways and exploit Distributed Computing and Storage Infrastructures worldwide, including EGI and EUDAT ones, to easily reproduce and extend scientific analyses. In this presentation some concrete examples related to the data of the ALEPH and ALICE Experiments will be shown.
        Speaker: Roberto Barbera (University of Catania and INFN)
      • 16:20
        Digital Knowledge Platforms 20m
        The concept of Digital Knowledge Platforms (DKP) as a framework to support the full data cycle in research is presented. DKPs extend the existing ideas in Data Management, first of all by providing a framework to exploit the power of ontologies at different levels. DKPs aim to preserve knowledge explicitly, starting with the description of the Case Studies, and integrating data and software management and preservation on equal basis. The uninterrupted support in the chain starts at the data acquisition level and covers up to the support for reuse and publication in an open framework, providing integrity and provenance controls. A first prototype developed for a LifeWatch pilot project with different commercial companies using only open source software will be described and compared to existing solutions from other research areas. The issues on the implementation of this platforms using cloud resources, and in particular FedCloud resources, will be discussed.
        Speaker: Jesus Marco de Lucas (CSIC)
      • 16:40
        Maximising uptake by opening access to research: The BlueBRIDGE endeavour 20m
        Open Science is emerging as a force that by democratizing access to research and its products will produce advantages for the society, economy and the research system, e.g. "more reliable" and efficient science, faster and wider innovation, societal challenges-driven science. BlueBRIDGE is a European funded project realizing the Open Science modus operandi in the context of Blue Growth Societal Challenge. The overall objective of this project, starting from September ’15 and running over a 30 months timeline, is to support capacity building in interdisciplinary research communities. These communities are principally involved in increasing scientific knowledge on marine resource overexploitation, degraded environment and ecosystem. Their aim is to provide advices to competent authorities and to enlarge the spectrum of economic growth opportunities. BlueBRIDGE will implement and operate a set of Virtual Research Environments (VREs) facilitating communities of scientists from different domains (e.g. fisheries, biology, economics, statistics, environment, mathematics, social sciences, natural sciences, computer science) to collaborate in their knowledge production chain, from the initial phases, data collection and aggregation, to the production of indicators. These communities involve EU and International world-renowned leading institutions (e.g. ICES, IRD, FAO, UNEP) that provide informed advice on sustainable use of marine resources to their member countries. Furthermore, the communities also include relevant Commissions of international organizations, national academic institutions and small and medium enterprises (SMEs). VREs are innovative, web-based, community-oriented, comprehensive, flexible, and secure working environments conceived to serve the needs of science [2]. They are expected to act like "facilitators" and "enablers" of research activities conducted according to open science patterns. They play the role of "facilitators" by providing seamless access to the evolving wealth of resources (datasets, services, computing) - usually spread across many providers including e-Infrastructures - needed to conduct a research activity. They play the role of "enablers", by providing scientists with state-of-the-art facilities supporting open science practices [1]: sharing, publishing, and reproducing comprehensive research activities; giving access to research products while scientists are working with them; automatically generating provenance; capturing accounting; managing quota; supporting new forms of transparent peer-reviews and collaborations by social networking. The development of such environments should be effective and sustainable to actually embrace and support research community efforts. In this presentation, we described the set of VREs that will be developed and operated to serve four main scenarios of BlueBRIDGE: (i) Blue assessment; supporting the collaborative production of scientific knowledge required for assessing the status of fish stocks and producing a global record of stocks and fisheries, (ii) Blue economy; supporting the production of scientific knowledge for analysing socio-economic performance in aquaculture, (iii) Blue environment; supporting the production of scientific knowledge for fisheries & habitat degradation monitoring, and (iv) Blue skills; boosting education and knowledge bridging between research and innovation, in the area of protection and management of marine resources. BlueBRIDGE builds on the D4Science infrastructure and the gCube technology to operate the VREs by aggregating the needed data, software and services.
        Speaker: Dr Gianpaolo Coro (CNR)
      • 17:00
        Storage Management in INDIGO 20m
        The INDIGO DataCloud project is set out to develop a data/computing platform targeting scientific communities, to enhance usefulness of existing e- infrastructures [1]. The developments in the storage are focus on two levels. On the IaaS level, QoS in storage will be adressed, by implementing a standardised extension to the CDMI standard, which enables management of storage quality e.g. access-latency, retention-policy, migration-strategy or data-lifecycle. This is closely related with intelligent identity management to harmonise access via different protocols, such as gridFTP, sftp and CDMI. This will allow to use CDMI to manage QoS of data that is accessible via gridFTP or sftp. On the PaaS level, INDIGO DataCloud will provide flexible data federation functionality, enabling users to transparently store and access their data between heterogeneous infrastructures. DataCloud will provide unified API's for data management based on state of the art standards, allowing both users and application developers to easily integrate DataCloud high level data management functionality into their use cases. One of the key features of this solution will be optimization of data access in various scenarios, which will include automatic pre-staging, maximum bandwidth usage via parallel transfers and enabling instant access to remote data through streaming. Furthermore, the layer will provide information to Cloud orchestration services allowing placing the computations in the sites where the data is already staged, or where it can be delivered efficiently. [1]https://www.indigo-datacloud.eu/
        Speakers: Marcus Hardt (KIT-G), Patrick Fuhrmann (DESY)
      • 17:20
        Wikidata - structured information for Wikipedia and what this means for research workflows 20m
        There have been several attempts at bringing structured information together with Wikipedia. The most recent one is Wikidata, a MediaWiki-based collaborative platform that uses dedicated MediaWiki extensions (Wikibase Repository and Wikibase Client) to handle semantic information. It started out in late 2012 as a centralized way to curate the information as to which Wikipedia languages have articles about which semantic concepts. Since then, it has been continuously expanding in scope, and it now has structured information about 14 million, including about some that do not have articles in any Wikipedia. It is not only the content that is growing in extent and usefulness, but the contributor community is expanding too, making Wikidata one of the most active Wikimedia projects. The use of Wikidata in research contexts has only begun to be explored. For instance, there are Wikidata items about all human genes, and they have been annotated with information about their interaction with other genes, with drugs and diseases, as well as with references to pertinent databases or the relevant literature. Another example is the epigraphy community, which uses Wikibase to collect information about stone inscriptions and papyruses. In this talk, I will outline different ways in which Wikidata and/ or Wikibase can and do interact with research workflows, as well as opportunities to expand these interactions in the future, especially in the context of open science and citizen science projects.
        Speaker: Daniel Mietchen (National Institute of Health (US))
    • 16:00 18:00
      Tutorial: Programming Distributed Computing Platforms with COMPSs Federico II

      Federico II

      Villa Romanazzi Carducci

      Convener: Daniele Lezzi (Barcelona Supercomputing Center)
      • 16:00
        Programming Distributed Computing Platforms with COMPSs 2h
        Distributed computing platforms like clusters, grids and clouds pose a challenge on application developers due to different issues such as distributed storage systems, complex middleware, geographic distributions. COMPSs [1] is a programming model which is able to exploit the inherent concurrency of sequential applications and execute them in a transparent manner to the application developer in distributed computing platforms. This is achieved by annotating part of the codes as tasks, and building at execution a task-dependence graph based on the actual data consumed/produced by the tasks. The COMPSs runtime is able to schedule the tasks in the computing nodes and take into account facts like data locality and the different nature of the computing nodes in case of heterogeneous platforms. Additionally, recently COMPSs has been enhanced with the possibility of coordinating Web Services as part of the applications and extended on top of a big data storage architectures. In the course, the syntax, programming methodology and an overview of the runtime internals will be given. The attendees will get a first lesson about programming with COMPSs that will enable them to start programming with this framework. The attendees will analyze several examples of COMPSs programming model compared with other programming models, such as Apache Spark, and also examples of porting libraries and codes to this framework. Different programming languages will be used including Java and Python whose adoption for scientific computing has been gaining momentum in the last years [2]. A hands-on with simple introductory exercises will be also performed. The participants will be able to develop simple COMPSs applications and to run them in the EGI Federated Cloud testbed. COMPSs is available in the EGI Cloud Marketplace as solution [3] for the integration of applications (use cases from BioVeL, LOFAR and EUBrazilCC communities) in the federated cloud environment providing scalability and elasticity features.
        Speaker: Daniele Lezzi (Barcelona Supercomputing Center)
    • 09:00 10:30
      Current status and evolution of the EGI AAI Scuderia

      Scuderia

      Villa Romanazzi Carducci

      Convener: Peter Solagna (EGI.eu)
      • 09:00
        Comparison of Authentication and Authorization e-Infrastructures for Research 20m
        Today there exist several international e-Infrastructures that were built to address the federated identity management needs of research and education in Europe as well as the rest of the world. While some of these e-Infrastructures were specifically built for particular groups of research communities (DARIAH, ELIXIR AAI, CLARIN SPF), others were built with a more general target group in mind. The second group includes eduGAIN, EGI, EUDAT, Moonshot and to some extent also Stork. All of these "general-purpose" e-Infrastructures are international or even global. They differ in characteristics, coverage, governance and technology even though they all share the same goal: Provide an infrastructure to facilitate the secure exchange of trusted identity data for authentication and authorization. As most of these five e-Infrastructures use different and quite complex technologies, it is often difficult to know and understand even the basic concepts they are based on. Even operators of one particular e-Infrastructure often don't know sufficiently about the technical mechanisms, the policies and the needs of the main users of the other e-Infrastructures. Having a good overview about the different e-Infrastructures in these regards is even more difficult for research communities that are about to decide which and how to use one of these e-Infrastructures for their own purposes. It is thus no surprise that it is hard for research communities to learn and understand the differences and commonalities of these e-Infrastructures. Therefore, the presentation aims at shedding light on the uncharted world of e-Infrastructures by providing a comprehensive and objective overview about them. It will cover the differences, coverage, advantages, known limitations, as well as the overlaps and opportunities for interoperability. The presentation will be based on an e-Infrastructure comparison document that is written by the GÉANT project in collaboration with and involvement from the described e-Infrastructures that will be invited to contribute to the document.
        Speaker: Lukas Haemmerle (SWITCH)
    • 09:00 10:30
      Demand Of Data Science Skills & Competences Sala D, Giulia Centre

      Sala D, Giulia Centre

      Villa Romanazzi Carducci

      Convener: Prof. Matthias Hemmje (FernUniversität in Hagen)
    • 09:00 10:30
      EGI Council meeting (closed) Europa

      Europa

      Villa Romanazzi Carducci

      Convener: Yannick Legre (EGI.eu)
    • 09:00 10:30
      Tutorial: Hybrid Data Infrastructures: D4Science as a case study Federico II

      Federico II

      Villa Romanazzi Carducci

      Conveners: Pasquale Pagano (CNR), gianpaolo coro (CNR)
      • 09:00
        A Tutorial on Hybrid Data Infrastructures: D4Science as a case study 1h 30m
        An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations allowing scientists residing at distant places to collaborate. They may offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction. E-Infrastructures can have different implementations (Andronico et al 2011). A major distinction is between (i) Data e-Infrastructures, i.e. digital infrastructures promoting data sharing and consumption to a community of practice (e.g. MyOcean, Blanc 2008) and (ii) Computational e-Infrastructures, which support the processes required by a community of practice using GRID and Cloud computing facilities (e.g. Candela et al. 2013). A more recent type of e-Infrastructure is the Hybrid Data Infrastructure (HDI) (Candela et al. 2010), i.e. a Data and Computational e-Infrastructure that adopts a delivery model for data management, in which computing, storage, data and software are made available as-a-Service. HDIs support, for example, data transfer, data harmonization and data processing workflows. Hybrid Data e-Infrastructures have already been used in several European and international projects (e.g. i-Marine 2011; EuBrazil OpenBio 2011) and their exploitation is growing fast supporting new projects and initiatives, e.g. Parthenos, Ariadne, Descramble. A particular HDI, named D4Science (Candela et al. 2009), has been used by communities of practice in the fields of biodiversity conservation, geothermal energy monitoring, fisheries management, and culture heritage. This e-Infrastructure hosts models and resources by several international organizations involved in these fields. Its capabilities help scientists to access and manage data, reuse data and models, obtain results in short time and share these results with other colleagues. In this tutorial, we will give an overview of the D4Science capabilities; in particular, we will show practices and methods that large international organizations like FAO and UNESCO apply by means of D4Science. At the same time, we will explain how the D4Science facilities conform to the concepts of e-Infrastructures, Virtual Research Environments (VREs), data sharing and experiments reproducibility. In our tutorial, we will give insight about how D4Science contributors can add new models and algorithms to the processing platform. D4Science adopts methods to embed software developed by communities of practice involving people with limited expertise in Computer Science. Community software involves legacy programs (e.g. written in Fortran 90) as well as R scripts developed under different Operating Systems and versions of the R interpreters. D4Science is able to manage this multi-language scenario in its Cloud computing platform (Coro et al. 2014). Finally, D4Science uses the EGI Federated Cloud (FedCloud) infrastructure for data processing: computations are parallelized by dividing the input in several chunks and each chunk is sent to D4Science services residing on FedCloud (Generic Workers) to be processed. Furthermore, another D4Science service executing data mining algorithms (DataMiner) also resides on FedCloud and adopts an interface that is compliant with the Web Processing Service (WPS, Schut and Whiteside 2015) specifications.
        Speakers: Pasquale Pagano (CNR), gianpaolo coro (CERN)
    • 10:30 11:00
      Coffee break
    • 11:00 12:30
      Academic Supply For Data Science Sala D

      Sala D

      Villa Romanazzi Carducci

      Via G. Capruzzi, 326 70124 Bari Italy
      Convener: Prof. Matthias Hemmje (FernUniversität in Hagen)
    • 11:00 12:30
      Current status and evolution of the EGI AAI Scuderia

      Scuderia

      Villa Romanazzi Carducci

      Convener: Peter Solagna (EGI.eu)
      • 11:00
        Enhancing the user life-cycle management management in EGI Federated Cloud 20m
        Basic FedCloud user access scenario is composed of a VOMS server for authentication and authorization, and the site itself. Users with valid VOMS credentials are automatically created on sites. This solution is easy to deploy and manage but has several drawbacks if you need to support the whole user life-cycle. In this presentation we will introduce Perun as an additional component in the described scenario. Perun is EGI Core Service for VO and group management, it also provides functionality for managing access to services. It supports the whole user life-cycle from user import and enrollment through user expiration and membership renewal to complete account deletion and deprovisioning from services. In addition, it supports linking of multiple external identities (federated identities, X.509 certificates, kerberos, …) to one user account. As a part of its service management capabilities, Perun can propagate user accounts to both VOMS and sites. VOMS will still be used as the authentication and authorization service. User data is managed centrally and then distributed to VOMS. For example, if the user wants to change her/his certificate, she or he is able to do it in one place even though he is a member of several VOs. Active propagation of user data to sites enables users to change their preferences (e.g. contact e-mail) in one place, then the information is distributed to all sites without any further action required from the users. More importantly, it enables sites to know about expired or suspended users and take appropriate action, such as suspending or stopping their virtual machines. That substantially enhances security of Federated Cloud sites.
        Speaker: Slavek Licehammer (CESNET)
      • 11:20
        The Indigo AAI 20m
        The Indigo Project [1] set out to develop a data and computing platform targeting scientific communities, deployable on multiple hardware and provisioned over hybrid e-infrastructures. This includes delegation of access tokens to a multitude of (orchestrated) virtual machines or containers as well as authentication of REST calls from and to VMs and other parts of the infrastructure. We introduce different tokens for delegation and tokens for accessing services directly. In this contribution we describe - the Indigo approach to address token handling (delegation tokens, access tokens, ...) - token translation to support SAML, X.509 and more, on client and server side - the plan to include support for VO-managed groups - our approach to providing a more fain grained limitation of delegated access tokens [1]https://www.indigo-datacloud.eu/
        Speakers: Andrea Ceccanti (INFN), Marcus Hardt (KIT-G)
    • 11:00 12:30
      EGI Council meeting (closed) Europa

      Europa

      Villa Romanazzi Carducci

      Convener: Yannick Legre (EGI.eu)
    • 11:00 12:30
      Tutorial: Security training Federico II

      Federico II

      Villa Romanazzi Carducci

      Convener: Dr Sven Gabriel (NIKHEF)
      • 11:00
        Security training 1h 30m
        Cyber attacks have become ubiquitous and attackers are targeting a wide range of services on the Internet. Resources involved in EGI are no exception and are constantly probed by attackers launching massive attacks that strive to find vulnerable machines anywhere. Successful attacks cause additional harm, including damage to the reputation of institutions and EGI. Therefore, EGI as well as service and machine operators have to be prepared to provide proper incident response to make sure security incidents are dealt with in a proper manner. The training session will demonstrate how easy it is to perform a cyber attack against a site. The attendees will be walked through a live scenario that shows basic offensives principles and techniques. Then, the session will focus on how to provide proper response to incident. The target audience for the training are cloud providers, owners of virtual machines and maintainers of their images.
        Speaker: Dr Sven Gabriel (NIKHEF)
    • 12:30 13:30
      Lunch
    • 13:30 15:00
      Astronomy and astrophysical large experiments and e-infrastructure - new frontiers Scuderia

      Scuderia

      Villa Romanazzi Carducci

      Convener: Giuliano Taffoni (INAF)
      • 13:30
        Cherenkov Telescope Array data processing: a production system prototype 20m
        The Cherenkov Telescope Array (CTA) - a proposed array of many tens of Imaging Atmospheric Cherenkov Telescopes - will be the next-generation instrument in the field of very high energy gamma-ray astronomy. CTA will operate as an open observatory providing data products and analysis tools to the entire scientific community. An average data stream of about 1 GB/s for approximately 2000 hours of observation per year, is expected to produced several PB/year. A large amount of CPU time will be required for data processing as well as for massive Monte Carlo simulations used to derive the instrument response functions. The current CTA computing model is based on a distributed infrastructure for the archive and the data off-line processing. In order to manage the off-line data processing in a distributed environment, CTA has evaluated the DIRAC (Distributed Infrastructure with Remote Agent Control) system, which is a general framework for the management of tasks over distributed heterogeneous computing environments. For this purpose, a production system prototype has been developed, based on the two main DIRAC components, i.e. the Workload Management and Data Management Systems. This production system has been successfully used on three massive Monte Carlo simulation campaigns to characterize the telescope site candidates, different array layouts and the camera electronic configurations. Results of the DIRAC evaluation will be presented as well as the future development plans. In particular, these include further automatization of high level production tasks as well as the proposed implementation of interfaces between the DIRAC Workload Management System and the CTA Archive and Pipeline Systems, currently under development.
        Speaker: arrabito arrabito (LUPM CNRS/IN2P3)
    • 13:30 15:00
      EGI Council meeting (closed) Europa

      Europa

      Villa Romanazzi Carducci

      Via G. Capruzzi, 326 70124 Bari Italy
      Convener: Yannick Legre (EGI.eu)
    • 13:30 15:00
      Tutorial: EUDAT infrastructure Federico II

      Federico II

      Villa Romanazzi Carducci

      Convener: Rene Horik, van (DANS - Data Archiving and Networked Services)
      • 13:30
        Training workshop: “The EUDAT infrastructure: Foundation, utilisation and implementation” 1h 30m
        Training Workshop “The EUDAT infrastructure: Foundation, utilisation and implementation” EUDAT is a collaborative pan-European infrastructure providing research data services, training and consultancy for researchers, research communities and research infrastructures and data centers. The aim of the training workshop is to present and discuss the research data services of the EUDAT infrastructure. First, based on a research data life cycle the EUDAT services are positioned. Next, an example of a research community using the EUDAT services is presented. The last part of the workshop provides detailed information on a number of services EUDAT provides. Convenor: René van Horik (DANS - Data Archiving and Networked Services) Program 1. Introduction (30 minutes) The introduction provides information on the EUDAT initiative and the structure of the workshop. 2. Research Data Life Cycle and the EUDAT offer (60 minutes) The increasing importance of research data is acknowledged by all scientific disciplines. Research data must be managed properly and for this services are required, that enable the creation, analysis, storage and reuse of research data. Based on a research data lifecycle model an overview is given of how EUDAT supports the research data life cycle. This presentation is aimed at researchers (and research communities) as well as research infrastructures that would like to learn what EUDAT has to offer. 3. Use-cases. Examples from practice (30 minutes per use-case. Number of use-cases to be decided) How are the EUDAT services used in practice? A number of examples are presented. This presentation is aimed at research communities that are looking for research data management services. They will learn how other researchers are using EUDAT. 4. Under the hood: details of EUDAT services (30 minutes per service, number of services to be decided) This presentation provides detailed technical information on a number of the EUDAT services. Attention is given to ways to integrate the services in existing infrastructures. E.g. implementation of the PID, AAI, etc. This presentation is aimed at developers that have a sound technical background
        Speaker: Rene Horik, van (DANS - Data Archiving and Networked Services)
    • 15:00 15:30
      Coffee break
    • 15:30 17:00
      Advances in the computational chemistry and material science field Sala D, Giulia centre

      Sala D, Giulia centre

      Villa Romanazzi Carducci

      Convener: Antonio Lagana (UNIPG)
    • 15:30 17:00
      Community clouds Scuderia

      Scuderia

      Villa Romanazzi Carducci

      Convener: Dr Enol Fernandez (EGI.eu)
      • 15:30
        Linking EUBrazilCloudConnect and EGI Federated Cloud 20m
        EUBrazilCloudConnect (EUBrazilCC) - EU-Brazil Cloud infrastructure Connecting federated resources for Scientific Advancement (2013-2015) aims to develop a state-of-the-art Cloud Computing environment that efficiently and cost-effectively exploits the computational, communication and data resources in both the EU & Brazil with selected interoperable and user-centric interfaces, which involve the support to complex workflows and access to huge datasets. EUBrazilCC strongly focuses on interoperability. It has adopted mainstream standards in clouds and integrates with different services in EGI at the level of the infrastructure, the platform components and the use cases. Regarding the infrastructure, UFCG has developed fogbow, a lightweight federation middleware for on-premise cloud providers. Fogbow’s API implements an extension of the OCCI standard. To create a VM in a fogbow federation a client issues a request with the resource specification (eg. VM flavour, image, requirements, etc.) and receives a handle for this request. Eventually the request is fulfilled and the client can use the request handle to have access to the pertinent information to access the VM (eg. IP address). In this way, fogbow can be used to deploy VMs across multiple EGI Federated Cloud sites. Fogbow can also make use of vmcatcher to prefetch VMIs registered in the EGI appDB. EUBrazilCC uses VOMS for the authorisation and has registered a VO in the EGI databases (eubrazilcc.eu). All the services in EUBrazilCC uses VOMS for authentication. EUBrazilCC incorporates several tools for the brokering of resources and the deployment of customised Virtual Appliances. Among those tools, two of them are already used within EGI Federated cloud, Infrastructure Manager (IM) and COMPSs, providing a seamless integration of both infrastructures. In this way, IM can be used to deploy and install the same configuration in different infrastructures, using the same configuration specification and based on a common basic instance. COMPSs can also elastically deploy a virtual infrastructure adapting the number of resources to the actual computational load and run the same workload in hybrid environments composed of public and private providers, provided that compatible VMIs are available in the target infrastructure. Finally, interoperability is also aimed at the level of the applications. EUBrazilCC will register the VMIs of the applications in the EGI appDB. Currently, there are VMIs for the Leishmaniasis Virtual Lab and the eScience Central workflow engine that uses it, as well as for COMPSs and for the mc2 platform for developing scientific gateways. All of them can be deployed in EGI Federated Cloud.
        Speaker: Dr Ignacio Blanquer (UPVLC)
      • 15:50
        Setting up a new FedCloud site in collaboration with the industry 20m
        Doñana National Park is a natural reserve in the south of Spain, which biodiversity is unique in Europe and is tagged as an UNESCO World Heritage Site. The importance of this place requires an infrastructure capable to provide environmental data at different scales and on-line available that support monitoring of environmental changes in short, mid and long term. Supported by European FEDER funds and Spanish Ministry, Doñana Biological Station, institute that manage the research in Doñana, is developing different actions to improve and adapt the internationalization of the e-infrastructure for Lifewatch ESFRI. Within these actions, different companies are working to deploy a computing based on cloud site and integrated with EGI FedCloud. The deployment of this site is distributed in four different tasks focused in different features that give the site an added value to become a reference for Lifewatch ICT: • Set up of the infrastructure needed: installation of servers and packages needed to support a cloud system based on OpenStack and compatible with EGI FedCloud. • Distributed Control: this task adds new features for Lifewatch managers and makes all the resources easily available and manageable: monitoring, accounting, deployment of new services, SLA management… • Collaborative environments: user-oriented task to make the resources available for the final user through higher abstraction layers: PaaS, SaaS, WaaS (Workflow as a Service), etc. • Data preservation: This set of features makes the resources very data-oriented and allows users to manage the whole data lifecycle. This presentation will show the collaboration between with the industry in the deployment of the new EGI FedCloud site as well as all the features added and cloud-based tools used (or tested) for that like OpenShift, Cloudify, Mesos, Kubernetes, as well as which solution has been adopted and why.
        Speaker: Fernando Aguilar (CSIC)
      • 16:10
        Volunteer Clouds for the LHC experiments 20m
        Volunteer computing remains a largely untapped opportunistic resource for the LHC experiments. The use of virtualization in this domain was pioneered by the Test4Theory project and enabled the running of high energy particle physics simulations on home computers. Recently the LHC experiments have been evaluating the use of volunteer computing to provide additional opportunistic resources for simulations. In this contribution we present an overview of this work and show how the model adopted is similar to the approach also used for exploiting resources from the EGI Federated Cloud.
        Speaker: Laurence Field (CERN)
    • 15:30 17:00
      EGI Council meeting (closed) Europa

      Europa

      Villa Romanazzi Carducci

      Via G. Capruzzi, 326 70124 Bari Italy
      Convener: Yannick Legre (EGI.eu)
    • 15:30 17:00
      Virtual Research Environments Federico II

      Federico II

      Villa Romanazzi Carducci

      Convener: Dr Gergely Sipos (EGI.eu)
      • 15:30
        Building Virtual Research Environments: The Lion grid Experience. 20m
        Research & Development (R&D) statistics is one of the key indices and important component in measuring a country’s National Innovation System (NIS). The R&D landscape has changed so much within the 21st century. Many countries are categorized as developed or developing based on their ability or inability to rise with the tide of research and technological advancement. Poor research funding, chronic lack of research infrastructure, lack of appreciation of research findings and scanty information base on who is working on what or lack of collaboration remains the recurring decimal that affect the development of research in developing countries. Research in the 21st century requires skills in the area of the 4Cs of (Critical thinking and problem solving, Communication, Collaboration, and Creativity and innovation), all of which are addressed by Virtual Research Environments (VREs).Virtual Research Environment is an online system that helps researchers to collaborate, by providing access to e-infrastructures and tools for simulation, data analysis and visualization, etc. In 2011, we deployed the first-ever Grid Computing e-infrastructure in Nigeria, the Lion Grid, under the HP-UNESCO Brain Gain Initiative project. This led to the building of the first VRE in Nigeria. Our VRE database has grown through workshops, demonstrations, and training close to 500 members from heterogeneous research backgrounds. The project has developed applications for the local research community, deployed existing apps for its VRE, demonstrated the use of Science Gateways and Identity providers, as well as trained hundreds of researchers and technical support staff. In this paper, we present our experiences, prospects of the VRE and future plans.
        Speakers: Dr Collins Udanor (Dept. of Computer Science, University of Nigeria Nsukka, Nigeria), Dr Florence Akaneme (Dept. of Plant Science & Biotechnology, University of Nigeria Nsukka, Nigeria.)
      • 15:50
        Energising Scientific Endeavour through Science Gateways and e-Infrastructures in Africa: the Sci-GaIA project 20m
        In African Communities of Practice (CoPs), international collaboration and the pursuit of scientific endeavour has faced a major barrier with the lack of access to e-Infrastructures and high performance network infrastructure enjoyed by European counterparts. With the AfricaConnect and the just-about-to-start AfricaConnect2 projects and the regional developments carried out by both the Regional Education and Research Networks (RRENs) and the National Education and Research Networks (NRENs), this situation is changing rapidly. In the “Teaming-up for exploiting e-Infrastructures' potential to boost RTDI in Africa” (eI4Africa) project it has been demonstrated clearly that it is possible to develop e-Infrastructure services in Africa. It has also been demonstrated clearly that, as with the rest of the world, easy to use web portals, or Science Gateways, are needed to help CoPs to easily access e-Infrastructure facilities and through these collaborate with CoPs across the world. However, a major problem exists: it is very difficult for non-experts to develop Science Gateways and deploying and supporting e-Infrastructures. Elements of guides and supporting materials exist but these are either written for different audiences or out of date. The EU-funded “Energising Scientific Endeavour through Science Gateways and e-Infrastructures in Africa” (Sci-GaIA) project started on the 1st of May 2015 for a duration of two years and proposes to bring together these materials into clearly structured guides and educational documents that can be used to train and support representatives of NRENs, CoPs and, importantly, Universities to develop Science Gateways and e-Infrastructures in Africa. This will give a sustainable foundation on which African e-Infrastructures can be developed. Importantly, the results of our project will be usable by CoPs in Europe and the rest of the world. To achieve this we bring together a highly experienced team of beneficiaries that have worked between Africa and Europe to advance African e-Infrastructures. The objectives of Sci-GaIA are: - To promote the uptake of Science Gateways and e-Infrastructures in Africa and Beyond; - To support new and already emerging CoPs; - To strengthen and expand e-Infrastructure and Science Gateway related services; - To train, disseminate, communicate and outreach. In the contribution we will present the consortium running Sci-GaIA, its workplan and the current status of the activities with special focus on the tools and services deployed to support the development of e-Infrastructures in Africa and train the CoPs working in that continent and collaborating with their counterparts in Europe. Opportunities for participation and collaboration for EGI-related communities will also be outlined and discussed.
        Speaker: Roberto Barbera (Univesity of Catania and INFN)
      • 16:10
        Next generation Science Gateways in the context of the INDIGO project: a pilot case on large scale climate change data analytics 20m
        The INDIGO project aims at developing a data/computing platform targeted at scientific communities, deployable on multiple hardware, and provisioned over hybrid e-Infrastructures. This platform features contributions from leading European distributed resource providers, developers, and users from various Virtual Research Communities (VRCs). INDIGO aims to develop tools and platforms based on open source solutions addressing scientific challenges in the Grid, Cloud and HPC/local infrastructures and, in the case of Cloud platforms, providing PaaS and SaaS solutions that are currently lacking for e-Science. INDIGO will also develop a flexible and modular presentation layer connected to the underlying IaaS and PaaS frameworks, thus allowing innovative user experiences including web/desktop applications and mobile appliances. INDIGO covers complementary aspects such as VRC support, software lifecycle management and developers support, virtualized resource provisioning (IaaS), implementation of a PaaS layer and, on a top level, provisioning of Science Gateways, mobile appliances, and APIs to enable a SaaS layer. INDIGO adopts the Catania Science Gateway framework (CSGF) as presentation layer for the end users. The CSGF is a standard-based solution that, by exploiting well consolidated standards like OCCI, SAGA, SAML, etc., is capable to target any distributed computing infrastructure, while providing a solution for mobile appliances as well. In the context of INDIGO, the CSGF will be completely re-engineered in order to include additional standards, such as CDMI and TOSCA, and to be exposed as a set of APIs. This paper presents an early use case examined by the project from the final users perspective, therefore interfacing INDIGO targeted resources through a preliminary web interface, provided by a Science Gateway, which hides the complexities of the underlying services/systems. This use case relates to the climate change domain and community (European Network for Earth System modelling - ENES) and tackles large scale data analytics requirements related to the CMIP5 experiment, and more specifically to anomalies analysis, trend analysis and climate change signal analysis. It demonstrates the INDIGO capabilities in terms of software framework deployed on heterogeneous infrastructures (e.g., HPC clusters and cloud environments), as well as workflow support to run distributed, parallel data analyses. While general-purpose WfMSs (e.g., Kepler, Taverna) are exploited in this use case to orchestrate multi-site tasks, the Ophidia framework is adopted at the single-site level to run scientific data analytics workflows consisting of tens/hundreds of data processing, analysis, and visualization operators. The contribution will highlight: (i) the interoperability with the already existing community-based software eco-system and infrastructure (IS-ENES/ESGF); (ii) the adoption of workflow management system solutions (both coarse and fine grained) for large-scale climate data analysis; (iii) the exploitation of Cloud technologies offering easy-to-deploy, flexible, isolated and dynamic big data analysis solutions; and (iv) the provisioning of interfaces, toolkits and libraries to develop high-level interfaces/applications integrated in a Science Gateway. The presentation will also include a discussion on how INDIGO services have been designed to fulfil the requirements of many diverse VRCs.
        Speaker: Roberto Barbera (University of Catania and INFN)
      • 16:30
        Peak to long-tail : how cloud is the fabric and the workshop 20m
        Modern scientific discovery has followed modern innovation. We’ve enjoyed over 50years of Moore’s Law, over which computing capability has grown at near 50% compounded, in-turn driving model/simulation-driven scientific discovery. More recently then number of devices on the Internet has grown at a similar rate, and sensing capabilities have grown at half that rate, both driving data-driven discovery. No other modern-day man-made innovations come close. Hence, researchers creating tools and workflows over this space integrate instruments with storage with analysis tools and computing resources, to effectively create the 21st century equivalent of the humble microscope. Further, and because it is software, successful tools are then readily proliferated to others who explore the space of a discipline. The dichotomy of expectations: peak and long-tail, modelling and data, creates a significant tension for e-infrastructure providers. Do we serve peak-modellers (~HPC)? Do we serve the data-driven peak? Or do we serve the long-tail? Research @ Cloud Monash (R@CMon, pronounced “rack-mon”) is a single scalable heterogeneous fabric that spans the peak and long-tail agendas. It rekindles the computing centre as a workshop for tooling experiments - things are not bought but bespoke and made for the experiment, whilst also, driving modern data computing consolidation and scale. R@CMon is a node of the NeCTAR Research Cloud, a major data storage facility, a HPC facility, a virtual desktop facility and the home of the Characterisation Virtual Lab. The goal is to nurture virtual research environments that scale between peak and desktop to long-tail and HPC.
        Speaker: Steve Quenette (Monash University)
    • 17:00 22:00
      Guided tour of Bari old town and social dinner
    • 09:00 10:00
      Closing plenary Europa

      Europa

      Villa Romanazzi Carducci

      • 09:00
        Opportunities and challenges of the e-Infrastructures and the new H2020 Work Programme 16-17 40m
        Speaker: Augusto Burgeño (European Commission)
      • 09:40
        Conclusions and closing cerimony 20m
        Speakers: Dr Tiziana Ferrari (EGI.eu), Yannick Legre (EGI.eu)
    • 10:00 10:30
      Coffee break
    • 10:30 12:30
      EDISON project: Expert Liaison Group meetings

      This closed session is for those who have been invited to join one of the three Expert Liaison Groups (ELG) convened as part of the recently funded EU EDISON project. EDISON has been established to support the development of the data science career path into a recognised profession. The three ELGs represent employers, universities and data experts, and will meet to contribute to the project’s aim of supporting and accelerating the process of establishing data scientist as a certified profession.

      EDISON will run for 24 months and has seven core partners from across Europe. The project is coordinated by Yuri Demchenko at the University of Amsterdam in the Netherlands.

      See project website for further details on the aims and objectives of EDISON. http://edison-project.eu

      Convener: Steve Brewer (University of Southampton)
    • 10:30 12:30
      INDIGO DataCloud project meeting

      On Friday, Nov 13th, two meetings of the INDIGO-DataCloud project will take place immediately after the conclusion of the EGI Community Forum. These meetings, reserved to invited INDIGO-DataCloud participants, are the INDIGO Project Management Board’s (PMB) in the morning and the INDIGO Technical Board’s (TB) in the afternoon. These two bodies steer the technical development of the INDIGO-DataCloud project, whose goal is to create an open Cloud platform for Science. INDIGO-DataCloud is an H2020 project, funded from April 2015 to September 2017, involving 26 European partners and based on use cases and support provided by several multi-disciplinar scientific communities and e-infrastructures. The project will extend existing PaaS (Platform as a Service) solutions, allowing public and private e-infrastructures, including those provided by EGI, EUDAT, PRACE and Helix Nebula, to integrate their existing services and make them available through AAI services compliant with GEANT’s inter-federation policies, thus guaranteeing transparency and trust in the provisioning of such services. INDIGO will also provide a flexible and modular presentation layer connected to the PaaS and SaaS frameworks developed within the project, allowing innovative user experiences and dynamic workflows, also from mobile appliances.

    • 10:30 12:30
      Open Science Cloud workshop

      In the conclusions on "open, data-intensive and networked research as a driver for faster and wider innovation" (May 28-29 2015) the Competitiveness Council welcomed "the further development of a European Open Science Cloud that will enable sharing and reuse of research data across disciplines and borders, taking into account relevant legal, security and privacy aspects".

      The Council conclusions offer an opportunity of reflection on the experience gathered by the EGI Cloud Federation and other cloud initiatives worldwide addressing the needs of data-driven science. New challenges and requirements are emerging from Research Infrastructures EGI cooperates with in the context of the EC funded project EGI-Engage.

      The workshop offers the opportunity to e-Infrastructure and Research Infrastructure providers, publicly funded and commercial cloud providers, data providers, international research collaborations and policy managers to gather and discuss how the needs of research are pushing the technology, policy and regulatory boundaries of cloud provisioning in Europe and worldwide.

      The workshop will address topics including:

      • federating public and private cloud infrastructure for research;
      • cross-border data sharing and federation worldwide;
      • scalable access to and analysis of research data for reuse;
      • services for depositing data for resource-bound users;
      • promoting the sharing of open source scientific software and community applications

      The expected outcome of the workshop is a position paper that defines the state of play of cloud services for research and identifies the barriers that an Open Science Cloud initiative could help removing to better support international research communities.

      Convener: Dr Tiziana Ferrari (EGI.eu)
    • 12:30 13:30
      Lunch
    • 13:30 15:30
      EDISON project: Expert Liaison Group meetings

      This closed session is for those who have been invited to join one of the three Expert Liaison Groups (ELG) convened as part of the recently funded EU EDISON project. EDISON has been established to support the development of the data science career path into a recognised profession. The three ELGs represent employers, universities and data experts, and will meet to contribute to the project’s aim of supporting and accelerating the process of establishing data scientist as a certified profession.

      EDISON will run for 24 months and has seven core partners from across Europe. The project is coordinated by Yuri Demchenko at the University of Amsterdam in the Netherlands.

      See project website for further details on the aims and objectives of EDISON. http://edison-project.eu

      Convener: Steve Brewer (University of Southampton)
    • 13:30 15:30
      INDIGO DataCloud project meeting

      On Friday, Nov 13th, two meetings of the INDIGO-DataCloud project will take place immediately after the conclusion of the EGI Community Forum. These meetings, reserved to invited INDIGO-DataCloud participants, are the INDIGO Project Management Board’s (PMB) in the morning and the INDIGO Technical Board’s (TB) in the afternoon. These two bodies steer the technical development of the INDIGO-DataCloud project, whose goal is to create an open Cloud platform for Science. INDIGO-DataCloud is an H2020 project, funded from April 2015 to September 2017, involving 26 European partners and based on use cases and support provided by several multi-disciplinar scientific communities and e-infrastructures. The project will extend existing PaaS (Platform as a Service) solutions, allowing public and private e-infrastructures, including those provided by EGI, EUDAT, PRACE and Helix Nebula, to integrate their existing services and make them available through AAI services compliant with GEANT’s inter-federation policies, thus guaranteeing transparency and trust in the provisioning of such services. INDIGO will also provide a flexible and modular presentation layer connected to the PaaS and SaaS frameworks developed within the project, allowing innovative user experiences and dynamic workflows, also from mobile appliances.

    • 13:30 15:30
      Open Science Cloud workshop

      In the conclusions on "open, data-intensive and networked research as a driver for faster and wider innovation" (May 28-29 2015) the Competitiveness Council welcomed "the further development of a European Open Science Cloud that will enable sharing and reuse of research data across disciplines and borders, taking into account relevant legal, security and privacy aspects".

      The Council conclusions offer an opportunity of reflection on the experience gathered by the EGI Cloud Federation and other cloud initiatives worldwide addressing the needs of data-driven science. New challenges and requirements are emerging from Research Infrastructures EGI cooperates with in the context of the EC funded project EGI-Engage.

      The workshop offers the opportunity to e-Infrastructure and Research Infrastructure providers, publicly funded and commercial cloud providers, data providers, international research collaborations and policy managers to gather and discuss how the needs of research are pushing the technology, policy and regulatory boundaries of cloud provisioning in Europe and worldwide.

      The workshop will address topics including:

      • federating public and private cloud infrastructure for research;
      • cross-border data sharing and federation worldwide;
      • scalable access to and analysis of research data for reuse;
      • services for depositing data for resource-bound users;
      • promoting the sharing of open source scientific software and community applications

      The expected outcome of the workshop is a position paper that defines the state of play of cloud services for research and identifies the barriers that an Open Science Cloud initiative could help removing to better support international research communities.

      Convener: Dr Tiziana Ferrari (EGI.eu)
    • 15:30 16:00
      Coffee break