EGI Community Forum 2015

Name: EGI Community Forum 2015
Start: 2015-11-10T08:00:00+01:00
End: 2015-11-13T18:00:00+01:00
Location: Villa Romanazzi Carducci

10 Nov 2015, 08:00 → 13 Nov 2015, 18:00 Europe/Rome

Villa Romanazzi Carducci

Via G. Capruzzi, 326 70124 Bari Italy

Tiziana Ferrari (EGI.EU)

Description

Building Next Generation e-Infrastructures through Communities

EGI Community Forum – Bari 2015

Nowadays, research practice is increasingly and in many cases exclusively data driven. Knowledge of how to use tools to manipulate research data, and the availability of e-infrastructures to support them, are foundational. Along with this, new types of communities are forming around interests in digital tools, computing facilities and data repositories.

By making infrastructure services, community engagement and training inseparable, existing communities can be empowered by new ways of doing research, and new communities can be created around tools and data. The EGI Community Forum aims at gathering tool developers, infrastructure providers, data providers and research communities to work together towards open science.

More information and timebale of the EGI-Engage face-to-face meetings

Conference4Me app

"The Conference4me smartphone app provides you with the most comfortable tool for planning your participation in . Browse the complete programme directly from your phone or tablet and create your very own agenda on the fly. The app is available for Android, iOS, Windows Phone and Kindle Fire devices.

To download mobile app, please visit the Conference4Me website or type 'conference4me' in Google Play, iTunes App Store, Windows Phone Store or Amazon Appstore.

Tuesday, 10 November
- 09:00 → 10:30
  Opening plenary Europa
  
  Europa
  
  Villa Romanazzi Carducci
  - 09:00
    
    Opening cerimony 10m
    
    Opening of the EGI Community Forum 2015 with Dr. Antonio Zoccoli, member of the executive board of the Italian National Institute for Nuclear Physics and Italian EGI council delegate.
  - 09:10
    
    The role of the Italian e-Infrastructure in supporting the Research Roadmap 40m
    
    Cristina Messa is Chancellor Università degli Studi di Milano - Bicocca, Vice President Italian National Research Council and Italian Horizon 2020 Research Infrastructures Delegate & Steering Committee Chair.
    
    Speaker: Cristina Messa (University of Milano-Bicocca)
  - 09:50
    
    EGI: Status, challenges and opportunities in the Digital Single Market era 40m
    
    Nowadays, research is increasingly and in many cases exclusively data driven. Knowledge of how to use tools to manipulate research data, and the availability of e-infrastructures to support them, are foundational. New communities of practice are forming around interests in digital tools, computing facilities and data repositories. By making infrastructure services, community engagement and training inseparable, existing communities can be empowered by new ways of doing research, and can grow around interdisciplinary data that can be easily accessed and manipulated via innovative tools as never done before. Key enablers of this process are openness, participation, collaboration, sharing and reuse, and the definition of standards and the enforcement of their adoption within the community. In some disciplinary areas communities succeeded in defining and adopting a community governance of the production and sustainable sharing of data and applications. The European Commission has identified the completion of the Digital Single Market (DSM) as one of its 10 political priorities and much effort is being done in aligning strategies and initiatives along the DSM targets. This presentation provides an overview of the services and the initiatives that EGI is supporting to promote open science and the technical roadmap and challenges that will drive its future evolution.
    
    Speaker: Dr Tiziana Ferrari (EGI.eu)
    
    Slides
- 10:30 → 11:00
  
  Coffee break
- 11:00 → 12:30
  Cross-border service procurement Scuderia
  
  Scuderia
  
  Villa Romanazzi Carducci
  
  With e-Infrastructures evolving towards service-oriented provision with on-demand allocation and pay-for-use capabilities, there is an opportunity for analysing and revising the procurement process for e-Infrastructure services. Currently publicly funded resource providers and their users lack the knowledge and mechanisms to collectively bid within a public procurement process. The goal of this activity is to analyse opportunities and barriers for cross-border procurement of e-Infrastructure services and to identify best practices that could enable RIs or large research collaborations to acquire services to support their research agenda collectively. A number of RIs and infrastructure providers will contribute to the analysis and documentation of use cases. A final report will be produced identifying opportunities, barriers, use cases and best practices. The report will be disseminated to relevant authorities at national and international level, including those involved in structural funding, and feedback will be collected. The activity will be led by CERN with the contribution of INGV (representing EPOS), CSIC (representing LifeWatch), BBMRI-ERIC, RBI (representing DARIAH) and EGI.eu (representing EGI and liaising with the NGIs).
  
  Convener: David Foster (CERN)
  
  CERN market survey
  
  HNSciCloud summary
  
  Previous meeting agenda and material
  - 11:00
    
    Introduction 10m
    
    Speaker: David Foster (CERN)
    
    Slides
  - 11:10
    
    PICSE procurement case studies and procurement process assessment tool 30m
    
    Speaker: Sara Garavelli (TRUST-IT)
    
    Slides
  - 11:40
    
    Procurement in EGI & CERN Cloud Market Survey Feedback 20m
    
    This presentation provides a brief overview of why procurement is an important opportunity for EGI, outlines on-going activities in related areas, reports feedback from EGI providers regarding the CERN Cloud Market Survey, issues experienced and offers a few recommendations moving forward.
    
    Speaker: Sy Holsinger (EGI.eu)
    
    Slides
  - 12:00
    
    Update on the HNSciCloud Pre Commercial Procurement project 20m
    
    Speaker: David Foster (CERN)
    
    Slides
  - 12:20
    
    Discussion, summary and next steps 10m
    
    Speaker: David Foster (CERN)
- 11:00 → 12:30
  Open Grid Forum Federico II
  
  Federico II
  
  Villa Romanazzi Carducci
  
  Convener: Alan Sill (CERN)
  - 11:00
    Open Grid Forum 1h 30m
    
    The Open Grid Forum is the standards body for many of the protocols used by grids and clouds, e.g. GridFTP and OCCI. The agenda will be as follows: 1. Introduction to OGF and OGF's support for standards and interoperation 2. Focus on interoperation for clouds and grids, e.g. networking, security, OCCI, etc., cloud plugfests 3. Questions for the Interoperable Global Trust Federation (IGTF, www.igtf.net) 4. AOB
    
    Speaker: Dr Jens Jensen (STFC)
    
    Slides
    
    Introduction to OGF 15m
    
    An introduction to the Open Grid Forum
    
    Slides
    
    Interoperation and Cloud Plugfests 20m
    
    This part of the session will look at the role of interoperation testing in general and cloud plugfests in particular in providing services for communities
    
    Inteoperation examples: OCCI and GridFTP 20m
    
    This part of the session will primarily be an update on OCCI, and following on from the interoperation discussion look at the role of interoperation and open standards for providing cloud services.
    
    Questions for IGTF 30m
    
    This part of the session will raise some questions for the Interoperable Global Trust Federation (www.igtf.net)
- 11:00 → 12:30
  Security Groups progress and plans in the changing EGI environment Sala A+A1, Giulia Centre
  
  Sala A+A1, Giulia Centre
  
  Villa Romanazzi Carducci
  
  As most are aware, the EGI infrastructure and various activities related to it are changing quite rapidly with new communities being engaged with EGI and new technology being used such as that related to the EGI Federated Cloud. These have a wide range of implications concerning the various security activities and evolving the security activities to meet with these challenges is being carried out as part of the EGI-Engage project. This session will include presentations from the Security Policy Group(SPG), the Computer Security Incident Response Team(CSIRT), Software Vulnerability Group(SVG) on progress during the first few months of the EGI-Engage project and plans for the coming year with emphasis on the impact and responsibilities of user and application communities in the changing environment.
  
  Convener: David Kelsey (STFC)
  - 11:00
    
    Introduction 5m
    
    Speaker: David Kelsey (STFC)
  - 11:05
    
    Security Policy 10m
    
    Speaker: David Kelsey (STFC)
    
    Slides
  - 11:15
    
    Software Vulnerability Group 15m
    
    Speaker: Linda Cornwall (STFC)
    
    Slides
  - 11:30
    
    Developments in the Trust Fabric 20m
    
    Speaker: David Groep (FOM)
    
    Paper
    
    Slides
  - 11:50
    
    Security Operations 30m
    
    Speakers: Daniel Kouril (CESNET), Dr Sven Gabriel (NIKHEF), Vincent Brillault (CERN)
    
    Slides
  - 12:20
    
    Discussion - Security Groups progress and plans in the changing EGI environment. 10m
    
    As most are aware, the EGI infrastructure and various activities related to it are changing quite rapidly with new communities being engaged with EGI and new technology being used such as that related to the EGI Federated Cloud. These have a wide range of implications concerning the various security activities and evolving the security activities to meet with these challenges is being carried out as part of the EGI-Engage project. This session will include presentations from the Security Policy Group(SPG), the Computer Security Incident Response Team(CSIRT), Software Vulnerability Group(SVG) on progress during the first year of the EGI-Engage project and plans for the future with emphasis on the impact and responsibilities of user and application communities in the changing environment.
    
    Speakers: David Kelsey (STFC), Linda Cornwall (STFC), Dr Sven Gabriel (NIKHEF)
- 11:00 → 12:30
  The EGI Federated Cloud - The word to the users: showcasing the fedcloud use cases Europa
  
  Europa
  
  Villa Romanazzi Carducci
  
  In this session, the EGI Federated Cloud use cases will be showcased. Relevant production applications and pilots will be invited to demonstrate both the multidisciplinary of the FedCloud and the different usage models enabled. Researchers will describe their experience with the FedCloud highlighting how this infrastructure is helping them and what new features they have in their wish list.
  This session will help EGI to provide a service that better fits the users needs. Indeed, the requirements that will be collected during this session will be further examined by the EGI Federated Cloud task force and included in the EGI technology roadmap when considered of general interest.
  
  Convener: Diego Scardaci (EGI.eu/INFN)
  - 11:00
    
    The EGI FedCloud - User Support 10m
    
    Speaker: Diego Scardaci (EGI.eu/INFN)
    
    Slides
  - 11:10
    
    EPOS-CC Computational Seismology Use Case 15m
    
    The EPOS Competence Center will focus its effort on the analysis and the prototypal implementation of a number of use cases, which are crucial for the realisation of the EPOS Research Infrastructure. These uses cases will span across different scenarios, such as the adoption of a scalable AAI and the integration of computational and Cloud Services, to support existing and yet to come scientific applications in Solid Earth Science. More specifically, one of the use cases aims to improve the back-end services of an existing application in the field of Computational Seismology, developed in the context of the EC funded project VERCE. The application allows the processing and the comparison of data resulting from the simulation of seismic wave propagation following a real earthquake and real measurements recorded by seismographs. While the simulation data is produced directly by the users and stored in a Data Management System, the observations need to be pre-staged from institutional data-services, which are maintained by the community itself. Users can interactively select the data of interest, compose and execute processing pipelines and conduct eventually the MISFIT analysis between the synthetics and the observed-data streams. The final scope of the tool is to support the researchers with the study and the improvement of regional and global Earth Models. Within the Competence Center we will evaluate improvements to the infrastructure which lays behind the aforementioned application, evaluating the adoption of Cloud technologies as an alternative to the GRID. Moreover, in cooperation with the AAI Use Case, we may consider the integration or the coexistence of new authorisation and delegation mechanisms to deal with secure job submissions and the acquisition of raw-data from the institutional data-services, if required. We will present the scientific tool in the context of the current Virtual Research Environment, which offers different Workflow technologies and a comprehensive Provenance and Data-Management System. We will show ideas and plans for the future integration of the EGI's new classes of services, taking into account a preliminary "benefits versus costs" evaluation. This is crucial for a proper assessment of the reusable components of the VRE within the EPOS RI, trying to pursue a strategy which accommodates diversity, as well as sustaining long term compatibility.
    
    Speakers: Alessandro Spinuso (KNMI), Andre Gemuend (FRAUNHOFER)
    
    Slides
  - 11:25
    
    From Case Studies to Application Requirements from Research Communities in the INDIGO-DataCloud project 15m
    
    The INDIGO project solutions will support several Research Communities with demanding and complex applications. A detailed analysis of the requirements to be supported has been prepared on the basis of Case Studies following a template where the Communities can describe their needs through User Stories, and where their specific Data Management needs are also considered. The experience and lessons learned in the process will be presented.
    
    Speakers: Dr Ignacio Blanquer (UPVLC), Peter Solagna (EGI.eu)
    
    Slides
  - 11:40
    
    ITRANS database - sorting structural motifs in cellular transporters 15m
    
    Cells interact with their surroundings via molecular transporters located in the cellular membrane. These transporters are fundamental for the uptake of nutrients and release of toxic compounds and its malfunction has been correlated with different diseases. Over the last five decades great advances have been taken in understanding the mechanism behind ions and solutes transport. A key tool that triggered the understanding at atomic level of this mechanism was the determination of high resolution structures. Currently, over 2000 high resolution structures of molecular transporters are deposited in the Protein Data Bank and available to researchers. In this work we present the ITRANS database. ITRANS is a relational database focusing on ion and solute transporters for witch high resolution structures were experimentally obtained. The database contains: classification, structural, motifs, functional and sequential information on cellular membrane transporters providing the end user a novel tool to study the mechanism of these proteins. The database spans information of different species and different families of ion and solute transporters, allowing the user to organize molecular transporters using different structural motifs and biophysical properties. ITRANS is accessible via a web interface that provides the user community several methods to access and organize the different datasets available. ITRANS is an end user driven database, thus the flexibility and characteristics of EGI Federated Cloud will be the best solution for an ITRANS Cloud deployment.
    
    Speaker: Dr Afonso Duarte (ITQB-UNL)
    
    Slides
  - 11:55
    
    Modeling transport phenomena in mesoscopic quantum systems 15m
    
    We overview a series of results on transport and nonlinear phenomena in mesoscopic quantum systems with focus on the use of the Gross-Pitaevskii equation for the dynamics of Bose-Einstein condensed gases and the Boltzmann-Vlasov equation for transport phenomena in nuclear matter. Our results on the dynamics of Bose-Einstein condensates (BECs) focus on the emergence of density waves through patter-forming instability and rely on both numerical and analytical results, while the results on the collective modes of nuclear matter focus on the pygmy and giant dipole resonances. The investigations focused on BECs address single-component and binary condensates with both homogeneous and inhomogeneous two-body interactions, in both stationary configurations and their dynamical evolution. The pygmy dipole resonance, in turn, is studied extensively by numerical means and we derive the dipolar response and extract several informations such as the position of energy centroid and the energy-weighted sum rule exhausted below the giant dipole resonance. Finally, the presentation describes the state-of-art in terms of computing solutions for both problems under scrutiny.
    
    Speakers: Dr Alexandru Nicolin (IFIN-HH), Dr Mihaela Carina Raportaru (IFIN-HH)
  - 12:10
    
    Lightweight construction of rich scientific applications 15m
    
    This paper presents an approach to rapid and lightweight development of scientific applications running on high performance computing resources,which resulted in a platform facilitating access to grid resources and handling the entire web application deployment pipeline. Usage of available web frameworks enables purely browser-side programming and frees application developers from any server-side dependencies. The platform provides testing and production environments to cover application development cycle including seamless source code synchronization. Recently, scientific gateway delivery has shifted towards web platforms. Complete software packages built on top of grid resources offered installable binaries such as desktop rich clients [1] and gave way to advanced and easily managed web solutions [2]. The grid integration layer has been completely hidden from end users by providing domain-specific graphical user interfaces through just one omnipresent application: the web browser, so grid-ready web toolkits [3] emerged as tools to build gateways tailored for individual scientific communities. The development process can be further improved with the platform by disposing of the server side completely from developer's concerns and exploiting grid based functionality exposed through REST interfaces. Enabling usage of browser-side web frameworks over grid resources to compose rich scientific gateways is the main goal of the platform. The objective is to completely detach application developers from the underlying tools and to provide the required functionality through a set of JavaScript libraries with a complete application deployment pipeline in place. This allows to minimize boilerplate work as it is no longer necessary to setup production and test environments (as they are provided by the platform) and to fully exploit browser debugging capabilities. Creating a new web application comes to filling in a web form which includes application URL schema setup. Authentication and authorization is handled by the platform and the server-side processing has to be exposed as REST services. The platform comes with all the basic services in place and already wrapped with JavaScript libraries (e.g. job management, file transfers and metadata management). An example of such a service exposing its functionality through a REST interface is DataNet [4]. Building a JavaScript API on top of this service and exposing it to developers is straightforward and may be done by other service providers to expand the API set of the proposed platform. The platform is already available as a part of the PLGrid infrastructure. It is integrated with the security mechanism and REST services present in the infrastructure which was done only once for all scientific applications implemented with the platform; this considerably facilitates the audit process of each new application. Summing up, the elaborated platform improves creation of modern science gateways by making the process platform-independent, taking care of the deployment pipeline, and ensuring source code synchronization with popular engines such as Dropbox.
    
    Speaker: Daniel Harezlak (CYFRONET)
    
    Slides
  - 12:25
    
    Open Discussion 5m
    
    Speaker: Diego Scardaci (EGI.eu/INFN)
- 12:30 → 13:30
  
  Lunch
- 13:30 → 15:30
  EGI Marketplace Scuderia
  
  Scuderia
  
  Villa Romanazzi Carducci
  
  The establishment of a marketplace is so researchers can discover and exchange services relevant to their research, ideally applying the one-stop-shop concept for data and services. The activity will develop the tools to facilitate the discovery and provisioning of services and data for the researchers and resource providers. This will be done in collaboration with researchers and resource providers (user-driven scenario development). The marketplace can also serve as a platform for research communities to provision or share data and services to members for the communities.
  
  Convener: Mr Dean Flanders (SwiNG)
  - 13:30
    
    EGI Marketplace 2h
    
    Session dedicated to the EGI Marketplace that will present the current status and gather feedback from the community for its evolution. The establishment of a marketplace is so researchers can discover and exchange services relevant to their research, ideally applying the one-stop-shop concept for data and services. The activity will develop the tools to facilitate the discovery and provisioning of services and data for the researchers and resource providers. This will be done in collaboration with researchers and resource providers (user-driven scenario development). The marketplace can also serve as a platform for research communities to provision or share data and services to members for the communities.
    
    Speaker: Mr Dean Flanders (SwiNG)
- 13:30 → 15:30
  
  EGI-Engage Collaboration Board (closed meeting) Sala A+A1, Giulia Centre
  
  Sala A+A1, Giulia Centre
  
  Villa Romanazzi Carducci
  
  Detailed agenda: https://indico.egi.eu/indico/conferenceDisplay.py?confId=2734
  
  Convener: Yannick Legre (EGI.eu)
- 13:30 → 15:30
  Showcasing tools and services from Research Infrastructures Europa
  
  Europa
  
  Villa Romanazzi Carducci
  
  Convener: Dr Gergely Sipos (EGI.eu)
  - 13:30
    
    BBMRI Competence Center in EGI-ENGAGE 20m
    
    As has been demonstrated in the previous years, providing high-quality samples and data for biomedial research is one of the key challenges the science is currently facing. BBMRI-ERIC, a European Research Infrastructure Consortium on Biobanking and Biomolecular Resources, strives to establish, operate, and further developing a pan-European distributed research infrastructure of high-quality biobanks and biomolecular resources. The talk will outline the BBMRI Competence Center (BBMRI CC) of EGI ENGAGE, focusing on processing of human omics data, which is very frequent task related to biobanking and also very sensitive because of the protection of personal data. The goal of the BBMRI CC is to enable processing of such data inside the biobanks using private cloud concept, which can be in practice implemented utilizing the EGI Federated Cloud framework. Requirements both from the data processing perspective and from the data protection perspective will be discussed. We will outline the work that is planned for the competence center and discuss its relation with other relevant projects and infrastructures (e.g., BiobankCloud, BBMRI-ERIC Common Service IT), tools from which will be used for the BBMRI CC pilot for integration into the practical workflows based on the EGI and EUDAT technologies.
    
    Speaker: Petr Holub (BBMRI-ERIC)
    
    Slides
  - 13:50
    
    A FreshWater VRE for LifeWatch 20m
    
    The different components and a workflow platform to support a FreshWater VRE for the LifeWatch ESFRI will be presented. The VRE is based on cloud resources to support processing of data from different sources of information. A detailed analysis of the components required to monitor and model a water body (like a lake) will be presented. An overview of different related initiatives that can be integrated under this framework, and of new challenges that need to be addressed, will be also presented.
    
    Speakers: Fernando Aguilar (CSIC), Jesus Marco de Lucas (CSIC)
    
    Slides
  - 14:10
    
    DARIAH requirements and roadmap in EGI 20m
    
    DARIAH, the Digital Research Infrastructure for the Arts and Humanities, is a large user community that gathers scientists across Europe from the research field of the Arts and Humanities (A&H). The aim of DARIAH is to enhance and support digitally-enabled research and teaching across the Arts and Humanities in Europe. The objective of DARIAH is to develop, maintain and operate a research infrastructure for ICT-based research practices. The DARIAH infrastructure aims to become a fully connected and effective network of tools, information, people and methodologies for investigating, exploring and supporting research across the broad spectrum of the Digital Humanities. To achieve this goal, a significant amount of effort has to be devoted to the improvement of the current infrastructure. A part of this effort is the EGI-DARIAH Competence Centre (EGI-DARIAH CC), established within the EGI-Engage Horizon2020 project. The EGI-DARIAH CC aims at bridging the gap between the DARIAH user community and the European e-Infrastructure, mainly those provided by the EGI community. To achieve this goal, EGI-DARIAH CC focuses on strengthening the collaboration between the DARIAH user community and EGI by deploying A&H applications in the EGI Federated cloud and increasing the number of e-Science services and applications, as well as raising awareness of A&H researchers about the advantages and benefits of e-Infrastructure by providing end-user support and organizing training events. Considering that the DARIAH community, as well as the general A&H research public, is very specific in their requirements and needs on e-Infrastructure, one of the first actions of the EGI DARIAH CC was to collect all relevant information about the DARIAH research requirements. The collection of the required information was conducted via a comprehensive web-based survey. The aim of this survey was to collect feedback from DARIAH end-users, application/service providers and developers on their knowledge and background on e-Infrastructure (e.g. computational and storage resources, user-support services, authentication policies, etc.), on how research data (information) are shared and accessed, about AAI requirements, what services and application researchers are using in their research and what are their characteristics, etc. Based on the inputs, a set of specific A&H services and application will be developed, such as gUSE/WS-Pgrade workflow oriented gateway, gLibrary framework for distributed information repositories and information retrieval service based on CDSTAR. Concurrently with the application development, a significant working effort is put in the education of DARIAH researchers since many of them have minor or no technical knowledge required to efficiently use various e-Infrastructure resources or new application/services that will be developed during this project. Therefore, a set of training events will be organized to demonstrate the specific applications and services developed within EGI-DARIAH CC, as well as give a general introduction on how to utilize various EGI resources, applications and services.
    
    Speaker: Davor Davidovic (RBI)
    
    Slides
  - 14:30
    
    The SADE mini-project of the EGI DARIAH Competence Centre 20m
    
    The DARIAH Competence Centre (CC) aims to widen the usage of the e-Infrastructures for Arts and Humanities (A&H) research. The objectives of the DARIAH CC, that will run over two years are the following: (i) to strengthen the collaboration between DARIAH-EU and EGI using workflow-oriented application gateways and deploying A&H applications in the EGI federated cloud (EGI FedCloud); (ii) to increase the number of accessible e-Science services and applications for the A&H researchers and integration of existing NGI resources into EGI; (iii) to raise awareness of A&H researchers of the possible benefits (excellence research) of using e-Infrastructure and e-Science technologies in their research, creating conditions for a sustained increase of the user community coming from A&H and social sciences as well; and (iv) to widen the work started within DC-NET, INDICATE and DCH-RP projects to other A&H communities. One of the mini-projects of the DARIAH-CC, led by INFN, is SADE (Storing and Accessing DARIAH contents on EGI) whose overall goal is to create a digital repository of DARIAH contents using gLibrary, a framework developed by INFN Catania to create and manage archives of digital assets (data and metadata) on local, Grid and Cloud storage resources. Datasets for SADE will be provided by the Austrian Academy of Sciences (AAS) and they will relate to >100 years old collection on Bavarian dialects within the Austrian-Hungarian monarchy from the beginnings of German language to nowadays. Several data types will be taken into account: text, multimedia (images, audio files, etc.), URIs as well as primary collection data, interpreted data, secondary background data and geo-data with different license opportunities. The AAS datasets will be orchestrated by gLibrary and the repositories will be exposed to end-users through two channels: (i) as a (series of) portlet(s) integrated both in one of the already existing Science Gateways implemented with the Catania Science Gateway Framework and in the WS-PGRADE-based Science Gateway that will developed by the lighthouse project of the CC, and (ii) as native apps for mobile appliances based on Android and iOS operating systems and downloadable from the official App Stores. The mobile apps will be coded using a cross-platform development environment so that other mobile operating systems could be supported, if needed. Furthermore, the apps could exploit geo-localisation services available on smartphones and tablets to find “near” contents. In order to fulfill SADE requirements, the gLibrary framework is currently being completely re-engineered in order to get rid of its dependence from the AMGA metadata catalogue and in this contribution to the EGI Community Forum the new version of the platform (i.e., gLibrary 2.0) as well the status and results of the SADE mini-project will presented.
    
    Speaker: Giuseppe La Rocca (INFN Catania)
    
    Slides
  - 14:50
    
    The EPOS e-Infrastructure for solid Earth sciences: architeture and collaborative framework 20m
    
    Integrating data from Solid Earth Science and providing a platform for the access to heterogeneous datasets and services over the whole Europe is a challenge that the European Plate Observing System(EPOS)is tackling.EPOS will enable innovative multidisciplinary research for a better understanding of the Earth’s physical processes that control earthquakes,volcanic eruptions,ground instability and tsunamis as well as the processes driving tectonics and Earth surface dynamics.To meet this goal,a long-term plan to facilitate integrated use of data and products as well as access to facilities from mainly distributed existing and new research infrastructures(RIs)has been designed in the EPOS Preparatory Phase(EPOS PP).In the EPOS Implementation Phase(starting in October 2015)the plan will be implemented in several dimensions: the Legal & Governance,with the creation of EPOS-ERIC and the implementation of policies for data and trans-national access;Financial,by adopting a financial plan to guarantee the long-term sustainability of infrastructure;Technical,with the implementation of Thematic services(TCS)in the several EPOS domains(e.g.seismology,satellite data,Volcanic observatory and others)and the creation of the Integrated Core Services(ICS)platform to integrate data and services.In this presentation we will deal with the technical aspects and the synergies with e-Infrastructure providers such as EGI,required to build the EPOS ICS platform.We will focus on the EPOS e-Architecture,based on the ICS integration platform and the European community specific TCS services,and its main components: a)the metadata catalogue based on the CERIF[1]standard,used to map and manage users,software,datasets,resources,included datasets and access to facilities;b)a compatibility layer to enable interoperation among ICS and the several TCSs,which includes the usage of web services or other APIs over a distributed VRE-like environment;c)the ICS-D distributed component,to provide computational and visualization capabilities to end-users;d)the implementation AAI module,to enable a user to have a single sign-on to the EPOS platform and retrieve and use resources from TCSs,and the synergies with the EGI-Engage EPOS Compentence Center pilot;e)a computational Earth Science module,where a contribution by VERCE[2]is expected;e)mechanisms to provide persistent identifiers both at ICS and TCS level,and the synergies with other European projects.The building of such complex system,which will hide to the end-user the technical and legal complexity of accessing heterogeneous data,is based on four main principles: 1.ICS-TCS co-development,2.do not reinvent the wheel,3.microservices approach,4.clear long-term technical goals but iterative short-term approach.We will discuss,in the framework of EGI,which are the synergies required with EGI and other e-Infrastructure providers,and which are the issues to be tackled in the short-mid term in order to optimize resources at European level and make the collaboration among ESFRIs,EGI and other relevant initiatives real and active.
    
    Speaker: Daniele Bailo (EGI.eu)
  - 15:10
    
    Progress of EISCAT_3D Competence Center 20m
    
    The design of the next generation incoherent scatter radar system, EISCAT_3D, opens up opportunities for physicists to explore many new research fields. On the other hand, it also introduces significant challenges in handling large-scale experimental data which will be massively generated at great speeds and volumes. This challenge is typically referred to as a big data problem and requires solutions from beyond the capabilities of conventional database technologies. The first objective of the project is to build common e-Infrastructure to meet the requirements of a big scientific data system such as EISCAT_3D data system. The work on the design specification has been looked at from a number of aspects such as: Priority Functional Components; Data Searching & Discovery; Data Access; Data Visualisation; Data Storage. There are different technologies at the different stages of the portal, such as dCache, iRods, OpenSearch, LifeRay and different forms of identifiers . We will present the ones chosen and why the suits better for operations and data from an environmental facility like EISCAT_3D. The design specification have been presented it to the EISCAT community and the feedback has been adopted in the portal development environment.
    
    Speaker: Ingemar Haggstrom (EISCAT)
    
    Slides
- 13:30 → 15:30
  Tutorial: Introduction to the EGI Federated Cloud Federico II
  
  Federico II
  
  Villa Romanazzi Carducci
  - 13:30
    
    Introduction to the EGI Federated Cloud – the user perspective 2h
    
    This is a 2h long introductory course about the EGI Federated Cloud infrastructure from the user perspective. The course will consist of short talks and hands-on exercises. From this tutorial session attendees can learn the basic concepts of cloud computing, cloud federations, and gain experience in interacting with the IaaS layer of the EGI federated cloud infrastructure through the AppDB Virtual Machine Marketplace and the rOCCI command line client. The course primarily targets developers of high level cloud environments (PaaS and SaaS) and developers of scientific applications. They - after this course – will be able to integrate custom system with the EGI IaaS cloud federation. The tutorial will be followed with another, related session that shows how to prepare custom VM images for the EGI Federated Cloud. The EGI Federated Cloud is a standards-based, open cloud system as well as its enabling technologies that federates institutional clouds to offer a scalable computing platform for data and/or compute driven applications and services. The EGI Federated Cloud is already deployed on more than 20 academic institutes across Europe who together offer 6000 CPU cores and 300 TB storage for researchers in academia and industry. This capacity is available for free at the point of access through IaaS, PaaS and SaaS capabilities and interfaces that are tuned towards the needs of users in research and education. The technologies that enable the cloud federation are developed and maintained by the EGI community, and are based on open standards and open source Cloud Management Frameworks. Outline of the course is (for 2x60 minutes) - Introduction to clouds, cloud federations and the EGI Federated Cloud - Application porting best practices and examples - Introduction to training infrastructure and first exercises - Exercise – compute and storage management - Introduction to contextualisation - Exercise – Contextualised compute instances - Next steps - How to become a user
    
    Speakers: Boris Parak (CESNET), Diego Scardaci (EGI.eu/INFN), Dr Enol Fernandez (EGI.eu)
    
    Slides
- 15:30 → 16:00
  
  Coffee break
- 16:00 → 18:00
  Innovating with SMEs and Industry Scuderia
  
  Scuderia
  
  Villa Romanazzi Carducci
  
  There are more than 20 million SMEs in the EU representing 99% of businesses. SMEs are considered one of the key drivers for economic growth, innovation, and employment. The European Commission has made them one of the focuses in the Horizon 2020 with the aim of putting SMEs in the lead for the delivery of innovation to the market. EGI aims at supporting this policy objective by exploring opportunities for synergies with SMEs.
  
  Presentations will highlight a number of different activities on going in order to support these objectives: from a dedicated business engagement programme, the addition of pay-for-use capabilities, open calls for driving innovation, and showcasing concrete examples of opportunities moving forward.
  
  The session concludes with ample time for discussion in order to define a set of recommendations and actions as key takeaways for session participants.
  
  Therefore this session is of interest to both private and public organisations looking for collaboration opportunities end users for providing direct feedback.
  
  Convener: Sy Holsinger (EGI.eu)
  - 16:00
    
    EGI Business Development 25m
    
    Formal activities: pay-for-use, business engagement programme, market analysis Business case examples: how NGIs are working with SMEs on a local level Innovation topics: bringing the information together to move forward
    
    Speaker: Sy Holsinger (EGI.eu)
    
    Slides
  - 16:25
    
    EGI Pay-for-Use Demo (e-GRANT) 25m
    
    Speakers: Roksana Rozanska (CYFRONET), Mr T. Szepieniec (CYFRONET)
    
    eGRANT-P4U-Instance
    
    Slides
  - 16:50
    
    The First GEANT Open Call - Innovation & Outreach in an e-Infrastructure 25m
    
    The first GEANT Open Call was conducted between April 2013 and March 2015 resulting in the funding of 21 small, highly focused research and development projects of 18 months in duration. Considered within the project to be a real success story (a view recently vindicated in a positive project review), this exercise was conducted as part of GEANT's community engagement efforts and innovation strategy (the GEANT Innovation Programme). Primarily it was intended to bring some new blood into a well-established R&E networking ecosystem that would hopefully bring with it plenty of networking innovation that would in turn lead to tangible benefits to be quickly realised by the R&E network user communities in Europe and beyond. This presentation will show how the open call programme has been used to fund innovative projects designed to address three broad objectives: (i) soliciting interesting new use cases for the facilities made available by the GEANT project; (ii) bringing new expertise into the project to undertake specific pieces of R&D work within the GEANT work programme and (iii) conducting a more open solicitation for interesting new ideas with a bearing on the future evolution of R&E networking (a "wildcard" element to the programme). The first part of the presentation will cover the way in which the open call programme was planned and executed and will place particular emphasis on the innovative way in which the projects were integrated into the larger GEANT project by adopting a holistic approach to the management of the programme. Consequentially, many of these open call projects have led to useful results that will benefit the GEANT network, the services running over it, the NRENs connected to it and their users - for example, services based on SDN related technologies. Some of them have wider implications that could potentially be of interest to other e-Infrastructures especially in the context of developing coordinated service offerings. A number of the significant results will be identified and we will reflect on the overall lessons learned during the running of this first GEANT open call. Innovation and the impact of that innovation will also be examined in the presentation, for example it is expected that more than 20 papers will be published in peer reviewed journals as a direct result of the projects' support. Finally, the presentation will describe preparations for the next round of open calls that it is hoped will take place during the next phase of the GEANT project (GN4-2). Discussions are underway at the time of writing this abstract but a clear idea of how these next open calls will be conducted will have been established by the time of the EGI Community Forum. A few guiding principles are already known. Overall there is expected to be a greater focus on service innovation and working with new commercial partners and other e-infrastructures.
    
    Speakers: Ms Annabel Grant (GEANT), Dr Michael Enrico (GEANT)
    
    Slides
  - 17:15
    
    Panel Led Discussion 45m
    
    Speakers: Ms Annabel Grant (GEANT), Giuseppe Fiameni (CINECA - Consorzio Interuniversitario), Roberta Piscitelli (EGI.eu)
    
    Slides
- 16:00 → 18:00
  Showcasing tools and services from Research Infrastructures (II) Europa
  
  Europa
  
  Villa Romanazzi Carducci
  
  Convener: Dr Gergely Sipos (EGI.eu)
  - 16:00
    
    ENVRI_plus: Toward Joint Strategy and Harmonized Policies for Access to Research Infrastructures 20m
    
    Speaker: Ingrid Mann (EISCAT Scientific Association)
    
    Slides
  - 16:20
    
    Tsunami Wave Propagation Forward and Inverse Simulation and Scientific Gateway Building for Disaster Mitigation 20m
    
    Manila trench and Ryukyu trench are the two hazardous subduction zones which might cause disaster tsunami to South East Asia countries if a megathrust earthquake caused by any of the two trenches. This EGI-Engage Disaster Mitigation Competence Centre aims to develop novel approaches of real-time tsunami simulation over the Grid and Cloud by COMCOT-base fast forward tsunami wave propagation simulation. Integration with rapid and correct rupture process solutions to make the tsunami simulation as accurate as possible is the first goal. By collaborating with tsunami scientists, the workflow and computing model are defined according to the case studies conducted by the user communities, and the iCOMCOT web-based application portal has been implemented. iCOMCOT is an efficient and low-cost tsunami fast calculation system for early warning by optimized and parallelized COMCOT in order to meet the requirements of real-time simulation. Also based on the high performance COMCOT simulation by the iCOMCOT, the tsunami inverse simulation is developed to identify the best possibilities of historical tsunami sources according to the evidences at hand. Cases around Taiwan and the Philippine Sea Plate region were studied and supporting the analysis of potential tsunami sources. Based on the e-Science paradigm and big data analytics capability, the target to answer a much open question such as “which fault in what rupture process could cause over 1 meter wave height and 50 meter in-land inundation in Taiwan” could be also achieved in the future.
    
    Speaker: Eric Yen (AS)
  - 16:40
    
    West-Life: Developing a VRE for Structural Biology 20m
    
    The focus of structural biology is shifting from single macromolecules produced by simpler prokaryotic organisms, to the macromolecular machinery of higher organisms, including systems of central relevance for human health. Structural biologists are expert in one or more techniques. They now often need to use complementary techniques in which they are less expert. Instruct supports them in using multiple experimental techniques, and visiting multiple experimental facilities, within a single project. The Protein Data Bank is a public repository for the final structure. Journals require deposition as a precondition of publication. However, metadata is often incomplete. West-Life will pilot an infrastructure for storing and processing data that supports the growing use of combined techniques. There are some technique-specific pipelines for data analysis and structure determination. Little is available in terms of automated pipelines to handle integrated datasets. Integrated management of structural biology data from different techniques is lacking altogether. West-Life will integrate the data management facilities that already exist, and enable the provision of new ones. The resulting integration will provide users with an overview of the experiments performed at the different research infrastructures visited, and links to the different data stores. It will extend existing facilities for processing this data. As processing is performed, it will automatically capture metadata reflecting the history of the project. The effort will use existing metadata standards, and integrate with them new domain-specific metadata terms. This proposal will develop application level service specific to uses cases in structural biology, enabling structural biologists to get the benefit of the generic services developed by EUDAT and the EGI.
    
    Speaker: Chris Morris (STFC)
    
    Slides
  - 17:00
    
    User needs, tools and common policies from PARTHENOS, a E-research networking in the field of linguistic studies, humanities and cultural heritage 20m
    
    PARTHENOS (Pooling Activities, Resources and Tools for Heritage E-research Networking, Optimization and Synergies) is an European project funded within Horizon 2020, the EU Framework Programme for Research and Innovation. The project started in May 2015 and has a duration of 48 months. PARTHENOS aims at strengthening the cohesion of research in the broad sector of Linguistic Studies, Humanities, Cultural Heritage, History, Archaeology and related fields through a thematic cluster of European Research Infrastructures, integrating initiatives, e-infrastructures and other world-class infrastructures, and building bridges between different, although tightly, interrelated fields. The project will achieve this objective through the definition and support of common standards, the coordination of joint activities, the harmonization of policy definition and implementation, and the development of pooled services and of shared solutions to the same problems. PARTHENOS will address and provide common solutions to the definition and implementation of joint policies and solutions for the humanities and linguistic data lifecycle, taking into account the specific needs of the sector that require dedicated design, including provisions for cross-discipline data use and re-use, the implementation of common AAA (authentication, authorization, access) and data curation policies, including long-term preservation; quality criteria and data approval/ certification; IPR management, also addressing sensitive data and privacy issues; foresight studies about innovative methods for the humanities; standardization and interoperability; common tools for data-oriented services such as resource discovery, search services, quality assessment of metadata, annotation of sources; communication activities; and joint training activities. Built around the two ERICs of the sector, DARIAH and CLARIN, and involving all the relevant Integrating Activities projects, PARTHENOS will deliver guidelines, standards, methods, services and tools to be used by its partners and by all the research community.
    
    Speaker: Sara Di Giorgio (Central Institute for the Union Catalogue of Italian Libraries)
    
    Slides
  - 17:20
    
    Promoting Grids and Clouds for the Health Science and Medical Research Community in France 20m
    
    Life sciences in general, and medical research in particular, have increasing needs in terms of computing infrastructures, tools and techniques: research in domains such as genomics, drug design, medical imaging or e-health cannot be undertaken without the adequate computing and data solutions. Yet generalizing the use of large scale and distributed infrastructures requires time and effort, first because of the cultural shift it implies for many researchers and teams, and second because of the heterogeneity of users’ needs and requirements. INSERM, the French National Institute for Health and Medical Research, is facing such a challenge. INSERM is the largest European medical research institution with around 300 research units and more than 1000 teams spread all over the country. As such, it represents a very wide panel of disciplines and domains, but also very different levels of expertise with regards to scientific computing and associated technologies: while a few teams have been using distributed infrastructures for many years, others are only merely aware of their existence and possible benefits. To face this challenge, INSERM has launched in 2014 a Computational Science Coordination Team (CISI) within its IT department. CISI is built as a set of competence centres on major scientific computing themes and technologies (grids, clouds, HPC, big data, parallel computing, simulation...). Building on this expertise, the team aims at addressing and matching the needs of INSERM researchers with the appropriate technical solution. One of the objectives of the team is to build and support communities around these different technical areas. One of those communities will be built around the use of grids and clouds, with the help of France Grilles, the French NGI, and in collaboration with other national institutions. The presentation will first describe CISI's organisation and missions, ranging from infrastructures and usage mapping to projects, training, expertise and support. It will then explain the organisation of knowledge transfer to enlarge grid and cloud users communities at INSERM, especially within the medical imaging, bioinformatics and e-health domains. It will also present 2 practical examples of innovative research on the edge between life sciences and computing: the deployment of a new parallelisation paradigm in a medical imagery use case using EGI infrastructure through DIRAC, and a pharmacovigilance application using iRODS and academic clouds. It will finally present INSERM's vision to empower its researchers through the use of Virtual Research Environments (VREs).
    
    Speaker: Gilles Mathieu (INSERM)
    
    Slides
  - 17:40
    
    Research Infrastructures for Integrated Environmental Modeling: the DRIHM(2US) experience 20m
    
    From 1970 to 2012, about 9000 high impact weather events (HIWE) were reported globally: all together, they caused the loss of 1.94 million lives and economic damage of US$ 2.4 trillion (2014 UNISDR report). Storms and floods accounted for 79 per cent of the total number of disasters due to weather, water, and climate extremes and caused 55% of lives lost and 86% of economic losses. These figures call for focused hydro-meteorological research to: (a) understand, explain and predict the physical processes producing HIWE (b) understand the possible intensification of such events because of climate change effects; and (c) explore the potential of e-Infrastructures to provide deeper understanding of those events through fine resolution modelling over large areal extents. The underlying premise of the DRIHM (Distributed Research Infrastructure for Hydro-Meteorology) and DRIHM2US (Distributed Research Infrastructure for Hydro-Meteorology to US) projects (www.drihm.eu and www.drihm2us.eu, DRIHM(2US) hereafter) is that understanding and predicting the environmental and human impact of HIWE requires a holistic approach. The DRIHM(2US) virtual research environment (VRE) enables the production and interpretation of numerous, complex compositions of hydro-meteorological simulations of HIWE from rainfall, either simulated or modelled, down to discharge, water level and flow, and impact. HMR topics which are allowed or facilitated by DRIHM(2US) services include: physical process studies, intercomparison of models and ensembles, sensitivity studies to a particular component of the forecasting chain, and design of flash-flood early-warning systems.
    
    Speaker: Daniele Dagostino
- 16:00 → 18:00
  Tutorial: DIRAC service Sala A+A1, Giulia Centre
  
  Sala A+A1, Giulia Centre
  
  Villa Romanazzi Carducci
  
  Convener: Andrei Tsaregorodtsev (CNRS)
  - 16:00
    
    DIRAC service tutorial 2h
    
    Many large and small scientific communities are using more and more intensive computations to reach their goals. Various computing resources can be exploited by these communities making it difficult to adapt their applications for different computing infrastructures. Therefore, they need tools for seamless aggregation of different computing and storage resources in a single coherent system. The DIRAC project develops and promotes software for building distributed computing systems. Both workload and data management tools are provided as well as support for high level workflows and massive data operations. Services based on the DIRAC interware are now provided by several national grid infrastructure projects. The DIRAC4EGI service is operated by the EGI project itself. The latter service is used by multiple user communities already and more communities are evaluating it for their future work. The proposed tutorial is focused on the use of DIRAC services and aims at letting the participants learn how to start using the system, perform basic tasks of submitting jobs and manipulating data. More advanced examples will be given with the help of the Web Portal of the DIRAC4EGI service. The tutorial will also show examples of how new computing and storage resources can be connected to the system including Cloud resources from the EGI Federated Cloud project. Configuring system for the use by multiple communities with particular usage policies will be explained. As a result, the participants will have a clear idea about the service functionality, interfaces and extensions.
    
    Speaker: Dr Andrei Tsaregorodtsev (CNRS)
    
    Slides
- 16:00 → 18:00
  Tutorial: Dos and Don'ts for Virtual Appliance Preparation Federico II
  
  Federico II
  
  Villa Romanazzi Carducci
  
  Convener: Boris Parak (CESNET)
  - 16:00
    
    Dos and Don'ts for Virtual Appliance Preparation -- Hands-on Tutorial 2h
    
    This tutorial closely follows Introduction to the EGI Federated Cloud – the user perspective [1]. It focuses on virtual appliance preparation (not just) for the EGI Federated Cloud. Individual users and user communities can now prepare, upload, and launch their own appliances as virtual machines in the EGI environment. This brings new possibilities, but it also places considerable burden on users preparing such appliances. This tutorial will discuss and demonstrate (hands-on) basic dos and don'ts of appliance preparation, focusing on the following topics: 1.) Operating Systems (Linux-based) 2.) Disk Image Formats 3.) Appliance Portability 4.) Contextualization 5.) Security 6.) Automation & Provisioning 7.) The EGI Application Database Attendees are encouraged to bring up real-world problems and experiences for discussion. For the hands-on parts, attendees are expected to have their own laptops with pre-installed VirtualBox [2] & Packer [3] ready. [1] https://indico.egi.eu/indico/contributionDisplay.py?sessionId=42&contribId=71&confId=2544 [2] https://www.virtualbox.org/wiki/Downloads [3] https://packer.io/downloads.html
    
    Speaker: Boris Parak (CESNET)
    
    Packer Installer
    
    Slides
    
    VirtualBox Installer
- 18:00 → 19:00
  
  Participation in H2020 innovation project with SMEs (CLOSED) Sala A+A1
  
  Sala A+A1
  
  Closed meeting (by invitation/confirmation) to discuss the role of NGIs and EIROs and participation mechanisms in an H2020 project to offer open calls to facilitate innovation with SMEs.
  For more information and to express interest in attending, please contact policy@egi.eu.
  
  Conveners: Roberta Piscitelli (EGI.eu), Sergio Andreozzi (EGI.eu), Sy Holsinger (EGI.eu)
Wednesday, 11 November
- 09:00 → 10:30
  EGI EUDAT interoperability use cases Europa
  
  Europa
  
  Villa Romanazzi Carducci
  
  In this session there will be a presentation of the first steps taken toward the technical interoperability and the participating EGI-EUDAT pilot communities (ICOS, BBMRI, ELIXIR, EISCAT-3D) will present their cross-infrastructure requirements. The aim is to show how to connect data stored in the EUDAT CDI to the high throughput and cloud computing resources provided by EGI and the other way around.
  
  Conveners: Dejan Vitlacil (KTH), Giuseppe Fiameni (CINECA - Consorzio Interuniversitario)
  - 09:00
    
    EUDAT services 5m
    
    Speaker: Giuseppe Fiameni (CINECA - Consorzio Interuniversitario)
  - 09:05
    
    EGI services 5m
    
    Speaker: Diego Scardaci (EGI.eu/INFN)
    
    Slides
  - 09:10
    
    EGI EUDAT interoperability use case demo 10m
    
    Speaker: Diego Scardaci (EGI.eu/INFN)
    
    Slides
  - 09:20
    
    ICOS Community Requirements 10m
    
    Speaker: Dejan Vitlacil (KTH)
    
    Slides
  - 09:30
    
    BBMRI Community Requirements 10m
    
    Speaker: Petr Holub (BBMRI-ERIC)
    
    Paper
    
    Slides
  - 09:40
    
    EMBL Community Requirements 10m
    
    Speaker: Tony Wildish (European Bioinformatics Institute)
    
    Slides
  - 09:50
    
    EISCAT-3D Community Requirements 10m
    
    Speaker: Ingemar Haggstrom (EISCAT)
    
    Slides
  - 10:00
    
    EPOS Community Requirements 10m
    
    Speaker: Dr Daniele Bailo (INGV)
    
    Slides
  - 10:10
    
    Discussion 20m
- 09:00 → 10:30
  EGI LifeWatch Competence Centre workshop Sala A+A1, Giulia Centre
  
  Sala A+A1, Giulia Centre
  
  Villa Romanazzi Carducci
  
  This session presents and discusses the advance in LifeWatch Competence Center:
  
  -Support for LifeWatch in EGI FedCloud
  -Implementation of Data Flow
  -R as a service
  -Support to Workflows (Galaxy and TRUFA)
  -Assisted Pattern Recognition for Citizen Science in Biodiversity
  -Implementation of the Network of Life
  
  Convener: Jesus Marco de Lucas (CSIC)
  - 09:00
    Advances in EGI LifeWatch Competence Center 1h 30m
    
    A Session (1h30-2h) is requested to present and discuss the advance in LifeWatch Competence Center: -Support for LifeWatch in EGI FedCloud -Implementation of Data Flow -R as a service -Support to Workflows (Galaxy and TRUFA) -Assisted Pattern Recognition for Citizen Science in Biodiversity -Implementation of the Network of Life Presentations from VLIZ, CIBIO, INRA, CSIC, BIFI and other partners are expected.
    
    Speakers: Fernando Aguilar (CSIC), Jesus Marco de Lucas (CSIC)
    
    Intro and summary of progress in EGI LW CC 5m
    
    Speaker: Jesus Marco de Lucas (CSIC)
    
    Status of FedCloud setup for LifeWatch 15m
    
    Speakers: Dr Enol Fernandez (EGI.eu), Fernando Aguilar (CSIC)
    
    Slides
    
    Observatories: VREs and Data portals -- Update of progress on the terrestrial and fresh water VRE 10m
    
    Speaker: Dr Julien Radoux (Université catholique de Louvain)
    
    Slides
    
    Obervatories: VREs and Data portals --LifewatchGreece Portal: Semantic web with Data services, Parallelizing R with RvLab, Manipulating specimens with MicroCT 20m
    
    Speaker: Mrs Emmanouela Panteri (Hellenic Centre for Marine Research)
    
    Slides
    
    Obervatories: VREs and Data portals -LIFEWATCH ITALY: Data Portal and Virtual Lab Integration VRE 10m
    
    Speakers: Dr NICOLA FIORE (LifeWatch Italy), Paolo Tagliolato
    
    Slides
    
    Implementation of wokflows using Galaxy 15m
    
    Speaker: Dr Ignacio Blanquer (UPVLC)
    
    Slides
    
    Status of the Citizen Science application: training NN for image recognition 15m
    
    Speaker: Mr Eduardo LOSTAL (BIFI)
    
    Slides
    
    NoSQL working group Use case: Network of Life 15m
    
    Speaker: Mario David (LIP)
    
    Slides
- 09:00 → 10:30
  Research Infrastructures in Horizon 2020 Scuderia
  
  Scuderia
  
  Villa Romanazzi Carducci
  
  The aim of the H2020 session at the EGI Community Forum is to introduce services of the RICH Consortium, work of National Contact Points and to introduce new calls in Horizon 2020 Work Programme European research infrastructures (including e-Infrastructure) for years 2016- 2017. Beneficiaries (users of virtual access and coordinators of first e-infrastructures Calls in H2020) will provide participants with useful tips (preparing proposals, creation of consortium, realization phase of the project, etc.).
  
  Convener: Elena Maffia
  - 09:00
    
    Introduction of RICH 15m
    
    Research infrastructures (RI) are used by research communities but where it is relevant, they provide services even beyond research: education, public services. Research infrastructures play a key role in forming research communities and offer them the advancement of knowledge, technology and their exploitation. European union is supporting research infrastructures via Horizon 2020 programme, which put an emphasis on long-term sustainability of RI, their expanding role and impact in the innovation chain, widening participation and integrating activities. National Contact Points (NCPs) provide professional support and complex services to national research teams, research organizations and industrial enterprises to facilitate and support their integration into the H2020. By spreading awareness, giving specialist advice, and providing on-the-ground guidance, NCPs ensure that the RI programme becomes known and readily accessible to all potential applicants, irrespective of sector or discipline. To enhance cooperation and networking between these national entities, a higher quality of services, NCPs for RI are involved in H2020 project RICH (Research Infrastructures Consortium for Horizon 2020). RICH 2020, the European Network of National Contact Points (NCPs) for Research Infrastructures in Horizon 2020, facilitates transnational cooperation between NCPs, promotes the effective implementation of the RI programme, supports transnational and virtual access to RIs and highlights the opportunities offered by Research Infrastructures - at the European and international level. The aim of the H2020 session at the EGI Community Forum is to introduce services of the RICH Consortium, work of National Contact Points and to introduce new calls in Horizon 2020 Work Programme European research infrastructures (including e-Infrastructure) for years 2016- 2017. Experienced NCPs in the area of RI will be leading the session as speakers and project beneficiaries will be invited. Main speakers will introduce new RI Calls from Work Programme 2016-17, its topics, aims and conditions. Beneficiaries (users of virtual access and coordinators of first e-infrastructures Calls in H2020) will provide participants with useful tips (preparing proposals, creation of consortium, realization phase of the project, etc.). Timing is perfect, as the new calls from Work Programme 2016-17 would be introduced in the autumn and at the same time, coordinators of first e-infra calls would have enough useful tips and information to share best practice. This session would be helpful for all the EGI participants, as the funding opportunities for research infrastructures in Horizon 2020 are interested for infrastructure users and providers, tool developers, research and scientific communities. The H2020 would be a great value to the EGI Community Forum.
    
    Speaker: Ms Elena Maffia (APRE - Coordinator of RICH project)
    
    Slides
  - 09:15
    
    RI Work Programme 2016-17 30m
    
    Speaker: Laurent Ghys (Belgian Science Policy Office)
    
    Slides
  - 09:45
    
    Experience of beneficiaries in RI Calls 30m
    
    Speakers: Luciano Gaido (INFN), Pasquale Pagano (CNR)
    
    Slides
  - 10:15
    
    Discussion 15m
- 09:00 → 10:30
  Tutorial: HAPPI toolkit Federico II
  
  Federico II
  
  Villa Romanazzi Carducci
  
  Prerequisites:
  * Bring your own laptop
  * Internet connection
  * Programming knowledge
  
  Content:
  The HAPPI Toolkit is part of the Data Preservation e-Infrastructure produced by the SCIDIP-ES project [www.scidip-es.eu]. This component, released with open source license (Apache License v2.0) and available on SourceForge [http://goo.gl/yWPBkV] is an implementation of an authenticity model defined by the collaboration of the APARSEN and SCIDIP-ES projects. This model describes how to trace and document transformations on any digital object during the whole life cycle, and it is based on Open Provenance Model and PREMIS. These de-facto standards improves interoperability among different digital archives and/or research communities.
  Description of transformations on digital object is part of “preservation metadata” (a.k.a. Preservation Description Information) includes provenance, reference and integrity information, according to the Open Archival Information System (OAIS), standard ISO:14721:2012.
  
  Objectives of the tutorial is to provide attendees with:
  * an overview of the digital preservation [15 minutes]
  * an overview of the datamodel implemented by HAPPI Toolkit [15 minutes]
  * an overview of the technologies, implementation details and code chunks of HAPPI Toolkit [30 minutes]
  * practice on HAPPI Toolkit instances on EGI FedCloud [10 minutes]
  * answers to questions [20 minutes]
  
  Target audience: archivists, researchers, managers of research infrastructures and data centers, developers, EGI FedCloud users
  
  Tutorial Material:all the material of the tutorial will be realised with Creative Commons license (by-sa). Details will be given before the tutorial. Attendees can anticipate questions and requests to the author.
  
  Convener: Luigi Briguglio (Engineering Ingegneria Informatica S.p.A.)
  - 09:00
    
    Tracking Dataset Transformations with HAPPI Toolkit 1h 30m
    
    Results of the research community are based on three main pillars: models of phenomena, dataset gathered from missions and campaigns, validation and refinement of models based on dataset. Since its acquisition and during the whole life cycle of the research processes, dataset undergoes through many transformations (e.g. capture, migration, change of custody, aggregation, processing, extraction, ingestion) in order to be opportunely processed, analysed, exchanged with different researchers and (re-)used. Consequently, trustworthiness of results, and of research community itself, rely on tracking dataset transformations within the whole life cycle of the research processes. Tracking dataset transformations becomes more important whenever dataset has to be treated from researchers communities of different domains and/or the research processes may span over a long interval of time. Open Archival Information System (OAIS ¬- ISO:14721:2012) [1] has identified as “provenance information” the type of metadata where to store and track changes undergone to a generic digital object since its creation. Provenance is part of the so call OAIS Preservation Description Information, the metadata used to preserve digital object in a long-term digital archive, and it includes i) reference information (persistent identifier assigned to digital object); ii) provenance information; iii) context information (relationships to other digital objects), iv) fixity information (information used to ensure that digital object has not been altered in an uncontrolled manner) and v) rights information (permitted roles to access and transform digital object). The HAPPI Toolkit [2], part of the Data Preservation e-Infrastructure produced by the SCIDIP-ES project [3], traces and documents dataset transformations by adopting the Open Provenance Model, a simple information model based on three basic entities (i.e. controller agent, transformation, digital object) that improves interoperability and capability to exchange information among different digital archives and/or research communities. Moreover, HAPPI Toolkit generates for each transformation a record (called Evidence Record) that includes reference information and integrity information. The collection of records represent the history of all the dataset transformations is called Evidence History, and this information is managed by HAPPI Toolkit and provides data managers with evidences that are used during the assessment of the integrity and authenticity of the dataset. Since July 2014, HAPPI Toolkit is running on EGI FedCloud. The tutorial aims to presents how HAPPI Toolkit works, and specifically: how HAPPI Toolkit is configured, how it creates the evidences of dataset transformations, how users can access evidences and dataset information.
    
    Speaker: Mr Luigi Briguglio (Engineering Ingegneria Informatica S.p.A.)
    
    Slides
- 10:30 → 11:00
  
  Coffee break
- 11:00 → 12:30
  Data without boundaries: legal and policy aspects Europa
  
  Europa
  
  Villa Romanazzi Carducci
  
  The (re)-use of data, also in fishery and marine sciences, need data sharing policies and legal aspects in addition to technological interoperability.
  
  FAO has prepared data sharing and legal interoperability policies in Virtual Research Environments, to be joined with a technical feasibility analysis by CNR. This activity will be completed by February 2016.
  
  The workshop objective is to reach out to other communities with similar policies demand:
  1) Overcome legal barriers in sharing fishery & marine sciences data-sets with other institutions and communities;
  2) Deliver a set of legally relevant instructions to data providers and consumers to describe their data, access to data, and the life-cycle of data in an infrastructure;
  3) Devise a context where infrastructure support to processing a mix of public and non-public data-sets results in improved data availability whilst respecting legal dissemination boundaries;
  4) Present a use case with a regional database targeting fisheries’ productivity;
  5) The ensuing discussion will aim to advice on legal interoperability support through an infrastructure (VRE’s), especially where storage and access arrangements are required (e.g. to support confidentiality needs of data owners).
  
  Conveners: Anton Ellenbroek (FAO), Sergio Andreozzi (EGI.eu)
  - 11:00
    
    Removing legal barriers in the sharing of fisheries & marine sciences data: an approach to relevant legal interoperability issues 25m
    
    Speaker: Enrique Alonso García (Consejo de Estado (España))
    
    Slides
  - 11:25
    
    Data policies and legal aspects with a focus on fishery and marine sciences 30m
    
    The (re)-use of data, also in fishery and marine sciences, need data sharing policies and legal aspects in addition to technological interoperability. FAO, amongst others, has prepared data sharing and legal interoperability policies in Virtual Research Environments. In EGI Engage this experience will be used to develop a market analysis. This activity will be completed by M12. The objective of this presentation is to reach out to other communities with similar policy requirements, and to share some practical advice on data policies: 1) What are policy and legal barriers for data sharing (with a focus on fishery & marine sciences), and why do we care; 2) How to provide legally relevant instructions to data providers to describe their data, access to data, over the entire life-cycle of data in an infrastructure; 3) How can an infrastructure provide support to processing a mix of public and non-public data-sets results in improved data availability whilst respecting legal dissemination boundaries; 4) Present a use case with a regional database targeting fisheries’ productivity;
    
    Speaker: Anton Ellenbroek (FAO)
    
    Slides
  - 11:55
    
    Infrastructure data policies 25m
    
    The use of e-Infrastructures to support cross-domain data access and processing raises important questions on defining and marshaling storage and access to data. This talk will address current and emerging data policy issues to ensure secure and safe data exploitation respecting the data owners requirements.
    
    Speaker: Pasquale Pagano (CNR)
    
    Slides
  - 12:20
    
    Discussion on Data Policies 10m
    
    Discuss the opportunities to develop fine grained data policies on the EGI Infrastructure serving the legal, institutional and data policy needs of, in particular, fisheries and EO data.
- 11:00 → 12:30
  Disaster mitigation Competence Centre Sala A+A1, Giulia Centre
  
  Sala A+A1, Giulia Centre
  
  Villa Romanazzi Carducci
  
  Disaster Mitigation Competence Centre (DMCC) is building an open platform for open collaboration on Asia Pacific regional natural disaster mitigation by making use of e-Science methodologies. Scientific gateways of tsunami and weather simulation and analysis are the focuses of the first project year. iCOMCOT and gWRF are the primary application portals for the tsunami waveform transmission and analysis and the weather analysis respectively. Use cases driven approach dominates the workflow, components, and computing models of the system design. Potential hazards of tsunami sourced from the Manila trench, historical tsunami sources identification in the South China Sea, Thailand flood in 2011 and Malaysia flood in late 2014 are the target cases proposed by the members of DMCC in the first year.
  
  Convener: Eric Yen (AS)
  - 11:00
    Disaster Mitigation Competence Centre Face-to-Face Meeting 1h 30m
    
    Disaster Mitigation Competence Centre (DMCC) is building an open platform for open collaboration on Asia Pacific regional natural disaster mitigation by making use of e-Science methodologies. Scientific gateways of tsunami and weather simulation and analysis are the focuses of the first project year. iCOMCOT and gWRF are the primary application portals for the tsunami waveform transmission and analysis and the weather analysis respectively. Use cases driven approach dominates the workflow, components, and computing models of the system design. Potential hazards of tsunami sourced from the Manila trench, historical tsunami sources identification in the South China Sea, Thailand flood in 2011 and Malaysia flood in late 2014 are the target cases proposed by the members of DMCC in the first year. Purpose of this session is to seek much synergy from the participants and related communities. The session is reserved for the DMCC face-to-face meeting as it is essential to have interactions and collaborations with EGI-Engage partners, other Competence Centres, and other projects and Institutes. The project status including the partner progress and the future plan will be reported. Discussions of specific topics and possible collaborations will be arranged according to the participants or invitations.
    
    Speakers: Eric Yen (AS), Siew Hoon Leong (EGI.eu), Simon Lin (ASGC)
    
    Disaster Mitigation Competence Centre Collaboration and Progress 20m
    
    Speaker: Eric Yen (AS)
    
    e-Science for the Masses 25m
    
    Speaker: Simon Lin (AS)
    
    Advanced Visualisation Case Study 25m
    
    Speaker: Ms Cerlane Leong
- 11:00 → 12:30
  Federated accelerated computing Scuderia
  
  Scuderia
  
  Villa Romanazzi Carducci
  
  Accelerated computing systems deliver energy efficient and powerful HPC capabilities. Many EGI sites are providing accelerated computing technologies to enable high performance processing such as GPGPUs or MIC co-processors. Currently these accelerated capabilities are not directly supported by the EGI platforms. To use the co-processors capabilities available at resource centre level, users must directly interact with the local provider to get information about the type of resources and software libraries available and which submission queues must be used to submit tasks of accelerated computing.
  The session follows the one held in Lisbon in May, and will discuss the progress on the roadmap to achieve the federation of GPGPU or MIC co-processors capabilities across EGI HTC and Cloud platforms.
  Service providers as well as user communities interested in the use of accelerated computing facilities across Europe are invited to participate bringing their requirements.
  
  Conveners: Dr Marco Verlato (INFN), Dr Viet Tran (UI SAV)
  - 11:00
    
    Latest progresses on AC activity in EGI-Engage 20m
    
    This presentation will discuss the progress on the roadmap to achieve the federation of GPGPU or MIC co-processors capabilities across EGI HTC and Cloud platforms, in the context of the EGI-Engage JRA2.4 task.
    
    Speakers: Dr Marco Verlato (INFN), Dr Viet Tran (UI SAV)
    
    Slides
  - 11:20
    
    GPGPU-enabled molecular dynamics of proteins on a distributed computing environment 20m
    
    As part of the activities of the MoBrain Competence Center within the EGI-Engage project, we have implemented services for the use of molecular dynamics (MD) simulations on biological macromolecules (proteins, nucleic acids) based on the AMBER suite and taking advantage of GPGPU architectures within a grid computational infrastructure. The rationale for this development is to improve upon the tools already provided within the WeNMR gateway [1], which allow for MD-based refinement of macromolecular structures derived from NMR spectroscopy data as well as for unrestrained (i.e. without experimental data) MD simulations. These services are available via the AMPS-NMR portal, which has been using the EGI computational infrastructure for almost five years [2]. The portal allows a large range of users that are not computer-savvy to apply successfully state-of-the-art MD methods through completely guided protocols. The current protocols are only designed to work with CPUs. Transitioning to GPGPU services would result in a significant reduction of the wall time needed for calculations, thereby enabling a higher throughput of the portal. Alternatively, one could run simulations for larger, more complex molecular systems or could sample molecular motions more extensively in order to obtain information on various biologically relevant time scales. For the above reasons, we thus decided to extend the capabilities of the AMPS-NMR portal so that it would provide access to both CPU and GPGPU resources, depending on the requirements of the specific calculation requested by the user and taking into account also resource availability. To achieve this it is necessary to modify the submission pipeline that underlies the portal as well as to implement different versions of the AMBER suite. Some changes to the software code were necessary in order to achieve the best treatment of NMR-based experimental restraints during MD simulations. A further hurdle was the lack of an approach generally agreed upon to expose GPGPU resources on the EGI grid environment. To address this middleware limitation, we endeavored to contribute to testing the implementation of different queuing systems within EMI. In this contribution, we show the initial results of the above work. We also demonstrate an example biological application that is only possible on GPGPU systems. [1] Wassenaar TA, et al. WeNMR: Structural biology on the Grid. J. Grid. Computing 10:743-767, 2012 [2] Bertini I, Case DA, Ferella L, Giachetti A, Rosato A. A Grid-enabled web portal for NMR structure refinement with AMBER. Bioinformatics. 27:2384-2390, 2011
    
    Speakers: Andrea Giachetti (CIRMMP), Antonio Rosato (CIRMMP)
    
    Slides
  - 11:40
    
    Bioinformatics simulations by means of running virtual machines as a grid jobs in the MolDynGrid Virtual Laboratory 20m
    
    There are number of software for bioinformatics simulations used in MolDynGrid virtual laboratory (VL) [1] for in silico calculations of molecular dynamics, including GROMACS, NAMD, Autodock, etc [2-3]. Computational resources for such simulations are provided by a number of HPC clusters that are mostly the part of Ukrainian National Grid (UNG) infrastructure powered by Nordugrid ARC [4] and few clusters of European Grid Infrastructure. In the heterogeneous grid environment ensuring that every resource provider has required build of software, particular version of software with its dependencies, and moreover handling software updates is a non-trivial task. When number of software and build flavors grows, like in MolDynGrid case, the software management across dozens of clusters becomes almost impossible. The classical approaches to software maintenance includes building software on the fly within grid-job execution cycle and relying on VO-managed common filesystem like CVMFS with pre-built software. Both approaches works well in case of similar resource providers environment, but in case of completely heterogeneous hardware and software, including different OS distributions you should handle software builds for every of this platform. To efficiently handle software in such environments for MolDynGrid researches another approach has been introduced - running hardware accelerated virtual machines (VM) as a grid jobs. This approach eliminates the necessity to build software on every resource provider and introduce a single point for software updates. Software should be build for one virtual platform only. Moreover this approach also allows to use software for Windows. Thus adding virtualization layer will drop performance, the first thing that had been analyzed is the amount of such drop. On the UA-IMBG cluster, molecular dynamics in GROMACS for the same biological object had been computed on the same hardware with and without virtualization. The software environment was cloned from the host to the guest VM. GROMACS was chosen as a main software used by MolDynGrid VL in terms of CPU time consumption. To run VMs as grid jobs on grid-site worker nodes, there are several helpers running with root priveleges needed to setup virtual hardware and transfer job data to VM. The framework of components that support VM execution cycle as a grid job has been originally developed as a part of Ukrainian Medgrid VO project [5] and called Rainbow (ARC in the Cloud) [6]. Rainbow start with providing interactive access to Windows VMs running on UNG resources for analyses of medical data stored in grid for telemedicine [7]. For MolDynGrid VL several components had been added to Rainbow framework that implements data staging to VM and allows to add VM layer to grid-job processing cycle. Both CLI and Web MolDynGrid VRE interfaces had been extended to support VM submission with Rainbow. This approach allows to involve more resources for computations with particular software builds. Further ongoing developments of Rainbow for MolDynGrid includes support of Docker containers in addition to KVM VMs and GPGPU computations by means of GPU device pass-through.
    
    Speaker: Andrii Salnikov (Taras Shevchenko National University of Kyiv)
    
    Slides
  - 12:00
    
    GPGPU support in EGI Federated Cloud 20m
    
    Live demonstration of GPGPU-enabled VMs on the EGI Federated Cloud
    
    Speakers: Dr Andrii SALNIKOV (Taras Shevchenko National University of Kyiv), Dr Viet Tran (UI SAV)
    
    Paper
  - 12:20
    
    Session wrapup 10m
- 11:00 → 12:30
  Tutorial: Running Chipster in the EGI FedCloud Federico II
  
  Federico II
  
  Villa Romanazzi Carducci
  
  Conveners: Diego Scardaci (EGI.eu/INFN), Kimmo Mattila (CSC)
  - 11:00
    
    Running Chipster data analysis platform in EGI Federated Cloud 1h 30m
    
    Chipster is a bioinformatics environment that includes over 350 analysis tools for high-throughput sequencing and microarray data. The tools are complemented with a comprehensive collection of reference datasets, such as genome indexes for the Tophat and BWA aligners. The tools can be used on command line or via an intuitive GUI, which offers also interactive visualizations and workflow functionality. Chipster is open source and the server environment is available as a virtual machine image free of charge. In this tutorial session you will learn how virtual Chipster servers can be launched in the EGI Federated Cloud. The development and support work done by the EGI Federated Cloud community has made launching a Chipster server easy: The rOCCI client, needed to connect EGI Federated Cloud, is first installed to Linux or OSX machine. In addition you need to join chipster.csc.fi virtual organization. After these preliminary steps, you can use a simple utility tool. With the FedCloud_chipster_manager, Chipster VM image is automatically downloaded form the EGI AppDB, launched in the EGI Federated Cloud environment and linked to the required reference data sets and applications using the CVMFS system. More in-depth demonstrations about actually using Chipster for analyzing biological data will be shown in the NGS data analysis tuoral. Requirements: 1. Linux machine with rOCCI client, 2. Chipster.csc.fi VO membership.
    
    Speaker: Kimmo Mattila (CSC)
- 12:30 → 13:30
  
  Lunch
- 13:30 → 15:30
  Data without boundaries: market analysis and technical requirements Europa
  
  Europa
  
  Villa Romanazzi Carducci
  
  Collecting of user (SME and academia) requirements is imperative to profile new and enhanced EGI services, such as big and/or open services. Investigating agri-food and, fishery and marine sciences markets serve as a window to collect user requirements. A market analysis of these sectors works towards gaining insights on how to best build and co-design next generation e-infrastructures and drive the evolution of existing infrastructures. Furthermore, the investigation may provide insights on how to participate to the creation of value perhaps also through new and synergistic industries.
  
  Conveners: Kostas Koumantaros (GRNET), nadia nardi (Engineering Ingegneria Informatica spa)
  - 13:30
    
    Overview of Market Analysis 30m
    
    Speaker: Kostas Koumantaros (GRNET)
    
    Minutes
    
    Slides
  - 14:00
    
    Fishery/Marine: initial requirements and market analysis 30m
    
    Initial findings of a top-down and bottom-up analysis of the Fishery and marine sciences data analysis sector will be presented. The market analysis will cover: structure and size, stakeholders and their respective interested along the data value chain. Research/industry community requirements will be collected through the recently EU co-funded BlueBRIDGE (bluebridge-vres.eu) project. The community engagement will allow for a deeper look into the fishery and marine sciences data analysis sectors to be able to build next generation e-infrastructures.
    
    Speaker: Nadia Nardi (Engineering Ingegneria Informatica spa)
    
    Slides
  - 14:30
    
    Agri-food: initial requirements and market analysis 30m
    
    In the agri-food sector SMEs offering services to institutions and individuals doing research on agri-food topics are targeted. In the context of the project, the agri-food sector is mainly represented by agINFRA (http://aginfra.eu), the European hub for agri-food research and the domain-specific node for OpenAIRE (http://openaire.eu) and Big Data Europe (http://www.big-data-europe.eu). Taking advantage of the agINFRA community and the FI-PPP accelerator projects (among others) initial findings of a market analysis will be presented: market needs, recommendations for new and enhanced services for (big) and/or open data services targeting the industry and academia, and feedback collected through an online survey distributed to the SMEs of the the targeted ecosystem.
    
    Speaker: Kostas Koumantaros (GRNET)
    
    Slides
  - 15:00
    
    Summary and Discussion 30m
- 13:30 → 15:30
  Exploiting the EGI Federated clouds - Paas & SaaS workshop Scuderia
  
  Scuderia
  
  Villa Romanazzi Carducci
  
  Convener: Diego Scardaci (EGI.eu/INFN)
  - 13:30
    
    An integrated IaaS and PaaS architecture for scientific computing 20m
    
    Scientific applications often require multiple computing resources deployed on a coordinated way. The deployment of multiple resources require installing and configuring special software applications which should be updated when changes in the virtual infrastructure take place. When working on hybrid and federated cloud environments, restrictions on the hypervisor or cloud management platform must be minimised to facilitate geographic-wide brokering and cross-site deployments. Moreover, preserving the individual operation at the site-level in federated clouds is also important for scalability and interoperability. In that sense, the INDIGO-DataCloud project [1] has been designed with the objective of building up a PaaS-level cloud solution for research. One of the key multi-level components is the PaaS computing core. This part constitutes the kernel for the deployment of services and computing virtual infrastructures for the users. It is complemented with the virtualized storage, federated AAI and networking. The INDIGO-DataCloud PaaS core will be based on a microservice architecture [2]. Microservices consist of a set of narrowly focused, independently deployable services, typically implemented using container-embedded applications, exposed by RESTful interface. Microservices are designed to be highly scalable, highly available and targeted for the use in cloud environments. INDIGO’s microservices will be deployed, dynamically scheduled and managed using tools such as kubernetes [3]. In cases where multi-tenancy is not yet intrinsically supported by the particular microservice, like the container manager, INDIGO-DataCloud may decide to offer multiple instances to bridge that gap. INDIGO PaaS will offer an upper layer orchestration service for distributed applications using the TOSCA language standard [4]. It will deal with the requested service instantiation and application execution, managing the needed microservices in order, for example, to select the right end-point for the deployment. Cross-site deployments will also be possible. This PaaS, aimed at providing a more efficient platform for scientific computing, will require additional characteristics from the underlying layers. The INDIGO PaaS will leverage an enhanced IaaS that will provide a richer set of features currently missing. The usage of TOSCA permits IaaS providers to offer infrastructure orchestration, making possible to manage the deployment and configuration of the resources that are being provided. The life-cycle of the resources is therefore managed through the APIs exposed by the IaaS end-points. The TOSCA templates will be translated into their native deployment schemas using IM [5] for OpenNebula and Heat-Translator [6] for OpenStack HEAT. Both OpenNebula and OpenStack will incorporate drivers to support the deployment of containers as first-class resources on the IaaS. This will provide high efficiency when building up complex configurations from a repository of container images. The scheduling algorithms for both cloud management frameworks will be improved, in order to provide a better experience for the end-users and a more efficient utilization of the computational resources. The usage of two-level orchestrator (at the level of PaaS and within each IaaS instances) will enhance the capabilities of providing a dynamic and on-demand increase in cloud resources.
    
    Speakers: Dr Germán Moltó Martínez (UPVLC), Dr Giacinto Donvito (INFN), Dr Ignacio Blanquer (UPVLC)
    
    Slides
  - 13:50
    
    Virtual Research Environments as-a-Service 20m
    
    Virtual Research Environments (VREs) are innovative, web-based, community-oriented, comprehensive, flexible, and secure working environments conceived to serve the needs of science [3]. They are expected to act like "facilitators" and "enablers" of research activities conducted according to cutting-edge science patterns. They play the role of "facilitators" by providing seamless access to the evolving wealth of resources (datasets, services, computing) - usually spread across many providers including e-Infrastructures - needed to conduct a research activity. They play the role of "enablers" by providing scientists with state of the art facilities for supporting scientific practices, e.g. sharing and publishing comprehensive research activities giving access to the real research products while scientists are working with them [1], automatically generating provenance, capturing accounting, managing quota, and supporting new forms of transparent peer-reviews and collaborations by social networking. The development of such environments should be effective and sustainable to actually embrace and support research community efforts. Ad-hoc and from-scratch approaches are not suitable for the development and provision of such working environments because the overall costs (implementation, operation and maintenance) are neither affordable nor sustainable by every scientific community. In this presentation it is discussed the experience made by a series of initiatives and projects (e.g. D4Science and iMarine) enabling the creation and provisioning of Virtual Research Environments by the as-a-Service paradigm [2]. In particular, it is presented the gCube technology by focusing on the mechanisms enabling the automatic creation and operation of VREs by relying on an extended resource space (comprising datasets, functionalities, services) built by aggregating constituents from existing Infrastructures and Information Systems. This mechanism envisages a definition phase and a deployment phase. The definition phase is based on a wizard enabling a user to specify the characteristics of the VRE he/she is willing to enact in terms of datasets to be offered and services to be made available by selecting them from a catalogue. In addition to that, the VRE designer can specify requests for services customisations (e.g. enable/disable features) as well as establish the policies that govern the VRE (e.g. whether it is public or by invitation). The overall goal of the definition phase is to be as easy and as short as possible by abstracting on technical details. The deployment phase is completely automatic and results in the delivery of a web-based environment ready to be used. During this phase the VRE specification, after the approval by a Manager, is analysed and transformed in a deployment plan consisting of creating the software system and the secure application context needed to operate the VRE. This software system is created by instructing service instances to support the new VRE, by deploying new service instances dedicated to it, by allocating computing power, by deploying services giving access to the datasets. All of this is done according to resources usage policies and by maximising the overall exploitation of the resources forming the resource space.
    
    Speaker: Pasquale Pagano (CNR)
    
    Slides
  - 14:10
    
    Occopus and its usage to build efficient data processing workflow infrastructures in clouds 20m
    
    IaaS clouds are very popular since you can easily create simple services (Linux PC, web portal, etc.) in the cloud. However, the situation is much more difficult if you want to build dynamically, on demand a complex infrastructure tailored to your particular needs. A typical infrastructure contains database services, processing resources and presentation services. These services together provide the infrastructure you actually need to run your eventually complex application (e.g. a workflow) on it. The OCCO (One-Click Cloud Orchestration) framework developed in SZTAKI attempts to solve this problem in a very generic way by avoiding any specialization, i.e. it can work for any IaaS cloud type, on any Operating System type, for services with any complex interaction among them, etc. OCCO represents the second level above the IaaS layer within any cloud compute architecture. The talk will introduce the main services, the architecture and the internal structure of OCCO and explains how the required flexibility can be achieved with it. Particular attention will be given in the talk on how the TOSCA standard (Topology and Orchestration Specification for Cloud Applications) can be implemented in OCCO. The OCCO framework is currently under development towards supporting TOSCA specifications. In the talk the recent progress towards this support is also going to be introduced. The talk will demonstrate the flexibility of OCCI through an advanced data processing workflow. Data processing workflows are considered as networks of nodes where each node performs some computation/ data processing on the incoming data item and passes the result to the next one. OCCO is an ideal tool to be used for building such network of nodes performing data processing or streaming. The talk will show how an individual workflow or the network layout can be configured and how it is realised by OCCO.
    
    Speaker: Peter Kacsuk (MTA SZTAKI)
    
    Slides
  - 14:30
    
    R Computing services as SaaS in the Cloud 20m
    
    R is a programming language and software environment for statistical computing and graphics that is widely used in different context like environmental research thanks to different geodata packages. Within the EGI-Lifewatch Competence Centre context, one of the tasks is to provide a final user oriented application based on R so different solutions to achieve this goal are being explored. One of these solutions is layer based architecture with three layers: • Bottom layer: R instances that can be installed in cloud (allowing load balance) or HPC (we have a testbed based in a PowerPC cluster). • Medium layer: R server that interacts with an R client. • Top Layer: Web based solution with certificate authentication. The user interface can be deployed using tools like iPython notebook or Rstudio web version. This presentation will analyze how the different proposed solutions can fit or not to the final user requirements and what is the satisfaction from user point of view.
    
    Speaker: Fernando Aguilar (CSIC)
    
    Slides
  - 14:50
    
    Atmosphere: A Platform for Development, Execution and Sharing of Applications in Federated Clouds 20m
    
    The advent of cloud computing offers new opportunities for developers and users of scientific applications [1]. This paper presents results of research on efficient development, federation, execution and sharing of cloud computational services. We have investigated methods for integration of heterogeneous public cloud infrastructures into a unified computational environment, cost optimization of application deployment in a heterogeneous cloud environment (choosing optimal resources given the available funds), federated data storage mechanisms with extensions for public storage services, and dealing with resource demand spikes (optimization of platform middleware and user interfaces). This research was undertaken as a part of the EU VPH-Share project [2] and resulted in a platform called Atmosphere [2, 3] Recently, Atmosphere which has also become a part of the PLGrid e-infrastructure [4]. Atmosphere supports an iterative service development process. Computational software is exposed as a set of so-called Atomic Services (virtual machines (VMs) that can be created on demand from VM templates) which can be used either as standalone tools or as building blocks for larger workflows. Developers may iteratively construct new services which may be published, shared, or subjected to further development. Once published, services can be instantiated (executed) and made available on the Web for the community. In this way, with the help of a Web-based user interface, the platform provides an integrated tool supporting development and publishing of services for the entire VPH community. Atmosphere provides a full middleware stack, complete with end-user interfaces and APIs, enabling service developers to create, register and share cloud services, and end users to instantiate and invoke their features. Atmosphere federates 5 cloud IaaS technologies (OpenStack, EC2, MS Azure, RackSpace, Google Compute), 7 distinct cloud sites are registered with the VPH-Share infrastructure, there are over 250 Atomic Services available, and about 100 service instances are operating on a daily basis. Atmosphere has been used to host complex applications from several projects of the VPH community [5]: VPH-Share, MySpine, ARTreat, VPH-DARE and pMedicine as well as for medical students trainings at the University of Sheffield, the Jagiellonian University Medical College, and Karolinska Institutet [2, 3]. The platform is being improved in collaboration with application developers in order to ensure that their software can be easily ported to the cloud and provisioned in the PLGrid e-infrastructure.
    
    Speaker: Dr Marian Bubak (ACC Cyfronet and Department of Computer Science, AGH University of Science and Technology, Krakow, Poland)
    
    Slides
  - 15:10
    
    The VESPA Virtual Research Environment: a final report about the project at the beginning of clinical operations 20m
    
    The VESPA project aimed to provide a Virtual Research Environment (VRE) for qualitative and quantitative evaluation and rehabilitation of motor and cognitive diseases. It addressed more than thirty operational objectives to enable an extremely innovative platform for early evaluation and rehabilitation of cognitive diseases, such as Alzheimer’s Dementia, Mental Retardation and Linguistic Deficit, etc. VESPA is a pioneer project mixing brand new ICT technologies and Computer Science concepts in the fulfilling of patients and caregivers needs in the cognitive diseases field. In fact, on top of a fully immersive Virtual Reality system, it combines innovative and specialized 3D software applications, dedicated hand-free devices, a flexible and scalable Cloud Computing infrastructure, a powerful Science Gateway and a tele-supervision system. The project provided a completely new response to increasing demand for treatment of mental retardation, AD, Parkinson Disease, etc. Our open and flexible framework, the VESPA Library, extended a common gaming platform enabling integration of any cognitive application into a highly productive environment. By deriving from it, one can build a brand new evaluation test or rehabilitation task in the form of a 3D videogame by writing close-to-zero lines of code. The framework is highly customizable and include safe data management and transfer features. The development team created more than 80 applications designed by psychologists and neuropsychiatrists for three different kind of patients, in the range of simple to very hard, by just inheriting basic features from the framework. This is crucial for the growth of the community and the success of the VESPA system, aimed to feed a theoretically unlimited number of installations sites. The system also includes a Science Gateway that allows doctors, administrative staff members, and VESPA technicians to configure and manage system planning, operation, and results. Patients/caregivers take advantage from it by visualizing schedule and results of daily rehabilitation protocols, as well. Telemetries produced during operation are safely sent to the Cloud and the DataBase located at Health Center, so they can be available in near real-time to the community. Innovative and self-built devices can be plugged and used in the VESPA system with very few effort. This was the case of the home made instrumented glove built by VESPA, and devices like MyO Armband, Leap Motion sensor, etc. The validation process ran on three different group of patients. In the final part of the presentation, the results of VESPA system validation through Clinical Trials will be shown. By entering the market, the VESPA system will allow numbers of children and elders to live their daily rehabilitation sessions at schools and rest-homes so that no effort to be spent in transportation by caregivers. Since months, a community all around Europe is growing around the VESPA VRE. The VESPA system is now an open gate looking towards the next telemedicine offering fully immersive Virtual Reality applications inside a highly interactive platform to Health System actors by promising impressive results in terms of impact and speed for effective cognitive training activities.
    
    Speaker: Marco Pappalardo (Software Engineering Italia srl)
    
    Slides
- 13:30 → 15:30
  Infrastructure and services for human brain research Sala A+A1, Giulia centre
  
  Sala A+A1, Giulia centre
  
  Villa Romanazzi Carducci
  
  The aim of the Human Brain Project (HBP) is to accelerate our understanding of the human brain by integrating global neuroscience knowledge and data into supercomputer-based models and simulations. This will be achieved, in part, by engaging the European and global research communities using six collaborative ICT platforms: Neuroinformatics, Brain Simulation, High Performance Computing, Medical Informatics, High Performance Computing, Neuromorphic Computing and Neurorobotics.
  
  This session is intended for
  * NGIs cloud providers interested in how HBP can benefit from cloud provisioning for its big data integration needs and willing to support HBP
  * technology providers interested in offering solutions and participating to tests
  * members of the Federated Data Virtual Team
  
  Convener: Dr Yin Chen (EGI.eu)
  - 13:30
    
    VIP: a Virtual Imaging Platform for the long tail of science 20m
    
    Computing and storage have become key to research in a variety of biomedical fields, for example, to compute numerical simulations for research in medical imaging or cancer therapy, or to automate the analysis of digital images in neurosciences or cardiology. The Virtual Imaging Platform (VIP) is a web portal for medical simulation and image data analysis. It leverages resources available in the biomed Virtual Organisation of the European Grid Infrastructure to offer an open service to academic researchers worldwide. VIP aims to mask the infrastructure and enable a user experience as transparent as possible. This means that VIP has to take decisions as automatically, quickly, and reliably as possible regarding infrastructural challenges such as: -(1) the placement of data files on the storage sites, -(2) the splitting and distribution of applications on the computing sites, -(3) the termination of misbehaving runs. We heavily rely on the DIRAC service provided by France Grilles (the NGI of France) to take such decisions in the changing environment of EGI. In addition, we have developed 'non-clairvoyant’ techniques to specifically address the challenges of the applications provided in VIP. With VIP, researchers from all over the world can access important amounts of computing resources and storage with no required technical skills beyond the use of a web browser. EGI is essential to the success of VIP because it provides an open infrastructure relieving researchers from the burden of negotiating resource allocations with computing centres. Such an open policy enables the supply of services to the long tail of science, i.e. to the large number of groups and projects of modest size such an individual masters or PhD project, proof-of-concept studies, and so on.
    
    Speaker: Ms Sorina POP (CNRS, Creatis)
    
    Slides
  - 13:50
    
    The Characterisation Virtual Laboratory 20m
    
    In 2014, Monash University, through the Multimodal Australian ScienceS Imaging and Visualisation Environment (MASSIVE), and project partners, completed development of the NeCTAR-funded Characterisation Virtual Laboratory (CVL), a project to develop online environments for researchers using advanced imaging techniques, and demonstrate the impact of connecting national instruments with computing and data storage infrastructure. The CVL is a collaboration between Monash University, Australian Microscopy & Microanalysis Research Facility (AMMRF), Australian Nuclear Science and Technology Organisation (ANSTO), Australian Synchrotron, National Imaging Facility (NIF), Australian National University, the University of Sydney, and the University of Queensland. The partners joined together around the CVL project with three major goals: 1. To integrate Australia’s imaging equipment with specialised HPC capabilities provided by MASSIVE and National Computational Infrastructure (NCI) and with data collections provided by Research Data Storage Infrastructure (RDSI) nodes. More than 450 registered researchers have used and benefited from the technology developed by the CVL project, providing them with an easier mechanism to capture instrument data and process that data on centralised cloud and HPC infrastructure, including MASSIVE and NCI. 2. To provide scientists with a common cloud-based environment for analysis and collaboration. The CVL has been deployed across clouds at the University of Melbourne, Monash University, and QCIF. CVL technology has been used to provide easier access to HPC facilities at MASSIVE, University of Sydney BMRI, the Pawsey Centre, NCI and Central Queensland University. 3. To produce four exemplar platforms, called Workbenches, for multi-modal or large-scale imaging in Neuroimaging, Structural Biology, Energy Materials (X-ray), and Energy Materials (Atom Probe). The CVL environment now contains 103 tools for specialised data analysis and visualisation in Workbenches. Over 20 imaging instruments have been integrated so that data automatically flows into the cloud for management and analysis. In addition, a number of specialised workflows have been developed and integrated, including atom probe data processing using galaxy, and automatic brain MRI and histology registration. The newly developed infrastructure is also having an impact beyond the four workbenches. For example, HPC facilities across Australia, including facilities at MASSIVE, NCI, Central Queenland University, the Brain and Mind Research Institute at University of Sydney and the Pawsey Centre, use software developed by the CVL to help a wider range of researchers access imaging and visualisation services. The technology developed under the CVL provides simple access to HPC resources by newcomers and inexperienced HPC users. The Characterisation Virtual Laboratory is one of a number of initiatives led by Monash University to develop tailored environments for research communities. This presentation will introduce those initiatives, with a specific focus on the CVL.
    
    Speaker: Wojtek James Goscinski (Monash University)
  - 14:10
    
    Big data analysis in neuGRID towards a modern e-Health science 20m
    
    neuGRID (www.neugrid4you.eu) is a web portal aimed to help neuroscientists do high-throughput imaging research and provide clinical neurologists automated diagnostic imaging markers of neurodegenerative diseases for individual patient diagnosis. neuGRIDs user-friendly environment is customised to a range of users from students to senior neuroscientists working in the fields of Alzheimer's disease, psychiatric diseases, and white matter diseases. neuGRID aims to become a widespread resource for brain imaging analyses. neuGRID was first funded by the European Commission DG INFSO within the 7th Framework Program from 2008 to 2011. Here, the hardware and middleware infrastructure were developed. The second wave was funded in 2011 by the European Commission, now DG CONNECT, under the project neuGRID for you (N4U), with the main aim of expanding user services with more intuitive and graphical interfaces. N4U ended in April 2015. Through the single virtual access point Science Gateway web portal, users login and access a virtual imaging laboratory. Here users can upload, use, and share algorithms for brain imaging analysis, have access to large neuroimaging datasets, and make computationally intensive analyses, all the time with specialized support and training. Moreover, through the MoBrain initiative (https://wiki.egi.eu/wiki/EGI-Engage:Competence_centre_MoBrain), a neuroscientist will be able to find help and play the modern e-Health science spanning from macro level to micro-level as well and lowering the barriers to defeat a big social plague such as: Alzheimer’s disease. Thanks to distributed services and grid/cloud computational resources, analyses with neuGRID are much faster than traditional-style lab-based analyses. neuGRIDs proof-of-concept was carried out when an Alzheimer's disease biomarker (3D cortical thickness with Freesurfer and CIVET) was extracted from 6.500 MR scans in 2 weeks versus 5 years that it would have taken in a traditional setting. This presentation will introduce this initiative, with a specific focus on the different big data analyses conducted Europe wide so far.
    
    Speaker: alberto redolfi (INFN)
  - 14:30
    
    Human Brain Project 20m
    
    Speaker: Dr Sean Hill
  - 14:50
    
    Test activities for HBP/SP5 neuroinformatics: results and next steps 20m
    
    Speakers: Dr Lukasz Dutka (CYFRONET), Dr Yin Chen (EGI.eu)
  - 15:10
    
    Discussion 20m
- 13:30 → 15:30
  Tutorial: NGS data analysis Federico II
  
  Federico II
  
  Villa Romanazzi Carducci
  
  Convener: Dr Fotis Psomopoulos (Institute of Applied Biosciences, Center for Research and Technology Hellas)
  - 13:30
    
    NGS Data Analysis Training Workshop 2h
    
    Summary ======= "Big data" is one of today's hottest concepts, but it can be misleading. The name itself suggests mountains of data, but that’s just the start. Overall, data can be ‘big’ for three reasons – often referred to as the three V's: volume of data, velocity of processing the data, and variability of data sources. If any of these key features are present, then big-data tools are necessary, often combined with high network bandwidth and massive compute systems. Researchers working with genomics in life sciences (biomedicine, agri-food science etc) are producing big bio-data, mainly by the application of Next-Generation Sequencing (NGS) to give answer to important biological issues. NGS technologies are revolutionizing genome research. In order to deal with big bio-data, current approaches in Life Science research favor the use of established workflows facilitating the first steps in data analysis. This Training Workshop will focus on the particular needs of the researchers active in the field of NGS data analysis, but so far have limited or no experience with the use of big data tools and large compute systems. Description =========== The workshop will use compute resources from EGI, a publicly funded infrastructure that offers compute and storage resources and services for researchers in academia and industry. The workshop will consist of two parts; initially there are going to be presentations from key applications and workflows currently established in the EGI ecosystem that cater to the particular needs of NGS analysis. This first part will give the participants an in-depth idea of the state-of-the-art in this field, and prepare them for the second part, the hands-on exercises. The exercises will be carefully selected both in terms of generality (i.e. applicable to a wide range of NGS data and analyses such as quality control, filtering and trimming of reads, assembly, annotation and differential expression), as well as time constraints (i.e. small enough to conclude within the context of the session). The scope of these exercises will address issues such as input and reference data management, use of established analysis tools in a Cloud infrastructure, and tools for retrieving and further analyzing the produced output. All exercises will be performed on the Chipster platform (http://chipster.csc.fi/) using cloud resources from the EGI Federated Cloud infrastructure. The process of accessing cloud resources and launching Chipster will be briefly addressed in the context of this tutorial. However, there is a dedicated tutorial on "Running Chipster data analysis platform in EGI Federated Cloud" (https://indico.egi.eu/indico/contributionDisplay.py?sessionId=26&contribId=25&confId=2544) before this session so interested participants are encouraged to attend both tutorials. Impact ====== Researchers active in NGS are currently few in number, as compared with the number of life scientists. However, the rise of NGS data across all Life Science domains leads to an increasing demand of both trained personnel and novel tools and approaches. With this in mind, the goal of this workshop is to attract life science researchers from different fields in life sciences who are (a) actively using NGS data analysis workflows in their research, and (b) have limited experience in employing large scale computer systems or specifically EGI resources. Timetable ========= Part 1: NGS Data Analysis in EGI (40') 10' Quick Introduction to EGI resources, NGS Analysis workflows 10' Data Replication 10' Cloud Applications 10' Discussion / Wrap-up Part 2: Hand-on training (80') • Exercise #1: Connect to a Cloud VM • Exercise #2: Select ref data and replicate • Exercise #3: Upload test input NGS data • Exercise #4: Execute workflow • Exercise #5: Post-workflow analysis Additional Information - Requirements =========================== The exercises of the NGS Analysis Training Workshop will be solved using the Chipster platform. The computational resources required will be provided by the EGI Federated Cloud Infrastructure. The participants will be required to have a laptop (any OS with Java installed) which will be used to access the EGI training resources through SSH (for the launch of the VM) and the Java interface (for the exercises).
    
    Speakers: Dr Anastasia Hadzidimitriou (Institute of Applied Biosciences / CERTH), Dr Anna Vardi (Institute of Applied Biosciences / CERTH), Dr Fotis Psomopoulos (Aristotle University of Thessaloniki), Kostas Koumantaros (GRNET)
    
    Paper
    
    Sample Data for exercises
    
    Slides
- 15:30 → 16:00
  
  Coffee break
- 16:00 → 18:00
  Data without boundaries: metadata interoperability Europa
  
  Europa
  
  Villa Romanazzi Carducci
  
  Convener: Dr Lukasz Dutka (CYFRONET)
  - 16:00
    
    Data Repositories and Science Gateways for Open Science 20m
    
    The steep decrease of costs of large/huge-bandwidth Wide Area Networks has fostered in the recent years the spread and the uptake of the Grid Computing paradigm and the distributed computing ecosystem has become even more complex with the recent emergence of Cloud Computing. All these developments have triggered the new concept of e-Infrastructures which are being built since several years both in Europe and the rest of the world to support diverse multi-/inter-disciplinary Virtual Research Communities (VRCs) and their Virtual Research Environments (VREs). E-Infrastructure components can indeed be key platforms to support the Scientific Method, the “knowledge path” followed every day by scientists since Galileo Galilei. Distributed Computing and Storage Infrastructures (local High Performance/Throughput Computing resources, Grids, Clouds, long term data preservation services) are ideal both for the creation of new datasets and the analysis of existing ones while Data Infrastructures (including Open Access Document Repositories – OADRs – and Data Repositories – DRs) are essential also to evaluate existing data and annotate them with results of the analysis of new data produced by experiments and/or simulations. Last but not least, Semantic Web based enrichment of data is key to correlate document and data, allowing scientists to discover new knowledge in an easy way. However, although big efforts are being done in the last years, both at technological and political level, Open Access and Open Education are still far from being pervasive and ubiquitous and prevent Open Science to be fully established. One of the main drawbacks of this situation is the limiting effect it has on the reproducibility and extensibility of science outputs which are, since more than four centuries, two fundamental pillars of the Scientific Method. In this contribution we present the Open Access Repository (OAR), a pilot data preservation repository of INFN and other Italian Research Organisations' products (publications, software, data, etc.) meant to serve both researchers and citizen scientists and to be interoperable with other related initiatives both in Italy and abroad. OAR is powered by the INVENIO software and is both an Open Access Initiative conforming and an official OpenDOAR data provider, able to automatically harvest resources from different sources, including the Sponsoring Consortium for Open Access Publishing in Particle Physics (SCOAP3), using RESTful API’s. It is also one of the official OpenAIRE archives, compliant with version 3.0 of its guidelines. OAR allows SAML-based federated authentication and it is one of the Service Providers of the eduGAIN inter-federation; it is also connected to DataCite for the issuance and registration of Digital Object Identifiers (DOIs). But what makes OAR really different from other repositories is its capability to connect to Science Gateways and exploit Distributed Computing and Storage Infrastructures worldwide, including EGI and EUDAT ones, to easily reproduce and extend scientific analyses. In this presentation some concrete examples related to the data of the ALEPH and ALICE Experiments will be shown.
    
    Speaker: Roberto Barbera (University of Catania and INFN)
    
    Slides
  - 16:20
    
    Digital Knowledge Platforms 20m
    
    The concept of Digital Knowledge Platforms (DKP) as a framework to support the full data cycle in research is presented. DKPs extend the existing ideas in Data Management, first of all by providing a framework to exploit the power of ontologies at different levels. DKPs aim to preserve knowledge explicitly, starting with the description of the Case Studies, and integrating data and software management and preservation on equal basis. The uninterrupted support in the chain starts at the data acquisition level and covers up to the support for reuse and publication in an open framework, providing integrity and provenance controls. A first prototype developed for a LifeWatch pilot project with different commercial companies using only open source software will be described and compared to existing solutions from other research areas. The issues on the implementation of this platforms using cloud resources, and in particular FedCloud resources, will be discussed.
    
    Speaker: Jesus Marco de Lucas (CSIC)
    
    Slides
  - 16:40
    
    Maximising uptake by opening access to research: The BlueBRIDGE endeavour 20m
    
    Open Science is emerging as a force that by democratizing access to research and its products will produce advantages for the society, economy and the research system, e.g. "more reliable" and efficient science, faster and wider innovation, societal challenges-driven science. BlueBRIDGE is a European funded project realizing the Open Science modus operandi in the context of Blue Growth Societal Challenge. The overall objective of this project, starting from September ’15 and running over a 30 months timeline, is to support capacity building in interdisciplinary research communities. These communities are principally involved in increasing scientific knowledge on marine resource overexploitation, degraded environment and ecosystem. Their aim is to provide advices to competent authorities and to enlarge the spectrum of economic growth opportunities. BlueBRIDGE will implement and operate a set of Virtual Research Environments (VREs) facilitating communities of scientists from different domains (e.g. fisheries, biology, economics, statistics, environment, mathematics, social sciences, natural sciences, computer science) to collaborate in their knowledge production chain, from the initial phases, data collection and aggregation, to the production of indicators. These communities involve EU and International world-renowned leading institutions (e.g. ICES, IRD, FAO, UNEP) that provide informed advice on sustainable use of marine resources to their member countries. Furthermore, the communities also include relevant Commissions of international organizations, national academic institutions and small and medium enterprises (SMEs). VREs are innovative, web-based, community-oriented, comprehensive, flexible, and secure working environments conceived to serve the needs of science [2]. They are expected to act like "facilitators" and "enablers" of research activities conducted according to open science patterns. They play the role of "facilitators" by providing seamless access to the evolving wealth of resources (datasets, services, computing) - usually spread across many providers including e-Infrastructures - needed to conduct a research activity. They play the role of "enablers", by providing scientists with state-of-the-art facilities supporting open science practices [1]: sharing, publishing, and reproducing comprehensive research activities; giving access to research products while scientists are working with them; automatically generating provenance; capturing accounting; managing quota; supporting new forms of transparent peer-reviews and collaborations by social networking. The development of such environments should be effective and sustainable to actually embrace and support research community efforts. In this presentation, we described the set of VREs that will be developed and operated to serve four main scenarios of BlueBRIDGE: (i) Blue assessment; supporting the collaborative production of scientific knowledge required for assessing the status of fish stocks and producing a global record of stocks and fisheries, (ii) Blue economy; supporting the production of scientific knowledge for analysing socio-economic performance in aquaculture, (iii) Blue environment; supporting the production of scientific knowledge for fisheries & habitat degradation monitoring, and (iv) Blue skills; boosting education and knowledge bridging between research and innovation, in the area of protection and management of marine resources. BlueBRIDGE builds on the D4Science infrastructure and the gCube technology to operate the VREs by aggregating the needed data, software and services.
    
    Speaker: Dr Gianpaolo Coro (CNR)
    
    Slides
  - 17:00
    
    Storage Management in INDIGO 20m
    
    The INDIGO DataCloud project is set out to develop a data/computing platform targeting scientific communities, to enhance usefulness of existing e- infrastructures [1]. The developments in the storage are focus on two levels. On the IaaS level, QoS in storage will be adressed, by implementing a standardised extension to the CDMI standard, which enables management of storage quality e.g. access-latency, retention-policy, migration-strategy or data-lifecycle. This is closely related with intelligent identity management to harmonise access via different protocols, such as gridFTP, sftp and CDMI. This will allow to use CDMI to manage QoS of data that is accessible via gridFTP or sftp. On the PaaS level, INDIGO DataCloud will provide flexible data federation functionality, enabling users to transparently store and access their data between heterogeneous infrastructures. DataCloud will provide unified API's for data management based on state of the art standards, allowing both users and application developers to easily integrate DataCloud high level data management functionality into their use cases. One of the key features of this solution will be optimization of data access in various scenarios, which will include automatic pre-staging, maximum bandwidth usage via parallel transfers and enabling instant access to remote data through streaming. Furthermore, the layer will provide information to Cloud orchestration services allowing placing the computations in the sites where the data is already staged, or where it can be delivered efficiently. [1]https://www.indigo-datacloud.eu/
    
    Speakers: Marcus Hardt (KIT-G), Patrick Fuhrmann (DESY), Paul Millar (DESY)
    
    Slides
  - 17:20
    
    Wikidata - structured information for Wikipedia and what this means for research workflows 20m
    
    There have been several attempts at bringing structured information together with Wikipedia. The most recent one is Wikidata, a MediaWiki-based collaborative platform that uses dedicated MediaWiki extensions (Wikibase Repository and Wikibase Client) to handle semantic information. It started out in late 2012 as a centralized way to curate the information as to which Wikipedia languages have articles about which semantic concepts. Since then, it has been continuously expanding in scope, and it now has structured information about 14 million, including about some that do not have articles in any Wikipedia. It is not only the content that is growing in extent and usefulness, but the contributor community is expanding too, making Wikidata one of the most active Wikimedia projects. The use of Wikidata in research contexts has only begun to be explored. For instance, there are Wikidata items about all human genes, and they have been annotated with information about their interaction with other genes, with drugs and diseases, as well as with references to pertinent databases or the relevant literature. Another example is the epigraphy community, which uses Wikibase to collect information about stone inscriptions and papyruses. In this talk, I will outline different ways in which Wikidata and/ or Wikibase can and do interact with research workflows, as well as opportunities to expand these interactions in the future, especially in the context of open science and citizen science projects.
    
    Speaker: Daniel Mietchen (National Institute of Health (US))
  - 17:40
    
    EGI-Engage Open Data Platform Prototype 20m
    
    Speaker: Dr Bartosz Kryza (ACK Cyfronet AGH)
    
    Slides
- 16:00 → 18:00
  Exploiting the EGI Federated clouds - Paas & SaaS workshop Scuderia
  
  Scuderia
  
  Villa Romanazzi Carducci
  
  Convener: Diego Scardaci (EGI.eu/INFN)
  - 16:00
    
    Accessing Grids and Clouds with DIRAC services 20m
    
    Multiple scientific communities are using more and more intensive computations to reach their research goals. Various computing resources can be exploited by these communities making it difficult to adapt their applications for different computing infrastructures. Therefore, there is a need for tools for seamless aggregation of different computing and storage resources in a single coherent system. With the introduction of Clouds as a new innovative way of provisioning the computing resources, the necessity for the means of their efficient usage grows even more. The DIRAC project develops and promotes software for building distributed computing systems. Both workload and data management tools are provided as well as support for high level workflows and massive data operations. Services based on the DIRAC interware are now provided by several national grid infrastructure projects. The DIRAC4EGI service is operated by the EGI project itself. Other DIRAC services are provided by a number of national computing infrastructures ( France, UK, China, etc ). Those services are used by multiple user communities with different requirements and amounts of work. Experience of running multi-community DIRAC4EGI and national DIRAC services will be presented in this contribution.
    
    Speaker: Andrei Tsaregorodtsev (CNRS)
    
    Slides
  - 16:20
    
    The Ophidia stack: a big data analytics framework for Virtual Research Environments 20m
    
    The Ophidia project is a research effort on big data analytics facing scientific data analysis challenges in multiple domains (e.g. climate change). It provides a framework responsible for atomically processing and manipulating datacubes, by providing a common way to run distributive tasks on large set of fragments (chunks). Even though the most relevant use cases for Ophidia have been implemented in the climate change context, the domain-agnostic design of the internal storage model, operators and primitives makes easier the exploitation of the framework as a core big data technology for multiple Research Communities. Ophidia provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to efficiently deal with multidimensional data and a hierarchical data organization to manage large data volumes. The project relies on a strong background on high performance database management and OLAP systems to manage large scientific datasets. The Ophidia analytics platform provides several data operators to manipulate data cubes, and array-based primitives to perform data analysis on large scientific data arrays (e.g. statistical analysis, predicate evaluation, FFT, DWT, subsetting, aggregation, compression). The array-based primitives are built on top of well-known numerical libraries (e.g. GSL). Bit-oriented primitives are also available to manage B-cubes (binary data cubes). Metadata management support (CRUD-like operators) is also provided jointly with validation-based features relying on community/project-based vocabularies. The framework stack includes an internal workflow management system, which coordinates, orchestrates, and optimises the execution of multiple scientific data analytics and visualization tasks. Real-time workflow monitoring execution is also supported through a graphical user interface. Defining processing chains and workflows with tens, hundreds of data analytics operators can be a real challenge in many practical scientific use cases. The talk will also highlight the main needs, requirements and challenges regarding data analytics workflow management applied to large scientific datasets. Some real use cases implemented at the Euro Mediterranean Center on Climate Change (CMCC) will be also discussed. The results of a benchmark performed on the Athena Cluster at the CMCC SuperComputing Centre and regarding CMIP5 datasets will be also presented.
    
    Speaker: Dr Sandro Fiore (CMCC)
    
    Slides
  - 16:40
    
    CernVM-FS vs Dataset Sharing 20m
    
    The CernVM-FS is firmly established as a method of software and conditions data distribution for the LHC experiments and many other Virtual Organizations at Grid sites. Use of CernVM-FS is reaching now a new stage, its advantages starting to be acknowledged by communities activating within and making use of the EGI Federated Cloud. As the manipulation of research data within cloud infrastructures becomes more important for many communities, they started to look into the CernVM-FS as a technology that could bring expected benefits. The presentation will explain when CernVM-FS can be used for dataset sharing without losing the main benefits of the technology and then information on how to properly use it will be given. Pros and cons will be discussed and available use cases will be analysed. The presentation is proposed as the start for a round table and discussions with audience participation on the topic of dataset distribution within cloud infrastructures, specifically the EGI Federated Cloud.
    
    Speaker: Catalin Condurache (STFC)
    
    Slides
  - 17:00
    
    Cloud-enabled, scalable Data Avenue service to process very large, heterogeneus data 20m
    
    Compute-intensive applications such as simulations applied in various research areas and industry require computing infrastructures enabling highly parallel, distributed processing. Grids, clusters, supercomputers and clouds are often used for this purpose. There also exist tools that allow easier design and construction of such complex applications, typically in the form of workflows, which tools can utilize various types of distributed computing infrastructures (DCIs) and provide automated scheduling, submission and monitoring of workflow tasks (jobs). Some tools support job-level granularity, that is, each job in a workflow may potentially be executed in a different computing infrastructure. Numerous storage solutions exist, however, storage resources accessible from within a given DCI are often limited by the protocols supported by the computing elements themselves. Binding jobs to a particular storage resource makes very difficult to port the workflow to other computing resources, or exchange data between different DCIs. To alleviate this problem a data bridging solution had been proposed, called Data Avenue, through which all common storage operations (such as listing, folder creation, deletion, renaming) and data access (download/upload) can be done on a wider set of storage resources (SFTP, GridFTP, SRM, iRODS, S3, etc.) using a uniform web service (HTTP) interface. Jobs, in this way, become capable of accessing diverse storage resources regardless of the DCI where the job is currently being run, resulting in more flexible and portable workflows. Such a mediation service however occasionally implies very high CPU and network load on the server, as data exchanged over a storage-related protocol between the Data Avenue server and the storage has to be converted to HTTP established between the Data Avenue Server and the client. On massive, concurrent use, such as running parameter sweep applications where thousands of jobs may run in parallel, a single Data Avenue server could soon become a bottleneck, and clients may experience a significant decline in transfer rate. On the other hand, such peak loads are often followed by idle periods, when Data Avenue host will be underexploited. This presentation introduces a solution to scale Data Avenue (DA) services on-demand by multiplying the available Data Avenue servers. The solution uses cloud infrastructure (IaaS) to dynamically grow or shrink the capacity of the server depending on the current load, composed of architectural components: load balancer, cloud orchestrator, VM pool, and a common database. Load balancer is responsible for dispatching client requests to one of the servers in the VM pool, which contains virtual machines having individual Data Avenue services pre-installed. Cloud orchestrator continuously monitors the load of VMs in the pool, and based on predefined load thresholds, starts new or shuts down instances, respectively. A common database to which each DA VM connects persists data of client interactions over lifetimes of individual DA VMs. An important advantage of this solution is that clients communicate with a single Data Avenue endpoint (load balancer), whereas mechanisms behind the scenes are hidden. Details of the proposed solution and preliminary experimental results are also reported.
    
    Speaker: Peter Kacsuk (MTA SZTAKI)
    
    Slides
  - 17:20
    
    Supporting Big Data Processing via Science Gateways 20m
    
    With the rapid increase of data volumes in scientific computations, the importance of utilising parallel and distributed computing paradigms in data processing is becoming more and more important. Hadoop is an open source implementation of the MapReduce framework supporting processing large datasets in parallel and on multiple nodes in a reliable and fault-tolerant manner. Scientific workflow systems and science gateways are high level environments to facilitate the development, orchestration and execution of complex experiments from a user-friendly graphical user interface. Integrating MapReduce/Hadoop with such workflow systems and science gateways enables scientists to conduct complex data intensive experiments utilising the power of the MapReduce paradigm from the convenience provided by science gateway frameworks. This presentation describes an approach to integrate MapReduce/Hadoop with scientific workflows and science gateways. As workflow management systems typically allow a node to execute a job on a compute infrastructure, the task of integration can be translated into the problem of running the MapReduce job in a workflow node. The input and output files of the MapReduce job have to be mapped into the inputs and outputs of a workflow node. Besides executing the MapReduce job, the necessary execution environment (the Hadoop cluster) should also be transparently set up before and destroyed after execution. These operations should also be carried out from the workflow without further user intervention. Therefore, the concept of infrastructure aware workflow is utilised where first the necessary execution environment is created dynamically in the cloud, followed by the execution of workflow tasks, and finally breaking down of the infrastructure releasing resources. As implementation environment for the above concept, the WS-PGRADE/gUSE science gateway framework and its workflow solution has been utilized. However, the solution is generic and can also be applied to other grid or cloud based workflow systems. Two different approaches have been implemented and compared: the Single Node Method where the above described process is implemented as a single workflow node, and the Three Node Method where the steps of creating the Hadoop cluster, executing the MapReduce jobs, and destroying the Hadoop execution environment are separated. While the Single Node Method is efficient when embedding a single MapReduce experiment into a workflow, the Three Job Method allows more flexibility for workflow developers and results in better performance in case of multiple MapReduce experiments that can share the same Hadoop cluster. Both approaches support multiple storage solutions for input and output data, including local files on the science gateway, and also cloud-based storage systems such as Swift object storage and Amazon S3. These storage types can be freely mixed and matched when defining input and output data sources/destinations of the workflow. The current implementation supports OpenStack based clouds with a more generic solution including OpenNebula and generic EGI Federated Cloud support on its way. The presentation will describe the implementation concept and environment, will provide benchmarking experiments regarding the efficiency of the implemented approaches, and demonstrate how the solution can be utilised by scientific user communities.
    
    Speaker: Tamas Kiss (University of Westminster, London, UK)
    
    Slides
  - 17:40
    
    Scalable ABM platforms with friendly interaction to address practical problems. 20m
    
    Many problems can be addressed in a realistic way with the help of Agent Based Model tools. However, these tools are sometimes not easy to use for a final user, or are not able to scale up to use the computing resources required by the problem. We propose to develop a general platform supporting different ABM solutions, and deployed as a service in HPC Cloud resources. We analyze a first possible pilot, through the discussion of a simple but real use case: the anthropogenic impact in the water quality in a lake.
    
    Speaker: Luis Cabellos (CSIC)
    
    Slides
- 16:00 → 18:00
  Long tail of science: tools and services Sala A+A1, Giulia centre
  
  Sala A+A1, Giulia centre
  
  Villa Romanazzi Carducci
  
  Convener: Dr Gergely Sipos (EGI.eu)
  - 16:00
    
    A new platform from EGI for the long tail of science 20m
    
    Processes are well established for several years in EGI to allocate resources for user communities. However individual researchers and small research teams sometimes struggle to access grid and cloud compute and storage resources from the network of NGIs to implement ‘big data applications’. Recognising the need for simpler and more harmonised access for individual researchers and small research groups, aka. members of the ‘long-tail of science’ the EGI community started to design and prototype a new platform in October 2014. The platform development is coming to the end and this talk will present the new EGI platform and open this for the EGI community and their partners/users. The platform consists of a centrally operated 'user registration portal' and an expandable set of science gateways. Through the registration portal users can request resources from the platform, through the science gateways they can consume the allocated resources. The platform will be relevant for any scientist who works with large data, moreover it will include communication channels and mechanisms to connect these people to their respective National Grid Initiatives where they can receive more customised services.
    
    Speaker: Dr Gergely Sipos (EGI.eu)
    
    Slides
  - 16:20
    
    A science gateway example in the EGI long-tail of science platform: The Catania Science Gateway 20m
    
    This EGI Pilot for the Long Tail of Science [1] aims to design and prototype a new e-Infrastructure platform in EGI to simplify access to Grid and Cloud Computing services for the Long-Tail of Science (LToS), ie. those researchers and small research teams who work with large data, but have limited or no expertise in distributed systems. The project will establish a set of services integrated together and suited for the most frequent Grid and Cloud computing use cases of individual researchers and small groups. The INFN is involved in the LToS Pilot since its beginning and its responsibility is twofold: (i) to improve the Catania Science Gateway Framework (CSGF) [2], in order to fulfill the requirements of the Pilot, and (ii) deploy a Science Gateway for LToS users. In the last 6 months new features of the CSGF have indeed been implemented to better support diverse multi/inter-disciplinary Virtual Research Communities (VRCs) and allow scientists across the world to do better (and faster) research with an acceptable level of tracking of user activities and zero-barrier access to European ICT-based infrastructures. The most relevant improvements of the CSGF are: (i) the support for Per-User Sub-Proxies (PUSPs) [3] and (ii) the integration with the new EGI User Management Portal (UMP) for LToS researchers [4] developed by CYFRONET and based on Unity [5]. With the support for PUSPs, which add user-specific information to the CN proxy field, now it is possible to uniquely identify users that access ICT-based infrastructures using proxies issued by a common robot certificate. PUSPs are usually generated by the eTokenServer, a standard-based solution developed by INFN for central management of robot certificates and provisioning of proxies to get seamless and secure access to computing e-Infrastructures, based on local, Grid and Cloud middleware supporting the X.509 standard for authorization. The Authorisation and Authentication Infrastructure of the Catania Science Gateway Framework has been extended to support the OpenID-Connect protocol which is used by the EGI UMP to authenticate users. The approach followed by EGI with its UMP is to centralise the authorisation to access resources so only people holding an e-grant and with the right to perform computation and data access are authenticated and authorised. In this contribution we will present the new features of the CSGF, developed to support the LToS Pilot, and we will show some of the use cases already integrated in the Science Gateway dedicated to the project [5] which are seamlessly executed both on the EGI Grid and on the EGI Federated Cloud. Time permitting, a short demonstration will also be given.
    
    Speaker: Giuseppe La Rocca (INFN Catania)
    
    Slides
  - 16:40
    
    Opportunistic use of supercomputers: linking with Cloud and Grid platforms 20m
    
    Supercomputers are good candidates for opportunistic use, as a non-negligible fraction of the computing time may stay unused while waiting for a large number of cores to be available for a new parallel job to be executed. In practice this results in a typical occupancy below 90%, leaving yet an interesting 10% of computing time that could be used by short jobs using few cores. Users accessing to supercomputers however may not have such need for short jobs that may be more frequent for users of Cloud or Grid computing platforms. We explore the possibility of automatic back-filling execution in a supercomputer of jobs prepared for a Cloud or a Grid platform. Different options for this integration are presented, as well as its implementation on a top500 supercomputer for different applications. The experience using this schema to support the long-tail of science in biodiversity, providing access to more than 100 users from 20 different countries in the world, will be also described.
    
    Speakers: Fernando Aguilar (CSIC), Luis Cabellos (CSIC)
    
    Slides
  - 17:00
    
    Outreach strategies for the long tail of science in France 20m
    
    The research landscape in France is very scattered. There are a lot of units that depend on multiple research organisms or universities. The computing offer is also scattered and France Grilles is one of the many stakeholders that work at local, regional, disciplinary or national levels. « France Grilles aims at building and operating a multidisciplinary national Distributed Computing Infrastructure open to all sciences and to developing countries. » is the French NGI vision. The French major scientific organizations joint their forces in France Grilles. It implies that all their scientists may use the services if needed. A question is « how to reach them ? » About 100 000 academic researchers work in France and 60 000 more people are involved in research organisms and universities. CNRS staff represents about 30 000 people working in more than 1000 research units for example. Most of these units are shared with one or more other organisms. Looking for researchers who may need France Grilles resources and services implies to be organised. As computing is related to IT our strategy will mainly rely on IT people and on related initiatives and work-groups. In fact in almost all French research units or entities there is an IT team in charge of information system. This team is often in charge of the unit computing resources and is close to the researchers needs. In the presentation we will present this context and our detailed strategy. We will explain how we got involved in business networks, how we built fruitful relationship with other entities and how we benefit from platforms, tools or events. We will also present our communication and dissemination actions.
    
    Speaker: Romier Romier (CNRS)
    
    Slides
  - 17:20
    
    Developing the computing environment for new research communities in Romania 20m
    
    An overview is presented on the implementation of the computing environment for new research communities served by the Romanian Grid Infrastructure. These include the researchers involved in the Extreme Light Infrastructure – Nuclear Physics (ELI-NP) project, in computational biology and in the physics of condensed matter and nanomaterials. The new infrastructure provides access to HTC and HPC resources through a single web portal which features tools for the definition of workflows, job submission and monitoring, data analysis and visualization, and access to third-party software. A multi-disciplinary instance of the DIRAC framework is also integrated and used for production and training. The infrastructure will support various research activities, such as the numerical investigation of the new processes generated by the interaction of the nuclear matter with extreme electromagnetic fields at ELI-NP, the design of nanostructures relevant for the next generation of high-speed electronic devices, the modeling of various subcellular structures in bacteria, and the drug design.
    
    Speaker: Dr Ionut Traian Vasile (IFIN-HH)
    
    Slides
  - 17:40
    
    Engaging with the long tail of science in the Netherlands 20m
    
    Scientists from all disciplines turn to ICT to speed up their research. Helping these, often novice, users through a centralised support model does not scale very well. It is therefore important to engage local research support offices at the scientific institutes to help scientists reach and use the local, national and international e-infrastructure resources. Through the Support4research program SURF is reaching out to these supporters by offering a comprehensive catalogue of services being offered, provides training in using those resources and discusses the specific needs of researchers on an institute level. Preliminary results show that there is a need for knowledge exchange on applying e-infrastructure resources within and between institutes on both a national and international level. Furthermore, processes and mutually agreed work flows need to be put in place to streamline research support when different institutes or resource providers are involved. A lot of work still remains in the area of training as many of the support offices focus on common (non research oriented) ICT support.
    
    Speaker: Jan Bot (SARA)
    
    Slides
- 16:00 → 18:00
  Tutorial: Programming Distributed Computing Platforms with COMPSs Federico II
  
  Federico II
  
  Villa Romanazzi Carducci
  
  In the tutorial, the syntax, programming methodology and an overview of the COMPSs runtime internals will be given. The attendees will get a first lesson about programming with COMPSs that will enable them to start programming with this framework. The attendees will analyze several examples of COMPSs programming model and will be able to develop simple COMPSs applications and to run them in the EGI Federated Cloud testbed.
  
  Prerequisites:
  - Bring your own laptop
  - (optional)Virtualbox with COMPSs image installed to run local examples
  Content:
  * Introduction to COMPSs and the integration with the EGI Fed Cloud (15’)
  * Presentation of COMPSs applications in Java and Python (30’)
  - Examples of code from real science use cases (bioinformatics, astrophysics, etc)
  - Examples of benchmarks in Pyhton (KMeans, Word count)
  * Hands on using the virtual machine (75’)
  - Access to the VM
  - Configuration of the COMPSs runtime
  - Upload of demo data to the storage
  - Execution of the applications
  - Monitoring, debugging and tracing
  * Final notes
  
  Convener: Daniele Lezzi (Barcelona Supercomputing Center)
  - 16:00
    
    Programming Distributed Computing Platforms with COMPSs 2h
    
    Distributed computing platforms like clusters, grids and clouds pose a challenge on application developers due to different issues such as distributed storage systems, complex middleware, geographic distributions. COMPSs [1] is a programming model which is able to exploit the inherent concurrency of sequential applications and execute them in a transparent manner to the application developer in distributed computing platforms. This is achieved by annotating part of the codes as tasks, and building at execution a task-dependence graph based on the actual data consumed/produced by the tasks. The COMPSs runtime is able to schedule the tasks in the computing nodes and take into account facts like data locality and the different nature of the computing nodes in case of heterogeneous platforms. Additionally, recently COMPSs has been enhanced with the possibility of coordinating Web Services as part of the applications and extended on top of a big data storage architectures. In the course, the syntax, programming methodology and an overview of the runtime internals will be given. The attendees will get a first lesson about programming with COMPSs that will enable them to start programming with this framework. The attendees will analyze several examples of COMPSs programming model compared with other programming models, such as Apache Spark, and also examples of porting libraries and codes to this framework. Different programming languages will be used including Java and Python whose adoption for scientific computing has been gaining momentum in the last years [2]. A hands-on with simple introductory exercises will be also performed. The participants will be able to develop simple COMPSs applications and to run them in the EGI Federated Cloud testbed. COMPSs is available in the EGI Cloud Marketplace as solution [3] for the integration of applications (use cases from BioVeL, LOFAR and EUBrazilCC communities) in the federated cloud environment providing scalability and elasticity features.
    
    Speaker: Daniele Lezzi (Barcelona Supercomputing Center)
- 18:00 → 19:00
  
  Shaping the EGI flagship project in H2020 WP16-17 (CLOSED) Sala A+A1
  
  Sala A+A1
  
  Villa Romanazzi Carducci
  
  Via G. Capruzzi, 326 70124 Bari Italy
  
  Convener: Dr Tiziana Ferrari (EGI.eu)
Thursday, 12 November
- 09:00 → 10:30
  Current status and evolution of the EGI AAI Scuderia
  
  Scuderia
  
  Villa Romanazzi Carducci
  
  EGI is serving many user communities, distributed collaborations, international virtual organizations, providing them a portfolio of federated services. Federated authentication and authorization is a critical capability that is needed to be productive in such a diverse landscape of use cases and service providers.
  This session is a follow up to the AAI session that took place at the EGI Conference in Lisbon in May 2015 and will include presentations on the current state of the art of the e-infrastructures, in terms of AAI solutions, and their evolution.
  Attending this session, EGI and the participating scientific communities will be able to discuss the evolution of the AAI landscape in Europe and how EGI enables user communities to overcome barriers and collaborate securely on top of EGI e-Infrastructure. As part of this session, EGI will present the initial outcomes from the adaption of federated access by EGI services and tools.
  
  Convener: Peter Solagna (EGI.eu)
  - 09:00
    
    EGI-Engage: the AAI strategy for the EGI infrastructure 25m
    
    Speaker: Christos Kanellopoulos (GRNET)
    
    Slides
  - 09:25
    
    Services for an easy access to X509 personal certificates 15m
    
    Speaker: Mischa Salle (FOM)
    
    Slides
  - 09:40
    
    Comparison of Authentication and Authorization e-Infrastructures for Research 20m
    
    Today there exist several international e-Infrastructures that were built to address the federated identity management needs of research and education in Europe as well as the rest of the world. While some of these e-Infrastructures were specifically built for particular groups of research communities (DARIAH, ELIXIR AAI, CLARIN SPF), others were built with a more general target group in mind. The second group includes eduGAIN, EGI, EUDAT, Moonshot and to some extent also Stork. All of these "general-purpose" e-Infrastructures are international or even global. They differ in characteristics, coverage, governance and technology even though they all share the same goal: Provide an infrastructure to facilitate the secure exchange of trusted identity data for authentication and authorization. As most of these five e-Infrastructures use different and quite complex technologies, it is often difficult to know and understand even the basic concepts they are based on. Even operators of one particular e-Infrastructure often don't know sufficiently about the technical mechanisms, the policies and the needs of the main users of the other e-Infrastructures. Having a good overview about the different e-Infrastructures in these regards is even more difficult for research communities that are about to decide which and how to use one of these e-Infrastructures for their own purposes. It is thus no surprise that it is hard for research communities to learn and understand the differences and commonalities of these e-Infrastructures. Therefore, the presentation aims at shedding light on the uncharted world of e-Infrastructures by providing a comprehensive and objective overview about them. It will cover the differences, coverage, advantages, known limitations, as well as the overlaps and opportunities for interoperability. The presentation will be based on an e-Infrastructure comparison document that is written by the GÉANT project in collaboration with and involvement from the described e-Infrastructures that will be invited to contribute to the document.
    
    Speaker: Lukas Haemmerle (SWITCH)
    
    Slides
  - 10:00
    
    Enhancing the user life-cycle management management in EGI Federated Cloud 15m
    
    Basic FedCloud user access scenario is composed of a VOMS server for authentication and authorization, and the site itself. Users with valid VOMS credentials are automatically created on sites. This solution is easy to deploy and manage but has several drawbacks if you need to support the whole user life-cycle. In this presentation we will introduce Perun as an additional component in the described scenario. Perun is EGI Core Service for VO and group management, it also provides functionality for managing access to services. It supports the whole user life-cycle from user import and enrollment through user expiration and membership renewal to complete account deletion and deprovisioning from services. In addition, it supports linking of multiple external identities (federated identities, X.509 certificates, kerberos, …) to one user account. As a part of its service management capabilities, Perun can propagate user accounts to both VOMS and sites. VOMS will still be used as the authentication and authorization service. User data is managed centrally and then distributed to VOMS. For example, if the user wants to change her/his certificate, she or he is able to do it in one place even though he is a member of several VOs. Active propagation of user data to sites enables users to change their preferences (e.g. contact e-mail) in one place, then the information is distributed to all sites without any further action required from the users. More importantly, it enables sites to know about expired or suspended users and take appropriate action, such as suspending or stopping their virtual machines. That substantially enhances security of Federated Cloud sites.
    
    Speaker: Slavek Licehammer (CESNET)
    
    Slides
  - 10:15
    
    The Indigo AAI 15m
    
    The Indigo Project [1] set out to develop a data and computing platform targeting scientific communities, deployable on multiple hardware and provisioned over hybrid e-infrastructures. This includes delegation of access tokens to a multitude of (orchestrated) virtual machines or containers as well as authentication of REST calls from and to VMs and other parts of the infrastructure. We introduce different tokens for delegation and tokens for accessing services directly. In this contribution we describe - the Indigo approach to address token handling (delegation tokens, access tokens, ...) - token translation to support SAML, X.509 and more, on client and server side - the plan to include support for VO-managed groups - our approach to providing a more fain grained limitation of delegated access tokens [1]https://www.indigo-datacloud.eu/
    
    Speakers: Andrea Ceccanti (INFN), Marcus Hardt (KIT-G)
    
    Slides
- 09:00 → 10:30
  Demand Of Data Science Skills & Competences Sala A+A1, Giulia Centre
  
  Sala A+A1, Giulia Centre
  
  Villa Romanazzi Carducci
  
  On behalf of the EDISON consortium, we kindly invite you to participate in its first workshop on the Demand for Data Science Skills & Competences.
  
  The emergence of Data Science technologies is having an impact on nearly every aspect of how research is conducted. Data Science is considered as main enabler and facilitator of the Open Science Initiative for the European Research Area (ERA).
  
  The effective use of Data Science technologies requires new skills and demands for new professions, usually referred as the Data Scientist: an expert who is capable both to extract meaningful value from the data collected and also manage the whole lifecycle of Data, including supporting Scientific Data e-Infrastructures. The future Data Scientists must possess knowledge (and obtain competencies and skills) in data mining and analytics, information visualisation and communication, as well as in statistics, engineering and computer science, and acquire experiences in the specific research or industry domain of their future work and specialisation.
  
  CONTEXT
  The Horizon 2020 EDISON project (September 1, 2015 August 30, 2017) will develop a sustainable business model that will ensure a significant increase in the number and quality of data scientists graduating from universities and being trained by other professional education and training institutions in Europe. This will be accomplished through the development of a number of inter-connected activities including the definition of required Data Science competences and skills.
  
  WORKSHOP OBJECTIVES
  The EDISON Competence Framework will formalise the Data Scientist profile using the European eCompetence Framework (eCF), extending it to cover all identified profiles corresponding to the major stakeholders/employers of the future specialists. This workshop will present interim project results and establish an open dialogue across communities to characterize the existing education and training resources in order further the development and validation of the Competence Framework.
  Participants are invited to contribute to
  - The analysis of organizational and employer requirements for competences and skills of Data Scientists
  - Identifying and formalizing profiles corresponding to the major stakeholders/employers of the future specialists
  
  TARGET AUDIENCES
  European Research e-Infrastructure and European research institutions, data intensive and data driven industries, innovation companies and SME as well as related community initiatives interested in supporting the developing in data science experts.
  
  Participants will benefit through
  - Active involvement in structuring profiles and identification of commons and specifics (sub-profiles) for their respective sector
  - Reviewing, discussing and, where appropriate, endorsing recommendations related to the formalization of the Data Science profession
  - Being part of an ongoing open dialogue across communities
  
  Conveners: Holger Brocks (InConTec GmbH), Jana Becker (FTK - Forschungsinstitut fuer Telekommunikation und Kooperation e.V.), Prof. Matthias Hemmje (FernUniversität in Hagen)
  - 09:00
    
    Welcome and introduction 10m
    
    Speaker: Prof. Matthias Hemmje (FernUniversität in Hagen)
    
    Slides
  - 09:10
    
    Current State Of Demand Of Data Science Skills & Competences 20m
    
    The emergence of Data Science technologies (also referred to as Data Intensive Science or Big Data technologies) is having an impact, at a fundamental level, on nearly every aspect of how research is conducted, how research data are used and shared. Data Science is considered as main enabler and facilitator of the recently launched by EC the Open Science initiative for European Research Area (ERA). The effective use of Data Science technologies requires new skills and demands for new professions, usually referred as the Data Scientist: an expert who is capable both to extract meaningful value from the data collected and also manage the whole lifecycle of Data, including supporting Scientific Data e-Infrastructures. The future Data Scientists must posses knowledge (and obtain competencies and skills) in data mining and analytics, information visualisation and communication, as well as in statistics, engineering and computer science, and acquire experiences in the specific research or industry domain of their future work and specialisation. The Horizon 2020 EDISON project (1 September 2015 – 30 August 2017) aims to develop a sustainable business model that will ensure a significant increase in the number and quality of data scientists graduating from universities and being trained by other professional education and training institutions in Europe. This will be accomplished through the development of a number of inter-connected activities including the definition of required skills and competences, defining a Data Science Competence Framework (CF-DS) and also a Model Curriculum (MC-DS). The project will work in close co-operation with experts and practitioners involved and interested in the development of Data Science academic educational and professional training programes. The target is to define basic competences and skills for new profession of data scientist with the focus on European e-Infrastructure and industry needs. This cooperation will take place in consultation and validation activities and roundtables with relevant stakeholders and institutions. The proposed workshop Demand Of Data Science Skills & Competences will contribute by presented EDISON recent development and initiate an open dialogue across communities to characterize the existing needs and practical requirements, topical trends and preferences in order to create the Competence Framework for Data Science. Participants are invited to contribute to the following objectives: • Analysis of organizational and employer requirements for competences and skills of Data Scientists • Elicitation of education and training needs in various contexts, focusing on different industry sectors. Particular training and continuous education needs for practitioners (or self-made data scientists) to support advanced European Research e-Infrastructure development. • Discussing perspectives of the educational path for the Universities with the potentials of starting careers into data-driven industries, as well as for life-long learning programs to identify suitable paths to carrier development • Identified profiles corresponding to the major stakeholders/employers of the future specialists Target audience: European Research e-Infrastructure and European research institutions, data intensive and data driven industries, innovation companies and SME as well as related community initiatives interested in supporting the developing in data science experts
    
    Speaker: Yuri Demchenko (University of Amsterdam)
    
    Slides
  - 09:30
    
    Demand from e-Infrastructures 20m
    
    Speaker: Sy Holsinger (EGI.eu)
    
    Slides
  - 09:50
    
    Big Data Infrastructure And Skills From A Consulting Point Of View 15m
    
    Speaker: Kevin Berwind (University of Hagen)
    
    Slides
  - 10:05
    
    Data Science Competences To Understand Big Data Analysis From A Management Perspective 15m
    
    Speaker: Marco Xaver Bornschlegl (University of Hagen)
    
    Slides
  - 10:20
    
    Q&A 10m
- 09:00 → 10:30
  
  EGI Council meeting (closed) Europa
  
  Europa
  
  Villa Romanazzi Carducci
  
  Convener: Yannick Legre (EGI.eu)
- 09:00 → 10:30
  Tutorial: Data and Processes without Boundaries: D4Science as a case study Federico II
  
  Federico II
  
  Villa Romanazzi Carducci
  
  Prerequisites:
  Participants need to bring their own laptop, connected to Internet
  No advanced mathematical skills are required
  No desktop application needs to be installed
  
  Content:
  The D4Science e-Infrastructure is a distributed network of service nodes designed to exploit the EGI Federated Cloud, residing on multiple sites, and managed by one or more organizations. It allows scientists to collaborate and offers a multiplicity of facilities as-a-service: data sharing, transfer, harmonization, Cloud processing and storage. D4Science has been used to support communities in several domains and it hosts models and data contributed by several international organizations. This tutorial will give attendees an overview of how to access and share data, how to execute either simple scripts or complex methods implemented in different languages, e.g. Fortran, R, Java, etc, on FedCloud overcoming technical boundaries not so often hidden in the exploited technologies.
  
  Outline
  - The D4Science e-Infrastructure and Virtual Research Environments
  - Practice with the D4Science e-Infrastructure through web interfaces: sharing, social networking, interaction with applications
  - Geospatial data visualization and representation
  - Playing with specific domain models to familiarize with D4Science
  --- Biological Science: (1) Accessing and representing large heterogeneous biological data, (2) Production of biodiversity trends, (3) Cloud computing and modelling with biological data
  --- Environmental Science: Federation and visualisation of environmental data
  
  Conveners: Pasquale Pagano (CNR), gianpaolo coro (CNR)
  - 09:00
    
    A Tutorial on Hybrid Data Infrastructures: D4Science as a case study 1h 30m
    
    An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations allowing scientists residing at distant places to collaborate. They may offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction. E-Infrastructures can have different implementations (Andronico et al 2011). A major distinction is between (i) Data e-Infrastructures, i.e. digital infrastructures promoting data sharing and consumption to a community of practice (e.g. MyOcean, Blanc 2008) and (ii) Computational e-Infrastructures, which support the processes required by a community of practice using GRID and Cloud computing facilities (e.g. Candela et al. 2013). A more recent type of e-Infrastructure is the Hybrid Data Infrastructure (HDI) (Candela et al. 2010), i.e. a Data and Computational e-Infrastructure that adopts a delivery model for data management, in which computing, storage, data and software are made available as-a-Service. HDIs support, for example, data transfer, data harmonization and data processing workflows. Hybrid Data e-Infrastructures have already been used in several European and international projects (e.g. i-Marine 2011; EuBrazil OpenBio 2011) and their exploitation is growing fast supporting new projects and initiatives, e.g. Parthenos, Ariadne, Descramble. A particular HDI, named D4Science (Candela et al. 2009), has been used by communities of practice in the fields of biodiversity conservation, geothermal energy monitoring, fisheries management, and culture heritage. This e-Infrastructure hosts models and resources by several international organizations involved in these fields. Its capabilities help scientists to access and manage data, reuse data and models, obtain results in short time and share these results with other colleagues. In this tutorial, we will give an overview of the D4Science capabilities; in particular, we will show practices and methods that large international organizations like FAO and UNESCO apply by means of D4Science. At the same time, we will explain how the D4Science facilities conform to the concepts of e-Infrastructures, Virtual Research Environments (VREs), data sharing and experiments reproducibility. In our tutorial, we will give insight about how D4Science contributors can add new models and algorithms to the processing platform. D4Science adopts methods to embed software developed by communities of practice involving people with limited expertise in Computer Science. Community software involves legacy programs (e.g. written in Fortran 90) as well as R scripts developed under different Operating Systems and versions of the R interpreters. D4Science is able to manage this multi-language scenario in its Cloud computing platform (Coro et al. 2014). Finally, D4Science uses the EGI Federated Cloud (FedCloud) infrastructure for data processing: computations are parallelized by dividing the input in several chunks and each chunk is sent to D4Science services residing on FedCloud (Generic Workers) to be processed. Furthermore, another D4Science service executing data mining algorithms (DataMiner) also resides on FedCloud and adopts an interface that is compliant with the Web Processing Service (WPS, Schut and Whiteside 2015) specifications.
    
    Speakers: Pasquale Pagano (CNR), gianpaolo coro (CERN)
    
    Slides
- 10:30 → 11:00
  
  Coffee break
- 11:00 → 12:30
  AARC project workshop Scuderia
  
  Scuderia
  
  Villa Romanazzi Carducci
  
  The AARC project that started in May 2015 aims is a collaboration among
  different parties, such as NRENs, e-infrastructure service partners,
  including various user communities and the libraries.
  
  The project aims to build on eduGAIN and on federated access to deliver
  an integrated architecture that connects all existing AAIs deployed in
  the R&E community. AARC has also a strong focus on training both on the
  technical and policy aspects of federated access as well as to promote
  AARC results. AARC results are expected to be validated via selected
  pilots and by the buying in of the user communities.
  
  A number of preliminary results will be available in the fall, such as
  the initial draft for the integrated architecture, two deliverables on
  the requirements for both the technical work as well as for the training
  content and the initial preparation for the first training material.
  
  In this sessions AARC results will be validated with the relevant communities. During the session AARC will also present the intermediate results on the architecture and the initial content of the training.
  
  Convener: Christos Kanellopoulos (GRNET)
  - 11:00
    
    Introduction: first 8 months of AARC projec 15m
    
    Speaker: Christos Kanellopoulos (GRNET)
    
    Slides
  - 11:15
    
    Dissemination and training programs 20m
    
    Speaker: Marialaura Mantovani (GARR)
    
    Slides
  - 11:35
    
    Community requirements 15m
    
    Speaker: Peter Solagna (EGI.eu)
    
    Slides
  - 11:50
    
    Architecture overview 20m
    
    Speaker: Marcus Hardt (KIT-G)
    
    Slides
  - 12:10
    
    Level of Assurence management 15m
    
    Speaker: David Groep (FOM)
    
    Slides
  - 12:25
    
    Discussion 5m
- 11:00 → 12:30
  Academic Supply For Data Science Sala A+A1
  
  Sala A+A1
  
  Villa Romanazzi Carducci
  
  On behalf of the EDISON consortium, we kindly invite you to participate in its first workshop on the Academic Supply to Data Science.
  The emergence of Data Science technologies is having an impact on nearly every aspect of how research is conducted. Data Science is considered as main enabler and facilitator of the Open Science Initiative for the European Research Area (ERA).
  
  The effective use of Data Science technologies requires new skills and demands for new professions, usually referred as the Data Scientist: an expert who is capable both to extract meaningful value from the data collected and also manage the whole lifecycle of Data, including supporting Scientific Data e-Infrastructures. The future Data Scientists must possess knowledge (and obtain competencies and skills) in data mining and analytics, information visualisation and communication, as well as in statistics, engineering and computer science, and acquire experiences in the specific research or industry domain of their future work and specialisation.
  
  CONTEXT
  The Horizon 2020 EDISON project (September 1, 2015 August 30, 2017) will develop a sustainable business model that will ensure a significant increase in the number and quality of data scientists graduating from universities and being trained by other professional education and training institutions in Europe. This will be accomplished through the development of a number of inter-connected activities including the definition of required Data Science competences and skills and the Data Science Body of Knowledge as a foundation for the following definition of the Data Science model curriculum.
  
  WORKSHOP OBJECTIVES
  EDISON will work in close co-operation with experts and practitioners involved and interested in the development of Data Science academic educational and professional training programmes. The target is to discuss and to describe new outlines of professions in the field of data scientist for academic and industrial purpose. This cooperation will take place in consultation and validation activities and roundtables with relevant stakeholders and institutions.
  The EDISON Body of Knowledge will be structured following the major knowledge areas that represent the data lifecycle and organizational workflow to use data to achieve their main operational goals. This workshop will present interim project results and establish an open dialogue across communities to characterize the existing education and training resources in order further the development and validation of the Body of Knowledge.
  Participants are invited to contribute to:
  - The EDISON inventory and taxonomy by providing an overview of existing curricula, training programmes and related educational resources
  - Determining the Body of Knowledge for Data Science, identify and discuss common conceptual elements and gaps among the existing offerings
  - The development of the Data Science Model Curriculum
  - The formalization of the Data Scientist profession
  
  TARGET AUDIENCES
  Universities and scholars involved in Data Science academic programmes who are willing to engage in an open dialogue about the determination of the EDISON Body of Knowledge and the development of the Data Science Model Curriculum.
  Participants will benefit through:
  - Reviewing, discussing and, where appropriate, endorsing recommendations related to the formalization of the Data Science profession and required educational and training programmes.
  - Being part of an ongoing open dialogue across communities
  
  Convener: Prof. Matthias Hemmje (FernUniversität in Hagen)
  - 11:00
    
    Academic Supply For Data Science – Consultation And Validation Of Body Of Knowledge 20m
    
    The emergence of Data Science technologies (also referred to as Data Intensive Science or Big Data technologies) is having an impact, at a fundamental level, on nearly every aspect of how research is conducted, how research data are used and shared. Data Science is considered as main enabler and facilitator of the recently launched by EC the Open Science initiative for European Research Area (ERA).The effective use of Data Science technologies requires new skills and demands for new professions, usually referred as the Data Scientist: an expert who is capable both to extract meaningful value from the data collected and also manage the whole lifecycle of Data, including supporting Scientific Data e-Infrastructures. The future Data Scientists must posses knowledge (and obtain competencies and skills) in data mining and analytics, information visualisation and communication, as well as in statistics, engineering and computer science, and acquire experiences in the specific research or industry domain of their future work and specialisation. The Horizon 2020 EDISON project (1 September 2015 – 30 August 2017) aims to develop a sustainable business model that will ensure a significant increase in the number and quality of data scientists graduating from universities and being trained by other professional education and training institutions in Europe. This will be accomplished through the development of a number of inter-connected activities including the definition of required skills and competences. For that purpose a Data Science Body of Knowledge (DS-BoK) will be created for defining a Data Science Competence Framework (CF-DS) and also a Model Curriculum (MC-DS). The project will work in close co-operation with experts and practitioners involved and interested in the development of Data Science academic educational and professional training programes. The target is to discuss and to describe new outlines of professions in the field of data scientist for academic and industrial purpose. This cooperation will take place in consultation and validation activities and roundtables with relevant stakeholders and institutions. The proposed workshop Academic Supply For Data Science will contribute by presented EDISON recent development and initiate an open dialogue across communities to characterize the existing education and training resources in order to create an Body of Knowledge. Participants are invited to contribute to the following objectives: • Inventory and taxonomy of existing curricula, training programes and related educational resources • Determining the Body of Knowledge for Data Scientists, identify common conceptual elements and gaps among the present offering • Overview of existing curricula, training programes and related educational resources as contribution to the Data Science Model Curriculum (MC-DS) definition • Formalise the Data Scientist profession definition Target audience: Universities and Scholars to contribute to the development of the EDISON Model Curriculum for Data Science as well as their corresponding educational offering, i.e., study programes and courses.
    
    Speaker: Andrea Manieri (Engineering)
    
    Slides
  - 11:20
    
    Industry-driven Master Certificate in Data Science 20m
    
    Speaker: Gianluca Reali (University of Perugia)
    
    Slides
  - 11:40
    
    The online big data course at the University of Liverpool 20m
    
    Speaker: Yuri Demchenko (University of Amsterdam)
    
    Slides
  - 12:00
    
    Education Interests of Summer School Students at GridKaSchool 20m
    
    Speaker: Christopher Jung (KIT-G)
    
    Slides
  - 12:20
    
    Discussion 10m
- 11:00 → 12:30
  
  EGI Council meeting (closed) Europa
  
  Europa
  
  Villa Romanazzi Carducci
  
  Convener: Yannick Legre (EGI.eu)
- 11:00 → 12:30
  Tutorial: Security training Federico II
  
  Federico II
  
  Villa Romanazzi Carducci
  
  Convener: Dr Sven Gabriel (NIKHEF)
  - 11:00
    
    Security training 1h 30m
    
    Cyber attacks have become ubiquitous and attackers are targeting a wide range of services on the Internet. Resources involved in EGI are no exception and are constantly probed by attackers launching massive attacks that strive to find vulnerable machines anywhere. Successful attacks cause additional harm, including damage to the reputation of institutions and EGI. Therefore, EGI as well as service and machine operators have to be prepared to provide proper incident response to make sure security incidents are dealt with in a proper manner. The training session will demonstrate how easy it is to perform a cyber attack against a site. The attendees will be walked through a live scenario that shows basic offensives principles and techniques. Then, the session will focus on how to provide proper response to incident. The target audience for the training are cloud providers, owners of virtual machines and maintainers of their images.
    
    Speakers: Daniel Kouril (CESNET), Dr Sven Gabriel (NIKHEF)
- 12:30 → 13:30
  
  Lunch
- 13:30 → 15:00
  Astronomy and astrophysical large experiments and e-infrastructure - new frontiers Scuderia
  
  Scuderia
  
  Villa Romanazzi Carducci
  
  This workshop aims at strengthen the relation between the Astronomical and Astrophysical (A&A) community, mainly focused on the large experiments that will need of very powerful e-Infrastructures, and the ICT researcher that are proposing innovative technologies. This meeting will help researcher and developers from the different fields to meet and discuss on the next future answer that the e-infrastructure can provide.
  
  Convener: Giuliano Taffoni (INAF)
  - 13:30
    
    Euclid Satellite mission: the ground segment distributed computing infrastructure 20m
    
    The EUCLID project is a medium-class mission of the ESA Cosmic Vision program. Its main objective is to map the geometry of the dark universe. Euclid will generate 26 PB of data for each full data release (including external data from ground observations) but reprocessing needs and simulations will increase it to almost 200 PB. To handle this volume of data, the Euclid Science Ground Segment (SGS) federates 9 Science Data Centres (SDCs) and a Science Operations Centre, providing redundant and distributed data storage and processing. To manage the heterogeneous computing and storage infrastructures of the SDCs, the SGS reference architecture is based on loosely coupled systems and services: 1) the Euclid Archive System (EAS), a central metadata repository which inventories, indexes and localizes the huge amount of distributed data; 2) a Distributed Storage System (DDS), providing a unified view of the SDCs storage systems and supporting several transfer protocols; 3) a COmmon ORchestration System (COORS), performing a balanced distribution of data and processing among the SDCs and executing processing plans based on user defined triggering criteria and Processing Functions (PFs); 4) an Infrastructure Abstraction Layer (IAL), isolating the processing software from the underlying IT infrastructure and providing a common, lightweight workflow management system; 5) a Monitoring & Control Service allowing to monitor the status of the SGS computing infrastructure as a whole or at SDC level; 6) a Common Data Model (CDM), a central repository where all SGS components interfaces and data structures are formalized in the XSD language. Virtualization is another key element of the SGS infrastructure. The EuclidVM is a lightweight virtual machine, deployed in any SDC processing node, with a reference OS, selected stable software libraries and "dynamic" installation of the Euclid PFs. These architecture concepts have been prototyped and are incrementally developed and validated through Euclid "SGS Challenges".
    
    Speaker: Dr Frailis Marco (INAF - OATs)
  - 13:50
    
    Gaia: A Stereoscopic Census of our Galaxy 20m
    
    Gaia will provide positional and radial velocity measurements with the accuracies needed to produce a stereoscopic and kinematic census of about one billion stars in our Galaxy and throughout the Local Group The biggest challenge is clearly the data volumes and the steady incoming stream that will not stop for the next five years. The satellite sends us 40-100 GB of compressed raw data every day . Gaia data will revolutionize astronomy .
    
    Speaker: UGO BECCIANI (INAF)
  - 14:10
    
    Cherenkov Telescope Array data processing: a production system prototype 20m
    
    The Cherenkov Telescope Array (CTA) - a proposed array of many tens of Imaging Atmospheric Cherenkov Telescopes - will be the next-generation instrument in the field of very high energy gamma-ray astronomy. CTA will operate as an open observatory providing data products and analysis tools to the entire scientific community. An average data stream of about 1 GB/s for approximately 2000 hours of observation per year, is expected to produced several PB/year. A large amount of CPU time will be required for data processing as well as for massive Monte Carlo simulations used to derive the instrument response functions. The current CTA computing model is based on a distributed infrastructure for the archive and the data off-line processing. In order to manage the off-line data processing in a distributed environment, CTA has evaluated the DIRAC (Distributed Infrastructure with Remote Agent Control) system, which is a general framework for the management of tasks over distributed heterogeneous computing environments. For this purpose, a production system prototype has been developed, based on the two main DIRAC components, i.e. the Workload Management and Data Management Systems. This production system has been successfully used on three massive Monte Carlo simulation campaigns to characterize the telescope site candidates, different array layouts and the camera electronic configurations. Results of the DIRAC evaluation will be presented as well as the future development plans. In particular, these include further automatization of high level production tasks as well as the proposed implementation of interfaces between the DIRAC Workload Management System and the CTA Archive and Pipeline Systems, currently under development.
    
    Speaker: arrabito arrabito (LUPM CNRS/IN2P3)
    
    Slides
  - 14:30
    
    CANFAR: The Canadian Advanced Network for Astronomy Research 20m
    
    CANFAR is an integrated cloud ecosystem that supports the entire data life-cycle from ingestion of observatory data to publication of final data products. It supports curation and long-term preservation of data. It provides user-managed storage resources and access to batch cloud processing and interactive and persistent virtual machines to support science use and service support. Authentication and authorization services glue together these components into an integrated whole. CANFAR manageds 2.2 Petabytes of data, serves over five thousand astronomers worldwide, moves over a Petabyte of data per year across the network. The International Virtual Observatory Alliance (IVOA) has worked for over a decade to create standards to enable global interoperability of astronomy services. But this work focused primarily on data services. The world has moved on. Big Data resources need to be integrated with processing and other capabilities and shared cyberinfrastructure seems to be the only way to achieve the scalability that we need. IVOA is poised to begin to broaden its approach to interoperability to include the integration of data with other capabilities. The major success stories of data-intensive research infrastructure in astronomy (and other domains) have been driven by development by scientific-technical teams with a high level of domain expertise. How will this development and delivery model translate into a future of shared research infrastructure?
    
    Speaker: Dr David Schade (Canada - CADC)
  - 14:50
    
    Discussion 10m
    
    Speakers: Giuliano Taffoni (INAF), UGO BECCIANI (INAF)
- 13:30 → 15:00
  
  EGI Council meeting (closed) Europa
  
  Europa
  
  Villa Romanazzi Carducci
  
  Via G. Capruzzi, 326 70124 Bari Italy
  
  Convener: Yannick Legre (EGI.eu)
- 13:30 → 15:00
  Tutorial: EUDAT infrastructure Federico II
  
  Federico II
  
  Villa Romanazzi Carducci
  
  Convener: Rene Horik, van (DANS - Data Archiving and Networked Services)
  - 13:30
    
    EUDAT and the research life cycle 15m
    
    Speaker: Rene Horik, van (DANS - Data Archiving and Networked Services)
    
    Slides
  - 13:45
    
    Interoperability use case between EUDAT and EGI services. Demo. 10m
    
    Speakers: Diego Scardaci (EGI.eu/INFN), Giuseppe Fiameni (CINECA - Consorzio Interuniversitario)
    
    Slides
  - 13:55
    
    Store and share data with the B2Share API 30m
    
    Speakers: Carl Johan Hakansson (SNIC), Sarah Berenjiardestani (SNIC)
    
    Slides
  - 14:25
    
    Persistent identifiers and the B2Handle service 30m
    
    Speakers: Dejan Vitlacil (KTH), Peter Gille (KTH)
    
    Slides
- 15:00 → 15:15
  
  Coffee break
- 15:15 → 16:45
  Advances in the computational chemistry and material science field Sala A+A1, Giulia centre
  
  Sala A+A1, Giulia centre
  
  Villa Romanazzi Carducci
  
  Convener: Antonio Lagana (UNIPG)
  - 15:15
    
    Virtual Research Environment and Computational Chemistry Community 20m
    
    Speaker: Gabor Terstyanszky (University of Westminster)
  - 15:35
    
    Collaborative service on ab initio evaluation of gas phase processes efficiency 20m
    
    Speaker: Antonio Lagana (UNIPG)
  - 15:55
    
    From ab-initio calculations to full technological systems simulations: application to aerospace and nuclear fusion 20m
    
    Speaker: Fabrizio Esposito
  - 16:15
    
    General discussion on: Building multiscale simulations and data repositories on the efficiency of Molecular processes as a service 30m
    
    Speaker: Antonio Lagana (UNIPG)
- 15:15 → 16:45
  Community clouds Scuderia
  
  Scuderia
  
  Villa Romanazzi Carducci
  
  Convener: Dr Enol Fernandez (EGI.eu)
  - 15:15
    
    Linking EUBrazilCloudConnect and EGI Federated Cloud 20m
    
    EUBrazilCloudConnect (EUBrazilCC) - EU-Brazil Cloud infrastructure Connecting federated resources for Scientific Advancement (2013-2015) aims to develop a state-of-the-art Cloud Computing environment that efficiently and cost-effectively exploits the computational, communication and data resources in both the EU & Brazil with selected interoperable and user-centric interfaces, which involve the support to complex workflows and access to huge datasets. EUBrazilCC strongly focuses on interoperability. It has adopted mainstream standards in clouds and integrates with different services in EGI at the level of the infrastructure, the platform components and the use cases. Regarding the infrastructure, UFCG has developed fogbow, a lightweight federation middleware for on-premise cloud providers. Fogbow’s API implements an extension of the OCCI standard. To create a VM in a fogbow federation a client issues a request with the resource specification (eg. VM flavour, image, requirements, etc.) and receives a handle for this request. Eventually the request is fulfilled and the client can use the request handle to have access to the pertinent information to access the VM (eg. IP address). In this way, fogbow can be used to deploy VMs across multiple EGI Federated Cloud sites. Fogbow can also make use of vmcatcher to prefetch VMIs registered in the EGI appDB. EUBrazilCC uses VOMS for the authorisation and has registered a VO in the EGI databases (eubrazilcc.eu). All the services in EUBrazilCC uses VOMS for authentication. EUBrazilCC incorporates several tools for the brokering of resources and the deployment of customised Virtual Appliances. Among those tools, two of them are already used within EGI Federated cloud, Infrastructure Manager (IM) and COMPSs, providing a seamless integration of both infrastructures. In this way, IM can be used to deploy and install the same configuration in different infrastructures, using the same configuration specification and based on a common basic instance. COMPSs can also elastically deploy a virtual infrastructure adapting the number of resources to the actual computational load and run the same workload in hybrid environments composed of public and private providers, provided that compatible VMIs are available in the target infrastructure. Finally, interoperability is also aimed at the level of the applications. EUBrazilCC will register the VMIs of the applications in the EGI appDB. Currently, there are VMIs for the Leishmaniasis Virtual Lab and the eScience Central workflow engine that uses it, as well as for COMPSs and for the mc2 platform for developing scientific gateways. All of them can be deployed in EGI Federated Cloud.
    
    Speaker: Dr Ignacio Blanquer (UPVLC)
    
    Slides
  - 15:35
    
    Setting up a new FedCloud site in collaboration with the industry 20m
    
    Doñana National Park is a natural reserve in the south of Spain, which biodiversity is unique in Europe and is tagged as an UNESCO World Heritage Site. The importance of this place requires an infrastructure capable to provide environmental data at different scales and on-line available that support monitoring of environmental changes in short, mid and long term. Supported by European FEDER funds and Spanish Ministry, Doñana Biological Station, institute that manage the research in Doñana, is developing different actions to improve and adapt the internationalization of the e-infrastructure for Lifewatch ESFRI. Within these actions, different companies are working to deploy a computing based on cloud site and integrated with EGI FedCloud. The deployment of this site is distributed in four different tasks focused in different features that give the site an added value to become a reference for Lifewatch ICT: • Set up of the infrastructure needed: installation of servers and packages needed to support a cloud system based on OpenStack and compatible with EGI FedCloud. • Distributed Control: this task adds new features for Lifewatch managers and makes all the resources easily available and manageable: monitoring, accounting, deployment of new services, SLA management… • Collaborative environments: user-oriented task to make the resources available for the final user through higher abstraction layers: PaaS, SaaS, WaaS (Workflow as a Service), etc. • Data preservation: This set of features makes the resources very data-oriented and allows users to manage the whole data lifecycle. This presentation will show the collaboration between with the industry in the deployment of the new EGI FedCloud site as well as all the features added and cloud-based tools used (or tested) for that like OpenShift, Cloudify, Mesos, Kubernetes, as well as which solution has been adopted and why.
    
    Speaker: Fernando Aguilar (CSIC)
    
    Slides
  - 15:55
    
    Volunteer Clouds for the LHC experiments 20m
    
    Volunteer computing remains a largely untapped opportunistic resource for the LHC experiments. The use of virtualization in this domain was pioneered by the Test4Theory project and enabled the running of high energy particle physics simulations on home computers. Recently the LHC experiments have been evaluating the use of volunteer computing to provide additional opportunistic resources for simulations. In this contribution we present an overview of this work and show how the model adopted is similar to the approach also used for exploiting resources from the EGI Federated Cloud.
    
    Speaker: Hassen Riahi (CERN)
    
    Slides
- 15:15 → 16:45
  
  EGI Council meeting (closed) Europa
  
  Europa
  
  Villa Romanazzi Carducci
  
  Via G. Capruzzi, 326 70124 Bari Italy
  
  Convener: Yannick Legre (EGI.eu)
- 15:15 → 16:45
  Virtual Research Environments Federico II
  
  Federico II
  
  Villa Romanazzi Carducci
  
  Convener: Dr Gergely Sipos (EGI.eu)
  - 15:15
    
    Building Virtual Research Environments: The Lion grid Experience. 20m
    
    Research & Development (R&D) statistics is one of the key indices and important component in measuring a country’s National Innovation System (NIS). The R&D landscape has changed so much within the 21st century. Many countries are categorized as developed or developing based on their ability or inability to rise with the tide of research and technological advancement. Poor research funding, chronic lack of research infrastructure, lack of appreciation of research findings and scanty information base on who is working on what or lack of collaboration remains the recurring decimal that affect the development of research in developing countries. Research in the 21st century requires skills in the area of the 4Cs of (Critical thinking and problem solving, Communication, Collaboration, and Creativity and innovation), all of which are addressed by Virtual Research Environments (VREs).Virtual Research Environment is an online system that helps researchers to collaborate, by providing access to e-infrastructures and tools for simulation, data analysis and visualization, etc. In 2011, we deployed the first-ever Grid Computing e-infrastructure in Nigeria, the Lion Grid, under the HP-UNESCO Brain Gain Initiative project. This led to the building of the first VRE in Nigeria. Our VRE database has grown through workshops, demonstrations, and training close to 500 members from heterogeneous research backgrounds. The project has developed applications for the local research community, deployed existing apps for its VRE, demonstrated the use of Science Gateways and Identity providers, as well as trained hundreds of researchers and technical support staff. In this paper, we present our experiences, prospects of the VRE and future plans.
    
    Speakers: Dr Collins Udanor (Dept. of Computer Science, University of Nigeria Nsukka, Nigeria), Dr Florence Akaneme (Dept. of Plant Science & Biotechnology, University of Nigeria Nsukka, Nigeria.)
    
    Slides
  - 15:35
    
    Energising Scientific Endeavour through Science Gateways and e-Infrastructures in Africa: the Sci-GaIA project 20m
    
    In African Communities of Practice (CoPs), international collaboration and the pursuit of scientific endeavour has faced a major barrier with the lack of access to e-Infrastructures and high performance network infrastructure enjoyed by European counterparts. With the AfricaConnect and the just-about-to-start AfricaConnect2 projects and the regional developments carried out by both the Regional Education and Research Networks (RRENs) and the National Education and Research Networks (NRENs), this situation is changing rapidly. In the “Teaming-up for exploiting e-Infrastructures' potential to boost RTDI in Africa” (eI4Africa) project it has been demonstrated clearly that it is possible to develop e-Infrastructure services in Africa. It has also been demonstrated clearly that, as with the rest of the world, easy to use web portals, or Science Gateways, are needed to help CoPs to easily access e-Infrastructure facilities and through these collaborate with CoPs across the world. However, a major problem exists: it is very difficult for non-experts to develop Science Gateways and deploying and supporting e-Infrastructures. Elements of guides and supporting materials exist but these are either written for different audiences or out of date. The EU-funded “Energising Scientific Endeavour through Science Gateways and e-Infrastructures in Africa” (Sci-GaIA) project started on the 1st of May 2015 for a duration of two years and proposes to bring together these materials into clearly structured guides and educational documents that can be used to train and support representatives of NRENs, CoPs and, importantly, Universities to develop Science Gateways and e-Infrastructures in Africa. This will give a sustainable foundation on which African e-Infrastructures can be developed. Importantly, the results of our project will be usable by CoPs in Europe and the rest of the world. To achieve this we bring together a highly experienced team of beneficiaries that have worked between Africa and Europe to advance African e-Infrastructures. The objectives of Sci-GaIA are: - To promote the uptake of Science Gateways and e-Infrastructures in Africa and Beyond; - To support new and already emerging CoPs; - To strengthen and expand e-Infrastructure and Science Gateway related services; - To train, disseminate, communicate and outreach. In the contribution we will present the consortium running Sci-GaIA, its workplan and the current status of the activities with special focus on the tools and services deployed to support the development of e-Infrastructures in Africa and train the CoPs working in that continent and collaborating with their counterparts in Europe. Opportunities for participation and collaboration for EGI-related communities will also be outlined and discussed.
    
    Speaker: Roberto Barbera (Univesity of Catania and INFN)
    
    Slides
  - 15:55
    
    Peak to long-tail : how cloud is the fabric and the workshop 20m
    
    Modern scientific discovery has followed modern innovation. We’ve enjoyed over 50years of Moore’s Law, over which computing capability has grown at near 50% compounded, in-turn driving model/simulation-driven scientific discovery. More recently then number of devices on the Internet has grown at a similar rate, and sensing capabilities have grown at half that rate, both driving data-driven discovery. No other modern-day man-made innovations come close. Hence, researchers creating tools and workflows over this space integrate instruments with storage with analysis tools and computing resources, to effectively create the 21st century equivalent of the humble microscope. Further, and because it is software, successful tools are then readily proliferated to others who explore the space of a discipline. The dichotomy of expectations: peak and long-tail, modelling and data, creates a significant tension for e-infrastructure providers. Do we serve peak-modellers (~HPC)? Do we serve the data-driven peak? Or do we serve the long-tail? Research @ Cloud Monash (R@CMon, pronounced “rack-mon”) is a single scalable heterogeneous fabric that spans the peak and long-tail agendas. It rekindles the computing centre as a workshop for tooling experiments - things are not bought but bespoke and made for the experiment, whilst also, driving modern data computing consolidation and scale. R@CMon is a node of the NeCTAR Research Cloud, a major data storage facility, a HPC facility, a virtual desktop facility and the home of the Characterisation Virtual Lab. The goal is to nurture virtual research environments that scale between peak and desktop to long-tail and HPC.
    
    Speaker: Steve Quenette (Monash University)
    
    Slides
  - 16:15
    
    Next generation Science Gateways in the context of the INDIGO project: a pilot case on large scale climate change data analytics 20m
    
    The INDIGO project aims at developing a data/computing platform targeted at scientific communities, deployable on multiple hardware, and provisioned over hybrid e-Infrastructures. This platform features contributions from leading European distributed resource providers, developers, and users from various Virtual Research Communities (VRCs). INDIGO aims to develop tools and platforms based on open source solutions addressing scientific challenges in the Grid, Cloud and HPC/local infrastructures and, in the case of Cloud platforms, providing PaaS and SaaS solutions that are currently lacking for e-Science. INDIGO will also develop a flexible and modular presentation layer connected to the underlying IaaS and PaaS frameworks, thus allowing innovative user experiences including web/desktop applications and mobile appliances. INDIGO covers complementary aspects such as VRC support, software lifecycle management and developers support, virtualized resource provisioning (IaaS), implementation of a PaaS layer and, on a top level, provisioning of Science Gateways, mobile appliances, and APIs to enable a SaaS layer. INDIGO adopts the Catania Science Gateway framework (CSGF) as presentation layer for the end users. The CSGF is a standard-based solution that, by exploiting well consolidated standards like OCCI, SAGA, SAML, etc., is capable to target any distributed computing infrastructure, while providing a solution for mobile appliances as well. In the context of INDIGO, the CSGF will be completely re-engineered in order to include additional standards, such as CDMI and TOSCA, and to be exposed as a set of APIs. This paper presents an early use case examined by the project from the final users perspective, therefore interfacing INDIGO targeted resources through a preliminary web interface, provided by a Science Gateway, which hides the complexities of the underlying services/systems. This use case relates to the climate change domain and community (European Network for Earth System modelling - ENES) and tackles large scale data analytics requirements related to the CMIP5 experiment, and more specifically to anomalies analysis, trend analysis and climate change signal analysis. It demonstrates the INDIGO capabilities in terms of software framework deployed on heterogeneous infrastructures (e.g., HPC clusters and cloud environments), as well as workflow support to run distributed, parallel data analyses. While general-purpose WfMSs (e.g., Kepler, Taverna) are exploited in this use case to orchestrate multi-site tasks, the Ophidia framework is adopted at the single-site level to run scientific data analytics workflows consisting of tens/hundreds of data processing, analysis, and visualization operators. The contribution will highlight: (i) the interoperability with the already existing community-based software eco-system and infrastructure (IS-ENES/ESGF); (ii) the adoption of workflow management system solutions (both coarse and fine grained) for large-scale climate data analysis; (iii) the exploitation of Cloud technologies offering easy-to-deploy, flexible, isolated and dynamic big data analysis solutions; and (iv) the provisioning of interfaces, toolkits and libraries to develop high-level interfaces/applications integrated in a Science Gateway. The presentation will also include a discussion on how INDIGO services have been designed to fulfil the requirements of many diverse VRCs.
    
    Speaker: Emidio Giorgio (INFN Catania)
    
    Slides
- 16:45 → 21:45
  
  Guided tour of Bari old town and social dinner
Friday, 13 November
- 09:00 → 10:00
  Closing plenary Europa
  
  Europa
  
  Villa Romanazzi Carducci
  - 09:00
    
    Opportunities and challenges of the e-Infrastructures and the new H2020 Work Programme 16-17 40m
    
    Speaker: Augusto Burgeño (European Commission)
    
    Slides
  - 09:40
    
    Conclusions and closing ceremony 20m
    
    Speakers: Dr Tiziana Ferrari (EGI.eu), Yannick Legre (EGI.eu)
- 10:00 → 10:30
  
  Coffee break
- 10:30 → 12:00
  Community Workshop on the Open Science Cloud: Shaping the Open Science Cloud of the Future Europa
  
  Europa
  
  Villa Romanazzi Carducci
  CO-ORGANIZED BY EGI, EUDAT, GEANT and OpenAIRE
  REGISTRATION: http://go.egi.eu/cf2015registration (free of charge)
  
  In the conclusions on "open, data-intensive and networked research as a driver for faster and wider innovation" (May 28-29 2015) the Competitiveness Council welcomed "the further development of a European Open Science Cloud that will enable sharing and reuse of research data across disciplines and borders, taking into account relevant legal, security and privacy aspects".
  
  Open Science is an umbrella term referring to "the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable redistribution, reuse and reproduction of the research and its underlying data and methods."
  Open Science is not limited to open access to the outputs of the scientific process, but requires openness of each step of the research lifecycle (from ideas, to experimentation, data gathering, modelling, peer review, publishing, and finally education and training). In essence, Open Science aims at "rigorous, reproducible and transparent research".
  
  The workshop offers an opportunity to focus on the requirements and challenges of the infrastructure needed for:
  - making the entire primary record of a research project publicly available online as it is recorded.
  - opening research data, i.e. managing research data to optimize access, discoverability and sharing for user and reuse
  - documenting, opening and sharing research code, and making it freely available for collaboration
  - publishing the output of the research process and make it freely accessible for maximum use, reuse and impact
  - bridging the gap between research and society with citizen science
  The workshop, co-organized by EGI, GEANT, and EUDAT2020, will devote ample time to discussion and will offer the opportunity to users, e-Infrastructure and Research Infrastructure providers, publicly funded and commercial cloud providers, data providers, international research collaborations and policy managers to gather and discuss three key points:
  1. the mission and vision: what are the needs that the Open Science infrastructure addresses and services it should offer?
  2. the development: what are the services and processes still missing that the infrastructure must deliver?
  3. the governance: who are the service providers and the users, who is responsible of funding and procuring such infrastructure, what are the policies that need to be changed?
  The "reading material" section is provided to you to learn about community's contributions to the Open Science Cloud idea.
  The workshop conclusions will identify the barriers to remove and the actions needed.
  
  NOTE. Participation is free of charge, but registration is mandatory through the EGI Community Forum registration system (http://go.egi.eu/cf2015registration).
  
  STEERING COMMITTEE
  - Tiziana Ferrari, EGI.eu
  - Wouter Los, Independent Expert
  - Natalia Manola, University of Athens and OpenAIRE
  - Per Oster, CSC and EUDAT2020
  - Roberto Sabatino, GEANT
  Conveners: Manola Manola (University of Athens, Greece), Dr Per Oster (CSC), Mr Roberto Sabatino (DANTE), Dr Tiziana Ferrari (EGI.eu), Wouter Los (University of Amsterdam)
  Invited participants
  
  Reading material
  
  Citizen science in Cultural Heritage: roadmap
  
  European Open Science Cloud - EIROForum draft paper
  
  Position paper: European Open Science Cloud for Research
  
  White Paper on Citzen Science for Europe
  
  The heralds of resource sharing - edited video
  - 10:30
    
    Welcome and introduction 15m
    
    Speaker: Dr Tiziana Ferrari (EGI.eu)
    
    Slides
  - 10:45
    
    The Open Science Cloud: Eight Elements for Success 15m
    
    The joint EGI, EUDAT, GEANT, OpenAIRE and LIBER joint position paper will be presented. According to the paper, the European Open Science Cloud for Research must be: 1. Open in design, participation and use 2. Publicly funded & governed with the 'commons approach' 3. Research-centric with an agile co-design with researchers and research communities 4. Comprehensive in terms of universality and inclusiveness of all disciplines 5. Diverse & distributed empowering network effects 6. Interoperable with common standards for resources and services 7. Service-oriented as well as protocol-centric 8. Social connecting diverse communities
    
    Speaker: Manola Manola (University of Athens, Greece)
    
    Position Paper
    
    Slides
  - 11:00
    Mission and vision: how to meet the researchers needs for Open Science 1h
    
    Single researchers, citizen scientists, research teams, international research collaborations and research infrastructures will be invited to present by examples how open science will affect the life of a researcher. What are the components of Open Science and their priorities as these are applied to a research community? What is missing? What should be improved? Purpose of this session is to: - define the most important components of open science, intended as "the Opening of the creation and dissemination of scholarly knowledge towards a multitude of stakeholders, from professional researchers to citizens [http://www.openingscience.org/about/] - reach a common understanding of the Open Science Cloud - discuss open applies to different types of resources and services The audience and a group of panelists will be invited to address the following questions: (1) Open Science Cloud offers researchers from all disciplines seamless, open access to the advanced digital capabilities, resources and expertise they need to collaborate and to carry out data- and computing-intensive science. Secure and trustworthy, the Open Science Cloud engages researchers in governing, managing and preserving resources for everyones benefit. Do you share the concept of Open Science Cloud? (2) What services/resources are part of the Open Science Cloud and what is out of scope? (3) What is open the Open Science Cloud ? how open access applies to different types of resources like Data, publication, software (nonrival resource), Computing, network (rival resource) and other types? PANELISTS (1) Fotis E. Psomopoulos, Academic Fellow at the Aristotle University of Thessaloniki and a Team Leader on Bioinformatics at the Information Processing Lab (IPL) in the Department of Electrical and Computer Engineering of AUTH, Greece (2) Sebastian Waszak, EMBL, Senior Postdoc, Pan-cancer international project, Germany (3) Claire Devereux, Research Funding Unit Dept for Business, Innovation and Skills (BIS), Digital ERA Forum UK representative (4) Paolo Favali, Coordinator of the European Multidisciplinary Seafloor and Water Column Observatory, Italy (5) Vassilis Protonotarios, Agricultural Biotechnology, Agricultural University of Athens and Business Development Team Agro-Know RAPPORTEUR: Roberto Sabatino, GEANT
    
    Speaker: Antony Ross Hellauer (Scientific Manager at Göttingen State and University Library)
    
    Slides
    
    01_fpsom-OpenScienceCloud.pptx
    
    02_OSC_Bari_Sebastian_Waszak_121115.pdf
    
    03_Favali_EMSO_EGI-Bari_13.11.15.pptx
    
    04_Agro-Know_AgResearch_Protonotarios.pptx
- 10:30 → 12:00
  
  EDISON project: Expert Liaison Group meetings Scuderia
  
  Scuderia
  
  Villa Romanazzi Carducci
  
  This closed session is for those who have been invited to join one of the three Expert Liaison Groups (ELG) convened as part of the recently funded EU EDISON project. EDISON has been established to support the development of the data science career path into a recognised profession. The three ELGs represent employers, universities and data experts, and will meet to contribute to the projects aim of supporting and accelerating the process of establishing data scientist as a certified profession.
  
  EDISON will run for 24 months and has seven core partners from across Europe. The project is coordinated by Yuri Demchenko at the University of Amsterdam in the Netherlands.
  
  See project website for further details on the aims and objectives of EDISON. http://edison-project.eu
  
  Conveners: Steve Brewer (University of Southampton), Yuri Demchenko (University of Amsterdam)
- 10:30 → 12:00
  
  INDIGO DataCloud project meeting Federico II
  
  Federico II
  
  Villa Romanazzi Carducci
  
  On Friday, Nov 13th, two meetings of the INDIGO-DataCloud project will take place immediately after the conclusion of the EGI Community Forum. These meetings, reserved to invited INDIGO-DataCloud participants, are the INDIGO Project Management Boards (PMB) in the morning and the INDIGO Technical Boards (TB) in the afternoon. These two bodies steer the technical development of the INDIGO-DataCloud project, whose goal is to create an open Cloud platform for Science. INDIGO-DataCloud is an H2020 project, funded from April 2015 to September 2017, involving 26 European partners and based on use cases and support provided by several multi-disciplinar scientific communities and e-infrastructures. The project will extend existing PaaS (Platform as a Service) solutions, allowing public and private e-infrastructures, including those provided by EGI, EUDAT, PRACE and Helix Nebula, to integrate their existing services and make them available through AAI services compliant with GEANTs inter-federation policies, thus guaranteeing transparency and trust in the provisioning of such services. INDIGO will also provide a flexible and modular presentation layer connected to the PaaS and SaaS frameworks developed within the project, allowing innovative user experiences and dynamic workflows, also from mobile appliances.
  
  Conveners: Davide Salomoni (INFN), Dr Giacinto Donvito (INFN)
- 12:00 → 13:00
  
  Lunch
- 13:00 → 15:30
  Community Workshop on the Open Science Cloud: Shaping the Open Science Cloud of the Future Europa
  
  Europa
  
  Villa Romanazzi Carducci
  CO-ORGANIZED BY EGI, EUDAT, GEANT and OpenAIRE
  REGISTRATION: http://go.egi.eu/cf2015registration (free of charge)
  
  In the conclusions on "open, data-intensive and networked research as a driver for faster and wider innovation" (May 28-29 2015) the Competitiveness Council welcomed "the further development of a European Open Science Cloud that will enable sharing and reuse of research data across disciplines and borders, taking into account relevant legal, security and privacy aspects".
  
  Open Science is an umbrella term referring to "the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable redistribution, reuse and reproduction of the research and its underlying data and methods."
  Open Science is not limited to open access to the outputs of the scientific process, but requires openness of each step of the research lifecycle (from ideas, to experimentation, data gathering, modelling, peer review, publishing, and finally education and training). In essence, Open Science aims at "rigorous, reproducible and transparent research".
  
  The workshop offers an opportunity to focus on the requirements and challenges of the infrastructure needed for:
  - making the entire primary record of a research project publicly available online as it is recorded.
  - opening research data, i.e. managing research data to optimize access, discoverability and sharing for user and reuse
  - documenting, opening and sharing research code, and making it freely available for collaboration
  - publishing the output of the research process and make it freely accessible for maximum use, reuse and impact
  - bridging the gap between research and society with citizen science
  The workshop, co-organized by EGI, GEANT, and EUDAT2020, will devote ample time to discussion and will offer the opportunity to users, e-Infrastructure and Research Infrastructure providers, publicly funded and commercial cloud providers, data providers, international research collaborations and policy managers to gather and discuss three key points:
  1. the mission and vision: what are the needs that the Open Science infrastructure addresses and services it should offer?
  2. the development: what are the services and processes still missing that the infrastructure must deliver?
  3. the governance: who are the service providers and the users, who is responsible of funding and procuring such infrastructure, what are the policies that need to be changed?
  The "reading material" section is provided to you to learn about community's contributions to the Open Science Cloud idea.
  The workshop conclusions will identify the barriers to remove and the actions needed.
  
  NOTE. Participation is free of charge, but registration is mandatory through the EGI Community Forum registration system (http://go.egi.eu/cf2015registration).
  
  STEERING COMMITTEE
  - Tiziana Ferrari, EGI.eu
  - Wouter Los, Independent Expert
  - Natalia Manola, University of Athens and OpenAIRE
  - Per Oster, CSC and EUDAT2020
  - Roberto Sabatino, GEANT
  Conveners: Manola Manola (University of Athens, Greece), Dr Per Oster (CSC), Mr Roberto Sabatino (DANTE), Dr Tiziana Ferrari (EGI.eu), Wouter Los (University of Amsterdam)
  Invited participants
  
  Reading material
  
  Citizen science in Cultural Heritage: roadmap
  
  European Open Science Cloud - EIROForum draft paper
  
  Position paper: European Open Science Cloud for Research
  
  White Paper on Citzen Science for Europe
  
  The heralds of resource sharing - edited video
  - 13:00
    Developing the Open Science Cloud 1h 15m
    
    Purpose of this session is to - identify the current list of gaps, barriers and related priorities - define a roadmap to address these - discuss how the European Open Science Cloud needs to support international collaborations and relate to other infrastructures worldwide Initiatives will be invited to present their view on the main gaps and barriers that are faced today to realize the Open Science Cloud. A group of panelists and the audience will be asked to answer the following questions about Open Science Cloud: (1) What are todays gaps and priorities? (2) What are todays barriers and priorities? (3) Possible ways to move forward. How we bridge the gaps? PRESENTERS (1) Antonio Lagana, Chair of the Computational Chemistry Division of EUCHEMS, Italy (2) David Schade, Canadian Advanced Network for Astronomy Research, Canada (3) Daniele Bailo, EPOS management office - Istituto Nazionale di Geofisica e Vulcanologia, Italy PANELISTS (1) Marc Taconet, FAO Marc Taconet, Fisheries Global Information System (FIGIS), FAO (2) Earth observation. Pierre-Philippe Mathieu, European Space Agency - ESRIN Earth Observation Science & Applications, Italy (3) Sean Hill, Human Brain Project, Co-Director, neuroinformatics, Switzerland (4) Graziano Pesole CNR Director of Istituto di Biomembrane e Bioenergetica, ELIXIR Italian Node, Italy (5) Simon Leinen, Cloud Technologist, SWITCH, Switzerland RAPPORTEUR: Sy Holsinger, EGI.eu
    
    Speaker: Dr Tiziana Ferrari (EGI.eu)
    
    Slides
    
    01_Lagana_-_OS-PRESENTATION.pptx
    
    02_Schade_Bari_Nov13_2015_3slides.pdf
    
    03_Bailo_-_2015.11.13-_OPEN_SCIENCE_CLOUD.pptx
    
    04_Taconet_-_What_is_missing_to_realise_Open_Science_cloud_MarcTaconet.pptx
    
    The Heralds Of Resource Sharing (Video)
  - 14:15
    
    Governing the Open Science Cloud 1h
    
    The joint position paper presented in the first session defines the Open Science Cloud to be publicly funded & governed. "A publicly funded and publicly governed Open Science Cloud will guarantee persistence and sustainability, and ensure that outcomes are driven by scientific excellence and societal needs rather than profit. This commons approach, welcoming partnership with private-sector actors while driven by the public good, will encourage the development of innovative services that are conducive to the future of Open Science, while guaranteeing the long-term, persistent care of resources. A group of panelists and the audience will be asked to address the following questions: (1) What needs to be governed in Open Science Cloud? Policies, processes, funding...? (2) Who has to take care of the Open Science Cloud, who should feel responsible? Who are actors? Scientific community, institutions, funding bodies, governments,...? (3) How do you foresee options for structuring the governance of the open science cloud? what we can learn from other federated infrastructures such as the Internet? (4) How, in what form, can we ensure the involvement and participation of the researchers and any other stakeholders to steer the evolution of the Open Science Cloud? (5) Possible business models of the Open Science Cloud for sustainability? The session will help defining points of consensus and/or disagreement on: - Whom to act on what through the Open Science Cloud governance - Options for and principles of the governance - Ideas and areas of development for the Open Science Cloud governance PRESENTER Sergio Andreozzi, Strategy and Policy Manager, EGI.eu PANELISTS (1) Matthew Dovey, Jisc and UK Government Cabinet Office Open Standards Board (2) David Foster, Deputy Head of IT, CERN (3) Michael Symonds, ATOS, Principal Solutions Architect, and Helix Nebula Supply Coordinator (4) Dean Flanders, Digital ERA Forum WG on the Open Science Cloud (5) Yannis Ioannidis, ESFRI representative to the e-Infrastructures Reflection Group (e-IRG), and an expert in the Programme Committee on Research Infrastructures within the European Commission's Horizon2020 RAPPORTEUR: Sergio Andreozzi
    
    Speaker: Dr Per Oster (CSC)
    
    Governance - Intro
    
    Slides
  - 15:15
    
    Reports from rapporteurs and conclusions 15m
    
    Speaker: Dr Tiziana Ferrari (EGI.eu)
    
    Slides
- 13:00 → 15:30
  
  EDISON project: Expert Liaison Group meetings Scuderia
  
  Scuderia
  
  Villa Romanazzi Carducci
  
  This closed session is for those who have been invited to join one of the three Expert Liaison Groups (ELG) convened as part of the recently funded EU EDISON project. EDISON has been established to support the development of the data science career path into a recognised profession. The three ELGs represent employers, universities and data experts, and will meet to contribute to the projects aim of supporting and accelerating the process of establishing data scientist as a certified profession.
  
  EDISON will run for 24 months and has seven core partners from across Europe. The project is coordinated by Yuri Demchenko at the University of Amsterdam in the Netherlands.
  
  See project website for further details on the aims and objectives of EDISON. http://edison-project.eu
  
  Conveners: Steve Brewer (University of Southampton), Yuri Demchenko (University of Amsterdam)
- 13:00 → 15:30
  
  INDIGO DataCloud project meeting Federico II
  
  Federico II
  
  Villa Romanazzi Carducci
  
  On Friday, Nov 13th, two meetings of the INDIGO-DataCloud project will take place immediately after the conclusion of the EGI Community Forum. These meetings, reserved to invited INDIGO-DataCloud participants, are the INDIGO Project Management Boards (PMB) in the morning and the INDIGO Technical Boards (TB) in the afternoon. These two bodies steer the technical development of the INDIGO-DataCloud project, whose goal is to create an open Cloud platform for Science. INDIGO-DataCloud is an H2020 project, funded from April 2015 to September 2017, involving 26 European partners and based on use cases and support provided by several multi-disciplinar scientific communities and e-infrastructures. The project will extend existing PaaS (Platform as a Service) solutions, allowing public and private e-infrastructures, including those provided by EGI, EUDAT, PRACE and Helix Nebula, to integrate their existing services and make them available through AAI services compliant with GEANTs inter-federation policies, thus guaranteeing transparency and trust in the provisioning of such services. INDIGO will also provide a flexible and modular presentation layer connected to the PaaS and SaaS frameworks developed within the project, allowing innovative user experiences and dynamic workflows, also from mobile appliances.
  
  Conveners: Davide Salomoni (INFN), Dr Giacinto Donvito (INFN)
- 15:30 → 16:00
  
  Coffee break

EGI Community Forum 2015

Villa Romanazzi Carducci

Building Next Generation e-Infrastructures through Communities

Conference4Me app

Europa

Villa Romanazzi Carducci

Scuderia

Villa Romanazzi Carducci

Federico II

Villa Romanazzi Carducci

Sala A+A1, Giulia Centre

Villa Romanazzi Carducci

Europa

Villa Romanazzi Carducci

Scuderia

Villa Romanazzi Carducci

Sala A+A1, Giulia Centre

Villa Romanazzi Carducci

Europa

Villa Romanazzi Carducci

Federico II

Villa Romanazzi Carducci

Scuderia

Villa Romanazzi Carducci

Europa

Villa Romanazzi Carducci

Sala A+A1, Giulia Centre

Villa Romanazzi Carducci

Federico II

Villa Romanazzi Carducci

Sala A+A1

Europa

Villa Romanazzi Carducci

Sala A+A1, Giulia Centre

Villa Romanazzi Carducci

Scuderia

Villa Romanazzi Carducci

Federico II

Villa Romanazzi Carducci

Europa

Villa Romanazzi Carducci

Sala A+A1, Giulia Centre

Villa Romanazzi Carducci

Scuderia

Villa Romanazzi Carducci

Federico II

Villa Romanazzi Carducci

Europa

Villa Romanazzi Carducci

Scuderia

Villa Romanazzi Carducci

Sala A+A1, Giulia centre

Villa Romanazzi Carducci

Federico II

Villa Romanazzi Carducci

Europa

Villa Romanazzi Carducci

Scuderia

Villa Romanazzi Carducci

Sala A+A1, Giulia centre

Villa Romanazzi Carducci

Federico II

Villa Romanazzi Carducci

Sala A+A1

Villa Romanazzi Carducci

Scuderia

Villa Romanazzi Carducci

Sala A+A1, Giulia Centre

Villa Romanazzi Carducci

Europa

Villa Romanazzi Carducci

Federico II

Villa Romanazzi Carducci

Scuderia

Villa Romanazzi Carducci

Sala A+A1

Villa Romanazzi Carducci

Europa

Villa Romanazzi Carducci

Federico II