Workshops on e-Science Workflows

Europe/Amsterdam
Budapest

Budapest

Victor Hugo Street 18-22.
Gergely Sipos (EGI.EU), Peter Kacsuk (Sztaki/SHIWA)
Description
This event comprises a collaboration between EGI and SHIWA for the conduct of two linked workshops about workflows in e-Science. It is anticipated that there is sufficient commonality in the attendance at each of the two workshops for them to be run 'back to back’ and thereby for each to benefit from enhanced attendance and greater impact.

The first workshop (on Thursday) is organised by the SHIWA project with the main goal to introduce and collect feedback about the SHIWA solutions that enable cross-workflow and inter-workflow exploitation of Distributed Computing Infrastructures.  

The second workshop (Friday) is organised by EGI.eu and the Hungarian National Grid Infrastructure. This workshop seeks to bring together representatives of e-Science communities together with the providers of services, workflow technology and grid infrastructure.  The objective is to discuss and further clarify the requirements of ‘Scientific workflow systems’ in order that such systems can be incorporated into the European Grid Infrastructure(1).

The SHIWA Workshop (Thursday):

For the SHIWA Workshop, the SHIWA consortium partners will deliver presentations explaining the key points listed beneath.
  • What is the purpose of the SHIWA project?
  • What workflow services are delivered during the project? How can these services be useful to a workflow community?
  • How can workflow communities use the SHIWA services (store and submit workflows, etc.)?

The EGI Workshop (Friday):

The EGI Workshop includes a series of "position paper" presentations followed by a structured discissuon to answer the following set of questions:
  • What kind of e-Science workflow management system and best practices are the most widely used and/or are emerging within scientific communities? How many and which projects, scientific collaborations, Virtual Research Communities or National Grid Infrastructures are supporting these?
  • What requirements does scientific user and operation communities have concerning the usage, operation, integration and further-development of Scientific workflow services?
  • What kind of new services should EGI provide for scientific communities and for technology developers to support the integration of workflow systems with National Grid Infrastructures?
  • How could a wider and harmonised adoption of these technologies be facilitated within EGI? What role should EGI.eu, the NGIs and projects play in this process?


Important Notes:

1. Venue: Both workshops will be held in Budapest, in the Victor Hugo building of MTA SZTAKI (Hungarian Academy of Sciences). Further information and address of the venue is available here.

2.  Registration: Registration information.

3.  Travel and general information:  Travel information can be found here. General information relevant to Budapest can be found here.

4.  Accommodation in Budapest: Information about recommended hotels nearby to the workshop venue can be found here.

5.  Future Workshops: This represents the first of a series of such workshop sessions on scientific workflow - a follow-on workshop is espected to be conducted at the Community Forum in Munich, March 2012.

Media partner:
e-Science Talk
 

Participants
  • Akos Balasko
  • Alessandro Costantini
  • Andrew Jones
  • Béla Hullár
  • Daniele Cesini
  • David Rogers
  • Eva Takacs
  • Ewa Deelman
  • Gabor Terstyanszky
  • Gergely Sipos
  • Giacinto Donvito
  • Ian Harvey
  • Kassian Plankensteiner
  • Kitti Varga
  • Kostas Karasavvas
  • Kristian Ovaska
  • Marco Bencivenni
  • Miklos Kozlovszky
  • Peter Kacsuk
  • Richard McLennan
  • Seçil Kona
  • Stian Soiland-Reyes
  • Tamas Kukla
  • Tram TRUONG HUU
  • Vasileios Gkamas
  • Wolfgang Kuchinke
  • Zoltán Farkas
  • Thursday, 9 February
    • 14:00 14:20
      Welcome to the SHIWA workflow workshop
      Conveners: Peter Kacsuk (MTA SZTAKI), Richard McLennan (EGI.EU)
    • 14:20 15:30
      SHIWA - Coarse-grained interoperations of workflows

      Presentations will include the following topics:
      - Overview, introduction to coarse-grained interoperability (CGI) concepts (10)
      - SHIWA Simulation Platform presentation (20)
      - SHIWA Repository demo (20)
      - User scenarios, LINGA presentation (20)

      Convener: Peter Kacsuk (MTA SZTAKI)
      • 14:20
        Overview, introduction to CGI concepts 10m
        Speaker: Tamas Kukla
        Slides
      • 14:30
        The SHIWA Simulation Platform 20m
        Speaker: Gabor Terstyanszky
        Slides
      • 14:50
        SHIWA Repository demo 20m
        Speaker: Tamas Kukla
      • 15:10
        User scenarios and LINGA presentation 20m
        Speaker: Tram Truong Huu
        Slides
    • 15:30 16:00
      Coffee
    • 16:00 18:40
      User experience and further developments (Talks, demos and tutorial)

      The session will cover the following topics:
      - User experience and overview of the pilot applications (20)
      - SHIWA Desktop presentation and demonstration (20)
      - Fine-grained interoperability presentation and IWIR demo (20)
      - Full scenario tutorial with the LINGA application (45)
      - Discussion (30)
      Topics:
      - Validation of the applications in the SHIWA Repository
      - Access policy to the SHIWA Simulation Platform
      - Features and services
      - Sustainability
      - Relationship between SHIWA and other workflow related projects in Europe

      Convener: Peter Kacsuk (MTA SZTAKI)
      • 16:00
        Workflows in Astronomy: General Requirements and new implementations in SCI-BUS project 10m
        Speaker: UGO BECCIANI (INAF)
        Slides
      • 16:10
        User experience and overview of the pilot applications 20m
        Speaker: Tram Truong Huu
        Slides
      • 16:30
        Desktop presentation and demo 20m
        Speakers: Andrew Jones, Dave Rogers, Ian Harvey
      • 16:50
        Fine-grained interoperability presentation and IWIR demo 20m
        Speaker: Kassian Plankensteiner
      • 17:10
        Pegasus workflow system 15m
        Speaker: Ewa Deelman
      • 17:25
        Full scenario tutorial with LINGA application 45m
        Speakers: Tamas Kukla, Tram Truong Huu
        Tutorial website
      • 18:10
        Discussion 20m
        Topics - Validation of the applications in the Repository - Access policy to the SSP - Features and services - Sustainability - Relationship between SHIWA and other workflow related projects in Europe
        Speaker: Peter Kacsuk
    • 18:40 22:30
      Free time / Informal group dinner (starting 20:00; included in registration) Universe

      Universe

      Free time followed by informal non-hosted dinner for participants at a local restaurant (20:00 to 22:30).

      Restaurant address: Budapest, 1132, Visegrádi Street 50/a.
      Map and further information: http://www.trofeagrill.net

  • Friday, 10 February
    • 09:00 09:05
      Welcome to the EGI workflow workshop
      Convener: Richard McLennan (EGI.EU)
    • 09:05 10:50
      Workflow Systems and Requirements - Presentations

      Position papers from workflow user communities and workflow provider/developer communities are delivered in this session, each lasting approximately 15 minutes.

      Conveners: Gergely Sipos (EGI.EU), Richard McLennan (EGI.EU)
      • 09:05
        Biodiversity Virtual e-Laboratory (BioVeL): Robust biodiversity workflows running on the GRID 15m
        Biovel addresses the needs of biologists and environmental scientists by offering a series of robust and reliable web services based workflows that can be managed with the myGrid suite of tools: Taverna, BioCatalogue and myExperiment. The project proposes best practice and efficient workflows for commonly executed analyses in ecology, taxonomy, phylogenetics and metagenomics, with underlying deployment of hardened and robust Web Services. Within the first round of workflows produced and services deployed by the project there are phylogenetic inference workflows. These workflows will provide end users with the capabilities to execute inferencing applications that can scale to easily exploit several kind of resources, like EGI grid infrastructure, local batch farm or dedicated servers, and cloud resources.
        Speaker: Dr Giacinto Donvito (INFN)
        Slides
      • 09:20
        Globus-based Technologies for Supporting Advanced e-Science Workflows – An IGE Perspective 15m
        The IGE project supports the adoption of Globus Toolkit in Europe in many different ways, such as by providing Globus-based technologies to the EGI, PRACE, and other infrastructure providers in Europe. This presentation will outline the capabilities (technologies) specific to or based on the Globus Toolkit 5 of potential interest to scientific workflows' developers and their users (GRAM5, GridFTP, OGSA-DAI, GridWay etc.). We will point out also contributions to the implementation of standards such as BES, crucial for the interoperability of grids. Moreover, the GlobusOnline service will be addressed in the presentation, emphasizing its potential for grid user communities and workflow application developers. Requirements related to e-Science workflows received by the IGE project from the European Globus Community Forum or from other sources could be briefly summarized for this workshop.
        Speaker: Dr Ioan Lucian Muntean (UTC)
        Slides
      • 09:35
        Agile Analysis Framework for Complex Iterations and Data Integration 15m
        Anduril (anduril.org) is an open source Java based workflow engine that has been designed especially for the scientific data analysis and method development. Special emphasis has been spent on the simplification of testing, programming, reporting and refactoring of the framework itself, components and workflows. Anduril provides means for the language independent integration of programs and enables the remote and parallel execution of individual steps of analysis. The workflows are constructed using an extensible custom language that supports inheritable data types, conditions, loops, subworkflows, arrays, etc. The compile time validation of the workflow and the re-execution of modified or out-dated results enables the maintenance of workflows with hundreds or thousands of steps. Anduril has no graphical user interface, which would be impractical for the complex workflows but it relies on a console that may be used remotely and left open for days depending on the total execution time. The efficacy of the framework is supported by more than 300 community provided components and the ready made libraries for Bash, Lua, MATLAB, GNU Octave, Perl, Python, R, and Java.
        Speaker: Kristian Ovaska
        Slides
      • 09:50
        myExperiment 2.0 - preserving digital research objects using the Wf4Ever Architecture 15m
        Increasingly, published research is based on digital artefacts such as scientific workflows, web services and public data sets. myExperiment is a social website for sharing workflows and data, used by scientists of different domains such as bioinformatics, chemistry and astrophysics. myExperiment lets scientists build a structured aggregation or pack of the artefacts supporting a scientific work, which we call a Research Object. Packs may include a workflow, input data, reference data and the workflow results, thereby giving other researchers the ability to independently reproduce the virtual experiment and verify the results or reuse the work to analyse new data. Research Objects can be collectively annotated, shared, published, modified, combined and derived, but are also subject to external changes, evolution and decay. For instance, rerunning a Research Object becomes difficult or impossible if its workflow depends on tools and services which are no longer executable. Similarly, researchers analysing the work might struggle to reuse a Research Object if it is not clear how its artefacts can be combined. To address these preservation concerns we are developing “myExperiment 2.0” as both a software architecture and a reference implementation, called Wf4Ever. The Wf4Ever Architecture uses a Research Object Model to specify aggregations of digital artefacts with rich annotations, recording their provenance and evolution, and allowing sharing and collaboration through a set of decoupled RESTful Linked Data services. The Wf4Ever Toolkit combines techniques such as analysing and comparing workflow structures, provenance traces from automated workflow reruns and utilising workflow integrity and authenticity checking in order to detect Research Object decay, replay past workflow executions, attempt automated repair and recommend replacement workflow fragments.
        Speaker: Stian Soiland-Reyes (myExperiment)
        Slides
      • 10:05
        SHIWA technology to enable the integration and collaboration of various workflow systems used in Europe 15m
        User communities from all around Europe use many kinds of different workflow languages. Communities develop their workflows using one of the workflow engines. Workflow development, testing and validation are a time consuming process and it requires specific expertise. These limit the number of available workflows, so it is important to reuse them. Workflows developed for one workflow system is normally not compatible with workflows of other workflow systems. In the past if two user communities using different workflow systems wanted to collaborate, they had to create the workflows from scratch to transform them to the desired workflow languages. This situation can be resolved by emerging new workflow interoperability technologies. The goal of SHIWA is to develop such technologies. According to the new SHIWA technologies publicly available workflows can be used by different research communities working on different workflow systems and are enabled to run on multiple distributed computing infrastructures. As a result workflow communities are not locked anymore in to their selected workflow system and their supported distributed computing infrastructure. SHIWA develops, deploys and operates the SHIWA Simulation Platform to offer users production-level services supporting workflow interoperability. As part of the SHIWA Simulation Platform the SHIWA Repository facilitates publishing and sharing workflows, and the SHIWA Portal enables their actual enactment and execution in all the DCIs available in Europe. In the talk we shortly introduce the SHIWA technology and explain the benefits users can get from it. Since SHIWA can integrate practically any kind of workflows and workflow systems used in Europe this talk is very relevant for the users and developers of every workflow system.
        Speaker: Peter Kacsuk (MTA SZTAKI)
        Slides
      • 10:20
        A flexible zone model for data privacy and confidentiality in medical research 15m
        Increasing amounts of data is exchanged, processed and stored in medical research, linking healthcare registers, genetic and cancer databases, and clinical study databases with biobank repositories, imaging repositories and other data sources. But the legal and ethical implications of such far-reaching data access and data merging are often only taken insufficiently into consideration. Privacy protection has become a fundamental requirement for research with medical and also with genetic and genomic data, which have the inherent potential to be identifying. Any project involving access to patient data and the merging of phenotypic data with genomic and physiognomic data requires frameworks that can guarantee data privacy and confidentiality. Confronted with this situation, researchers tend to decide for the most stringent and confining solutions to protect data privacy often suffocating free research. Thus, for the TRANSFoRm project (http://www.transformproject.eu), which will implement a user centered platform for the integration of Primary Care clinical and research activities, we developed a privacy framework, that was designed to be flexible enough to satisfy the different privacy needs of heterogeneous data flows in pan-European projects involving access and exchange of clinical, care and research data. This privacy framework is generic enough to be used for all kinds of research, but especially large projects with data flows that include the potential for identification of patients. Data privacy profiles of different stringencies were created, and transcribed into a zone model consisting of a data source zone (care zone), a non-care zone and a research zone that describe areas of different degrees of privacy protection needs. Because it turned out that especially in pan-European projects three zones are insufficient, subzones within the main zones were defined, considering the fact that databases in different countries often operate under different rules and regulations. Privacy filters and data linkers operate between zones and sub-zones modifying the flow of data from the data source zone to the research zone. Only when this flow is possible according to policies and regulations to be applied, data can be transferred from a zone with high or medium privacy risk to a zone with low risk, enabling the research intended. Major functions of these filters are anonymisation, pseudonymisation, coding and data aggregation. In addition, data linkers allow the linkage of data bases within or between zones / subzones. The zone model allows to visualise and prepare privacy protected workflows for complex research projects with the aim to enable research with anonymised data.
        Speaker: Dr Wolfgang Kuchinke (Heinrich-Heine University Duesseldorf)
        Slides
    • 10:50 11:15
      Coffee
    • 11:15 13:00
      Discussions

      Facilitated discussions with objective to refine the role of workflow systems and workflows in the wider usage of EGI, and the role of EGI stakholders in the more harmonised development, provisioning and adoption of workflow solutions.

      Conveners: Gergely Sipos (EGI.EU), Richard McLennan (EGI.EU)
      • 11:15
        E-Science workflows: The EGI perspective 15m
        Speaker: Dr Gergely Sipos (EGI.EU)
        Slides
    • 13:00 14:00
      Lunch (included in registration)

      Opportunity for networking and further discussions.

      Sandwiches will be served at the workshop's location.