Meeting with EISCAT-3D - possible technical solutions

Europe/Amsterdam
Description
Sixth meeting of EGI ENVRI Study Case with EISCAT_3D.

A study case was set up to identify existing services and solutions from EGI that could address the data pre-processing, post-processing, publishing needs of these two ESFRI projects. The outcome of the pilot is expected to be directly applicable to EISCAT_3D and EURO-ARGO, and indirectly by other ESFRIs of ENVRI. In cooperation with EISCAT_3D and EURO-ARGO representatives in ENVRI, EGI.eu will try to find best suitable solutions for data pre-processing of primary data and post-processing toward publishing.

More: https://wiki.egi.eu/wiki/EGI_ENVRI

Thanks to Salvatore and  EGI colleagues who have made significant progress in the pilot study, using open search technology to provide metadata catalogue service and possible client interfaces supporting users to discover data. The main issues 1) about the open-search and metadata-based approach seems insufficient to support the requirements in EISCAT 3D. 2) Another issues is related to the performance. Question remains whether the metadata catalogue is scale enough for EISCAT 3D Data.  The group agreed to investigate these in a H2020 project.  [Link to the presentation: https://indico.egi.eu/indico/materialDisplay.py?contribId=3&materialId=slides&confId=2018]
 
Thanks to Ari who presented EUDAT solutions, and explained the possibility of storing data in CSC storage and make the use of EUDAT metadata, PID and other services. The questions focused on how to better integrate the two e-infrastructures and their services. Peter indicated possibility to integrate iRODS onto the EGI platform and the group consider it is an exciting direction to go. [Link to the presentation: https://indico.egi.eu/indico/materialDisplay.py?contribId=2&materialId=slides&confId=2018]
 
Andrei Tsaregotodtsev from CNRS also proposed possible supports from CNRS which are very welcome. (Since the audio was not very clear) Andrei sent a summary to explain his ideas as below:
 
"The DIRAC Project was developed for a large LHC experiment to provide a coherent system for managing large amounts of data ( tens of Petabytes ),  automated data distribution to a dozen of storage facilities in Europe and for a transparent access to these data by users doing the analysis work. The computing resources needed to analyse the data are also managed by DIRAC, therefore it is a self-consistent system for both computational and data management tasks. 
 
  The Data Management System of DIRAC consists of several parts:
- low level components to access various kinds of storages ( standard grid storage ), more
  storage types can be easily added to this list;
 
- catalog service to keep track of physical copies of data files ( Replica Catalog ) as well
  as to provide arbitrary metadata associated with the files with an efficient search engine
  based on this metadata. The catalog service was demonstrated to work with many ten's
  of millions of files and we are working further in order to increase its scalability properties.
  We are also working on a RESTful ( programming language neutral ) interface to facilitate
  development of application specific tools using te service;
 
- high level services to automate data distribution triggered by the new data availability
  according to predefined scenarios, for example, predefined shares at various storage
  facilities. 
 
  I think these are similar tasks to those that were discussed at the meeting today. 
In addition, DIRAC has a powerful Workload Management System which controls
user workloads ( jobs ) submitted to various computing resources available to the
community through a single interface to make this activity transparent and user-friendly.
The computing resources can be of many different kinds - computing grids, clouds, 
standalone computing clusters, volunteer grids or single PCs.       
 
  Of course, having a single system for both Data and Workload management allows
to increase efficiency and to lower the maintenance efforts. 
 
  As was mentioned in the meeting, we are preparing a pilot DIRAC service for the EGI
communities which will have most of the above features. Therefore, it will be relatively
easy to try it out without a need to install and maintain complex software. 
 
  Finally, we are open to further discussions about a possible use of the DIRAC system 
by the EISCAT Collaboration."
There are minutes attached to this event. Show them.
    • 11:00 11:05
      Introduction (Małgorzata Krakowian) 5m
    • 11:05 11:25
      EGI solution proposal (Salvatore Pinto) 20m
      Slides
    • 11:25 11:45
      EUDAT solution proposal (Ari Lukkarinen) 20m
      Slides
    • 11:45 12:20
      Discussion 35m