Thanks to Salvatore and EGI colleagues who have made significant progress in the pilot study, using open search technology to provide metadata catalogue service and possible client interfaces supporting users to discover data. The main issues 1) about the open-search and metadata-based approach seems insufficient to support the requirements in EISCAT 3D. 2) Another issues is related to the performance. Question remains whether the metadata catalogue is scale enough for EISCAT 3D Data. The group agreed to investigate these in a H2020 project. [Link to the presentation: https://indico.egi.eu/indico/materialDisplay.py?contribId=3&materialId=slides&confId=2018]
Thanks to Ari who presented EUDAT solutions, and explained the possibility of storing data in CSC storage and make the use of EUDAT metadata, PID and other services. The questions focused on how to better integrate the two e-infrastructures and their services. Peter indicated possibility to integrate iRODS onto the EGI platform and the group consider it is an exciting direction to go. [Link to the presentation: https://indico.egi.eu/indico/materialDisplay.py?contribId=2&materialId=slides&confId=2018]
Andrei Tsaregotodtsev from CNRS also proposed possible supports from CNRS which are very welcome. (Since the audio was not very clear) Andrei sent a summary to explain his ideas as below:
"The DIRAC Project was developed for a large LHC experiment to provide a coherent system for managing large amounts of data ( tens of Petabytes ), automated data distribution to a dozen of storage facilities in Europe and for a transparent access to these data by users doing the analysis work. The computing resources needed to analyse the data are also managed by DIRAC, therefore it is a self-consistent system for both computational and data management tasks.
The Data Management System of DIRAC consists of several parts:
- low level components to access various kinds of storages ( standard grid storage ), more
storage types can be easily added to this list;
- catalog service to keep track of physical copies of data files ( Replica Catalog ) as well
as to provide arbitrary metadata associated with the files with an efficient search engine
based on this metadata. The catalog service was demonstrated to work with many ten's
of millions of files and we are working further in order to increase its scalability properties.
We are also working on a RESTful ( programming language neutral ) interface to facilitate
development of application specific tools using te service;
- high level services to automate data distribution triggered by the new data availability
according to predefined scenarios, for example, predefined shares at various storage
facilities.
I think these are similar tasks to those that were discussed at the meeting today.
In addition, DIRAC has a powerful Workload Management System which controls
user workloads ( jobs ) submitted to various computing resources available to the
community through a single interface to make this activity transparent and user-friendly.
The computing resources can be of many different kinds - computing grids, clouds,
standalone computing clusters, volunteer grids or single PCs.
Of course, having a single system for both Data and Workload management allows
to increase efficiency and to lower the maintenance efforts.
As was mentioned in the meeting, we are preparing a pilot DIRAC service for the EGI
communities which will have most of the above features. Therefore, it will be relatively
easy to try it out without a need to install and maintain complex software.
Finally, we are open to further discussions about a possible use of the DIRAC system
by the EISCAT Collaboration."
There are minutes attached to this event.
Show them.