Grids encourage and promote the publication, sharing and integration of scientific data, distributed across Virtual Organizations. The complexity of data management within a grid environment comes from the distribution, heterogeneity and number of data sources. In the last ten years there was a strong interest on grid-database access and management topics. Moreover tools and services able to access in grid to relational databases are also strongly required in the ES, LF, A&A User Communities. Within the proposed talk we will present in detail the Grid Relational Catalog (GRelC) Project, an integrated environment for grid database management, highlighting the vision/approach, architecture, components, services and technological issues. The most relevant use cases (in particular in the Earth Science and Environmental domains) will be described in detail.
Description of the work
The key topic of this talk is the GRelC Service. The GRelC service is a GSI/VOMS enabled web service addressing extreme performance, interoperability and security. It efficiently, securely and transparently manage databases on the grid across VOs, with regard to emerging and consolidated grid standards and specifications as well as production grid middleware. It provides a uniform access interface, in grid, both to access and integrate relational (Mysql, Postgresql, SQLite) and non-relational data sources (XML DB engines such as eXist, XIndice and libxml2 based documents).
The GRelC service provides:
- basic functionalities (query submission, grid-db management, user/VO/ACL management, etc.) to access and manage grid-databases;
- efficient delivery mechanisms leveraging streaming, chunking, prefetching, etc. to retrieve data from databases in grid providing high level of performance (in terms of query response time, number of concurrent accesses, etc.);
- additional functionalities such as asynchronous queries,
- a data Grid Portal (GRelC Portal) to ease the access, management and integration of grid-databases, as well as user/VO/ACL management, etc.
The GRelC middleware has been included into the EGEE RESPECT Program (Recommended External Software Packages for EGEE CommuniTies) since it works well in concert with the EGEE gLite software by expanding the functionality of the grid infrastructure (w.r.t. database management in grid). The GRelC service is currently adopted as grid metadata management service in the Climate-G testbed to enable geographical data sharing, search and discovery activities. Moreover it is currently used at the Euro-Mediterranean Centre for Climate Change to manage climate metadata across the Italian CMCC data grid infrastructure through the CMCC Data Distribution Centre portal.
A grid-database service is fundamental for distributed data-oriented infrastructures, production grids, grid testbeds, since it enables the management of crucial information. Some examples concern with biological sequences in the Bioinformatics domain, spatial metadata information in the Earth Science and Environmental domains, patient-related information into a Health Information System (HIS), Astrophysics databases, etc. The talk presents the GRelC service highlighting also how it provides cross-VO capabilities. Due to the service nature, architecture and functionalities common use cases can be defined across different disciplines. This help in identifying common requirements, formalize exploitation patterns paving the way towards sustainability.
The GRelC service is a grid-database management service that can be exploited both at VO and site level. This service is currently successfully deployed and positively evaluated by end-users in the Earth Science and Environmental contexts (e.g. CMCC and Climate-G). Moreover it is also available for tutorial purposes in the GILDA t-Infrastructure.