Speaker
Overview (For the conference guide)
Information system is a backbone of any kind of a distributed grid system. NorduGrid ARC was one of the first middlewares that came with a reliable and comprehensive information system archtecture. This solution, initially inspired by an LDAP-based approach of Globus, has been used as a distributed dynamic database for grid resource discovery and monitoring in research infrastructures for many years. This paper gives an overview of the system architecture, the underlying technology and features. Although the original ARC information system came with a custom information schema, the standard GLUE2 model is being broadly endorsed by middleware developers now. ARC contributed to the GLUE2 development, and is among the first to implement it, as described in this paper. Ongoing new developments targeting further convergence and harmonization of otherwise different grid information systems is also discussed.
Impact
The GLUE2 standard specification itself is a result of cooperation between major middlewares, and thus it incorporates all the experience accumulated in production environments. Being a commonly agreed standard, it is a first step towards transparently interoperable grid services. Implementing it in ARC services is one of the major milestones on the road towards a truly unified and interoperable grid landscape, where various resources and services can be easily discovered by any tool. One important benefit of the standard information system is significant simplification in development of client tools, which can rely on a standard API and libraries.
Conclusions
The ARC information system has been deployed in research infrastructures for years, and its structure encapsulates the nature of the ARC grid itself. The acquired experience was used when developing the GLUE2 information standard together with experts from other middleware stacks. By adding GLUE2 support to ARC services, a major step towards a common grid information space and thus its better usability is being made.
Description of the Work
General architecture of the ARC information system and its components is a result of matching requirements implied by typical application use cases to the grid computing and storage resources. Among the key user requirements is freshness of the information, such that a grid information query produces results comparable with a local system query. On the resource provider side, a key requirement is that of the dynamic nature of the system, such that a site can be found in the information system as soon as it is powered up, and disappear from it as soon as it goes down for a brief maintenance. Such requirements influence the choice of the information model, schema and the technology.
ARC came with an own information schema and a coupled model. The schema deals not just with standard objects like a computing cluster or a storage element, but is fine-grained down to an individual job. One can therefore use the ARC information system to actually monitor jobs in almost real time. Later on, ARC added basic support for the GLUE 1.2 information schema, and very recently support for the GLUE2 was added as well.
Once the schema and the model are defined, a suitable technology must be chosen to collect and publish the information. ARC features an extensible framework for information collectors, which allows to harvest information from very different underlying batch systems via independent plugins.
Each ARC service publishes own information either with a help of a local LDAP instance, ARIS, or through a Web Service interface, following either (or all) of the supported schemas. In order to be discovered by schedulers and monitors, each service also registers to one or more indexing services, EGIIS. EGIIS does not aggregate information, and serves only as an index.
This architecture is however unique for ARC, and can not be easily mapped to other middlewares. EMI project is solving this by developing a common registration service making use of GLUE2, EMIR.