17–21 Sept 2012
Clarion Conference Centre
Europe/Prague timezone

Using Adaptation Strategies to Improve Grid Operations

Not scheduled
Clarion Conference Centre

Clarion Conference Centre

Clarion Congress Hotel, Prague, Czech Republic
Poster Resource Infrastructure services (Peter Solagna: track leader)

Speakers

Mr Filip Křikava (I3S - CNRS UMR 7271)Dr Javier Rojas Balderrama (I3S - CNRS UMR 7271)

Description of the work

An adaptation strategy in ACTRESS is based on a feedback control loop that is represented in a model following an intuitive sense/compute/control decomposition. A target system context (WMS host, L&B service, etc.) is modelled by sensors and effectors representing observable context information (system load, job state, queue size, etc.) together with available actions (service restart, job resubmission, etc.). These touch points are usually defined using interfaces provided by an operating system and middleware services. The actual adaptation strategy is expressed in controllers. Based on the sensors inputs, a user describes what effectors should be triggered in order to realize the envisioned adaptation.

There is a wide range of possible strategies including a job resubmission upon failure with automatic black listing of the computing element where the execution has failed, a detection and a suppression of a malfunctioning resource, purging saturated job queues, or replicating data in several storage elements by choosing the closest or the highest capacity server.

To aid the development of these adaptive strategies, the toolkit provides both a design time and a runtime support. The design part currently contains (1) a language to express both a structure and a behavior of the different elements and is compiled into the adaptation model, (2) a validator for the adaptation model that checks its well-formedess, and (3) a verifier that performs an explicit verification of assumptions about the model expressed in the linear temporal logic using the SPIN model checker. The runtime support consists of a code generator that translates the adaptation model concepts into a Scala source code together with deployment scripts.

Link for further information

https://salty.unice.fr

Printable Summary

Due to the complexity of grid infrastructures human administration costs of grid operations are high and end-users are not completely shielded from the system heterogeneity leaving them to deal explicitly with the reliability issues.

Instead of trying to achieve complete reliability within the middleware itself, we propose new operation modes where grid administrators describe their goals and knowledge about the system configuration, optimization and failure recovery into executable adaptation strategies. These strategies then run autonomously within the infrastructure freeing administrators from some common repetitive operational issues and allowing them to focus on a higher level supervision of the system.

This poster presents the ACTRESS toolkit that can be used by the grid administrators and users to rapidly prototype such adaptation strategies while abstracting them from the painful low-level implementation details.

Wider impact of this work

The presented toolkit enables grid administrators and users to develop solutions for some of the recurring problems they face during grid operation that would otherwise require their immediate attention. While the actual development of the adaptation strategies, especially of the more advanced ones is still a challenging problem, the ACTRESS toolkit aims at making the overall process easier by providing the tools and well defined building blocks for rapid prototyping.

The work is being evaluated in the context of the NeuGRID project that operates the gLite infrastructure, however, it is a general framework for assembling an external adaptive behavior on the top of existing systems. Among others, our ongoing long term goal is also to create a library of reusable elements and adaptation patterns that encapsulates the common services in the contemporary distributed computing infrastructures.

This work has been partly funded by the ANR SALTY project (ANR-09-SEGI-012).

Primary author

Mr Filip Křikava (I3S - CNRS UMR 7271)

Co-authors

Dr Javier Rojas Balderrama (I3S - CNRS UMR 7271) Dr Johan Montagnat (CNRS) Dr Philippe Collet (I3S - CNRS UMR 7271)

Presentation materials

There are no materials yet.