26–30 Mar 2012
Leibniz Supercomputing Centre (LRZ)
CET timezone
CALL FOR PARTICIPATION: is now closed and successful applicants have been informed

Improving resilience of T0 grid services

28 Mar 2012, 14:00
20m
FMI Seminar 1 (50) (Universe)

FMI Seminar 1 (50)

Universe

Operational services and infrastructure Service Management and Monitoring

Speaker

Steve Traylen (CERN)

Conclusions

With relatively little investment, one can considerably increase the availability of grid services while at the same time reducing the operational costs by using virtualization

Description of the Work

The IT/PES/PS section at CERN provides several T0 Grid production services, which includes: GridLFC; GridCE (LCG); GridCE (Cream); GridCE info; GridWMS; GridLB; GridFTS; GridBDII; GridMyProxy; GridMonBox; GridCAProxy; GliteVOMS; GliteVOMRS; CERN-PROD nagios instance and SCAS. This presentation covers operational aspects of running such services with particular focus on Configuration Management, Service Monitoring and ways on increasing service Resilience. Currently some of our Grid services are provided by Virtual Machines instantiated in the Service Consolidation Service. This, together with an intensive use of DNS load balancing, highly increases their availability. The presentation also describes how Grid Services would need to evolve in order to be able to run then in an Computing Cloud.

Overview (For the conference guide)

This presentation will cover the operational aspects of running grid services at CERN T0 grid site, with particular focus on Configuration Management, Service Monitoring and ways on increasing service Resilience. Currently some of our Grid services are provided by Virtual Machines instantiated in the Service Consolidation Service. This, together with an intensive use of DNS load balancing, highly increases their availability. The presentation also describes how Grid Services would need to evolve in order to be able to run in a Computing Cloud.

Impact

this experience can be of interest to any medium-large grid site willing to provide high-availability services to their users

Primary author

Manuel Guijarro (CERN)

Presentation materials