26–30 Mar 2012
Leibniz Supercomputing Centre (LRZ)
CET timezone
CALL FOR PARTICIPATION: is now closed and successful applicants have been informed

Site Status Board: a flexible monitoring system developed in close collaboration with user communities.

27 Mar 2012, 16:20
20m
LRZ 2 (100) (Leibniz Supercomputing Centre (LRZ))

LRZ 2 (100)

Leibniz Supercomputing Centre (LRZ)

Speaker

Ivan Dzhunov (CERN)

Conclusions

Close collaboration between SSB developers and VO administrators has led to creation of a successful, highly customizable, easy to configure and flexible solution for monitoring of the computing activities at distributed sites. The SSB framework was designed in a generic way which allows it to be easily adapted to the needs of different Virtual Organizations.

Description of the Work

Collaborative development proved to be a key of the success of the SSB which is heavily used by the LHC VOs for the computing shifts and site commissioning activities. The selection, significance and combination of monitoring metrics fall clearly in the domain of the VO administrators. Therefore VO administrators and computing teams define monitoring metrics and custom views of the monitoring data, in addition to developing sensors and data publishers. The responsibilities of the SSB team include development and support of the SSB framework and the SSB services which store, aggregate and visualize monitoring data. The collaboration extends beyond the customization of metrics and views to the development of new functionality and visualizations. SSB Developers and VO administrators cooperate closely to ensure that requirements are met and, wherever possible, new functionality is pushed upstream to benefit all users and VOs.

The contribution covers the evolution of SSB over recent years to satisfy diverse use cases through this collaborative development process.

Impact

The Dashboard SSB is intensively used by Atlas and CMS for the distributed computing shifts, for estimating data processing and data transfer efficiencies at a particular site, and for implementing automatic exclusion of sites from computing activities, in event of problems. Atlas administrators have defined 30 views in which they monitor ~200 metrics and CMS administrators have defined 8 views with ~100 metrics. In the recent years 93 million records were collected for CMS and 56 millions for Atlas. Atlas and CMS SSB web services have 100-250 unique users per service per week.

Overview (For the conference guide)

Development of highly customizable and flexible solutions requires close collaborative work between developers and the user community. Gathering user requirements and understanding user needs helps developers to provide common, but highly customizable solutions,that will fit the needs of different groups of users. One such example of successful collaborative development is Dashboard Site Status Board (SSB) framework, which allows Virtual Organizations (VO) to monitor their computing activities at distributed sites and to evaluate site performance from the VO perspective.

Primary authors

Alessandro Di Girolamo (CERN) Andrea Sciaba (CERN) David Tuckett (CERN) Edward Karavakis (CERN) Ivan Dzhunov (CERN) Jaroslava Schovancova (Acad. of Sciences of the Czech Rep. (CZ)) Josep Flix (PIC) Julia Andreeva (CERN) Lukasz Kokoszkiewicz (CERN) Michal Nowotka (CERN) Pablo Saiz (CERN) Peter Kreuzer (Rheinisch-Westfaelische Tech. Hoch. (DE))

Presentation materials