26–30 Mar 2012
Leibniz Supercomputing Centre (LRZ)
CET timezone
CALL FOR PARTICIPATION: is now closed and successful applicants have been informed

The role of GGUS in delivering highly reliable WLCG operations

28 Mar 2012, 11:00
25m
LRZ 1 (50) (Universe)

LRZ 1 (50)

Universe

Operational services and infrastructure EGI Helpdesk

Speaker

Mrs Maria Dimou (CERN)

URL

https://twiki.cern.ch/twiki/bin/view/LCG/GGUSusedbyWLCGEGICF2012Abstract

Description of the Work

The ticketing system of choice for WLCG is GGUS. Accessible via the email address helpdesk@ggusSPAMNOTSPAMNOT.org or the web entry point https://ggus.org (requires login or certificate-based authentication), GGUS provides a very rich functionality while maintaining its web pages and instructions light and simple. GGUS' scope is wide - National Grid Initiatives (NGIs), Support Units (SUs) related to VOs, deployment, operations and/or middleware projects accompany tickets towards their solution, so the amount of supporters is large and variable. In addition, GGUS maintains interfaces with many external systems like the "CMS Computing" savannah tracker, GOCDB, the Operations' portal, the LCG VOMS service, the US-based OSG Information Management (OIM) database and the local ticketing systems in the Tier0, several Tier1s, the OSG and multiple NGIs. For these reasons, the underlying workflows are complex but this complexity is hidden from the end-user.

GGUS is flexible allowing direct site notification and automatic assignment to the relevant SU. This functionality allows bypassing the dispatching SU of Ticket Process Manager (TPM). In this way, the relevant experts get immediate notification of the incident and can start working on the ticket from the moment of its creation. This is the most common use case of GGUS by WLCG. Even more "focused" in importance for the VOs is the notion of TEAM tickets, especially developed by GGUS for WLCG, now also adopted by the BIOMED VO. TEAM tickets allow a number of knowledgeable VO members to co-own a ticket, hence to remain up-to-date at all stages of the solution process. ALARM tickets, an expert choice of GGUS, allow a small number of Grid experts within the VO to raise ALARMs at the Tier0 or Tier1 sites for the appropriate problem areas according to the MoU.

This talk will describe the workflows, reports and methods used to maintain quality in the incident resolution process, in terms of data in any given ticket, submission and

Conclusions

GGUS is the tool that survived all Grid projects' re-incarnations. The reason is that it provides an important service. It brings together all support structures across sites, VOs and projects and provides a working platform with the end-user in mind. WLCG is the most requiring user community. Its experience and requirements can suggest effective ways of working that may be useful for other communities.

Impact

GGUS developers made available since 2007 a number of features on WLCG request that are certainly useful for and available to all communities but may be not fully known. By becoming aware of the functionality that is already there and of which everyone can profit, the community will understand why GGUS should be properly equipped in funding and resources to continue and extend this good work. In addition, "re-inventing" the wheel will be avoided, hence, resources will be spared.

Overview (For the conference guide)

The Worldwide LHC Computing Grid (WLCG) strategy is to use the Global Grid User Support (GGUS) system for tracking grid-related incidents . Tickets are created every day by all 4 LHC experiments and their progress is discussed at 3pm CET every working day. For the WLCG Tier1s, i.e. the Grid sites providing the biggest fraction of computing capacity to the Virtual Organisations (VOs) of the LHC and connected with high bandwidth with CERN (mentioned here as the WLCG Tier0) if an incident is not recorded in a GGUS ticket, then it is not considered for further investigation. This talk will describe the ways we used to extend and promote the rich GGUS functionality in order to get the very requiring WLCG user community on board and in order to deal with incidents that need immediate attention and rapid resolution.

Primary author

Mrs Maria Dimou (CERN)

Presentation materials