Merged TPM and DMSU kickoff, 22 June 2012 Participation: Ales Krenek, Ron Trompert, Alessandro Paolini, Guenter Grein, Will Rogers, Torsten Antoni, Mathilde Romberg, Tiziana Ferrari, Zdenek Sustr, Peter Solagna ----------------------------- 10.12 Introduction Ales: Welcome, Purpose of the meeting Tiziana: Support structure changes (TPM/DMSU Merge) not yet approved, technically, but let's assume they will be ----------------------------- 10.16 TPM Overview Torsten: https://indico.egi.eu/indico/getFile.py/access?contribId=1&resId=0&materialId=slides&confId=1091 Ales: Give quantitative assessment (people, tickets, workload) Guenter: Load varies, just under 10 tickets per day on average. Ales: Automated support for escalation procedures? Guenter: GGUS search engine, database queries, new report generator to be commissioned on Monday Guenter: We need to decide what to do if SU unresponsive. Ales: Is there classification of Support Units wrt. their SLAs etc.? Mathilde & Guenter: There is a spreadsheet somewhere. Torsten: Rough Classification in GGUS drop-down box - The rest is left for the afternoon ----------------------------- 10.46 DMSU Overview Ales: https://indico.egi.eu/indico/getFile.py/access?contribId=2&resId=0&materialId=slides&confId=1091 More in-depth explanation of current ticket follow-up procedures Discussion on closing aged low-priority tickets - reasons pro and contra raised again - it's not the solultion that would make everyone happy but there's no apparently better choice; - we went through this discussion at TCB, the described approach was agreed, and we will not change it in short term ----------------------------- 10.18 Partner Introduction CESNET -- Zdenek: https://indico.egi.eu/indico/getFile.py/access?contribId=3&resId=2&materialId=slides&confId=1091 INFN -- Alessandro: https://indico.egi.eu/indico/getFile.py/access?contribId=3&resId=0&materialId=slides&confId=1091 - LRMS supporters - Ales & Tiziana offering their respective experts - Will be decided TPM DE -- Guenther: https://indico.egi.eu/indico/getFile.py/access?contribId=3&resId=0&materialId=0&confId=1091 APEL -- Will: - Tiziana explains why APEL was invited (no 2nd level was available officially). Workload not expected to change, funding will be provided to improve responsiveness. JUELICH -- Mathilde: https://indico.egi.eu/indico/getFile.py/access?contribId=3&resId=1&materialId=slides&confId=1091 Ales: Nordic team not here due to midsummer celebrations. - they take care of ARC and dCache ----------------------------- 11.50 Motivation for the Merge Tiziana: https://documents.egi.eu/document/1104 ----------------------------- 13.14 Discussion Ales: https://indico.egi.eu/indico/getFile.py/access?contribId=5&resId=0&materialId=slides&confId=1091 - Taking over TPM work - Suggesting INFN takes up triage - Ales: Afraid if CESNET is not involved regularly, it will be difficult to provide backup. (1 week in 4?) - Guenter: General knowledge and SU FAQs should be enough - Ales: Can "triage and assessment" be left wholly to INFN (at 0.5 FTE)? - Alessandro: There are six or seven people, they can cover it in rotation if we can ask CESNET to stand in for us on national holidays. - For Higher-priority software tickets, the person on shift must make sure resolution starts immediately (either start themselves or get an expert). - Lower priorities may be left for the jabber meetings, provided they are held twice a week (suggestion accepted). Ales will run a Doodle poll - It is desirable to allow more people to have a look and to provide solution eventually, without the need to bother 3rd line - APEL does not necessarily need to be present at all the meetings, just check your e-mail and come if you're needed (Ales) - Sounds Reasonable (Will) - Ticket Followup - Guenter: Switching back from 'Waiting for Reply' is automatic - Ales: Is it possible to notify on tickets untouched for a certain time by priority, unless they are 'Waiting for reply'? Action: Ales and Guenter: Come up with a proposal for a strategy for more targeted reminders to be sent to supporters to achieve faster reaction to higher-priority tickets. - Support units - Guenter: For technical reasons, let's not change SU names - Ales: No objection - Tiziana: At least in presentation it must be made clear that this is the new structure - Zdenek: Let's find a new meaning for the 'DM' - Reassignment: Only allow assigning from DMSU to relevant SUs (Ales)? I'm in favour of leaving the full list there (Ales) - Ales: Who can decide which current 2nd level SUs are still needed? - Tiziana: I think it is quite easy. - ARC Deploy - no need - gLite Release Pages and Repository - no need - ELCG Operations - External - Dashboard-Siteview - External - GStat - 3rd revel - SAMNagios - integrate into DMSU - RGMA - Decommissioned - Networking - External - VOMRS - External - LCG CE - External - StratusLab - External Action: Guenter: enable reassignment from DSMU to all 2nd and 3rd line SUs; it will become a bit more messy but we should know what we are doing - Interaction with Ops - Tiziana: How does DMSU KB complement Known Issues/Troubleshooting Guides: https://wiki.egi.eu/wiki/Operations_Manuals - Ales: Some integration is in order, probably belongs to the Troubleshooting category. Format: https://wiki.egi.eu/wiki/UMD-1:UMD-1.7.0 - It makes sense to add issues found by DMSU to be recorded there (i.e. decomission the current Middleware issues and solutions page) - Ales: I will have to discuss it with Kostas - Who will make the entry? The ticket resolver, or decide ad-hoc on jabber. - The entries are associated with particular version only - we assume the users will find it anyway - we don't have to take care about expiration - Tiziana: Another thing - communicating internally prioritized issues back to Operations. I.e. when to make an emergency release of UMD. - just do it, no need to establish formal channels - Ales: Do we need other Channels aside of GGUS? I don't think so. - Tiziana: Yes - Ales: Are we required to attend Monday meetings? - Alesandro: I'm there anyway Ticket oversight and followup - to be done by KIT completely - High priority tickets - Mathilde: Suggest hard-setting the 45-day ETA for top priority tickets. It could be changed on negotiation - Ales: Top priority tickets are rare, perhaps it's not worth setting up an automated process for automatic setting - Ales: Can we make an automated check that the ETA is 1) set and 2) met? - Guenter: Yes. Notifications will be sent to Technology Providers. - Mathilde: Top priority tickets should be assigned to the TP's generic SU regardless of the actual product affected! (agreed) - So that the appropriate people know about the issue - Low priority tickets - Poke TPs through reminders (Ales) - Mathilde: Compile reports easily understood by the managers. Software Ticket Process - Ales: I still feel there are issues reassigned to 3rd level too quickly. Request to Alessandro: explain that to the triage team. - Informal involvement of developers - Ales: EMI (A. Cecchanti) complains they don't get credit for the effort. - Try to resolve ourselves first, use common sense Documentation - Ales: How to make sure that we don't miss issues that deserve documenting. - Don't be too eager closing tickets, we will use the jabber meetings to decide. AOB - Statistics reports - Ales: weekly elementary statistics (Nos of issues) - Tiziana: As soon as the change is approved, I will need them for my reports in SA1. - Rules for escalation procedure: - Requests for Changes - even lower priority than low priority - Adapt an even more relaxed priority - Perhaps close only if there was a major release in between - Tiziana: for post-EMI: What to do with PTs who do not want to use GGUS anymore? - Mathilde: For UNICORE, we will watch the GGUS channel. We will still look at the tickets. - Zdenek: Do the same thing we do now for products by non-SLA'd partners. - Ales: No simple solution, keeping all 'externalized' tickets in DMSU too work-intensive. 16.11 Thank you