2nd OLA Task Force meeting

Europe/Amsterdam
Description
2nd OLA Task Force meeting: recap from previous meeting, discuss updates from ACE meeting, continue unfinished agenda from first meeting. 1. Recap from previous meeting actions, for which mails have been send: - Helene: feedback on VO assessment ressource cic feature - Helene: feedback alarm/team ticket status - Dimitris: feedback regarding impact of the increase of the suspension limit to 60/60 - Dimitris: Kostas comment regarding increase of suspension limit. 2. Recap previous meeting decisions * Minimum Storage/cores: freeze them for now, see after meeting with ACE. Ideally they could be removed altogether * GGUS tickets: Increase to 8 hours the limit to acknowledge, see about alarm tickets * Grace period for new sites: Not needed, site will not get suspended anyway in the first 3 months, certification procedure should deal with major issues anyway. * NGI meetings metrics: Let NGIs specify their own metrics * Discontinuous availability of resources: No need to treat that differently than new site introduction/ site closure * Increase availability/reliability thresholds: Do not do that now. Instead investigate impact of increasing suspension to 70%/75% * Security and other contact requirements: They are already there * Middleware updates timeframe: Rephrase the OLA appropriately. Suggestion: middleware that is in line with the UMD roadmap * Sites supporting only ops VO: rephrase the OLA Suggestion: that site supports at least one "non monitoring" VO 3. Go through the ACE meeting notes and discuss their impact 4. Points from previous meeting agenda that were not discussed due to lack of time (some could be discussed in 2.) : - How to treat sites that participate in staged rollout/are early adapters - Core services specific provisions - Requirements from operational tools. EVO information: Title: 2nd OLA Task Force meeting Description: Community: Universe Password: ola_tf Meeting Access Information: - Meeting URL http://evo.caltech.edu/evoNext/koala.jnlp?meeting=MsMiMI2i2DDiDD9s9aDt9s - Password: ola_tf - Phone Bridge ID: 244 2178 Password: 1016 Eastern European Summer Time (+0300) Start 2010-10-19 11:30 End 2010-10-19 17:30 Japan Standard Time (+0900) Start 2010-10-19 17:30 End 2010-10-19 23:30 Central European Summer Time (+0200) Start 2010-10-19 10:30 End 2010-10-19 16:30
Attendance Dimitris Zilaskos Christos Kannelopoulos Marcin Radecki (replacing Tomasz Szepieniec) Alessandro Paolini Helene Caudier Dusan Vugradovic Mats Nylen 1. Recap from previous meeting actions, for which mails have been send: - Helene: feedback on VO assessment ressource cic feature - Helene: feedback alarm/team ticket status - Dimitris: feedback regarding impact of the increase of the suspension limit to 60/60 - Dimitris: Kostas comment regarding increase of suspension limit. Helene: Correct URL for the new tool will be send. Another feature of the VO resource assessmnet tool is the ability give vo display of sites the vo is supported, only which vos a site support Helene: no timeframes defined in Biomed for repsonse to tickets, 2-4 thours in WLCG MOU for alarm tickets for LHC VOs Dimitris: Regarding increase of the thresholds for suspension, we can propose kostas ideas for introduction at the 2nd year of the project. Marcins : Need to start measuring this 3 months before the 2nd year, or right now and star enforicing that in the 2nd year and have first suspension march or january? Tiziana: First suspension will occur in May, Helene: Consider the case of low availability for a site that is about to be decommissioned Tiziana: Decommissioning goes together with certification procedure, provide vera with feedback Timeframe for VO to pull data: 3 month should be enough Helene: The documentation should be best practices for certification and decommissioned, but not actually enforced Tiziana: A common ground is needed for everyone, for example accounting not checked in some NGIs. Follow up with vera Helene: If such a procedure is adapted as official, it should be an EGI requirement from NGIs, not sites. 2. Recap previous meeting decisions * Remove minimum storage/core requirements Tiziana: During the meeting with ACE team James proposed to adapt the VO feed mechanism, it tells the ATP using xml what are the virtual sites that are important for a VO. They could be services that are hosted in different physical sites. This could be expanded to be customizable list of services that needed. James also proposed to implement an EGI site for central operations/middleware services. The ACE needs no change to calculate on this. Suggestion: webpages visualizing the virtual site on demand as used by WLCG not a clean solution for operations, better have a virtual site configured in the gocdb. Proposed to and Gilles and he would think about this. The VO feed can be adapted very quickly, a virtual site is created in the GOCDB, and that is then fed to ace. Tiziana: attach timelines for features that do not need operations work, and for those that need developments the relevant timeline Helene: who is the one responsible creating the virtual site? Tiziana: site manager/NGI, then VO. No point to ask WLCG to switch to that or move to gocdb. Other non LHC VOs might find it useful. Do not need to add this in the OLA, probably more to VO-site or VO-NGI ola Dimitris: how to add the concept on the virtual services on the OLA? Helene: EGI should be responsible. Add a sentence in the EGI-NGI OLA about fulfillment of requirements of core services for VO. Dimitris: Perhaps define that the site should specify at least one service as minimum requirement, instead of cores and storage is it is currently Dusan: At least one service + site bdii which is mandatory for a site to publish information Tiziana: Important not to compare things that are not the same, it is a radical change. NGIs may not like that. EGEE had common case for all to make comparison easier. Marcin: implications for COD work, procedures are based on the results of the critical tests. Mats: Dont have it completely flexible at the site's admin discretion, there must be a sensible list of service, otherwise agrees Alessandro: Agrees, should provide computing and/or storage resources. In theory a site could provide just a wms or an LGC, the site bdd is mandatory Tiziana: there is also the extreme case of a LHCB site taking care only LHCB services Marcin: How a service is categorized as critical, since it will affect rod and cod works, for example for alarms Tiziana: Either leave it to site managers/ or in an agreement with VOs decide which services are critical, or the NGI to customize the OLA adding its own services, for comparison keep the same list as the current, or just add the site bdii and one additional service as extra requirements. It requires more thought: for now mention both options * GGus tickets increase 8 hours: Regarding alarm tickets: Helene: may be misused, better start by team tickets, for well trained VOs Tiziana: It is not not clear how they are handled, even by WLCG Tier-1. Maybe make them available to all communities, and se the limit to ack a ticket to 4 hours Tiziana: Perhaps it is better to start with something simple, the ticket is be acknowledges within the timeframe defined for each GGUS priority Dimitrs: Should we add in the OLA these time limits, or to somewhere else? 3. Go through the ACE meeting notes and discuss their impact This was not discussed as the notes have been send to the list, and some parts have been already discussed 4. Points from previous meeting agenda that were not discussed due to lack of time (some could be discussed in 2.) : * Early adapters: Tiziana: This is related to what is critical and what is not. One options is to make it not critical by extending GOCDB witht he ability to flag the site as an early adapter and not include this in the statistics. Advantage: can have an EGI virtual site hosting all early adapters Helene: sites feel they do not have clear say when a test is failing because of something not their fault. a "buttom-up" procedure Action or Dimitris: how to make this better Helene: Nagios commemts/downtimes, site admins need to know what is taken into account * Core services and operational tools: Tiziana: ACE needs to handle them as a virtual service at EGI/NGI level. Minimum thresholds: not less than 95%. Dusan: agrees Helene: not sure, little higher than the availability of last year, doesn't think too high, but has to be checked if its reasonable, a gradual increase approach is better. Tiziana: would need some history if thresholds are reasonable. Mats: if a service is defined as core 95% is not very high, 99% is better. Such services are: accounting portal/repository, dashboard, regional ticketing, nagios. But are all core services? accounting is not that important. Tiziana: core service is central, either to the EGI or NGI Tiziana: Learn from experience what is reasonable Helene: Start with a short list first Actions for Dimitris: Provide a short summary of the TF and the work so far for the next meeting Draft an amended OLA proposal to present to the OMB that will take place in November
There are minutes attached to this event. Show them.
The agenda of this meeting is empty