OMB meeting, Jun 29, 2010 - MINUTES v1 ================================================================================================ PARTICIPANTS (List possibly incomplete, additional attendees connected through phone bridges) Tiziana Ferrari (chair), John Gordon, Malgorzata Krakovian, Vera Hansper, Kostas Koumantaros, Christos Kanellopulos, Tadeus Szvmocha, Dusan Vudragovic, Mario David, David Durvaux, Helene Cordier, Jinny Chien, Jan Astalos, Edgar Znots, Daniele Cesini, Luciano Gaido, Miroslav Ruda, Cyril L'Orphelin, Emir Imamagic, Agnes Szeberenyi, Claire Devereux, Szabolcs Hernath, Andoena Balla, Javier Lopez, Mingchao Ma, Luuk Uljee, Gabor Roczei, Christoph Witzig, Edgars Znots, Nugzar Gamtsemlidze, Ron Trompert, Gabor Roczei INTRODUCTION (T.Ferrari) Noc managers ML will be updated ASAP to included non EU regions, and to extend it to include all representatives (not all partners are necessarily involved in the InSPIRE projet) OMB wiki pages available Meeting: they will be usually on the last tuesday of the month, F2F meeting at the EGI Tech Forum Next Dates: July 27 and August 17 Update on Milestones and Deliverables: see dedicated page on the EGI operations wiki MS401 (luciano): ready for review MS402 (mario): almost ready for the review MS403 (torsten): TOC ready MS404 (dimitris): see dedicated discussion point later on the agenda for today MS405 (minchao): this milestone will include two parts (1) software vulnerability by Linda Cornwall distributed for comments to the NOC managers mailing list, (2) incident response procedures, Dorine (French NGI) will be the editor NGI VALIDATION PROCEDURE (Malgorzata Krakovian) Existing procedure has ben updated and refined to include information on how to formally approve and notify the integration of a new NGI. Other changes: added Nagios box validation procedure; COD is the entity responsible of doing NGI validation if a parent ROC is missing or not operational anymore. Tiziana: request for comments during the next 7 days - Helene Q: how will be updated documents propagated? A: noc managers ML Q: where are the obligations for NGIs (SLA) described? A: an OLA defining the quality of service of a Nagios service is not available at the moment. Milestone MS404 will propose that all NGI core services (middleware and operational) to be defined and described by an OLA. MS404 draft will be distributed to the noc managers ML. Tiziana: we need a procedure now, which will be then refined to include OLAs aspects for all tools as soon as an OLA has been agreed upon (monitoring infrastructure is just one aspect). Kostas: important to validate the Nagios boxes but more important to check the conformance to OLA (availability/reliability), in any case we should not wait for OLAs be defined, because in the next weeks some NGIs will join the infrastructure J.Gordon: how is certification of new sites going to be performed? Tiziana: SAM AP will be the reference tool for some months over the summer as the SAM framework will continue to exist for some months (only the submission engine has been stopped). In the meanwhile there's an open action on Emir to come up with a technical proposal. Emir: RT ticket 88 updated, wiki page available, will be published soon, Emir is waiting for a Nagios update to be release soon DISTRIBUTION of AVAILABILITY/RELIABILITY STATISTICS (Dimitris) CERN will not contribution to the validation and distribution of availability/reliability statistics. New procedure and responsibilities being defined. Wiki page available to propose a new workflow which involves GRNET as partner responsible for preliminary statistics validation (Task TSA1.8), and COD as entity responsible of the follow up to sites eligible for suspension and in need of justification. In the future comments will be solicited by COD through tickets. Tickets that are not properly handled will be escalated according to the usual COD procedures, involving final suspension if no justification or feedback from the respective NGI is provided. In case of poor performance for 3 months, suspension will be applied by default unless a specific request to hold suspension (it needs to be well motivated) is sent to COD by the respective NGI Helene: Q: how do we amend results and who is the partner to notify in case of such a need Tiziana: any request has to be handled through COD and requests for amendment needs to be filed through a ticket to COD. The mechanim to hack results has to be understood. D.Collados said that now that a distributed Nagios framework is in place, a new mechanism for amendment of results has to be defined Open action on JRA1 to come up with a technical proposal for this. OPERATIONS TOOLS: FUTURE UPGRADES (Daniele) Detailed development roadmap for all tools not completed yet (milestone on this) GOCDB4 is going to replace GOCDB3 (change in programmatic interface): GGUS tickets about compatibility among GOCDB4 and BDII and Gridview now closed. Replacement of the programatic interface foreseen by July 31 and then the input system during the summer. A final report will be provided at the tech forum. Operations portal under validation. New versione of GGUS helpdesk just released. A new big release for Nagios/MyEGEE/ATP foreseen for next week. J.Gordon: new releases can't be too frequent. OPERATIONAL LEVEL AGREEMENTS in EGI and Milestone MS404 (Christos) Change in terminology: Operational Level Agreement will replace "SLA"/"SLD" MS404 will provide an overview of existing OLAs from EGEE, and ideas for enhancements according to the requests received from NGIs during the kickoff and to EGI needs. J.Gordon: we have to reconsider the minimum hw resources provided by a site, different types of resources (e.g. digital repositories), in the future not all sites may be interested in providing a classical set of EGEE services. Tiziana: the mandatory set of services was defined to define a basic set of critical tests. The concept of critical test has to be redefined if we allow for more flexibility in the minimal set of service provided by a site. OPERATIONS INTEROPERABILITY (Michaela) Call for NGIs willing to participate to the INTEROPERATIONS session at the EGI technical forum - John: Accounting is not an easy item, it should involve JRA1 and EMI - Tiziana: we should concentrate on requirement gathering (for JRA1 and EMI), middleware interoperability and harmonization is a topic in the EMI agenda. Germany and Poland (and maybe Netherlands) should contribute on this, also SEE NGIs presented interestin g cases for HPC integration - J.Gordon: there is an accounting workshop which will cover all interoperability aspects that specifically concern accounting John also suggests to compile a list with the grid (and mw stacks) we want to interoperate with. How many are they? do we need to cover interoperability with non European Grids? Should we define priorities? Tiziana: S.Newhouse already provided guidelines about this. SA1 interoperability activities should start focussing on interoperability issues within Europe, Unicore resources integration being the top priority. We should focus on operational aspects not on middleware interoperability which is a EMI issue. EGI TECHNICAL FORUM: draft programme (Tiziana) The agenda is very draft, changes are still possible to avoid clashes. ACTION: Mingchao to verify if the security training session can be scheduled on Friday according to the availability of presenters AOB - RT task force: discussion will follow by email to finalize the setup of this group (T.Antoni not present in the meeting) - Procedure for decommissioning of ROC and NGI: candidate to draft a document to be found, discussion with French NGI - mingchao: security contact mailing lists as published in GOCDB should be checked by the respective NGI, at least submission to non-members should be possible. Broken mailing list are imparing the CSIRT communication channels. Tiziana: please file a GGUs ticket against each NGI asking to verify the respective sites. - vera: how to find people to contibute to operational documentation? Tiziana: suggests to extract information from the PPT (declared contribution to the task), but still NGIs have to come forward. If not, revision and creation of procedures will be tasks shared with all NGIs according to some rota. - Emir: when did we decide to use the NOC term? Tiziana: Tiziana decided due to time pressure. NOC stands for NGI Operations Centre and in any case the mission of a NREN NOC and NGI NOC are very similar in terms of mandate and principles. If a more appropriate name is requested, we can update the list. Discussion should continue on the mailing list. [end]