EGI Operations Meetings 12th July 2010 Minute Taker: Emir Imamagic MD - Mario David EI - Emir Imamagic TF - Tiziana Ferrari HC - Helene Cordier ML - Michaela Lechner -- "Update on middleware and staged rollout process", Mario David (MD) MD - Overview of present status of stage rollout, components under testing gLite 3.1 DPM - no early adopters, sent, mail to early-adopters, inspire-sa1 and noc-managers asking for more volunteers. Lack of early adopters in general for glite 3.1 components, more people should volunteer. Explained the reasons why would be good to have more EAs, prod infrastructure is more heterogeneous, more use cases. Last version of DPM did not went to SR and it was dicovered a bug (race condition in a voms lib) that affected dpm under high load TF: NGIs should send feedback about their SRM/DPM implementations. HC: France: will forward Mario's call MD: to update the table of components with EA, or missing EAs. Status of FTS, BDII and dCache nearly in ready for production for glite 3.2. -- Milestone 402 https://documents.egi.eu/secure/ShowDocument?docid=53 MD: this week starts formal review, after feedback and comments from sa1. stress the importance of this MS which contains information and the workflow of staged rollout, and will be used to update the wiki. EA should be aware of this doc. TF: questioning the grace period between SR and production is 5 days, which is too long. MD: idea is to keep components in production for a while, some problems pop up only after a while, it can be discussed this time frame of 5 days. TF: MD should clarify/rephrase this issue in the doc. MD: the staged rollout should be made as fast as possible, but with more than 1 or 2 EAs the more EA per component the more availability to have an EA which will do the sr fast. ML: we are too glite-centric, we should allow ARC and UNICORE here MD: agree, it might be different for other middlewares, comments should be sent to me, ML will send the doc to arc and unicore people so they can give feedback. ML: ARC will do a review, arc and unicore people should be and feel involved from the beggining. Josva Klein (ARC) was contacted and will review the doc. TF: uploaded a new version of the MS doc with comments. -- CA release in EGI era MD: information of the status, David Groep already made https://wiki.egi.eu/wiki/EGI_IGTF_Release_Process start by reviweing this wiki page, then EGI repository and the RT to track the release. MD will raise this issue in the next sa2 meeting, to start this process in the EGI era. HC: CA release, when would you circulate conclusions? MD: it will be discussed on the SA2 meeting tomorrow, I will inform about it in these grid operations meetings, new releases are always announced through the broadcast tool of the operations portal. -- Operations tools EI EI: uploaded some slides. EI has been appointed as task leader of tsa1.4: coordination of deployment of operations tools in the NGIs discussion between EM TF and Daniele Cesini about the relations between the several tasks JRA1 is development SA1.4 coordinates the deployment into production of both the regionalized and centralized tools. only regionalized tools will go through staged rollout, not the centralized tools. short term goals: - estabilishing mailing list with all people deploying the operations tools. there are two options, EM has sent a mail to vote what option: 1- SSO based groups (does not permit easily mail alias) 2- separate mailing list where mail alias are permited and t does not need a egisso account - Monitoring, monitor all operational tools, in egee was done with downcollector, EGI should go for more functionality monitoring (nagios probes, etc.), - - domain migration, in the next few months migrate all tools to the egi domain. agreed with D. Cesini that tsa1.4 meetings will be co-allocated with jra1 meetings. TF: comment about the mailing list to be deployed for this task, where also NGIs are involved, also for discussion of the roadmap and other matters related to the operations tools. -- Actions 1. security of middleware Giuseppe Misureli: discussion started, Linda involved EMI ppl, discussion still ongoing, still an open action 2. update the wiki information MD: this will information from the milestone (402), will be done in next few weeks 3. clarification of gLite 3.2 components MD: not checked yet with EMI consortium, to clarify the support of components in glite 3.1, and in general the versions supported in any glite major release. 4. daniele - to check adding sites with non-glite MW stack to GOCDB TF: will ask Gilles 5. from all NGIs to send feedback on integration of other middleware MD: feedback came from Helene (France), others should send their comments to M David and M. Lechner -- HEPSPEC EI: could we change gstat probes to raise warning and not critical for site not publishing HEPSPEC? TF: HEPSPEC is result of long discussion so probe will be kept critical, will talk to gstat people MD: there is no OLA which forces sites to provide these figures, right way would be to not raise critical TF: EC requires accounting information and HEPSPEC is for the moment the only one HC: is there place where all the tests that are critical are recorded? MD: don't know EI: right now we use the old EGEE SAM tests TF: there is a wiki page on EGI.eu, but no comprehensive picture what is critical, mario should open action on Tiziana to check these things -- COD procedures Malgorzata Krakowian: There are NGIs with sites with below threshold availability. Some NGIs didn't respond to GGUS ticket to provide reasons. MD: will be added action for following meetings Open new action TF: to check about a wiki/doc containing a comprehensive list of all critical tests sites should pass. Next meeting: 26 July 2010 14h00 CET