Operations Management Board

Europe/Amsterdam
Adobe connect direct link

Attendees:

  Alessandro Paolini (EGI.eu)                                                                                                
  Daniel Kouřil         CSIRT                                                         
  Dave Kelsey (Security)                                                                                                       
  Di Qing(ROC_Canada)                                                                                                      
  Dimitri  (NGI_DE)
  Emir Imamagic (SRCE)             
  Eric Yen (ASGC)                                                                                                                                  
  Gianfranco Sciacca (NGI_CH)                           
  Ian Neilson (STFC)                                                   
  Ievgen Sliusar (NGI_UA)                                       
  Jan Astalos (NGI_SK)                                                                                                        
  Jeremy Coles (NGI_UK)              
  Joao Pina (NGI_IBERGRID)                     
  Kostas Koumantaros (GRNET)                                                                                                       
  Linda Cornwall (Security)                                                                                            
  Luis Alves (CSC/NGI_FI)              
  Miroslav Ruda (CESNET)                                                                                                               
  Ionut Vasile (NGI_RO)                                           
  Peter Solagna  (EGI.eu)                                                                                                   
  Vincent Brillault (EGI CSIRT)                   
  Vincenzo Spinoso  (EGI.eu)                                                                                    
  Andrzej Zemła (NGI_PL)       
  Sven Gabriel (Security)

Agenda bashing

Introduction (Peter Solagna EGI.eu)

Topics discussed in the contribution:

  • Cloud probes added to the A/R calculations profiles
  • Proposal to cancel August OMB meeting (previously planned on August 25th) and anticipate September OMB to September 15th (previously planned on September 29th and clashing with DI4R)
  • Proposed to have one or more OTAG meetings in September/October focused on the operational tools that have been heavily developed during 2015-2016:
    • Accounting portal new version
    • ARGO central monitoring
  • [Action] for all NGIs: submit requirements about these (and other tools) in order to discuss them in an OTAG meeting. It is important to capture requirements now, until there is effort in Engage to support developments.

Security updates

  • Incident on the cloud reported by Daniel K.
  • Sven G. reported on the impact of contextualization and after-deployment configuration (by the users) of the VMs on security
  • VMI endorsement process can be vanified by after deployment activities

Comments and actions discussed during the meeting:

  • Users expectations cannot be failed. If the use case requires to deploy new software and udpate the system configuration this should be allowed.
  • Sites must enforce the monitoring of VM activities to intercept and actively rect on incidents. Federated cloud task force should encourage and facilitate the sharing of security best practices, in order to improve the incident reaction and prevention capabilities in the federation.
    • One possibility to explore is to reduce to a minimum the default network capabilities available on a VM.
  • Users should also be educated and be more aware of the security of their VMs, if they need to have inbound/outbound full connectivity

Report from Vincent B. about ARGUS monitoring

  • A change in the central ARGUS configuration requires ARGUS admin at the NGIs to implement simple configuration changes to have effective monitoring. Please find instructions in the presentation.
    • [Action] NGI representatives to pass the information to the Argus admins

UMD updates

Vincenzo S. reported on the updates for UMD3 and UMD4 being prepared

RP and RC OLA

Peter S. presented the new version of the RP and RC OLAs that are currently in DRAFT.  The main change in these OLAs is the more precise specification of the services in-scope for the agreement:

Services registered in GOCDB and in the "EGI" scope.

In addition a new OLA has been added to the framework the Corporate OLA. It is applicable to all the production services, also the ones not covered in the RP and RC OLAs. The important information is that the corporate OLA does not add more requirements that were not in the RP/RC OLA, all the requirements are superseded by the RC/RP OLA. In case there will be changes in the Corporate OLA, OMB and affected service providers will be timely notified.

The goal is to discuss for approval the RP/RC OLA in the next OMB, unless there will be feedback that require a longer discussion.

[Action] All NGIs: Provide feedback until July 20th

Central monitoring: action plan for the NGI nagioses decommissioning

All the testing of central monitoring performed in the last three months have been successful. The plan is to switch to central monitoring on Friday July 1st.

One problem presented during the meeting is about PROC08. CREAM-CE will be tested with new probes, which were not integrated following PROC08. This would have been impossible to implement without depoying a new SAM update at NGI level, which would have been an enormous overhead for a very little to gain. The proposal of Emir is to make an exception for PROC08 based on the good results obtained in the tests so far, and to use the new monitoring of CREAM from July 1st. This information was unfortunatley not provided before to OMB.

  • OMB agreed to move to new CREAM monitoring from July 1st, with the agreement that at least during July failures caused by the new probe, or the central monitoring framework will not trigger site suspensions or actions follwoing up site underperforming.
    • Service unavailability not directly attributable to CREAM probe or central monitoring issues, will be handled as usual.

Note: New probes will test CA distribution on the CE not on the WN anymore. Make sure that also CEs are upgraded with new CAs (as it should be anyways).

Note: With central monitoring cloud monitoring is added to the ROC_CRTICAL profile for the A/R calculation, as agreed in May OMB.

[Action]: All ROD teams should pay particular attention to ARGO monitoring in the first days/weeks of July to spot possible problems.

ARGO team is committing to be available to quickly solve problems and support NGIs.

Information system discussion with WLCG

EGI provided an extensive description of the problems that non publishing services in the information system will raise to EGI operations. These information will be provided to WLCG GDB meetingin July.

The OMB decision is that NGI should encourage their sites not to remove their SEs, or any service currently published, from the information system, even if supporting only LHC VOs.

WMS Options: DIRAC and glideIN-WMS

Vincenzo S. presented some information collected through an initial discussion with glidein-WMS and DIRAC. The major limit of both systems is the fact that jobs are submitted to pilots and the infrastructure services do not have information about the real users.

AOB

Linda C. Asked about the EGI plans for RFC Proxies.

  • The technology providers have the requirements to produce new version of clients that require with x509 proxies to generate by default RFC proxies. Currently users must use an extra option for RFC proxies, the idea is to do the contrary: users should use an extra option to ask for legacy proxies.
  • Currently central monitoring is using RFC proxies, and this caused no visible consequences
  • All the middleware iscompatible with RFC proxies
  • Once the clients have been produced, a new UI will be released in UMD with the RFC as default proxy
  • Timeline to be provided through the Operations meetings.

Kostas K. reported to the OMB that the services for the certification of new sites are planned to be decommissioned. NGIs should check with their teams if there is still need of these services. Ops tools should be flexible enough to allow certification without custom BDIIs or WMS.
[Action]: provide feedback to EGI Operations or the catch-all services provider about this topic.

 

There are minutes attached to this event. Show them.
    • 10:00 10:15
      Introduction 15m
      Speaker: Peter Solagna (EGI.eu)
      Slides
    • 10:15 10:35
      Security update 20m
      Speakers: Daniel Kouril (CESNET), Dr Sven Gabriel (NIKHEF), Vincent Brillault (CERN)
      Slides
    • 10:35 10:50
      UMD updates 15m
      Speaker: Vincenzo Spinoso (INFN)
      Slides
    • 10:50 11:05
      RP and RC OLA 15m
      Speaker: Peter Solagna (EGI.eu)
      RC OLA
      RP OLA
      Slides
    • 11:05 11:20
      Central monitoring: action plan for NGI nagioses decommissioning 15m
      Speakers: Christos Kanellopoulos (GRNET), Mr Emir Imamagic (SRCE)
      Slides
    • 11:20 11:30
      Information system discussion with WLCG. Updates 10m
      Speaker: Alessandro Paolini (EGI.eu)
      Slides
    • 11:30 11:45
      WMS Options: DIRAC and GlideIN-WMS 15m
      Speakers: Alessandro Paolini (EGI.eu), Vincenzo Spinoso (INFN)
      Slides
    • 11:45 11:55
      AOB 10m
    • 11:55 11:56
      Ops Portal: Information system browsing capabilities - POSTPONED 1m
      Speaker: Cyril Lorphelin (CNRS)