ARC probe review

Europe/Amsterdam
Adobe Connect

Adobe Connect

Marcin Radecki (CYFRONET)
Description
The meeting will be dedicated to review of ARC EMI probes. In particular: completeness of ARC probes to replace the current SAM set, readiness to be integrated into SAM, other issues. Connection details: http://connect.ct.infn.it/egi-inspire-sa1/
                       sam-probe-minutes-27.02
                       =======================

Author: radecki <radecki@radecki-ThinkPad-X201>
Date: 2013-03-04 10:22:49 CET


Table of Contents
=================
1 Minutes of SAM probe WG telco 27.02.2013 10:00-11:30
    1.1 Participants
    1.2 Probe completeness
        1.2.1 org.arc.AUTH
        1.2.2 org.arc.GRIDFTP
        1.2.3 org.arc.RLS
    1.3 Additions - new probes
    1.4 Other issues
        1.4.1 Run probes from command line
        1.4.2 Probe dependency on NG schema
        1.4.3 Better certificate life-time check for infosys.
        1.4.4 Automatic selection of good SE.
    1.5 Presence of ARC probes in POEM profiles
    1.6 Open RT tickets:
    1.7 Discussion on collaboration of the WG and SAM Team


1 Minutes of SAM probe WG telco 27.02.2013 10:00-11:30
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1.1 Participants
=================
Petter Urkedal (EMI), Pablo Fernandez (NGI_CH), Ulf Tigerstedt
(NGI_FI), Paloma Fuente (SAM), Marian Babik (SAM), Marcin Radecki
(COD/chair)

1.2 Probe completeness
=======================
There exists equivalents for what is current now, but experts from
NGI_UA, NGI_CH and NGI_FI agreed to drop the following ARC tests:

1.2.1 org.arc.AUTH
-------------------
present in ROC_CRITICAL, ROC_OPERATORS and WLCG profiles. Reason for
drop: functionality is checked at every job submission, but probe is
covered by EMI-3. Petter will check if the probe can be ported to
EMI-2.

1.2.2 org.arc.GRIDFTP
----------------------
present in WLCG profile. Staging can be tested by ARC SRM and LFC
tests. Reason for drop: functionality is checked at every job
submission - replacement exists.

1.2.3 org.arc.RLS
------------------
absent in any profile. Reason for drop: no RLS service deployed in
production

We are missing CA test for the time being but this is not considered
as a showstopper for SAM Team integration works.  CA check probe is
covered by EMI-3. *Petter* will check and inform Marian if it will be
possible to include CA test into EMI2.

1.3 Additions - new probes
===========================
Additional ARC tests proposed by EMI are accepted for additions in ARC
framework. NGI_UA and NGI_FI confirmed running these probes for quite
a while and they are fine with them.

1.4 Other issues
=================

1.4.1 Run probes from command line
-----------------------------------
NGI_CH and NGI_UA expressed a need to run probes from a command line.
Petter confirmed it is possible, only it needs to be better
documented. *Petter* will upgrade the documentation.

Some minor issues has been identified by NGI_UA and confirmed by
NGI_FI but they are not considered as a showstopper for EMI2
integration. The issues will be followe up by the WG:

1.4.2 Probe dependency on NG schema
------------------------------------
Some infosys tests, e.g. certificate life-time, are bound to NG
schema. It seems not a durable solution because NG schema (or
infoproviders that generate its content) slightly changes every ARC
release and also is planned for phase out.  NG schema is thought to be
something internal to ARC and is not widely used somewhere except ARC
clients. But in Ukraine we have some grid-portals that parse LDAP
information directly to speed up job status retrieval.  Here we
introduce an external tool (Nagios Probes) that depends on NG schema
and so it should be kept in sync.  Maybe, some tests can be modified
to utilize GLUE2 schema which is a standard or to use ARC clients to
retrieve information delegating all NG-schema stuff to the original
code.

1.4.3 Better certificate life-time check for infosys.
------------------------------------------------------
Certificate life-time check through the infosys is not as direct as
e.g. retrieveing certificate from GridFTP or A-REX WS-interface.
Generally, the tests should interact with the service the same way as
clients usually do, and clients never check certificate validity
through the infosys.  NGI_FI: maybe test should first try A-REX (https
so we get the certificate anyway), then fall back to ldap+NG schema.
*Petter* will look at that.

1.4.4 Automatic selection of good SE.
--------------------------------------
It will be useful to integrate data staging tests with GoodSEs metric
of Nagios so that SRM endpoints for the tests were chosen
automatically.

1.5 Presence of ARC probes in POEM profiles
============================================
The experts decided that we should first integrate all probes in SAM and
let them run for some time. Then, making sure that turning the test
OPERATIONS will be safe it can be decided.

1.6 Open RT tickets:
=====================
[https://rt.egi.eu/rt/Ticket/Display.html?id=5003] - MPI
- require runtime envirnment for MPI on ARC machines
- plenty of MPI users in FI, using 10 versions of different MPIs
- maybe that easily doable - Ulf will check

[https://rt.egi.eu/rt/Ticket/Display.html?id=5004] - APEL and ARC CE
- when you flag ARC-CE as APEL you get red in nagios since
WMS is not able to submit some APEL test due to a known bug

Ulf: when you add APEL flag then machine gets flagged as glite CE
suddenly - probably somewhere in SAM are wrong implicit assumptions.
Emir (offline): true, will be fixed in SAM Update-22.

Ulf: But you have to flag this machine as APEL otherwise machine DN is not
accepted by APEL repo.

Emir (offline): APEL repo uses another service type - gLite-APEL for
getting the list of DNs that are allowed to publish to APEL
([https://wiki.egi.eu/wiki/APEL/UsingAuth]). This service type doesn't
have any SAM tests associated with it so there is no harm in adding it
to ARC site. Btw, be careful when adding DN, APEL doesn't strip
trailing whitespaces :)

Ulf: The problem hasn't show up before cause sites
were using SGAS.  Marian: we have a ticket for that in SAM. *Emir* is
going to check this. Seems some stale dependency, but maybe some
topology thing then more work.  Conclusion: we can wait for a fix
since these alarms does not propagate to Ops. Dashboards.

1.7 Discussion on collaboration of the WG and SAM Team
=======================================================
MB: the WG can assess usefulness of metrics measured by the probes for
EGI operations. If found useful, then the WG can have a look at the
probe logic. WG helps SAM Team to follow up the issues with EMI by
organizing meetings.  Output of the WG could be a lit of metrics
approved by the WG and that can be integraged into SAM. For EMI
integration case it is important to identify missing parts, feedback
to OMB which should decide on actions: remove it or develop an
addition. But OMB shall be aware that this may be a blocking situation
(no SAM release).  Cases to be addressed: org.sam.WN-Rep, central LFC,
FTS, MPI, local LFC, BDII, VOMS, VOBOX

*Marian* will provide a full mapping of current probes and EMI-2 probes.

SAM Team suggested that SR process should be improved to arrange for
the case where issues are found with EMI probes at the SR stage.
There are minutes attached to this event. Show them.
    • 10:00 11:30
      Discussion 1h 30m