Dear all,
Since migrating the APEL tests from SAM to Nagios, there seems to be some confusion between the test result “CRITICAL” and what are critical tests.
In the case of APEL, we provide two different tests, APEL-Pub and APEL-Sync. APEL-Pub is a critical test; APEL-Sync is not. However, sites will assume that when the APEL-Sync Nagios test is in a “CRITICAL” status, their site will have their availability/reliability affected.
The problem in our case (the reason why more sites are failing the APEL-Sync test) is that this test is calculated by checking all the historical data of a site (currently since January 2008) both in the site’s local database and in the central database and raising an error if there are any synchronisation issues. This means that if a site has published correctly all their accounting data, except for some records in January 2008, their APEL-Sync Nagios test today will have a result of “CRITICAL”.
We have two different proposals:
* Reduce the window to calculate the results of the APEL-Sync tests to 13 months only. Any unsynchronised data before that won’t result on an error.
* Remove the status “CRITICAL” for the APEL-Sync test and only return a “WARNING” if there are discrepancies. (Cristina Del Cano)