19–23 Sept 2011
Lyon Conference Centre
Europe/Amsterdam timezone

A integrated monitoring tool for grid site administrator

Not scheduled
Lyon Conference Centre

Lyon Conference Centre

Lyon Conference Centre, Lyon, France
Poster

Speaker

Dr Giacinto Donvito (INFN)

Description

In this work we will show the development and the work carried on in order to build a monitoring tool that gives an aggregate view of all the users activities on a given grid site. The tool is able to show the job submitted by each user together with information about the file accessed on the storage system. Also in a farm with posix-like parallel file-system, the tools is able to track down both SRM standard operation and the local “posix” file access. This monitoring system works in a mixed environment like a farm used both via grid and with local job submission. Moreover it could easily work with different type of computing elements and batch system, as it is highly modular and customizable. This monitoring system will help the sys-admin to have a complete and detailed view of what is happening with the computing center. There is a central database system that take care of storing, aggregating and presenting the information gathered from each monitored node. In particular each Computing Element has its own agent in order to sent to the information about the jobs, indeed this agent provide information about: user DN, FQAN, grid-jobid, local-jobid, queue, local user, VO. Also the StoRM and the gridftp servers provide information about the file accessed both from the farm itself and/or from remote sites. Also in this case the monitoring agents provides: DN, FQAN, name and path of the file, VO. For each file accessed locally through lustre file-system, the local user that access the file, the node from which the file is accessed, the pid of the process accessing the file, the name of the process accessing the file. Thanks to sensors installed in all the nodes of the farm it is possible for the site admin to know each accessed file over a Lustre/GPFS parallel file-system. All the monitoring agent are as lightweight as possible in order to run it every one or few minutes. We have already developed sensors for several services: LCG-CE, CREAM-CE, StoRM, Gridftp servers, Xrootd servers, Torque/Maui, Lustre.

Primary author

Co-authors

Prof. Giorgio Pietro Maggi (Politecnico di Bari e INFN) Mr Guido Cuscela (INFN-Bari) Mr Vincenzo Spinoso (INFN-Bari)

Presentation materials

There are no materials yet.