8–12 Apr 2013
The University of Manchester
GB timezone
CALL FOR PARTICIPATION IS NOW CLOSED

VO auger experience with large scale simulations on the grid

12 Apr 2013, 12:00
20m
3.204 (The University of Manchester)

3.204

The University of Manchester

Presentations Virtual Research Environments (Track Lead: G Sipos and N Ferreira) VREs

Speaker

Jiri Chudoba (CESNET)

Impact

We show how even a very small team can use a lot of EGI resources. Also we list problems
resulting from missing high level tools for bulk data processing together with examples
of our approach how to solve them. We hope to unify efforts with other VOs with limited
manpower to have a common solutions for different issues.

Summary

VO auger is one of the main users of the EGI grid resources. Computing
jobs are managed by the central production team from Granada University, common
users use mostly results of official productions. We describe our strategy
for job submission and data distribution. Our experience with reliability and
usability of various resources and biggest hurdles in every day usage of grid
resources. Results of small scale tests of using DIRAC catalogue instead of LFC
are reported.

Description

Production for the members of Pierre Auger collaboration is controlled by the team of
3 people. In principle one could use just one robot certificate to submit production jobs,
but we found from our experience that we can run more jobs if several different certificates
are used. All production jobs store their status to one common MySQL database. They are
automatically resubmitted in case of a failure. This approach maximises the usage of shared
computing resources on several sites. The efficiency of resource usage is difficult to evaluate
because aborted jobs do not report consumption of a walltime. Output data files are stored
on available Storage Elements randomly chosen from a list of reliable SEs if the close SE
is not available. This brings difficulties during data consolidation in case of temporary
unavailability of SE. We faced several cases when data were lost. In such case we can either
create a new library or users may confirm that they can use a reduced dataset. Scripts were
created to do bulk transfers either with or without FTS. These transfers were mostly needed
when an SE is declared to be decommissioned. Such cases create great problems in case of SEs
with significant amount of data, because it is difficult to find new resources where to move
affected data files.

Primary author

Jiri Chudoba (CESNET)

Presentation materials