Speaker
Conclusions
The new accounting types implemented in the LHCbDirac framework give a complete view of the historical usage of space at grid sites. Accounting plots for all types of data can be displayed: user data, real reconstructed data, simulated data, raw data. Plots can be created selecting samples of data on the basis of several metadata, providing a powerful and flexible tool for the management of storage resources.
Impact
The new accounting types for space usage were developed under request of the LHCb user community, especially by the members of the collaboration in charge of the planning of data processing and data management. This accounting service allows to see the historical usage of storage resources at all grid sites and is an extremely useful input to physicists who should take decisions about the management of storage resources at sites. The typical use case is to report about the space used by a given reprocessing and how the data is distributed over the grid sites. This information is necessary to decide whether old datasets should be removed to free up space needed for new reprocessing, or to monitor how a reprocessing is progressing and how the data are being distributed over all grid sites. Other interesting use cases will be described in the paper, to give a complete view of the impact of this new development.
Description of the Work
In this paper we describe the new functionality implemented in LHCbDirac for the accounting of space usage at grid sites for LHCb data, how it was implemented, its functionality and its impact on LHCb community. Three different types of accounting plots for space usage have been implemented: the first is a general view of the data on the basis of the name-space in LHCb central file catalogue, secondly a type of plot dedicated to users data, and finally the most complex type, for all LHCb production files, including simulated data, raw data files, and reconstructed data. The first type of accounting aims at giving a high level view of space usage, giving a historical view of how much space is used for the main categories of data: real data, Monte Carlo simulation, test and validation activities, users data. The second type, is dedicated to the user data, and displays how much space is used by each user and the grid sites where the data is. Finally, the most complex type is the accounting of all LHCb productions data. This allows to see the historical usage of space for all LHCb data, including raw data, real reconstructed data and simulated, over the storage elements of all grid sites supporting the VO. Space usage can be displayed as a function of several parameters, such as the data taking conditions of the LHC and the configuration of the LHCb detectors, the version of the software used to process the data, the event type, the file type and other relevant parameters.
Overview (For the conference guide)
DIRAC framework was developed in order to provide a complete solution for using the distributed computing resources of the LHCb experiment. DIRAC has been developed in a very generic way and with a modular architecture, that have made it suitable for serving other VOs as well, like the Belle II experiment at KEK and the ILC project. The LHCbDIRAC system is the DIRAC extension specific to the LHCb experiment, where the particular features requested by the LHCb community are implemented. LHCbDIRAC framework is split in several systems, inheriting from the corresponding DIRAC systems, providing all functionality for using a distributed computing infrastructure, among which a workload management system, a data management system, and a monitoring and accounting systems. In this paper we describe some new functionality of the LHCbDirac accounting system which provide a historical view of space usage at grid sites.