Speaker
Description
Description of work
Total CDF collision and simulation data amount to about 10 PB. These samples are currently stored on T10K technology tapes at Fermilab and we plan to copy to CNAF all raw and ntuple data (about 4 PB) before the start of LHC data taking in 2015. To meet this tight constraint we setup a dedicated system able to transfer data at 5 Gb/s rate, and
copy it to CNAF tape system automatically updating the FNAL database. The copy is driven by CNAF using the CDF SAM data handling system: upon a request from CNAF, data are retrieved from the FNAL tape system and copied via gridftp to CNAF, where they are automatically uploaded to tape. The data storage layout consists of a pool of disks managed
by GPFS, a tape library infrastructure for the archive back-end and an integration system to transfer data from disk to tape and vice versa.
The CNAF storage solution is GEMSS, an integration of GPFS, TSM and StoRM, which is completely transparent to CDF data handling system. As
far as data analysis is concerned, CNAF already offers a set of services to analyse CDF data. Data can be accessed via SAM and stored on a dedicated cache. Users can submit their analysis jobs to LCG via a dedicated portal. CDF analysis code is accessible via AFS. All these services are replicas of CDF services at Fermilab, installed as
virtual machines on SL5 and SL6 operating systems. For the long term future, running CDF legacy code requires addressing several issues, like availability of suitable hardware resources, software maintenance and handling of computer and network security. Services used to access
CDF data be eventually migrated to a dynamic virtual infrastructure. We are implementing this infrastructure so that CDF services can be
instantiated on-demand on pre-packaged virtual machines (VMs) in a controlled environment, where in- and out-bound access to these services and connection to storage data is administratively
controlled.
Wider impact and conclusions
During the implementation of this project we are facing several issues typical of data preservation, and sharing experiences with other
experiments and laboratories, as Babar at SLAC, Desy and of course Fermilab. The project is being developed within the DPHEP collaboration. Data maintenance, validation systems, virtualization
techniques to run CDF legacy software are key areas for our project where we aim at solutions highly sustainable in the long term future, as much flexible as possible and easy to be adapted to other experiments. The project described in this contribution is the first INFN supported
project on data preservation. It will serve as a prototype for other experiments – inside and outside HEP - which are currently storing
their data at CNAF computing center.