The adoption of the HTTP/XRootD protocols for a new data caching architecture in WLCG experiments
The World Wide Computing Grid (WLCG) Infrastructure has been designed to handle the massive amount of the data produced by the Large Hadron Collider (LHC) experiments at CERN. The main LHC experiments are Atlas, CMS, Alice and LHCb. They are aimed at studying the fundamental matter building blocks and their related computing models drove the design of WLCG. The WLCG is a highly distributed infrastructure of regional data centers. They are organized in Tiers, where their classification depends on the level of service provided and the CPU, disk and tape storage resources pledged. In particular, the WLCG infrastructure is composed by the one Tier 0(Cern), 12 Tier 1s plus many other Tier 2 and Tier 3 sites. The WLCG resources amount to around 800K cores, 600 PB of disk storage and 400 PB of tape storage,. The latter being provided by Tier 0 and Tier 1 sites only. The data flow collected from the experiment detectors are transferred to the Tier 0 and data center. They are then distributed among at the other Tiers data centers for custodial copies (Tier 1s) and for processing and user analysis. The LHC operations are scheduled in data taking periods, which are in general interleaved by shutdown periods during which maintenance and upgrades of the collider or the experiment detectors are performed. The current data taking period is "Run 2", while "Run 3" and "Run 4" have been commissioned for 2021 and 2026 respectively. During the shutdown between Run 2 and Run 3 experiment detectors will be upgraded while between Run 3 and Run 4 a major upgrade of the collider facility will be performed as well. Such upgrade will enable an higher data rate production which up to date is not possible to handle with the current WLCG design, even considering technology improvements and yearly budget. Within the H2020 eXtreme-DataCloud (XDC) project, several approaches are under investigation to optimize the resources usage for storage management. In particular, the concept of storage federation is gaining more and more interest because it can contribute to an overall storage consolidation reducing the operating costs as well as the replication needs ("data lakes"). In this context caching technologies play a central role. Two caching approaches for data management based on XrootD and HTTP protocols are described and presented. The HTTP protocol is currently considered as one of the best choices for data management on geographically distributed environments for its wide adoption also outside research (in CDNs - Content Delivery Networks) and its easy implementation with off-the-shelf software. A caching layer on top of storage management middleware has been investigated by using nginx, with an additional module to handle x509 certificates and VOMS extensions (VOMS proxies). A caching system based on XRootD technology provides the advantage of being seamlessly integrated with the computing models of the WLCG experiments. In the context of this activity XRootD proxy services (i.e. XCache) will be used to implement the system, as well as to benchmark the developed solutions.