Speaker
Overview
CMS computing needs reliable, stable and fast connections among multi-tiered computing infrastructures. PhEDEx provides a data management layer composed of a series of collaborating agents, which manage data replication at each distributed site. It uses the File Transfer Services (FTS), a low level data movement service responsible for moving sets of files from one site to another, while allowing participating sites to control the network resource usage. FTS servers are provided by Tier-0 and Tier-1 centers, and need to be setup according to Grid site’s policies, including all the virtual organizations making use of the Grid resources at the site, and properly dimensioned to satisfy all the requirements for them. Managing the service efficiently needs good knowledge of the CMS needs for all kind of transfer workflows we expect to be handled, and the sharing and interference with other Virtual Organizations using the same FTS transfer managers.
Impact
This global study by collecting statistics on each individual transfer for the whole CMS distributed fabric was never done before. In fact, we do not know on any other VO that has conducted an study like this before.
The impact is the improvement of data transfers for CMS by either solving FTS channel bottlenecks or identifying configuration problems at the sites, or even on individual transfer links.
As an example, the study has already been useful to propose a policy to setup the FTS at PIC Tier-1 for PIC->Tier-2 transfers. The impact of the change was the increase by a factor 2 of the overall PIC->Tier-2 transfers, as well as a clear improvement on the data transfer qualities, by means of implementing dedicated FTS cloud channels for low and high throughput transfer connections, based on results collected by this developed tool.
By the time of the conference, these results will be daily available in a Webpage containing all the relevant information and plots, at disposal of the expert operation teams as well as of the sites administrators.
Description of the work
This contribution deals with a complete revision of all FTS servers used by CMS, customizing the topologies and improving their setup in order to keep CMS transferring data to the desired levels in a reliable and robust way.
We use the FTS Monitor, a web-based monitoring system developed at the CC-IN2P3 Tier-1, providing a graphical view of the FTS activity. This service retrieves data directly from the FTS
backend database to generate summary statistics and to provide detailed reports about transfer activities. FTS Monitor web pages display channel conguration, statistics about transfers in
the last 14 days on each channel, and detailed information on all jobs submitted in the last 24 hours, including the status and throughput of each individual transfer.
Each transfer detail is published in machine-readable XML format, which we parse daily the FTS Monitors in each CMS Tier-1 to keep the history of all transfers and important values, such as transfer rates per file and per stream, SRM response times, FTS channel congestions, etc... A wealth of information that is extremely useful to spot issues and debug problems.
Conclusions
With the help of such statistics the central team of distributed data transfers of CMS is running a campaign to spot out issues and address them to the sites. Regular usage in operations will, of course, give more feedback on which are the most relevant statistics to gather. Some new statistics are already planned to be included as well.
At the moment FTS Monitor Parser is gathering data only for the CMS VO, but in the future it can be opened to other VOs. In fact, we would like other VOs to be engaged with us in a global WLCG data transfers study, and we are collaborating with FTS Monitor developers, so more information is published.