Diego, Enol, Tiziana
presentation on the LOFAR Calibration Pipeline (Susana), the problem to address: 1. the pipeline to be executed in a distributed environment, the data cube can be processed in different threads with parallelism, 2. WM image allows complete control on what needs to be installed, the management of libraries and sw installation is difficult (LOFAR software needs many libraries, the use of prepackaged tools and libraries simplified this).
Several tests where conducted. Parallelization was ok, sw distribution was also well managed, the problem faced is the storage and better solution for the many datasets is needed. Input data needs to be mounted in every VM processing the data. Currently NFS is used and it is not the best approach.
Q. Is a cluster file system a solution? S. the volume needs to be shared across different cloud sites, the volume should be accessible through different sites. Problem: accessing data volume from remote sites
MW 1.PROBLEM 1. Join up computing infrastructure and the LTA data ==> open up the archive (active archive), to lower barriers in accessing the LTA data. Make it easier for the average user.
Imaging pipelines, calibration pipelines
2. PROBLEM. Use the distributed computing facilities to support external usage (Leiden, more ASTRON facilities to be procured)
3. PROBLEM. Support easier software updates
CS: LTA is based on dCache. Downloading from web browser not possible but rules could be changed
Some data is public and some data is not public
Data becomes public after 12 months ==> Coen this is not implemented in LTA
For public data users do not need to be linked to any project and no right to browse the directories, so data is not browsable
(A) review of AAI infrstructure needed for the AAI data (LTA/SURFSara)
(A) help in making data recalling from tape easier and more scalable
A download server based on http is possible. Users can download from that machine, with username/passwd provided by LOFAR. The limitation is that it is only 1 Gbps pipeline and the bandwidth is too limited
Data downloaded to UK and DE, the bottleneck is the local network, eg. the firewall (downloading multiple TBs is not feasible)
(A) evaluate a organized T2 infrastructure for LOFAR
CS: 11 PB of tape and disk, next to it 5,000-6,000 cores
LOFAR sw is installed in the cluster
10 TB of local scratch space
Lack of documentation => organization problem
MW: the scripts are not generic, they are specific to some use cases, phyton scripts
(A) ASTRON. Provide a generic pipeline/capability for reprocessing of LTA data for average user
Need to use the distributed computing facility and the software updates
Is Docker supported? LOFAR sw packaged in containers, possible in the SURFSara cloud which provides an isolated system. Security problems in the grid.
(A) Help from SURFSara/EGI in studying how to port software to Docker containers
S: how are the local clusters connected to the "t1 centres"? How is the LTA data moved from LTA to the local clusters? data has to be locally copied through scp or other mechanisms, it is difficult
Use Case 1 => we bring users to the archives (T1 sites)
Use Case 2 => use of other computing facilities (T2 sites, currently totally not organized)
LOFAR to define priorities of activities
There are minutes attached to this event.