6-8 May 2019
WCW Congress Centre
Europe/Amsterdam timezone

APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools

Not scheduled
WCW Congress Centre

WCW Congress Centre

Science Park 123 1098 XG Amsterdam


Prof. German Molto Martinez (UPVLC)


Lately, open and reproducible science has become a topic of interest. It mainly aims at solving two old problems in scientific research, non-reproducible investigation and fraud. With that purpose, Open Science promotes a new ap- proach to the scientific process based on cooperative work and new ways of diffusing knowledge by using digital technologies and new collaborative tools. Recently, many new platforms have been created to promote and facilitate the creation of Open Science to researchers [11] [4] [1] [9] [7] [8] [2] [5]. These plat- forms offer different services such as cloud infrastructure to execute a list of programming languages, article and data repositories, a set of predefined algo- rithms to process researcher’s data, etc. However, a lack of functionality stands out in these platforms related to dynamic infrastructure deployment. Some ex- periments involve significant computational effort with specific distributed in- frastructures or accelerated hardware devices, such a MPI cluster or GPGPUs, conversely. Actual Open Science platforms cannot be used to reproduce this kind of experiments without a significant effort by the researcher and advanced technical knowledge. Enter APRICOT (Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools), an extension for Jupyter [11] notebooks to automa- tize the deployment, usage and life-cycle management of infrastructure. We se- lected Jupyter among all Open Science platforms because of its flexibility, usage facility, capacity of module integration and open-source philosophy. APRICOT supports automatic deployment on multi-clouds via Jupyter javascript plugin that uses the EC3 (Elastic Cloud Compute Cluster) [3] and IM (Infrastruc- ture Manager) [10] open-source developments. Therefore, it can be used in any Jupyter notebook if the EC3 client is installed and the selected kernel allows using “magic” commands. APRICOT also provides its own Jupyter “magics” to manage the deployed infrastructure, upload and retrieve data, execute tasks etc. These ones can be used in any kernel with “magic” commands support, thus be- ing compatible with many programming languages in the Jupyter environment. Aimed at simplifying the researcher’s technical effort, the deployment plu- 1gin offers a set of predefined cluster topologies such “Batch-Cluster” or “MPI- Cluster”. These ones are configured automatically with common utilities like a queue system, shared home via NFS, a set of compilers etc. and specific ones like MPI libraries. So, researchers only need to specify their credentials (depending on the infrastructure provider), infrastructure topology, number of nodes and their characteristics (CPU number, memory, etc). Data management can be achieved via SSH and it is being integrated with OneData client [6] as external storage provider. To sum up, the benefits of this extension are the integration of specific infras- tructure deployment, management and usage for Open Science, making experi- ments that involve specific computational infrastructures reproducible. All the experiment steps and details can be documented at the same Jupyter notebook which includes infrastructure specifications, data storage, experimentation ex- ecution, results obtainment and infrastructure termination. Thus, distributing the experimentation notebook and needed data should be enough for reproduce the experiment.
Type of abstract Poster
References [1] codeocean. https://codeocean.com/, [2] DAE. http://dae.cse.lehigh.edu/DAE/, [3] EC3. http://servproject.i3m.upv.es/ec3/, [4] galaxy. https://galaxyproject.org, [5] IPOL. https://www.ipol.im/, [6] OneData. https://onedata.org/#/home, [7] runmycode.online. https://runmycode.online/, [8] runmycode.org. http://www.runmycode.org/, [9] Daniel Nüst, et al. Opening the publication process with executable research compendia. D-Lib Magazine, [10] Miguel Caballer, et al. Dynamic management of virtual infrastructures. Journal of Grid Computing, [11] Thomas Kluyver, et al. Jupyter notebooks - a publishing for- mat for reproducible computational workflows. Concurrency and Computation.

Primary authors

Prof. German Molto Martinez (UPVLC) Prof. J.Damian Segrelles Quilis (UPVLC) Vicent Gimenez Alventosa (UPVLC)

Presentation Materials

There are no materials yet.