Speaker
Description
Description of work
We use the Galaxy instance kindly provided by the Genomic Transcriptomic facility of Bordeaux. We already wrote a Galaxy tool to launch a pipeline producing a taxonomic inventory from NGS sequences on the Avakas cluster (Mesocentre Aquitaine). A second tool is developed to distribute the calculation repetitions (Structure software, PhyML phylogeny) to the grid using the DIRAC interware.
Development of a galaxy tool to launch jobs on the cloud could be accomplished performing the following steps:
- set up a Galaxy server for this hackaton (or use a prebuilt VM)
- configure authentication process
- write a generic script using DIRAC interware to:
. connect to a cloud resource
. instantiate virtual machine(s) (VM)
. send files and parameters specified in Galaxy interface
. run specific software on the VM with specified parameters
. get back the result files to Galaxy
- generate the xml file to declare the tool inside Galaxy
We propose to test the implementation by producing 3 Galaxy tools launching respectively:
- Phyml phylogeny, with bootstraps
- Structure bootstrapping calculation,
- Readsyst pipeline taxonomy inventory.
Wider impact and conclusions
Diversity of tools used by researchers is currently impeded (for biologists, but not only ...) on efficient HPC infrastructure by technicalities of how to distribute and launch, and related dependencies problems. It can be circumvented using dedicated virtual machines instantiated on the cloud. One can take advantage of scaling capabilities of a cloud infrastructure as well (cloud burst). Our current intent to connect the Galaxy platform to the grid and the cloud, brings directly the high computing capabilities of EGI to the final user : the researcher.
The e-VirtualBiodiversityLab will be able to access its own computing elements, the cluster of the Mesocentre Aquitaine, the EGI grid and cloud resources. All elements will be present to offer a scalability pattern for software and tools in data analysis for biodiversity. It will foster and enable the development of new tools and pipelines for a better exploration of unknown biodiversity.
URL(s) for further info
https://galaxy-pgtp.pierroton.inra.fr
http://diracgrid.org