24–26 Sept 2014
CWI Conference Centre
Europe/Amsterdam timezone

Hadoop analytics provisioning based on a virtual infrastructure

25 Sept 2014, 11:00
25m
Turingzaal (CWI Conference Centre)

Turingzaal

CWI Conference Centre

Speaker

Ivan Alvarez (FCTSG)

Description

More than a year ago CESGA started to provide a Big Data service based on Hadoop. The initial request for this type of service came from the LIA2 research group working in Gaia, an ambitious mission from the European Space Agency. The initial implementation of the service was based on physical resources that were allocated through advanced reservations in one of our supercomputer clusters. Unfortunately this lead to delays in the provisioning of the resources and the need for reconfigurations each time the Hadoop cluster was created. At that time, a devoted cluster was not a viable alternative because the demand for such service did not warranty a continuous usage resources. After evaluating different alternatives, the service was moved to our private cloud infrastructure were the provisioning could be done in a much more flexible way. The performance loss was very small for Gaia's jobs so their executions where moved to this platform where they used up to 100 nodes to run their jobs. The new platform allowed to offer a Hadoop on demand service to all our users where they could get familiar with the Hadoop ecosystem and develop their our algorithms. Different alternatives to extend the virtual infrastructure have been evaluated, including an extensive study of the suitability of FedCloud to run federated Hadoop clusters and a comparison with Amazon EC2. The results of such study are promising, showing a small degradation of performance for small to medium jobs, making the FedCloud platform suitable for development and testing purposes.

Primary author

Dr Javier Lopez Cacheiro (CESGA)

Co-authors

Carlos Fernandez (FCTSG) Ivan Alvarez (FCTSG)

Presentation materials