Sep 16 – 19, 2013
Meliá Castilla Convention Centre, Madrid
Europe/Madrid timezone

Integrating and testing SLURM in EMI-3 cream-CE

Sep 16, 2013, 9:00 AM
8h 30m
Meliá Castilla Convention Centre, Madrid

Meliá Castilla Convention Centre, Madrid

Speakers

Alvaro Simon (FCTSG)Mr Bruno Rodriguez (Port d'Informació Científica)Dr Carles Acosta (Port d'Informació Científica) Enol Fernandez (CSIC) Esteban Freire Garcia (FCTSG) Goncalo Borges (LIP) Joao Pina (LIP)Dr Josep Flix (Port d'Informació Científica (PIC))

Description of Work

Integration and testing of SLURM and CREAM-CE.

Printable Summary

SLURM (Simple Linux Utility for Resource Management) is a higly scalable batch system and scheduler for large and small Linux clusters. It is used in many of the TOP500 supercomputers in the world, such as Tianhe-2, Tera 100, IBM Sequoia and MareNostrum, to mention a few. SLURM is an open-source software, highly scalable to large clusters of nodes and it is integrated with its own database for accounting and fair-share management. SLURM allows the integration and use of other schedulers, such as Maui or Moab, for instance. As of today, many Grid sites are growing in computing capacity, and other extended and simple batch systems used (such as Torque) can present serious instabilities with large number of jobs. SLURM is considered by the community as a good replacement. The new EMI-3 middleware for cream-CE and Worker Nodes (WNs) support SLURM integration. For that reason, IBERGRID members have recently started a collaboration to verify and test the correct behavior of an EMI-3 cream-CE which acts as SLURM server. The initial infrastructure for testing consists in three virtual machines provided by CESGA, one as cream-CE and the other two as WNs. In order to simplify and check the SLURM installation and configuration in Grid systems, YAIM tool to configure the machines is used. The aim of the test is to certify the integration, validate the functionality, check the scalability, stability and the performance of the SLURM scheduler, as well as getting familiar with the configuration and operation of the whole system, and providing as much feedback as possible to the community. In this poster, we present the first functionality test results obtained and first conclusions derived from the testbed.

Primary author

Dr Carles Acosta (Port d'Informació Científica)

Co-authors

Alvaro Simon (FCTSG) Mr Bruno Rodriguez (Port d'Informació Científica) Enol Fernandez (CSIC) Esteban Freire Garcia (FCTSG) Goncalo Borges (LIP) Joao Pina (LIP) Dr Josep Flix (Port d'Informació Científica (PIC))

Presentation materials

There are no materials yet.