9–11 Oct 2018
Lisbon
Europe/Lisbon timezone

Addressing Energy Wall for Exascale Computing: Whole System Design implementation at CINES for Energy Efficient HPC

10 Oct 2018, 17:15
15m
Lisbon

Lisbon

ISCTE, University of Lisbon
Presentation Area 3. Computing and Virtual Research Environments Computing Services Part II

Speaker

Mr Eric BOYER (CINES (Centre Informatique National de l'Enseignement Supérieur), FRANCE)

Description

CINES has initiated the deployment of the “Whole System Design for Energy Efficient HPC” solution on its 3,5 Pflops production system Tier1 (OCCIGEN). This solution developed within the PRACE-3IP PCP (joint Pre-Commercial Procurement involving CINECA, CSC, EPCC, GENCI and JUELICH) is a the result of R&D services for improvement of the energy efficiency of HPC systems, to address the energy wall of Exascale Computing. As such PRACE PCP combines elements of conventional hardware procurement with the provision of funding for research and product development. It was setup to procure and develop highly energy efficient HPC systems available for general use, i.e. able to run real applications, and to be operated within a conventional HPC computing centre but nevertheless achieve very high total-system energy efficiency. In addition to the technical goals the PCP intended to develop the HPC vendor eco-system within the European Economic Area (EEA) and as such it is expected to result in commercially viable products. As a result, ATOS integrated in its roadmap an energy optimization oriented suite developed during PCP (BEO, BDPO, HDEEVIZ, SLURM Energy saving plugins) are part of Atos-Bull Supercomputer Suite (SCS5 R2) available since Q1 2018. While hosting one of the PRACE-3IP PCP prototypes, CINES has collaborated with EoCoE (Energy Oriented Center of Excellence) and PRACE 4IP WP7 (application enabling an optimization) to assess and provide guidance to the PCP R&D development from ATOS. CINES has setup a monitoring architecture and tools to complement fine grain monitoring by coarse grain datacenter data collection and analysis. The implementation in production environment of a “Whole System Design for Energy Efficient HPC” is a key element to build the steps, in collaboration with GENCI of a new paradigm for application and HPC efficiency, changing from time-to-solution towards energy-to-solution optimisation. The global collection of energy and resource consumption, is a key repository of application behaviour and profile for data analysis and provide guidance for upcoming procurements, such as PPI4HPC (2019/2020), CINES next Tier1 (2020) and provide input for EuroHPC platforms (2022/2023).

Summary

Energy efficiency is one of the most critical aspects for enabling further boosts of performance of future supercomputers within an acceptable contained power envelope of 20 to 30MW for upcoming Exascale systems, targeted by EuroHPC initiative for 2022/2023.
Within the PRACE-3IP, a 5 partners project association (CINECA, CSC, EPCC, JSC, GENCI) a new approach was used to enable R&D that leads to innovative solutions that improve energy efficiency at a whole system level. CINES has hosted a prototype and has collaborated with EoCoE and PRACE and has decided to implement these solutions on OCCIGEN production Tier1 system.

Type of abstract Presentation

Primary author

Mr Eric BOYER (CINES (Centre Informatique National de l'Enseignement Supérieur), FRANCE)

Presentation materials

There are no materials yet.