EGI2024

Name: EGI2024
Start: 2024-09-30T09:00:00+02:00
End: 2024-10-04T14:00:00+02:00
Location: Hilton Garden Inn, Lecce, Italy

30 September 2024 to 4 October 2024

Hilton Garden Inn, Lecce, Italy

Europe/Amsterdam timezone

Contact

events@egi.eu

The HPC+AI Cloud: flexible and performant infrastructure for HPC and AI workloads

1 Oct 2024, 17:05

20m

Barocco (Hilton Garden Inn)

Barocco

Hilton Garden Inn

Long Talk Bridging the Gap: Integrating the HPC Ecosystem

Matt Pryor (StackHPC)

In recent years, in particular with the rise of AI, the diversity of workloads that need to be supported by research infrastructures has exploded. Many of these workloads take advantage of new technologies, such as Kubernetes, that need to be run alongside the traditional workhorse of the large batch cluster. Some require access to specialist hardware, such as GPUs or network accelerators. Others, such as Trusted Research Environments, have to be executed in a secure sandbox.

Here, we show how a flexible and dynamic research computing cloud infrastructure can be achieved, without sacrificing performance, using OpenStack. By having OpenStack manage the hardware, we get access to APIs for reconfiguring that hardware, allowing the deployment of platforms to be automated with full control over the levels of isolation. Optimisations like CPU-pinning, PCI passthrough and SR-IOV allow us to take advantage of the efficiency gains from virtualisation without sacrificing performance where it matters.

The HPC+AI Cloud becomes even more powerful when combined with Azimuth, an open-source self-service portal for HPC and AI workloads. Using the Azimuth interface, users can self-service from a curated set of optimised platforms from web desktops through to Kubernetes apps such as Jupyter notebooks. Those applications are accessed securely, with SSO, via the open-source Zenith application proxy. Self-service platforms provisioned via Azimuth can co-exist with large bare-metal batch clusters on the same OpenStack cloud, allowing users to pi the environments and tools that best suit their workflow.

Topic	Needs and solutions in scientific computing: Platforms and gateway

Matt Pryor (StackHPC)

Mr John Garbutt (StackHPC)

The HPC+AI Cloud.pdf

EGI2024

Contact

The HPC+AI Cloud: flexible and performant infrastructure for HPC and AI workloads

Barocco

Hilton Garden Inn

Speaker

Description

Primary author

Co-author

Presentation materials