Speaker
Summary
We report on our experiences of recent infrastructure upheaval and the corrective work that was carried out. We discuss the results of our investigation on building a custom grid facility from the ground up, with particular focus on power, heat and loading issues. We discuss the difficulties in retro-fitting an existing facility to allow for the efficient operational management of these areas.
Impact
The impact of this talk is the construction of operationally resilient grid clusters within an academic context. Although data centres around the world are growing in number (and size) we felt that it would be useful and appropriate to consider the problem from an individual cluster perspective. The construction of grid clusters is often, for various reasons, somewhat spread over time rather than being based on a major initial phase. In this context the problem of expansion of infrastructure becomes a critical issue which requires careful examination to mitigate key critical failures.
The content of this material will be of interest to colleagues looking at similar issues and those that are considering a new cluster. The material will also be of interest to users to give them a greater understanding of the features of a large scale infrastructure for a grid cluster, particularly in the context of how this might affect their work and local facility.
Description
The Scotgrid Glasgow Grid Cluster is split over 2 machine rooms in the Kelvin Building in Glasgow University, which is between 50 and 100 years old. In common with similar buildings, the machine rooms have over time been repurposed from other uses (in particular, our upper room was originally a mainframe room and the lower room was a laser research lab).
As the size of the cluster has grown, there have been a number of A/C, loading and power upgrades. Over the past year we have suffered a number of issues with our power and A/C load which have revealed some areas for improvement. In the process of upgrading these facilities we have spent some time investigating how we would best build a grid cluster from the ground up, specifically from the perspective of infrastructure including power and air conditioning.
We consider any differences between this approach and one which might be used for general purpose or shared facilities.