Speaker
Overview
After the first LHC phase where experiments and sites focused efforts to deliver the demanding wLGC metrics, now sites are facing a new dimension which is to consolidate and automate the computing services to reach a steady-state functioning. Starting form the ground, the installation and post-installation mechanics are one of the most important points for the computing centers to avoid initial installation and configuration problems. At Port d'Informació Científica the adopted solution is to steer all post-install and dynamic post-configuration using Puppet.
Puppet is a master entity were to easily define profile that get
propagated around the cluster, hence fulfilling the necessities of post-install configurations, after the raw os installation, and ensuring the persistence of the profile and the defined services once has been completey installed.
Description of the work
Managing hundreds of nodes is one of the challenges of any computing center. PIC is no exception. There are many tools able to cope with the problem; ranging from the very simple to the very complicated. We did a study of some of the available tools (quattor, cfengine, puppet). Our requirements included the ability to do incremental configuration (no need to bootstrap the service to make it manageable by the tool), simplicity in the description language for the configurations and in the system itself, ease of extension of the properties/capabilities of the system, a rich community for assistance and development, and open-source software.
We found in puppet a correct trade-off among simplicity and flexibility, and it was the most fitting to our requirements. Puppet approach to system management is simplistic, non-intrusive and incremental; puppet doesn't try to control every aspect of the configuration but only the ones you are interested in. Our sysadmins were able to build complex configurations in a short time due to the easy learning curve.
Impact
Puppet allows to administrate a whole site from a central service, easing a lot potential reconfiguration or speeding up disaster recovery procedures.
On the other hand, having a centralized management of grid services profiles results in a very easy scaling method adding up steered resources on the fly: push the install and let puppet do all the rest of the work until the service is ready for production.
Potentially Puppet doesn't have to be a site specific tool but a master service able to do the same work on distributed resources or sites. This advantage can be considered to centrally manage several sites that share a common middleware infrastructure, which is in line with the current strategy of middleware implementation.
Conclusions
Having an steered configuration system like puppet is a huge advantage from the administration point of view, easing the daily work of sysadmins.
Having the whole services configuration of a site under puppet provides an unvaluable confidence, ensuring a streamlined way for service deployment.
Unifying the work model is also an important feature, all services are defined using the same pattern in a common location. Therefore unbinding the constraint of service-responsible by having a common language of administration transversal to all sysadmins.