Description of the work
The Morph framework implements a client-server model for serving parameter values to the parametric jobs that are running on the Grid Worker Nodes. Morph monitors and manages the allocated Grid resources by implementing an abstraction layer on top of the gLite WMS. It consists of three distinct parts for the division of labor: a) “morph-wrapper” running at the target worker nodes, b) the “morph-tools” for the user interface and c) the “Morph Server”.
The “morph-wrapper” is the main function of the Morph framework. It wraps the executable script inside a loop that asks the parameter value from the Morph server. When the executable has executed the parameter, it updates the server with the exit status of the task and asks for the next parameter. If all of the parameters are finished or the server does not respond, the “morph-wrapper” exits.
The “morph-tools” is a collection of scripts built on top of the gLite WMS, coupling it with the Morph Server functionality. The only input is the standard parametric “JDL” file, which is parsed, assigned a unique id, wrapped to the “morph-wrapper” and submitted for execution. The “morph-tools” can also be used by the user to monitor, cancel and resubmit the parametric jobs.
The Morph Server is a “simple” intermediate data “holder” that does not perform any operation on the jobs or on the Grid Infrastructure. The server only stores and serves the values, relationships and logs of the parametric job, in order for the morph-tools to make a better allocation and identify “problematic” jobs. All of the stored data are sent from the clients (UI and WNs) via an authenticated connection and there is no communication triggered from the server.
Several types of problems can arise during the “lifetime” of a job, due to the diversity of the software and hardware available in the Grid Infrastructure. Problems can arise from user misconfiguration, system errors and unscheduled power or network outage. The probability of having several errors when dealing with thousands of jobs on several Grid Clusters tends to be a certainty. To correct the course of the stray jobs, one must first identify the parameters which malfunctioned, manually redefine, submit and monitor a new parametric job, making the Grid “user experience” more complicated. One collateral benefit of this central management is the ability to keep a constant number of jobs running to the Grid Infrastructure instead of constantly sending jobs, occupying several job slots. This way, a huge workload is taken off of the WMS workload, making better resource allocation. From the user’s perspective, a framework that can distribute, manage and monitor a vast amount of parametric jobs with a minimum manual interference, handling parameters and jobs on little input, is an important asset in the Grid toolkit.
At this point the Morph Framework is in alpha status and provides a framework for serving parameters to parametric. Our immediate plans include integrating Morph with the VOMS authorization framework for the UI and WN modules, adding support for multi-parametric jobs and supporting different kind of parameters, such as files.
Our preliminary results and the feedback from the users are encouraging. Every single user that makes a fair amount of grid usage with parametric jobs has dealt with the described problems and could use a better implementation. Parametric jobs are of great importance for the Grid Infrastructure and should be given the proper attention, extending the support and features of the current implementation.
One of our main objectives at the Center for Scientific Computing Services at AUTH, is to enable users to take advantage of the existing HPC and Grid Infrastructures.
Every day we interact with users from various scientific disciplines who present us their needs and problems from a user oriented perspective. During this process we try to identify the common patterns that might emerge among the different user groups and we use this information in order to develop common frameworks that satisfy current and possibly future user needs.
One of those frameworks that have been born out of this process is Morph. Morph is a framework that manages parametric jobs, filling some gaps of the implementation for parametric jobs that is provided by the gLite Workload Management System. The main objective is to outsource the effort of identifying and resubmitting aborted or unsuccessful jobs, as well as a better management of the list of parameters.