Grid Standards for Cluster Computing
In late July 2004, I gave a presentation at a cluster symposium (On the Use of Commodity Clusters for Large-Scale Scientific Applications 2004) I called Grid Standards and Implementations: Implications for Cluster Computing. With no shortage of grid inferences from the perspective of Cluster Computing, and a good few of my own doing, I proposed to invert this perspective in my presentation – i.e., to make inferences for Cluster Computing from the perspective of Grid Computing. As my presentation progressed, it became increasingly clear that my content wasn’t exactly resonating with the pragmatic HPC (High Performance Computing) audience.
Post-lunch haze and an absolutely brilliant presentation on the genesis of IBM BlueGene aside, I should’ve known better. For HPC stalwarts, “GRID” remains a four-letter word that raises their marketing-hype hackles. And although I perceived I was cautiously avoided for the remainder of my time at the event by this faction, there was a handful of the grid digerati who enthusiastically complimented me on an interesting and insightful presentation.
So for purposes beyond some form of post-presentation vindication, and in fact motivated by recent developments in Cluster Computing, it is timely to revisit the inferences I made. To recap, these inferences were:
- An increasing affinity for Web Services – Based on the ongoing convergence between Web Services and Grid Computing under the umbrella of the Open Grid Services Architecture (OGSA), but contingent upon a degree of downward percolation, this inference seemed reasonable if only a little vague. In retrospect, I should’ve qualified that this would impact at the infrastructural before the application level – e.g., existing models and simulations written in Fortran aren’t likely to be refactored for Web Services overnight!
- Agreements will replace policies and queues – The management of HPC workloads is replete with verbiage around policies and queues. Whereas queues are abstractions for ‘containerizing’ workload, policies formalize the rules for divvying up compute resources (supply) opposite application requirements (demand). In the closed-system context of cluster computing, all of this works very well, as the dedicated compute resources are well known to the software managing the workload. Because Grid Computing can require virtual organizations, there is an implicit, highly dynamic need to negotiate access agreements between Grid applications (consumers) and compute and other resources (providers). Agreements are a well-established focal point in the world of Grid standards (e.g., WS-Agreement, open https://forge.gridforum.org/projects/graap-wg, then “Documents” and “Current Drafts”) and implementations (e.g., The Community Scheduler Framework, CSF), and the notion has already gravitated to cluster-level workload management – e.g., goal-oriented Service Level Agreements (SLAs) is a capability provided in Platform LSF (open http://www.platform.com, then “Products” and “Platform LSF Family”).
- Continued tension between standardization and differentiation – Standards are important for a number of reasons – a handful of which I’ve itemized elsewhere. But because standards establish a Lowest Common Denominator (LCD), there is an implicit challenge issued to those that seek to provide differentiating value in the products and services they deliver opposite those same standards. And to take this a step further, innovating beyond the specifications embodied in a standard can be risky – e.g., such innovations may not be perceived to be of value, or extending standards can be viewed as openly sharing IP (intellectual property) with one’s competitors. With the notable exception of the Distributed Resource Management Application API (DRMAA), which has seen implementation in a number of cluster-level workload managers (open http://www.drmaa.org and search for “Available DRMAA Implementations”), Cluster Computing itself is effectively devoid of standards. Of course, I consider the Linux Standard Base (LSB) an important ‘reference implementation platform’, but LSB really doesn’t embody any cluster specificity – it’s distro and system focused. Interestingly in the present context, and to foreshadow recent developments in this standards context, I explicitly anticipated the importance of the Common Information Model (CIM) – on which I elaborate below.
- Increasing implementation of autonomic capabilities – Folklore has it that IBM’s entry into Grid Computing was greatly accelerated by their own research into Autonomic Computing. Autonomic Computing presages a future in which computer systems regulate themselves in much the same way our autonomic nervous system regulates and protects our bodies. Although some may be tempted to dismiss this as a ‘griddified’ revisitation of high availability, there really is a lot more involved, and I still anticipate transference from the Grid to Cluster – at both infrastructural and application levels.
With respect to these four inferences, but with deference to the notable exceptions of agreements and DRMAA, there’s been relatively little evidence for the emergence of Grid standards and implementations in Cluster Computing. That is, until now.
This sea change is emerging in the area of cluster management. Cluster management is a nebulous term – even Wikipedia avoids it! For present purposes, cluster management involves the management aspects of deploying (configuring and installing), monitoring and maintaining a Cluster Computing environment. Because this inherently embraces a broad and deep range of requirements, specific as well as general purpose solutions exist. Of the available solutions, it is Scali Manage (open http://www.scali.com then “Products” and “Scali Manage”) that is enacting the sea change through its uptake of standards and implementations in the context of cluster management.
Scali Manage is industry leading, commercially developed and supported solution for cluster management. In keeping pace with the meteoric adoption of Linux clustering in a wide range of markets, this mature solution has needed to address the increasingly demanding needs of evaluation/proof-of-concept, distributed/expansive as well as centralized/consolidated deployments. Because numerous organizations are already making their way to utilized/virtualized/grid deployments, Scali Manage has also needed to evolve to address these needs in the marketplace. As a direct consequence, the next-generation version of Scali Manage embraces:
- Web Based Enterprise Management (WBEM) – WBEM “… is a set of management and Internet standard technologies developed to unify the management of distributed computing environments …”. Under the standards auspices of the Distributed Management Task Force (DMTF), WBEM encapsulates the Common Information Model (CIM): “CIM provides a common definition of management information for systems, networks, applications and services, and allows for vendor extensions. CIM’s common definitions enable vendors to exchange semantically rich management information between systems throughout the network.” And although it isn’t explicitly a Grid standard under the auspices of the Global Grid Forum (GGF), CIM is clearly a point of leverage. Not only is Scali Manage completely aligned with WBEM and implicitly CIM-based (i.e., the configuration database), Scali has extended CIM and intends to share these extensions with the DMTF community. Industry heavyweights CISCO and Microsoft are already making use of WBEM.
- Eclipse platform and frameworks – “Eclipse is an open source community whose projects are focused on providing an extensible development platform and application frameworks for building software …”. Eclipse resonates strongly with the Service Oriented Architecture (SOA) approach and leverages Web and Web Services standards. Scali has made extensive use of Eclipse to develop a rich-client application in the form of a GUI for Scali Manage.
Use of WBEM and Eclipse in Scali Manage comprises solid evidence of the appearance of Grid standards/implementations in a Cluster Computing context. More specifically, they offer support opposite my inferences of Web Services affinities and implementations opposite standards – with differentiation elements that include extending the standard. More importantly than serving to validate my inferences, by embracing WBEM and Eclipse and key enablers, Scali Manage is rapidly addressing requirements emerging from utilized/virtualized/grid deployments of clusters.
This is just the tip of the iceberg.