Emblem Sub Level Top PUBLICATIONS
Archived Press Releases
Emblem Sub Level Logo Production Grid Breakthrough
Emblem Sub Level Bottom
June 7, 2002

Researchers Achieve Production Grid Breakthrough

Physics researchers have carried out the first production-quality simulated data generation on a data Grid, comprising sites at Caltech, Fermilab, the University of California-San Diego, the University of Florida, and the University of Wisconsin-Madison.

“This achievement represents an extremely challenging and important milestone in the integration of Grid middleware components within the current ‘real world’ LHC computing environment,” the researchers announced.

Doug Olson of Lawrence Berkeley National Laboratory and the Particle Physics Data Grid said it has been “decided that a worldwide Grid environment is required and will be used for the computing work of the physics experiments at the LHC,” the Large Hadron Collider at CERN in Switzerland. Technical details of the worldwide Grid are still being worked out, he said.

Globus Project co-leader Ian Foster called the work “a major achievement in terms of production Grid computing.”

The work was done by members of the U.S. Compact Muon Solenoid Collaboration (CMS) in concert with the Particle Physics Data Grid, the Grid Physics Network, and the International Virtual Data Grid Laboratory, and was funded by the U.S. Department of Energy, the National Science Foundation and the EU-DataGrid project, among others.

The deployed data Grid serves as an integration framework, with Grid middleware components brought together to form the basis for distributed CMS Monte Carlo Production (CMS MOP) and used to produce data for the global CMS physics program, the researchers said. The middleware components include Condor-G, DAGMAN, GDMP, and the Globus Toolkit packaged together in the first release of the Virtual Data Toolkit.

The CMS-MOP distributed production system employs a tier-like hierarchy in which a production manager at a Tier-1 center distributes production jobs to several remote Tier-2 sites, they said. Once generated at the Tier-2 sites, the simulated data is automatically published back to the Tier-1 center as well as replicated to selected Tier-2 sites.

“This integration exercise showed that the Grid still presents significant challenges in harnessing distributed resources,” the researchers said. Issues of data and security had to be overcome, such as how to get software and data to many remote systems and be sure that it’s there, and how to get results back.

Issues of heterogeneity and error recovery also had to be addressed, they said. “To use other sites’ resources, you need to interface with many batch systems; the Grid means more errors, more crashes, more mysterious failures,” they wrote. Unanticipated errors were handled, such as key machines crashing in the middle of a run; Grid credentials expiring in the middle of a run; jobs successfully completing but their results being lost before they got sent back; various pieces of middleware doing the unexpected; and the network going down.

“Despite these challenges, over 50,000 proton-proton collision events inside the CMS detector have been simulated using CMS-MOP and validated for use by CMS physicists,” the researchers said. Production of another 150,000 simulated events is underway.

Copyright 1993-2002 HPCwire. Redistribution of this article is forbidden by law without the expressed written consent of the publisher. For a free trial subscription to HPCwire, send e-mail to: trial@hpcwire.com