November 13, 1998
ORLANDO, FL - Supercomputing '98 -- Using the Transatlantic Metacomputing
Testbed established since June 1997 between the High Performance Computing
Center at Stuttgart University (HLRS) and the Pittsburgh Supercomputing
Center (PSC), researchers have run an application on two coupled
512-processor CRAY T3Es with performance equivalent to a single
1,024-processor machine. This is a significant advance in the emerging technology
of metacomputing, say scientists at the two centers.
This testbed, the first successful prototype for transatlantic
metacomputing -- linking two separate supercomputers to work together on the
same job -- uses high-performance research networks in the United States,
Canada and Germany to couple a 512-processor T3E at PSC with another at HLRS.
The researchers achieved a peak data transfer rate of 10 megabits per
second (Mbps) for a molecular dynamics application that simulates granular
particles. This is five times faster than the same application achieved a
year ago at Supercomputing '97, when it used the Pittsburgh-Stuttgart testbed
to simulate 1.75 particles of granular material, the largest such simulation
ever done. This year, even the sustained transfer rate (4 Mbps) is double
the peak rate a year ago, say the researchers. "Seven years ago, systems
couldn't transfer data internally among processors this fast," said Alfred
Geiger, head of HLRS. "Now we're doing it across the ocean."
While improved bandwidth is part of the milestone speedup, the biggest
part, say the researchers, is better communications processing and especially
"latency hiding." Latency is time used for processors to "shake hands" --
open communications with each other -- and by careful algorithm engineering,
to overlap communications with computation, the researchers made it
unnoticeable for certain problems. For the granular particles simulation,
communications overhead was only 5 percent, says Matthias Mueller, a
scientist at the University of Stuttgart's Institute for Computer
Applications: "This is essentially the same overhead we'd have with a
512-processor T3E."
These methods have broad applications, says Sergiu Sanielevici, PSC manager
of parallel applications. "There's a hierarchy between fast-access memory,
such as a processor cache, that has limited storage, and other slower,
higher-storage memory. This kind of optimization will benefit many other
projects, such as clusters, that must deal with the situation that not all
memory is created equal."
The metacomputing application relied on a library of communications
routines called PACX-MPI (PArallel Computer eXtension), developed by an HLRS
team led by Michael Resch, that distinguish between internal and external
communication. To reduce the latency of contacting processors on another
machine, two processors on each 512-processor T3E handle the external
communication. "This can be extended," said Resch, "from two T3Es to any
number heterogeneous systems."
A similar success was achieved by the HLRS-PSC testbed with another
application, a Navier-Stokes solver called URANUS (Upwind Relaxation
Algorithm for Nonequilibrium Flows of Stuttgart University), which simulates
reentry of a space vehicle in a wide altitude-velocity range. During SC '98,
a simulation of the European space-vehicle HERMES, ran with communications
overhead under 10 percent.
Granular materials are ubiquitous in nature, industrial processing and
everyday life, and the simulation technology tested at SC '98 has many
applications. Examples include crack propagation in concrete and storage and
shipment of foodstuffs such as grains, flour and sugar. Since early in this
century, researchers have studied these materials to improve industrial
processes, but fundamental questions remain. Scientists still lack a detailed
theoretical understanding, for instance, of how pipes carrying granular
materials become clogged or why grains of different size separate and gather
in band-like patterns.
"Large-scale computation is the only way to deepen our insight," said
Mueller. "One key problem is to understand the intermittent nature of the
complex force network that keeps granulate packings stable. Breakdowns under
load can give rise to silo failure, coffee powder spilling on the kitchen
floor and earthquakes."
The Pittsburgh Supercomputing Center is a joint effort of Carnegie Mellon
University and the University of Pittsburgh together with Westinghouse
Electric Company. It was established in1986 and is supported by several
federal agencies, the Commonwealth of Pennsylvania and private industry.
The High-Performance Computing Center Stuttgart offers supercomputing
services to academic users in Germany. It operates supercomputers together
with debis Systemhaus, Porsche and the University of Karlsruhe in the
framework of the company hww (High Performance Computing for Science and
Industry).
Copyright 1993-1999 HPCwire. Redistribution of this article is forbidden by
law without the expressed written consent of the publisher. For HPCwire
subscription information, send e-mail to sub@hpcwire.com. Tabor Griffin
Communications' HPCwire is also available at
http://tgc.com/hpcwire.html