September 21, 2001
The NSF recently
announced a plan to link computers in four major research
centers with a comprehensive infrastructure called the TeraGrid. The
project will create the world's first multi-site computer facility, the
Distributed Terascale Facility (DTF). SDSC director Fran Berman agreed to
answer some questions for HPCwire concerning the significance of the DTF
project both for SDSC and for the United States' computing infrastructure
as a whole.
HPCwire: How long has the DTF project been in development? How and by whom
was the plan developed?
BERMAN: The DTF project is based on an emerging direction that has come
from the scientific community and which is nicely represented by NSF's
cyberinfrastructure. The principals at SDSC, NCSA, Caltech and Argonne
have all been working for many years together putting forth this vision in
different venues and TeraGrid gave us the opportunity to join the PACI
programs for this extraordinary collaboration. When we began to put
together the proposal, I had just started as Director of SDSC and NPACI
and my counterpart from NCSA and long-time colleague, Dan Reed, was
instrumental in making the project a real partnership.
HPCwire: Are the TeraGrid capabilities dedicated solely toward large-scale
research initiatives? What will the TeraGrid mean to the "average"
scientist who uses high-performance computing?
HPCwire: Please elaborate on the announcement that "SDSC will lead the
TeraGrid data and knowledge management effort." What projects will
constitute its prime KM focus? Which industrial partners will be
cooperating? What will be the most concrete long-term benefits?
BERMAN: The next decade in computation is
emerging as the "Data Decade."
One of the critical trends from the last decade has been the immense ammount
of data from sensors, instruments, experiments, etc. that is now available
for large-scale applications and is fundamental for the next generation of
scientific discoveries. TeraGrid will provide an aggregate capability of
over 0.5 petabytes of on-line disk and SDSC's node will be configured to
be the most effective data-handling platform anywhere. We will also focus
on developing fundamental and advanced data services for the TeraGrid that
will enable application developers and users to leverage the full
potential of TeraGrid's resources
In particular, SDSC's node of the TeraGrid will include a 4-teraflops
cluster with 2 terabytes of memory and 225 terabytes of disk storage to
support the most data-intensive applications and allow researchers to use
NPACI-developed data grid middleware to manage and analyze the
largest-scale data sets, ranging from astronomy sky surveys, brain imaging
data, and collections of biological structure and function data.
In addition to IBM, Intel, and Qwest, SDSC will be working with Sun
Microsystems to deploy a next-generation Sun server in a data-handling
environment that will support a thousand transactions per second where
each transaction may require moving gigabytes of data.
HPCwire: Is the DTF itself significantly scalable? To what extent? Are
there currently plans to add centers to the DTF?
BERMAN: The TeraGrid will initially be deployed at four sites: SDSC,
NCSA, Caltech and Argonne National Lab. The plan is to ensure that the
software works at a production level and then to build TeraGrid out. The DTF was
proposed as the cornerstone of a National Grid effort, so it will be important
to be able to add nodes and capabilities in a smooth and effective way.
The scale and success of a National Grid is critical for the science
community and we will need to ensure that TeraGrid is usable. Members of PACI
partnerships and many other sites have expressed interest in becoming
nodes on the TeraGrid and this indicates how pervasive the need for a
National Grid effort is.
HPCwire: In terms of both the computing systems being integrated and the
optical network itself, how much existing hardware and technology is being
used and how much is being built from the ground up?
BERMAN: Aside from existing clusters at NCSA which will be integrated into
the hardware funded by the NSF award, the TeraGrid will be primarily new
hardware. Our intention is to make every effort to use hardware and software that is
or will be industry-standard, off-the-shelf, and open-source. The TeraGrid
compute clusters will use Intel processors and run the open-source Linux
operating system. The Storage-Area Network at SDSC will be built from
off-the-shelf components. The middleware will include Globus and the SDSC
Storage Resource Broker, both of which are deployed at many sites
worldwide and have become the de facto standards in grid computing. And of
course, the many other software components developed by TeraGrid partners
-- for example, schedulers and accounting systems -- will be made
available to the community.
As for networking, the initial 40-gigabit-per-second backbone represents
leading-edge technology, but it's only a matter of time before the
technology becomes widely deployed. The TeraGrid network will connect to
the global research community through Abilene and STAR TAP and to the
California and Illinois research communities via CalRen-2 and I-WIRE.
HPCwire: The TeraGrid will use Linux clusters joined by a cross-country
network backbone. What principal measures will be implemented to manage
and monitor the Linux configuration at various sites?
BERMAN: Both NPACI and the Alliance have considerable experience in
configuration management. The NPACI Rocks clustering toolkit, developed by
SDSC's Phil Papadopoulos and UCB's David Culler, provides a facility to
monitor, manage, update and deploy scalable high-performance Linux clusters in
minutes. Rocks is open source, already available, and currently used by
GriPhyN, Compaq, universities, and over 2 dozen other venues for Linux cluster
configuration management. NCSA's recently-announced "in-a-box" software
packaging initiatives are further examples of configuration tools that can
form components of an overall deployment strategy. In addition, SDSC and
NCSA are active participants in the proposed NSF GRIDS Center (led by PACI
partners and PIs Kesselman and Foster), which will be leading the community in
overall grid software stack integration and packaging. The TeraGrid team
understands how important it will be make the software core available to the larger
scientific community through these and other technology transfer vehicles.
HPCwire: Judging by the news releases, strategic administration of the
TeraGrid is as distributed as its resources. How will critical operational
policy directions be determined, and what is your role in that process?
BERMAN: TeraGrid will be administered by a distributed TeraGrid Operations
Center (TOC) which will be comprised of staff at SDSC, NCSA, Caltech and
Argonne. The TOC staff will maintain and administer the TeraGrid,
establish operational policies, and provide 24x7 support. All of the
TeraGrid principals will collaborate to coordinate the project. I will
serve as Chair of the TeraGrid Executive Committee, consisting of all the
TeraGrid PIs and co-PIs. This group will work as an ensemble to help
ensure the implementation of the vision we set out for TeraGrid.
HPCwire: Ruzena Bajcsy, NSF assistant director for Computer and
Information Science and Engineering, has stated that "the DTF can lead the
way toward a
ubiquitous 'Cyber-Infrastructure'..." Do you agree that this project is
the first step toward the development of such an infrastructure? What is
the next step? Please describe your vision of a "ubiquitous
Cyber-Infrastructure"?
BERMAN: I do agree. NSF's cyberinfrastructure recognizes the critical
need for a sustainable national infrastructure that combines computing,
communication and storage technologies into an extensible software
substrate fundamental to advances in science and technology. The TeraGrid
forms a critical foundation for this infrastructure.
The next step involves growing out the TeraGrid into a true National Grid.
This will involve a serious commitment to a sustainable and persistent human
infrastructure required to ensure the smooth operation, and development of
services and policies required to make the a national grid infrastructure
operational and truly usable by the science and engineering community.
I believe that the ultimate target for the next decade is not a TeraGrid
but a "PetaGrid" -- which adds to the TeraGrid additional grids (e.g. the
IPG, the
DOE Science Grid, the EU Grid, etc.) as well as the emerging
infrastructure of low-level devices (sensornets, PDAs, wireless networks,
etc.). The PetaGrid
will enable us to go from sensor to supercomputer and will bring forward a
new generation of applications including individualized medicine, real-time
disaster response applications, etc. We are building such a "PetaGrid"
prototype between SDSC and Cal-IT2 at UCSD. Our TeraGrid efforts will be a
critical part of this vision.
HPCwire: Does the DTF, in fact, constitute a de facto push by the NSF
toward virtual unification of SDSC and NCSA?
BERMAN: Both PACI partnerships pursue a common goal -- deploying an
infrastructure to meet the ever increasing demands of the national
academic community for high-end information technology resources. To reach
this goal, each partnership focuses on unique development issues and
application areas, which maximizes the impact of the PACI program as a
whole. Simultaneously, however, NPACI and the Alliance collaborate to
ensure that, in the end, the nation will have a unified, robust, and
scalable infrastructure. TeraGrid provides an opportunity for a partnering
of partnerships where each PACI partnership can play a critical role and
lead in complementary areas.
HPCwire: How would you characterize your leadership of SDSC? How does it
differ from that of your predecessor, Sid Karin? What are your greatest
challenges at this time, and how are you dealing with them?
BERMAN: I've been at SDSC/NPACI a little over 6 months and it has been an
exciting time for all of us. My leadership style is team-oriented and the
whole center has been involved in a visioning and strategic planning
process that is almost complete now. Sid has been a terrific "Director
Emeritus" and has been very helpful to me. My backround is more
Grid-oriented than Sid's and I think I bring that focus to the center. In
addition, I have worked for many years with multi-disciplinary application
teams and am interested in reinforcing SDSC's user-oriented focus.
It's tempting to approach this job as a "super-PI" rather than a Director,
and one of my biggest challenges has been to approach things as a Director
and to work effectively with a large-scale management infrastructure. Time
management is also an immense challenge as we are involved in a huge
number of exciting projects. I'm having a great time though and have
incredible admiration and respect for the outstanding staff and
researchers at SDSC and NPACI.
Copyright 1993-2001 HPCwire. Redistribution of this article is forbidden by
law without the expressed written consent of the publisher. For HPCwire
subscription information send e-mail to sub@hpcwire.com. Tabor Griffin
Communications' HPCwire is also available at
http://www.tgc.com/hpcwire.html