November 28, 2001
CHICAGO, IL -- The wordplay in Robert Grossman's Terra Wide Data Mining Testbed project
title is the first hint at the scale and scope of the datasets he manages.
Tera is the mathematical prefix meaning one trillion; terra is Latin for
the earth. Applied to data transfer terms, Grossman's Terra Wide project,
launched this month at the SC conference in Denver, Colorado, is aimed at
remotely exploring globally held terabyte datasets in real time.
Grossman and his University of Illinois at Chicago (UIC) colleague Jason
Leigh accessed, correlated and then visualized data generated from a
variety of datasets, including earth science data from the National Center
for Atmospheric Research (NCAR), El Nino data from the National Oceanic and
Atmospheric Administration (NOAA) and cholera data from the World Health
Organization (WHO).
The underlying aim of the technology behind the testbed is to provide
scientists a means to data mine and correlate datasets from different
organizations to make new discoveries. "Researchers may be able to find a
correlation between global weather patterns and the spread of diseases by
correlating data from NCAR and the WHO," said Grossman.
The demonstration also showcased PC-based clusters called
TeraNodes, now
gradually being deployed throughout the world, which will be dedicated to
massive computation, data mining or visualization over national and
international high performance networks. In coming years, as optical
technology transforms networking capabilities, TeraNodes will become the
building blocks for an optically connected web of data.
The SC testbed correlated and visualized WHO and NCAR data replicated onto
the testbed. There are TeraNodes in Chicago (at UIC), Amsterdam (at SARA,
Holland's supercomputer center), Halifax (Dalhousie University), Denver
(the SC show floor), London (Imperial College of Science, Technology and
Medicine), Virginia (Virginia Tech and ACCESS DC), Michigan (Internet2),
California (UC Davis) and Pennsylvania (University of Pennsylvania).
Given the large and growing scientific and engineering data resources
available on the web, there is a growing need for an easy-to-use data web
infrastructure. DataSpace, an open-standards-based system for working with
data over the web, is Grossman's attempt to provide such an infrastructure.
"DataSpace provides a new way for scientists and engineers to work with
each others' data," said Grossman. "If organizations publish their data in
the Dataspace format, many others could potentially make use of it."
The Terra Wide Data Mining Testbed is an infrastructure built on top of
DataSpace for remote analysis, distributed data mining, and real-time
exploration of scientific, engineering, defense, business, and other
complex data. Tera mining applications are designed to exploit the
capabilities provided by emerging domestic and international optical
networks so that gigabyte and terabyte datasets can be remotely explored in
real time.
Leigh, a scientific visualization expert from UIC's Electronic
Visualization Laboratory, and Grossman, head of UIC's National Center for
Data Mining, are collaborating to develop such tera mining applications.
Their partnership is a natural extension of their research interests. Both
work with data-intensive, very-high-bandwidth applications that test even
the most advanced networks. Both need to cull specific data from massive
datasets stored in widely distributed facilities. Both are seeking a means
for researchers to accelerate scientific discovery.
The optical Terra Wide Testbed is now being built in parallel with another
UIC-managed project,
StarLightSM.
StarLight is an advanced optical
infrastructure and proving ground for network services optimized for
high-performance applications, with major funding provided by the National
Science Foundation. It is being developed by UIC's Electronic Visualization
Laboratory, the International Center for Advanced Internet Research (iCAIR)
at Northwestern University, and the Mathematics and Computer Science
Division at Argonne National Laboratory, in partnership with Canada's
CANARIE and Holland's SURFnet.
About EVL
The Electronic Visualization Laboratory at the University of Illinois at
Chicago is the nation's oldest interdisciplinary art and computer science
graduate laboratory offering degrees in electronic visualization. Since
inventing the CAVE® Virtual Reality Theater in 1991, EVL's focus has been
the development and deployment of software, hardware, networking and
communications tools in support of collaborative tele-immersive
virtual-reality applications. EVL receives significant funding from the
National Science Foundation to manage projects in support of long-term
interconnection and interoperability of advanced international networking.
About NCDM
The National Center for Data Mining at the University of Illinois at
Chicago was established in 1998 to serve as a national resource for high
performance and distributed data mining. NCDM is a co-founding member of
the Data Mining Group (DMG), which develops the Predictive Model Markup
Language (PMML) and related standards, runs two data mining testbeds (the
Terabyte Challenge and the Terra Wide Data Mining Testbed), and has an
active outreach program. NCDM is supported by the National Science
Foundation, U.S. Department of Energy, University of Illinois at Chicago,
and its industrial partners.
Contact:
Laura Wolf
Electronic Visualization Laboratory
University of Illinois at Chicago
laura@evl.uic.edu