Teraflow Testbed: High Performance Flows for Large Distributed Data Archives

The TeraFlow Project is developing data mining middleware to transport, explore and mine high-volume data flows. The Teraflow Project supports the development of several tools and applications, including UDT for high volume data transport, SOAP* for high-performance web services, and applications in several domains including astronomy, bioinformatics, and sensor networks, built over UDT, SOAP*, and related tools.
Part of the project includes the operation of the Teraflow Testbed, an international application testbed for exploring, analyzing, integrating and detecting changes in massive and distributed data over wide-area high-performance networks. The Teraflow Testbed has nodes in Chicago, Kingston, Amsterdam, Geneva, Daejeon, and Tokyo connected by 1Gbps and 10Gbps wide-area networks. The Teraflow Testbed is currently used to distribute the Sloan Digital Sky Survey data to researchers worldwide. It is also used in experiments to detect changes in high volume data flows.
The Teraflow Testbed 2, introduced at SC 2006, will soon be extended from Chicago to UCSD/Calit2 over CAVEwave. Teraflow Testbed 2 will support persistent data services for moving large scientific data sets (Sector); persistent data services for real-time analysis of distributed streaming data (Angle); and, next-generation distributed storage, data, and integration services. It will have several dedicated paths using fiber from NLR, as well as several shared optical paths to other sites.
Using a subset of this Testbed (illustrated here), NCDM won the SC06 Bandwidth Challenge. Its entry, “Transporting Sloan Digital Sky Survey Data using SECTOR,” sustained a disk-to-disk data-transfer rate of 8Gbps over a shared 10Gbps routed link between SC’06 (Tampa), UIC and StarLight, with a peak rate of 9.18Gbps. StarLight network engineers greatly assisted.
* Supported by NSF OCI-0430781 for the period October 1, 2004 – September 30, 2007, principal investigators: Robert Grossman (University of Illinois at Chicago) and Alex Szalay (Johns Hopkins University).

URL:

www.ncdm.uic.edu
www.teraflowtestbed.net

Collaborators:

USA:
University of Illinois at Chicago, National Center for Data Mining
Johns Hopkins University
University of California, San Diego
NASA Goddard Space Flight Center

Australia:
University of Melbourne

China:
Chinese Academy of Sciences (CAS), Computer Network Information Center
National Astronomical Observatories

Germany:
Max-Planck-Institut für Plasmaphysik, Garching Computing Centre

Japan:
University of Tokyo, Institute for Cosmic Ray Research

Republic of Korea:
Korea Astronomy and Space Science Institute
Korea Institute of Science and Technology Information

Netherlands:
SARA Computing and Networking Services
University of Amsterdam

With support from StarLight (US); TransPAC2 (US); JGN2 (Japan); KREONet2 (Korea)