STAR TAP News

ATM WANs: Cornering the Market on Wide Area Data
By Allen Robel, Doug Pearson and Steven Wallace

April 19, 1999

The name of the game is connectivity. And for high-performance networks that increase available bandwidth without breaking the bank, ATM has long been recognized as a solid choice. The technology is built on the promise of QoS (Quality of Service) and support for voice, video and data - and, when used in the right environment, ATM has delivered. Not only have large corporations bought into ATM for enterprise solutions, but in 1995, the National Science Foundation sponsored the very-high-speed Backbone Network Service (vBNS) to interconnect the U.S. research and education communities. Implemented and operated by MCI, vBNS has successfully provided IPv4 transport services over an OC-12 ATM backbone connecting 11 POP (points of presence) and serving 71 site connections.

Recently, the NSF raised the stakes. In 1997, it issued a request for solutions that could extend the interconnectivity of vBNS over the Internet to international research and education networks, such as APAN (Asian Pacific Advanced Network), whose members include Australia, Hong Kong, Indonesia, Japan, Korea, Malaysia, Singapore and Thailand (www.apan.net). In October 1998, the NSF formally approved a proposal from Indiana University (see “Network Operations Organizational Structure”, below). Work on that trans-Pacific link began many months in advance of that approval, however, and is now largely completed.

As members of an Indiana University team that helped design and build that network, we joined forces with groups from Ameritech Advanced Data Services, Argonne National Labs, APAN, AT&T, Australian National University, Japan Science and Technology Corp., Kokusai Denshin Denwa (KDD), Korea Telecom, Korea Advanced Institute for Science and Technology, and the National University of Singapore to construct the ATM WAN that extended from Chicago to Tokyo. And while it might seem that building a global network should be as simple as connecting switches and routers over the Internet, building the actual ATM WAN connection was fraught with obstacles, not the least of which was overcoming differences of time, distance and language.

For example, teams from AT&T and KDD completed the first round of circuit testing in July and, after proclaiming the link clean, handed the process over to us. Much to our dismay, we experienced extreme packet loss during ping tests between the routers at each end of the link. We reasoned that the problem must lie in our equipment and/or its configuration.

We checked whether the routers were shaping traffic more loosely than the telco service expected, forcing the telco traffic-policing mechanism to drop cells. This was not the culprit. And we ruled out the possibility that IP-over-ATM encapsulation was set differently at the ends of the link (aal5snap versus vcmux), but we were getting closer. We eventually discovered that payload scrambling was enabled on one side of the local DS-3 link and disabled on the other side.

However, despite resolving this problem, we still encountered a 1 percent to 2 percent packet loss. Using router-to-router pings, we narrowed the packet loss to the vBNS router that interconnects the trans-Pacific network infrastructure and Indiana University. The university was connected via a VC (Virtual Circuit) on one interface while the trans-Pacific network was similarly connected but to a different interface. When MCI moved the connections to the same interface, the problem evaporated, but as of this writing we were still trying to figure out why the old configuration didn’t work.

Since many of the challenges we faced can occur during construction of any ATM WAN, we outline them here and discuss the solutions we developed while building the connection, which we call a Trans-Pacific Advanced Connection (see “Bonus Points: The ABCs of TransPAC”, below.

The Object of the Game
We based our TransPAC network on a 35-Mbps VBR-nrt ATM service provisioned as a single PVC (Permanent Virtual Circuit), extending from the ATM-based exchange point in Chicago, called Science Technology and Research Transit Access Point, STAR TAP (see “Bonus Points: STAR TAP”, below) to the Tokyo APAN XP (exchange point). Additional bandwidth should become available once carriers upgrade the trans-Pacific infrastructure. Routers in Chicago and Tokyo provide Layer 3 IP services.

Several factors played into our decision to use ATM as opposed to, say, DS-3 service. First, we were working with a limited budget and the ATM VBR (variable bit rate) service was much less expensive than equivalent dedicated bandwidth. Second, we needed to be able to carve out several discreet pipes. For example, researchers on both sides of the Pacific are interested in collaborating on wide-area native IPv6 trials, but tunneling IPv6 in IPv4 - the primary transport service offered over TransPAC - was not an option, since this would require end systems to run IPv4 and IPv6 stacks and would result in encapsulation overhead. Using ATM allowed us to provision a PVC to connect IPv6 routers on either side of the link.

Setting Up the Board
Most applications running over the TransPAC link will use IP as their transport layer. Although IP provides universal connectivity among heterogeneous systems and interconnecting networks, using IP over a long-distance link, such as ours, and ensuring that packet forwarding conforms to the vBNS acceptable-use policy present a number of interesting challenges.

For example, the speed of light tends to slow IP connections. The distance between Chicago and Tokyo is about 10,000 kilometers, or 10 million meters. The speed of light is about 300 million meters per second, so the delay between Chicago and Tokyo (commonly known as propagation delay) is about 10,000,000/300,000,000, or 33 milliseconds. Other factors compound the delay: Light travels through fiber at only 80 percent of its velocity through the air; router queuing delay and various other delays in the carrier’s cloud must be considered. Taken together, delay becomes significant - in our case, it measures 100 ms one way. If a network node in Chicago starts sending packets to Tokyo at the rate of 35 Mbps (the maximum rate supported by TransPAC), the Chicago node would transmit 350,000 bytes before the first packet arrived in Tokyo. The number of bytes in transit on a link is referred to as the bandwidth-delay product.

TCP, the mainstay for providing error-free reliable communications over the Internet, is adversely affected by high bandwidth-delay product links. Through the efforts of Van Jacobson and others, TCP has developed a sophisticated pacing mechanism that regulates the rate at which data is transmitted over a given application connection. TCP attempts to adapt its transmission rate to available capacity, both as the connection starts its initial transmission and in the presence of network congestion as detected by packet loss. Communications links with a high bandwidth-delay product, like TransPAC, hamper TCP’s feedback mechanism and typically impinge on an application’s ability to ramp up transmission rates on startup or recover from congestion (seen as packet loss). Although most vendors have implemented modern TCP options that improve performance in high-bandwidth-delay networks, we could not assume that all end systems using this network had been upgraded, and thus discarded this as a possible workaround.

To combat the side effects of high bandwidth-delay product and minimize packet loss, TransPAC includes a Layer 3 buffering device (a router) between the TransPAC network and the STAR TAP ATM switch. TransPAC also provides support for end users who need to understand these issues and tune their applications and workstations to perform better over similar long-distance links.

Rules Are Rules
There are two paths between APAN and the United States - one is through the commercial Internet, the other is over TransPAC. Some APAN institutions do not meet TransPAC’s AUP (acceptable-use policy), which oversees the traffic that transits the link, and any U.S.-bound traffic from these sites should take the commercial Internet path. AUP-compliant APAN sites should use the TransPAC link. Sounds easy, but there’s a rub: While router vendors seem to have perfected the art of routing packets fast, they still have work to do in the area of efficiently routing packets in a manner conforming to desired administrative policies.

Because routers typically forward packets solely according to destination address, they cannot distinguish AUP-compliant traffic from other traffic. Under destination-based routing, all sites at one end of the link would take the shortest path to the other - which, in our case, would inevitably be over TransPAC, since its metric is shorter than that for the commercial Internet. When there are multiple paths from point “A” to point “B,” destination-based IP routing has a hard time doing the “right thing,” from a policy standpoint. This type of routing dilemma is commonly called the fish-routing problem (see “Bonus Points: A Fish Called WANda”, below).

Fortunately, Cisco has a flexible set of knobs (user-controllable parameters), which it dubs route-maps, that allow the router to be configured to route packets based on both destination and source address, a process known as explicit routing. To implement this function, a table is constructed that identifies all the source-destination pairs that require special handling. Armed with the table, the router can make the appropriate routing decision.

But though route-maps provide a workable remedy, they’re rather slow, and the longer the list of source-destination pairs, the slower the route-map. Suppose you have 50 Asian sites connecting with 100 vBNS sites. You’d end up with a route-map list of 5,000 lines. That’s too long.

We pondered several possible solutions, ranging from dynamically updating the list of route-maps so that only currently in-use applications would be represented, to monitoring the link to determine the most-used route-map source-destination pairs and moving them to the top of the list (testing revealed that the router searched the list sequentially). We finally settled on a different approach: If we could keep the route-map list short, the problem would be manageable. Rather than define source-destination pairs, we used the route-map to identify packets coming from institutions permitted to use TransPAC, regardless of their destination. This trimmed the list to just one line per TransPAC authorized institution. Packets from these authorized institutions were routed to a second router. The second router had a different forwarding view of the network, one that used TransPAC for all packets destined for STAR TAP-connected sites in the United States.

Equipment and Game Pieces
The high-performance networks interconnected by an ATM WAN link frequently operate at bandwidths that exceed the bandwidth of the WAN link itself. Hence, there is a real risk that the link will become saturated. To avoid this, during threatening periods, certain applications should receive priority over others. Given the lack of maturity (and, more important, the lack of user-friendliness) of client-side RSVP implementations, as well as a general sense that transit links such as TransPAC are inappropriate places to deploy stateful reservation mechanisms, we plan to implement a differentiated services model for TransPAC soon.

In accordance with statements of direction from several vendors, TransPAC plans to implement four service classes. We are testing weighted fair queuing (WFQ) to implement this priority scheme. WFQ examines the priority bits in the IP header to route packets into several queues. These queues are then serviced in a round-robin fashion, with higher-priority queues receiving a longer service interval. Depending on the implementation, fair-queuing may be in effect within each of these queues. So, for example, if three 10-Mbps streams traverse the high-priority queue, they would each receive 33 percent of the bandwidth that’s available to that queue.

All this is well and good, and sounds like it’s exactly what the doctor ordered. But while several vendors offer WFQ on campus routers with 100BASE-X interfaces, no vendors that we have talked with as of this writing offer it on routers that support both BGP and ATM interfaces. This is somewhat of a paradox since a popular model within the IETF uses differentiated service in the core of wide-area networks (where BGP and ATM are currently mainstays) and RSVP at the end points. Regardless, all the vendors we have spoken with assured us that this feature is on their product road maps. Our sense is that this functionality should be available sometime this quarter.

While not immediately applicable to TransPAC, we tested one 100BASE-X WFQ implementation (Cisco’s Catalyst 8510 Layer 3 switch) to determine if there are some generalizations that we could make to characterize WFQ’s behavior. Using a Netcom Systems Smartbits SMB-2000 traffic generator / analyzer, we determined empirically that the worst-case bandwidth available to a given stream can be determined by the following generalized equality: BWS=(WS*L)/(WT*MS) - where, BWS equals worst-case stream bandwidth; WS equals weighted round-robin weight for stream; L equals line bandwidth; WT equals total weighted round-robin weight; and MS equals stream multiplier.

So, for example, the 8510 uses four queues per interface, which, by default, receive weights of 1, 2, 4 and 8, respectively. Queue weights are adjustable, but they must add up to 15. Packets are mapped to queues based on their IP precedence bits according to the following table:

PrecedenceQueue Weight0-112-324-546-78
For two packet streams with an IP precedence of 5: WS=4, L=107, WT=15, and MS=2. This gives a worst-case bandwidth for each stream of: (4*107)/(15*2)=~13.3 Mbps.

It is very important that researchers who use the TransPAC link are aware of the resources available for their applications, so such a generalized equality will come in handy in the future.

In lieu of per-VC WFQ, we are left with another mechanism that was really designed as a tool to help avoid congestion. WRED (Weighted Random Early Detection) provides a means through which traffic can be selectively dropped. This is helpful if your environment consists of traffic, like TCP, that responds when packets are dropped.

The “weighted random” part of WRED indicates that packets are discarded from random flows based on IP precedence bits. “Early detection” means that this selection is done well in advance of buffer exhaustion. The idea is that rather than allowing buffers to fill completely and then be forced to drop all traffic non-selectively, you begin to drop selectively before buffers are full. The advantage is that many traffic flows are not simultaneously affected and, in the case of TCP, do not all enter their slow start phases and begin to ramp up at the same time - an effect called “global synchronization.” In environments where RED is not implemented, it is common to see waves of congestion as end systems slow down and speed up their transmissions in unison. WRED is not a substitute for WFQ, however; indeed, the two should complement each other. But given our current options, we feel WRED is better than nothing.

Bonus Points: STAR TAP
One piece of the High-Performance International Internet Service (HPIIS) architecture as envisioned by the National Science Foundation is a Layer 2 “meet me” point at which research and education (R&E) connections, such as Indiana University’s Trans-Pacific Advanced Connection (TransPAC), enter the United States. The Science Technology and Research Transit Access Point (STAR TAP) is an ATM-based exchange point in Chicago that serves this purpose (www.startap.net).

Although the primary purpose of TransPAC is to interconnect the Asian-Pacific Advanced Network (APAN) and the very-high-speed Backbone Network Service (BNS), this connection also gives APAN access to the many other high-performance R&E networks that peer at the STAR TAP. Some of these networks are CANARIE (Canada), NREN (NASA) and ESnet (U.S. Department of Energy), as well as other HPIIS networks soon to arrive, such as MirNET (Russia).

Bonus Points: A Fish Called WANda

The fish-routing problem (named after the diagram typically used to represent it) results when two or more networks with different routing policies aggregate at a single router. This typically occurs when networks operating under contracts with different ISPs meet at a GigaPOP (a high-speed aggregation point).

In the example shown here, Site A has contracted with ISP 1 and Site B has contracted with ISP 2 for commodity Internet services. Because the GigaPOP’s router will have a single best route for Destination C, traffic from both Site A and Site B will follow this best route (from ISP 1).

Without the ability to perform explicit routing - in other words, routing based on source and destination, rather than just destination - the GigaPOP cannot ensure that traffic intended for Destination C is following the path contracted by the traffic source.

Bonus Points: The ABCs of TransPAC
More than just network infrastructure, the Trans-Pacific Advanced Connection (TransPAC) is a collection of network and human resources whose goal is to facilitate international collaboration in research and education.

Closely coordinated operational and user support activities are necessary for networks to succeed on a global scale, and they encompass a long chain of service providers (see diagram below).

The TransPAC NOC and User Services Groups comprise engineers and technicians from Indiana University and from APAN (Asian Pacific Advanced Network). Although composed of participants from multiple organizations across many countries, these support groups can act in a well-coordinated fashion by using state-of-the-art communications and collaboration tools, such as videoconferencing and shared environment conferencing.

The groups rely heavily on common systems and procedures for scheduling, maintenance, monitoring, problem management, notification, documentation, reporting and support.

© 2000 CMP Media Inc.

web @ startap.net