Graduate Networks, UCSD

CSE222 – Spring 2009

TCP Vegas: End to End Congestion Avoidance on a Global Internet May 15, 2009

This paper describes they modifications the authors made to the Reno version of the TCP protocol as implemented in BSD Unix in order to improve the behavior of the protocol in the face of network congestion.

They describe three major improvements to the TCP protocol, all of which affect the sender only, which makes this modification desirable from a deployment point of view:

  1. The first mechanism modifies the retransmission behavior. Reno retransmits only when a coarse-grain timer event occurs or when it receives a certain number of duplicate ACKs. The problem with this is that these events do not occur very quickly and thus harm throughput as the sender waits to retransmit data. Vegas improves on this by timestamping segments and using ACKs as a trigger to check the timestamps on the next relevant segment against the RTT. This enables it to retransmit much sooner than waiting for a number of duplicate ACKs, in some cases even before it has received one duplicate ACK. This had the effect of reducing the number of coarse-grain timeouts by half, which they showed could have a considerable effect on throughput (19% if all coarse-grain timeouts could be removed).
  2. The second mechanism involves how Vegas deals with congestion. Reno tends to be very reactive to congestion, and in fact it relies on generating some amount of congestion in order to determine the bandwidth of the network. Clearly, this is less than desirable, so Vegas takes a more proactive approach. It measures the achieved throughput versus the theoretical maximum throughput (measured in terms of RTT and window size), and then changes the congestion window to keep the achieved throughput between two thresholds which are relative to the theoretical maximum. The version that the authors settle on tries to keep between one and three buffers in use through the bottleneck router. They try to use at least one so that they can adapt to the bottleneck bandwidth increasing, and restrict it to three to keep from overrunning the bottleneck. This is probably the most important contribution of this paper, since it allows an increase in throughput while reducing the load on the bottleneck router relative to Reno.
  3. The third mechanism involves the slow-start behavior of TCP. Their mechanism in many ways mimics the congestion avoidance mechanism described above in that it measures achieved throughput versus expected throughput and uses that threshold to switch from exponential to linear segment growth in order to mitigate overrunning the bandwidth of the network. This provides the same general slow-start behavior without generating nearly as much packet loss as Reno does during its slow-start period. Their performance numbers show that Reno’s packet loss numbers flatten out over time because of the relatively large amount of packet loss during the initial slow-start. In contrast, Vegas’s packet loss numbers scale linearly with transfer size, which indicates that there is no additional penalty incurred by its slow-start mechanism.

In addition to the performance validations performed for the three major enhancements, the authors also performed fairness, stability, and queue behavior experiements to ensure that Vegas would not harm existing Internet users as a result of small- or large-scale deployment. All of these experiments show that Vegas either outperforms, or at least does not significantly underperform Reno in all of the various metrics tested.

The main weakness of this paper is that the experiments were all relatively small-scale. This is probably an artifact of the age of the paper, but the Internet is so much larger than it was then, with so many different classes of traffic, that their performance numbers may not be as meaningful in a modern context. I would like to see their experiments rerun with modern bandwidth-delay products on the Internet today.

Future research in this field should examine the remaining performance left “on the table” in terms of TCP transmissions to determine the best parts of the protocol to improve. This paper focused exclusively on the sender’s side of the transmission, which makes their performance numbers all the more impressive. Even though the receiver is supposed to be passive in TCP, it might be interesting to see what kinds of further performance gains could be achieved by collusion between sender and receiver, with each taking an active role in maximizing throughput.

 

TCP Vegas: End to End Congestion Avoidance on a Global Internet May 15, 2009

(i) the three most important things the paper says

The first most important topic that the paper covers is the fact that TCP Reno relies on congesting a network in order for it to be detected.  This is a huge observation, as TCP Reno must effectively make the congestion situation worse in order for it to get better.  The authors claim (and thereby show) that TCP Vegas handles this situation much differently.  TCP Vegas attempts to predict a congestion-based situation before it leads to significantly decreased performance on the network as a whole (by clogging up buffers at each intermediate router).  What the authors are effectively saying here is that the anti-congestion measures in place in TCP should be based upon preventative measures instead of failure-based measures.

The next observation that the authors make deals with the conservativeness of the Reno retransmit algorithm.  The authors claim that the retransmit algorithm doesn’t quite detect when there’s been a lost packet quite fast enough (fast enough meaning before the next global timeout).  TCP Vegas addresses this by, instead of waiting for a third consecutive repeated ACK, attempting to guess based upon the timeout value of each individual outstanding packet (versus the global timeout).

A third important observation made by the authors stated that the slow start mechanism used by Reno was too aggressive.  The exponential start could basically create a congestion problem just by how quickly it can ramp up (without testing out the current network condition).  TCP Vegas handles this by providing more time in between exponential growth periods, allowing the TCP connection to gauge its progress slowly (a slower ramp up) without congesting the connection on a wrong guess.

(ii) the most glaring problem with the paper

One of the biggest problems with this paper is the fact that it introduces quite a bit of complexity to (what was) a relatively simple protocol.  This means that the TCP connection itself will require more processing power.  While this may not be of great concern to higher-computation capable modern PCs, this could become a problem for lower powered embedded systems that use TCP as a mode of communication.  With lower powered devices on the rise (internet phones, etc.) increasing the complexity of TCP might not be the greatest idea.

(iii) the future research directions of the work

I feel that a larger scale study could be done with TCP Vegas on a TCP Reno based Internet.  The study that was done contained a small set of Vegas connections over the internet, but what would be interesting would be assigning a large subset of the Internet to use the TCP Vegas design on top of the TCP Reno Internet.  I feel that this will reveal more information about the usefulness of TCP Vegas in a TCP Reno world (and whether or not the conservativeness of Vegas will fall prey to the aggressive actions of TCP Reno).

 

TCP Vegas: End to End Congestion Avoidance on a Global Internet May 15, 2009

The paper by Brakmo and Peterson out of University of Arizona introduces a new TCP implementation, which improves on congestion avoidance mechanisms in TCP. The authors present the new techniques in contrast to the older TCP implementation called Reno, and evaluate simulation and measurement results of the two implementations. The three key techniques in Vegas are:

1) Retransmission mechanism: In both implementations, the goal is to avoid relying on coarse grained timers, and allow faster retransmission of lost packets in between coarse timer ticks. In Reno, this simply happens after three duplicate ACKs for an old packet are received. In Vegas, a timestamp is recorded for each sent packet and when the appropriate ACK for this packet is received, the Round-Trip-Time is updated. This RTT is used to see whether the first duplicate ACK already indicates a packet loss (because it arrives later than the estimate arrival time of the ACK of the lost packet) or if the first two ACK packets after a retransmission are also late, in which case more packets have been lost shortly before retransmission.

2) Congestion Avoidance: The authors describe Reno as an reactive implementation in the sense that Reno always pushes the bandwith over the “limit” of the optimal bandwidth by filling up router buffers on the way until packet loss occurs. This leads to oscillations in the send speed over long-going connections, filling up buffers unnecessarily. Their Vegas implementation, in contrast, tries to be proactive when increasing or decreasing the send window size. It does this by calculating the Expected and Actual throughput when changing the send window size, to find out if increases or decreases are necessary. This is done by comparing the rate to thresholds, so unnecessary oscillations are avoided.

3) Slow-Start: To optimize the slow-start period in TCP, the authors propose to use the congestion avoidance mechanism also during the slow-start phase, changing also the point in time where the send window size is doubled. This can only happen in Vegas once every two RTT, whereas Reno did it every RTT. This slows down the start a little, but avoids overshooting the available bandwith and reduces the need for retransmission.

The most glaring problem of the paper is the fact that the simulations don’t take different networks into accounts, no variations of the delay or available bandwith are simulated. Seeing the behavior in a multi-hop simulation with changing environmental conditions would have made a stronger point.

Future research directions for these TCP mechanisms are looking into congestion avoidance mechanisms that have minimal support from the routers on the way, which can signal end hosts and warn them about upcoming congestions or there optimal sending rate in a more direct way. These mechanisms could be simulated and evaluated.

 

TCP Vegas: End to End Congestion Avoidance on a Global Internet May 15, 2009

This paper talks about TCP Vegas, a TCP implementation based on TCP Reno but with better throughput and fewer packet losses. The basic idea is to try to predict congestion and avoid it, as opposed to Reno’s approach which creates congestion and then adjusts afterwards. The main contributions of the paper are:

  1. Retransmission Mechanism: Reno uses a coarse timeout for retransmissions, thus typically taking too long before retransmitting. It also retransmits when 3 duplicate ACKs are received. However, Vegas realizes eliminating dependency on this coarse-grain timer would reduce timeouts further, and increase throughput. Vegas recalculates RTT times each time an ACK arrives, thus giving better estimates of RTT. Then it takes advantage of this accurate RTT and uses certain ACKs as a hint to check if a timeout should occur: (1) rather than waiting for N duplicate ACKs, each time a duplicate ACK is received, check to see if any of the relevant segments have timed out. If so retransmit them. (2)Check the timeout for segments in question for the first or second non-duplicate ACK is received after a retransmission. By using this policy, Vegas can quickly detect if a retransmission is necessary, rather than having to wait for the coarse grain timeout to expire.
  2. Congestion Avoidance and modified Slow-start: Reno needs to create losses before it can detect congestion. It has to continually increase its window size for higher bandwidth, until it congests the network. Vegas on the other hand tries to predict congestion by comparing the current throughput to an expected throughput calculation. Vegas tries to keep bandwidth between two predetermined threshold values (indicating the number of extra buffers to use in the network). As for slow-start, Reno always sets the threshold window to half the congestion window when a retransmit timeout occurs.  The congestion window will exponentially increase until the slowstart threshold, after which it will be a linear increase. Vegas on the other hand only uses exponential growth every other RTT, and between the RTTs it keeps the window fixed to calculate an accurate throughput rate.
  3. Implementation can play a significant role in performance of a protocol: TCP Vegas is an implementation of the TCP protocol, in the same manner that TCP Reno is. This goes to prove how implementation details that are typically left out of the protocol specification can play a significant role on the performance of the end-system, and thus careful considerations should be made for the implementation.

Problems: Vegas tries to always keep a few extra buffers filled up in the network. That is, it sends enough traffic to ensure that some segments are buffered up on the routers. While this might result in good performance for a few nodes, when there are thousands or millions of Vegas nodes all doing the same thing, the routers’ buffers could easily be overwhelmed and start dropping packets. It seems like a greedy idea to aim to buffer packets on the routers all the time.

Future Research: Vegas was the basis for RenoNew and therefore provided the groundwork for the current popular TCP implementation. I don’t know anything about RenoNew, but I would assume some research was done to change the buffering mechanism of Vegas, as it would have most likely overwhelmed routers’ buffers.

 

TCP Vegas: End to End Congestion Avoidance on a Global Internet May 15, 2009

(i) Three most important things

1. TCP Reno uses coarse-grain timeouts in addition to the Fast Retransmit and Fast Recovery mechanisms to detect and transmit lost segments but eliminating the dependency on coarse-grain timeouts would result in a 19% increase in throughput. TCP Vegas proposes a new retransmission mechanism that treats the receipt of certain ACKs as a hint to check if a timeout should occur to reduce the time to detect lost packets.

2.  TCP Reno’s congestion detection and control mechanism uses the loss of segments as a signal that there is congestion in the network. TCP Vegas proposes that it looks at the changes in the sending rate to detect congestion. TCP Vegas compares the measure throughput rate with an expected throughput rate to determine the rate at which to send packets.

3. TCP Reno sets the threshold window for slow-start to one half of the congestion window. The slow-start period ends when the exponentially increasing congestion window reaches the threshold window and then increases linearly from then on but if the initial threshold window value is too small then throughput suffers or if value is too larger then packets will be lost. TCP Vegas allow exponential grown only every other RTT to able to detect and avoid congestion during slow-start and when the actual rate falls below the expected rate by the equivalent of one router buffer then Vegas changes from slow-start mode to linear increase/decrease mode.

(ii) Most glaring problem

The most glaring problem would be that when TCP Vegas is intermixed with other versions of TCP like Reno then the performance of TCP Vegas degrades because TCP Vegas reduces the sending rate before TCP Reno since Vegas detects congestion earlier and thus gives more bandwidth to the TCP Reno connections.

(iii) Future Research Directions

Future research directions for this work would be to study how the network congestion avoidance algorithm affects fairness when TCP Vegas connections need to compete with more aggressive TCP implementations like TCP Reno.