This paper describes they modifications the authors made to the Reno version of the TCP protocol as implemented in BSD Unix in order to improve the behavior of the protocol in the face of network congestion.
They describe three major improvements to the TCP protocol, all of which affect the sender only, which makes this modification desirable from a deployment point of view:
- The first mechanism modifies the retransmission behavior. Reno retransmits only when a coarse-grain timer event occurs or when it receives a certain number of duplicate ACKs. The problem with this is that these events do not occur very quickly and thus harm throughput as the sender waits to retransmit data. Vegas improves on this by timestamping segments and using ACKs as a trigger to check the timestamps on the next relevant segment against the RTT. This enables it to retransmit much sooner than waiting for a number of duplicate ACKs, in some cases even before it has received one duplicate ACK. This had the effect of reducing the number of coarse-grain timeouts by half, which they showed could have a considerable effect on throughput (19% if all coarse-grain timeouts could be removed).
- The second mechanism involves how Vegas deals with congestion. Reno tends to be very reactive to congestion, and in fact it relies on generating some amount of congestion in order to determine the bandwidth of the network. Clearly, this is less than desirable, so Vegas takes a more proactive approach. It measures the achieved throughput versus the theoretical maximum throughput (measured in terms of RTT and window size), and then changes the congestion window to keep the achieved throughput between two thresholds which are relative to the theoretical maximum. The version that the authors settle on tries to keep between one and three buffers in use through the bottleneck router. They try to use at least one so that they can adapt to the bottleneck bandwidth increasing, and restrict it to three to keep from overrunning the bottleneck. This is probably the most important contribution of this paper, since it allows an increase in throughput while reducing the load on the bottleneck router relative to Reno.
- The third mechanism involves the slow-start behavior of TCP. Their mechanism in many ways mimics the congestion avoidance mechanism described above in that it measures achieved throughput versus expected throughput and uses that threshold to switch from exponential to linear segment growth in order to mitigate overrunning the bandwidth of the network. This provides the same general slow-start behavior without generating nearly as much packet loss as Reno does during its slow-start period. Their performance numbers show that Reno’s packet loss numbers flatten out over time because of the relatively large amount of packet loss during the initial slow-start. In contrast, Vegas’s packet loss numbers scale linearly with transfer size, which indicates that there is no additional penalty incurred by its slow-start mechanism.
In addition to the performance validations performed for the three major enhancements, the authors also performed fairness, stability, and queue behavior experiements to ensure that Vegas would not harm existing Internet users as a result of small- or large-scale deployment. All of these experiments show that Vegas either outperforms, or at least does not significantly underperform Reno in all of the various metrics tested.
The main weakness of this paper is that the experiments were all relatively small-scale. This is probably an artifact of the age of the paper, but the Internet is so much larger than it was then, with so many different classes of traffic, that their performance numbers may not be as meaningful in a modern context. I would like to see their experiments rerun with modern bandwidth-delay products on the Internet today.
Future research in this field should examine the remaining performance left “on the table” in terms of TCP transmissions to determine the best parts of the protocol to improve. This paper focused exclusively on the sender’s side of the transmission, which makes their performance numbers all the more impressive. Even though the receiver is supposed to be passive in TCP, it might be interesting to see what kinds of further performance gains could be achieved by collusion between sender and receiver, with each taking an active role in maximizing throughput.