Convergence time of routing information and packet loss and delay due to inter-domain route changes, longer than expected.
Paper carries out experiments and analyses data collected to determine the average time taken before inter-domain routing information reaches steady state after there has been a change in the topology of an Autonomous System (AS). Data was collected by injecting routing faults into five major ISPs over a period of two years. In one of its experiments the paper observes that 20% of nodes require more than 3 minutes to converge for a Tdown or Tlong update.
Factors convergence time depends on.
The paper makes mention of some interesting factors convergence time depends (and does not depend) on. It provides experimental proofs and reasoning to support its claims. Some of them are:
i) Order in which route-change announcements are processed at a node
ii) Number of interconnections between AS (number of BGP peers). A complete graph would have larger number of computational states to be explored during convergence.
iii) Upstream provider transit property.
iv) If loop detection is done at only the receiver or both the sender and the receiver.
v) Largely independent of network load and congestion.
vi) Independent of geographical location of the node.
T-long and T-down have longer convergence times and why.
In the experimental data gathered by the paper it can be noted that convergence time and latencies are larger when the topology change involved a route going down. This might have resulted in a node becoming inaccessible or a longer route having to be taken. The paper analyses the sequence of events that occur when a route disappears and the how this results in T-down and T-long requiring longer convergence times as compared to T-short and T-up. In the case of T-down, a node keeps failing over to a longer secondary path before it exhausts all of them and realizes that the destination node is no longer available.
Problems/Oversights.
A couple of assumptions the paper makes in its experimental methods might not accurately model internet. Some of them are:
- It monitors the routing tables of only 25 ISPs. This might or might not correctly reflect packet forwarding and updating times over internet. More ISPs would mean more locations from which updates might reach a node.
- It models BGP messages to be processed in a single linear queue. This only models the case where there exist high congestion and long delays over the internet.
Future Work
The new system can be modeled which imbibes in itself factors claimed by the paper to reduce convergence times. These could involve, among others, sender and receiver end loop detection, re-ordering of announcements received to reduce convergence times, etc. This new system could be analyzed to obtain a quantitative measure of the enhancement that can be achieved at adding some amount of complexity to the internet protocol.