Graduate Networks, UCSD

CSE222 – Spring 2009

MIRO: Multi-path Interdomain ROuting May 29, 2009

Filed under: Extra Papers — ameenakel @ 8:19 pm
Tags: , , ,

(i) the three most important things the paper says

  1. One of the most important topics that this paper addresses is that BGP, in its current form, does not allow for any given AS to (in any reasonably automated way) request a certain route for its packets to traverse, regardless of the motive.  In fact, beyond sending that packet to its next AS-hop, the originating AS has no control whatsoever within the BGP implementation on the Internet today.  Some great examples mentioned in the paper about why this level of control (over which individual AS hops a stream of packets will travel) could be useful are security, performance/bandwidth, or price (among other things, I’m sure).  For example, a government agency may not want to route a certain set of transmissions through a country that is considered hostile and thus would like to request that a certain AS not be used, regardless of the fact that it may be on the shortest and most economical path.  This lack of control is the problem that the authors of this paper have set out to solve with their new scheme: MIRO.
  2. The authors also address the standing issues with the current (most popular) solutions for this problem: source routing and overlay networks, demonstrating why these solutions lack the proper scalability, lack the proper backwards compatibility (which would require a batch change of all Internet hardware), and require far too much overhead to be used as Internet-wide protocols.  Quite notably, on this same note, the authors made it a point to demonstrate why MIRO satisfied these two requirements (at the protocol level).  The authors brought up the point that MIRO continues to keep relatively the same amount of BGP-control messages between ASes as vanilla BGP (because of its pull-based alternate route advertisements), helping scalability.
  3. The authors also describe a method in which it would be possible for MIRO to operate on existing hardware on the intermediate routers of a given AS (to reiterate, not including edge BGP routers).  This is a great advantage of the MIRO solution, as most/all changes to the Internet backbone that require increasingly larger change rarely are implemented.  The solution involves treating an alternate route as if it were another IP address in the given AS’s inner network (essentially adding a level of indirection).  This would essentially require changes solely to the routing tables of each router instead of to the underlying software of hardware.

(ii) the most glaring problem with the paper

One of the largest problems with this paper (and it was quite difficult to find a topic that wasn’t addressed well by the authors) was that it did not address any performance metrics below the level of the protocol level.  Specifically, the authors did not investigate the impact of these new changes on the performance of edge BGP routers.  Since these routers are meant to move packets from their ingress ports to their egress ports (or vice-a-versa) as quickly and accurately as possible, adding to the latency of moving a packet from one end to the other is very undesirable.  If it is the case that this scheme adds significant overhead to these routers, this scheme may find greater resistance in its adoption (as it would also require a hardware or significant software change to each of the edge BGP routers–another big hurdle to overcome).

(iii) the future research directions of the work

Although mentioned briefly in the conclusion, security would be on the mind of the average grad-student reader of this paper from the very beginning.  The level of control can is given (insecurely) to any router that can send a packet is practically limitless, as long as the negotiating AS approves of its requests.  Some sort of secure handshaking protocol or digital signature technique would greatly help to strengthen this scheme.  Another great future direction of this work would be to investigate the true overhead created by this scheme versus the BGP baseline, along with he overheads of other popular replacements for or augmentations to BGP.  It would be useful to quantify the different overheads created by each scheme.

 

BGP Routing Policies in ISP Networks April 28, 2009

Filed under: R07. BGP Routing Policies in ISP Networks — mdjacobsen @ 2:14 pm
Tags: ,

The paper describes common techniques used to implement BGP decision policy in boarder gateway routers. Because the BGP does not specify the decision process, a complex preference and filtering scheme as sprung up in the relatively simple path vector protocol. The use of route attributes allows ISP to implement custom routing policies as they see fit.

Routing decisions are therefore not always simply shortest path. The main contributions from this paper are descriptions of the ways in which ISPs (and enterprises) make use of these attributes, filtering, and tagging to achieve preferred decisions.

The authors describe that it is common for ISP to affect routing policy for business relationship purposes. A common technique is to set non-overlapping LocalPref values for each business entity. This allows specific routing to say customers over peers. Additionally, tagging is used to disseminate preferences internally while filtering the tags prevents external entities from receiving such information.

It is also common and perhaps necessary to adjust routing for inbound and outbound traffic according to quality of service and load balancing purposes. The routing attributes can be used here to affect import policy so that LocalPref values can be updated according to capacity.

Filtering community attributes is also used to limit amount of routing updates received and thus the routing table size. This is an important concern as it has direct consequences in terms of scalability.

Lastly, the authors describe how filtering on import can prevent invalid (malicious) routes from changing the routing table. Similarly, filtering on export avoids spreading infrastructure information to the rest of the network.

The main problem with this paper is that it assumes that any change to the BGP protocol is out of the question. While it may not be the focus of this paper, it seems that some discussion of how the protocol could be improved to better accommodate custom routing would be in order. Instead the authors focus on how the existing policy can be manipulated to provide routing policy flexibility.

I’d expect to see further work on the use of RPSL or other routing policy languages in Internet level experiments. It would be nice to see how much better a routing policy language can be as compared to the current methods of policy implementation.

 

BGP Routing policies in ISP networks April 28, 2009

1. The paper provides a comprehensive overview of the BGP protocol used to achieve routing between ASs. It makes note of the major factors that influence inter-AS routing that aren’t an issue in intra-AS routing such as conflicting policies between different ASs, commercial aspects that define traffic between ASs, routing through intermediate ASs, etc. It talks of how BGP functions within an AS to allow its nodes to communicate with nodes in other ASs and also how packets that need to travel through multiple ASs before the reach their destination AS need to be handled. It makes note of the policies and the priorities of the policies that are used to determine routes in inter-AS communication.

2. It looks into the problem of a given AS wishing to influence the routing decisions of adjacent ASs when it has no direct control over the policies of its adjacent ASs. It talks of how an AS can make one route more preferable that another by suitably setting the MED attribute of the routers in the respective routes. It also talks of scenarios when it might be possible for an AS to remotely control the routing policies of a router in an adjacent AS.

3. It talks of how the overhead traffic generated due to the book-keeping involved in inter-AS routing can be reduced. Route-aggregation as a method to reduce the amount of information that needs to be communicated with a neighbouring AS when new nodes are added within an AS. An AS can configure its routers to advertise a single, less specific prefix to it’s neighbouring AS instead of advertizing two adjacent prefixes. Ignoring routes that change too frequently is another method of reducing the excessive traffic generated due to unstable routes.

Oversights:

The paper provides a concise overview of various aspects of BGP, ranging from the policies used in deciding routes to factors influencing these policies and how these policies can be used to control traffic between ASs. The topic is vast and there is no end to additional insights that can be given. It paper could have gone on and given a more in depth analysis of the BGP protocol and how it functions when the communicating ASs are not adjacent to each other or it could have looked into the short comings of BGP which include oscillations when routes change and how these shortcomings can be handled etc. However, there is no end to the detail that can be provided.

Future Work:

Some of the solutions provided by BGP are not the most efficient. For example the methods used by BGP to handel routes that go down or need to be taken down for maintence take up a lot of time and traffic before they reach a steady states. The various short commings of BGP can be looked into and methods can be suggested that circumvent the problem or provide more efficient solutions.

 

BGP Routing Policies in ISP Networks April 28, 2009

Filed under: R07. BGP Routing Policies in ISP Networks — liyunjiu @ 2:10 pm
Tags: , , ,

This paper starts with an introduction to BGP and AS routing and then focus on the decision process BGP uses to choose routes. There are several factors that influence ISPs to implement certain BGP routing decisions.

1. Business Relationships influence the decision process which cause ISPs to assign LocalPrefs to determine which peering relationship is more desired to be used to route traffic. Customers may be most preferred, while backup links may be least preferred. ISPs may also want to control route exports to other ASs by prepending a community attribute to advertisements from a peer, and filter for that attribute when advertising routes to other peers.

2. Traffic Engineering is also a factor in an ISP’s BGP policy when there may be several routes that are equally preferred. For outbound traffic control, one common goal is hot-potato or early-exit routing where the ISP forwards traffic to the closest border router to avoid internal congestion at the cost of inflating end-to-end path lengths on the internet. Another common goal is to reduce congestion on outbound links via load balancing by changing LocalPref for a set of prefixes that matches a regular expression with the help of tools. ISPs may also have inbound traffic control to limit internal congestion or control how much traffic it will receive from peers. One way of controlling inbound traffic is by AS pre-pending to artificially inflate the AS-path length. Another way ois to use the MED attribute. Traffic can also be remotely controlled by having an agreement between two AS’s to have community values map routes to a LocalPerf on a remote router. This is done because MED applies next AS hop  and LocalPref applies to routes across all neighbours.

3. Scalability is of concern to ISPs as well. There is a common goal of limiting the routing table size by utilising the community attribute and filters. Another goal is to limit the number of routing changes by flap damping to limit propagation of unstable routes.

4. Security is a big issue since an AS is highly vulnerable to false BGP updates. ISPs can defend themselves by filtering invalid routes, protect routing policies by overwriting attributes to expected values, securing the internal network infrastructure by export filtering, and blocking DoS attacks via damping and filters to direct spammer address blocks to blackhole routes.

I didn’t find any problems with the paper as it is just describing common design patterns of ISPs deploying BGP policies. The paper cites research topics done to improve BGP and further research can by done in serveral areas:

- Configuration checking and simulators to predict how policies will interact.

- Language design for a vendor neutral language to express routing policies.

- Research in an overhaul of BGP. BGP is not a clean protocol since there are many ambiguities associated with decision making.

 

An Experimental Study of Delayed Internet Routing Convergence April 23, 2009

  1. The transition from distance vector to path vector routing algorithms increases the number of routing path oscillations, which can last up to fifteen minutes.
  2. Much of the path restoration delay is a result of non-standardized implementation decisions on the part of BGP router vendors, including timer parameters. The lower bound on convergence could be reduced from linear time to constant time with minor changes to said implementations.

Using a full mesh may greatly increase the number of paths (up to an exponential

Possible future work includes adding some form of synchronization

 

Delayed Internet Routing Convergence April 23, 2009

Three Important Things:

  • There is common acceptance that the internet in its current form is robust and efficient in adjusting routes in the face of link failures. Such occurrences are fairly common in such a huge infrastructure, and the authors recognized that it would be valuable to evaluate this accepted belief. By evaluating BGP convergence they came to the conclusion that inter-domain failover has significant delay, and that the path vector approach is incorrectly believed to provide improved convergence properties over distance vector routing. The claim is that the delay stems from ambiguity in the BGP protocol, and varying implementations across different router vendors.
  • The paper was focused on experimental trails conducted over a long period of time, lending strength to the obtained results. The authors emphasize several points about designing and executing their experiments in order to produce quality, meaningful, relevant results.  Faults were injected into the actual internet backbone as opposed to a simulation environment. Latency effects were measured at the router level by routing ISP routing tables, this allowed the examination of several events: route repair, route failure, repair and failover, failurea and failover. End-to-end measurements were performed to determine the application level impact, the results proved that even moderate routing table flux caused packet loss, increased packet latency, and out-order-delivery.
  • In addition to the hard data, the authors provide theoretical lower and upper bounds. The exploration of these bounds resulted in several insights about delayed convergence. For example, a lack of minimum route advertisement interval timer in some implementations influences the order in which link updates are processed, which impacts the rate of convergence. There are numerous implementation variations across vendors that contribute to the performance bounds.

Glaring Problem:
A major concern that I had with this work is that it only focues on BGP, and that it didn’t really discuss the parallels between inter-domain and intra-domain results. It would have been useful to see how the obtained results pertained to other routing protocols, if at all.

Future Work:
Following this thorough treatment of BGP convergence analysis, it would be useful to follow up on the root causes of delayed convergence and try to design new protocols, or at least new features in future routers that could help alieviate the strain on the internet backbone.  These topics were mentioned in the paper, but not at length. Such research would be crucial to the continued viability of the internet from a QoS perspective.

 

An Experimental Study of Delayed Internet Routing Convergence April 23, 2009

This paper presents the results of a very comprehensive study of routing convergence behavior on the internet. By monitoring the internet in many different places while injecting routing faults at various points in the network, they were able to measure in great detail the effects of such faults on both the routing tables themselves and their effect on end users.

They present a few important conclusions. The first is that the amount of time it takes for new routes to stabilize after a fault is actually much longer than previously thought. In addition to this insight, they have discovered that this delay will increase linearly in the best case and exponentially in the worst case with the number of additional autonomous systems on the internet. This presents a significant challenge to the designers of internet routers, especially if the growth rate increases more exponentially than linearly.

Their second conclusion is that much of this large failover time is due to differences in how vendors implement the BGP specification. If more loop detection was done by the sender instead of just at the receiver, the number of total updates could be reduced dramatically. However, this adds complexity to routers which are likely already quite expensive and complicated to design.

A third conclusion they arrive at is that different classes of update traffic have different behavior with regards to the amount of time and number of messages necessary to reach routing convergence. The loss of a route causes messages which are monotonically increasing while the addition of a route causes messages which are strictly increasing. While this may not seem like a significant difference, it actually has a profound effect on the convergence time, since it can take a long time for a faulted route to disappear from the system as routers bounce messages back and forth slowly increasing their distance to the faulted route until eventually they all realize that it is no longer present. This is in contrast to an added route, which either creates a shorter path which is propagated, or doesn’t, and doesn’t generate any new routes.

A weakness of this paper is that their bounds on the convergence rate of routes are fairly loose. While it is valuable to know that it is at least linear, the upper bound is quite high and does not necessarily describe the behavior accurately. However, since the growth rate is at least linear, future research needs to focus on ways to either improve BGP or supplant it with something more intelligent, which is less susceptible to implementation dependencies and superfluous messages during routing convergence.

 

An Experimental Study of Delayed Internet Routing Convergence April 23, 2009

Major Contributions

1. This paper presents the results of a two year study on the convergence of the BGP routing protocol and provides recommendations to reduce convergence time and overhead mainly caused by BGP routers with different implementations due to the ambiguity of BGP specs. RouteView data collection probe was used to log and timestamp all BGP updates from BGP routers of vaious ISPs. Results show that failover delay is independent of network load, congestion, locality and more dependent on topology.

2. A fault injection server was used to inject BGP faults to upstream ISPs and measure the latency to random websites. The failure injection measurement results were taken for BGP events Tup, Tdown, Tshort, and Tlong and measured how long it takes for all the route tables to converge following an event. Tup is a previously unavailable route is now available. Tdown is the opposite of Tup. Tshort is a long ASPath is now replaced by a shorter ASPath. Tlong is the opposite of Tshort. From the distribution of convergence latencies following Tlong, Tdown, Tshort, and Tup faults,  the authors find that Tlong and Tdown form one equivalence class, and Tshort and Tup formed another equivalence class. Tlong and Tdown took almost three times the amount of time to converge compared to Tshort and Tup events. Analysis showed that Tlong and Tdown took longer because it is monotonically incrasing compared to strictly increasing for Tshort and Tup events. The results also show that RTT latency to random websites increased by a factor of four following a failover event and packet loss increased by a factor of 30.

3. The paper provides vendor specific recommendations to reduce convergence lower bound to 30 seconds by analyzing changes to MinRouteAdver using simulation data. The recommendation is to implement both sender and receive side ASPath loop detection, and do not apply MinRouteAdver to Tdown withdrawals.

The most glaring problem is not answering the question of how often these rare long convergence delays occur and whether BGP can still scale without much modification given some constraints on topology.

With routers getting faster, further research considerations with implementing additions or modifications to BGP to improve convergence times using synchronization, diffusing updates, and having additional state information may be feasible. Further research can also consider a complete overhaul or replacement of BGP.

 

An Experimental Study of Delayed Internet Routing Convergence April 23, 2009

The paper takes a critical and look at BGP route convergence in the face of failover/updates. It provides empirical measurements of real Internet latencies and packet losses for a period of over 2 years as well as theoretical analysis for worst and best case complexity of the route convergence process.

The paper brings to light the fact that most of the failovers measured during the 2 year period took an average of 3 minutes to converge. This observation provides hard statistical evidence that there are considerable BGP inefficiencies. Moreover, this duration length is considerably higher as compared to telephone network failovers. The underlying point here being that the network infrastructure supporting the Internet should be as efficient at resolving failures as the telephone network, especially given the increasing reliance on the Internet for business purposes.

The paper also analyzes the BGP protocol from a theoretical perspective. Given a completely connected graph model for the Internet and certain message ordering and processing constraints, the authors show that unbounded delay BGP messages can result in Ο((n-1)!) states. This is of course the upper bound, but it provides insight into how the network topology and BGP router processing details can affect route convergence. Additionally, the authors point out that a lower bound also exists for router states (or rounds as termed in the paper). If the BGP routers adhere to the minimum route advertisement delay and process messages of route length in increasing order (among other constraints), the lower bound is Ω(n).

Lastly, as a result of the analysis on the lower bound for rounds/states, the authors provide implementation suggestions and optimizations to reduce the lower bound to Ω(1). The minor changes to BGP implementations include performing sender and receiver loop detection and using peerwise MinRouteAdvert times. These suggestions allow routers to converge in one round. The paper provides simulation results that explore the unbounded, MinRouteAdvert, and modified (as per suggestions) MinRouyteAdvert implementations of BGP. These results validate the theoretical implications made by the authors.

The paper is a solid presentation of simulation, theoretical analysis, and real world observation/experimentation results. The only criticism is that the authors make the comparison between the public switched telephone network and the Internet. This isn’t necessarily a fair comparison as the telephone networks were designed with a dedicated single purpose, presumably using the same underlying technology throughout. The Internet serves considerably larger topologies, provides numerous varying uses, and runs on a myriad of technologies. Providing faster failure convergence is a great endeavor, but comparing against the telephone network may be too ambitious.

I think the path for further research would be to implement some of the suggestions and perform a long term test of the efficacy of the modified MinRouteAdvert protocol.