Graduate Networks, UCSD

CSE222 – Spring 2009

Building a Robust Software-Based Router Using Network Processors April 30, 2009

Main Points:

  1. This paper is another important step in the emergence of software-based routers. The authors describe  their experiences using network processors like Intel IXP1200 to build a router.
  2. They propose a processor hierarchy in the routers, by having the data plane (data packets) handled by a microengine at line speeds, whereas the control plane traffic (control packets like LDP, route calculation etc) handled by sophisticated Pentium at the top of the hierarchy. This ensures separation of functionalities and responsibilities across layers.
  3. The paper provides immense insight to a network designer into the general approach to design a network router. They describe the hardware used, the data structures involved at the classifier, forwarder and scheduler, and the queuing discipline they employed.

Problems:

While they dwell on the possibility of having multiple forwarders between classifier and scheduler, they do not adequately consider such scenarios in their implementation. They mention that to simplify their analysis, they have considered only the straight-forward input-port/output-port forwarding.

Future Work:

I see scope for future work in the aspect of using different hardware. They use 700 Mhz processor at the lowest level. If today’s technology permits them to use a much more sophisticated processor there, would there be a possibility of shifting some of the control traffic handling to the lower layer — thereby, avoiding some of the performance loss due to layering.

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

One contribution of the paper is that the authors describe a design to achieve fixed routing functionality using the 3 level processor hierarchy and the parallelism offered by the microengines. The authors describe how the IXP micro engines can be used to get the fastest data path forwarding while allowing exception packets to be sent to the StrongARM processor or the even higher layer Pentium  processor. At each higher layer, more memory and processor cycles can be spent on each packet, but the latency to forward the packet as well as the throughput in terms of number of packets that can be processed per second decreases.

Another contribution of the paper lay in the fact that the authors were able to get have easily programmable/extensible protocols or functionality inserted into the three levels of the processor hierarchy with specific constaints on the processing at each level such that a particular functionality inserted into one level does not adversely affect the forwarding in the other layer.

An important concept that the paper brings out is the separation of policy from mechanism (although the authors do not exactly describe it). Each lower layer in the hierarchy provides mechanisms to implement the policies that the higher layer would like to achieve. Certain mechanisms supported by the lower layers allow some functionality to be achieved entirely in the lower layer in which case the lower layer can then handle that functionality for all remaining packets of the flow. This analogous to the fast path / slow path differentiation seen in modern switches. The policy or routing decision is typically taken by the slow path, but the fast path eventually achieves that policy by forwarding at line rate.

One of the weaknesses of their approaches however is that they only experimented with a very restricted model of routing. The forwarders that they implemented or were able to achieve at the lower layers were very simple and most routers require more functionality than that. For example they mention in the paper that IP prefix lookups had to be done only in the Pentium (or the slow path). So this raises questions over whether their approach can scale to higher speed routers. The other challenge is that writing more complex forwarders entirely in assembly is going to be hard and with different network processors having different instruction sets etc, this can be a huge hurdle.

One possible direction of research is to explore what is the right set of primitives that the lowest layer (implementing the fast path) must provide to the higher layers, while still being able to achieve acceptable levels of extensibility. If the switch

 

Building a Robust Software-Based router Using Network Processors April 30, 2009

Paper proposes to build a software router using a network processor (Intel’s IXP1200) and PC.

Paper uses a hierarchical approach to provide the flexibility of extra processing while supporting the high speed switching.  Software based router performance has been increased with the nelp of network processors. IXP1200 has 6 micro engines each supporting four contexts. Context provides the execution while other are waiting for the memory operations. Paper defines the implementation by segregating the task into data plane and control plane. Data plane does the forwarding of packets while control plane runs protocol like RSVP, OSPF.  This requires processing at line speed while control planes are expected to receive fewer packets. Network processor is used for running the data plane and PC processor for control plane.  Data plane requires less computation while control plane requires more computation. Many of the applications lies in between where packets have both the characteristics. Paper treats the router as processor hierarchy, where packet flow different levels of hierarchy. These hierarchies are microengines, strogARM, Pentium. One important aspect of this hierarchy is the fact that going higher layers uses the resources of all the layers.

Paper addresses the resource allocation and scheduling problems for an extensible souter on this three level hierarchy. Three goals of the paper are performance, extensibility, and robustness. Software architecture of the router consists of three main components:

  1. Classifier: this reads fields in the packet and based on the attribute it decides to sent to a forwarder
  2. Forwarder: Forwarder takes a packet applies appropriate function to it and forwards it to the output queue.
  3. Output scheduler: selects a packet from non-empty queue and transmits the associates packet to the output port.

This architecture provides additional flexibility to add any number of forwarder, forwarders are not linked to any specific processor (microengine, strongARM, Pentium). Architecture does not distinguish between forwarders based on the data plane or control plane.

Paper discusses the design issues involved while treating all the processors in a hierarchy.  More complicated forwarders require more-cycles-per-packet which might reduce the maximum forwarding rate.

Paper suggests that based on their experience that static allocation of resource (task to context and context to port assignment), coupled with scheduling through token passing mechanism, and yields the most effective router. Register allocation should be done properly to achieve good performance. Paper further suggests that key innovation is to statically partition the processing capacity of the microengines into a fixed routing infrastructure and virtual processors.

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

This paper describes the implementation of a software router using Intel IXP1 1200 network processor and an Intel P3 PC. The authors emphasize the significance of the software-based routers because of the increasing array of services like firewalls, intrusion detection, proxies etc. demanded from the routers. The emergence of network processors has come as a boon to this thinking since it’s a specific processor developed for network applications. The router described in the paper implements both the data plane that forwards packets and the control plane where signaling protocols run. The requirements of speed are demanding on data plane at it must support line speed therefore it must perform minimal processing on the packets and the compute intensive tasks be handed over to control plane which receives fewer packets (new connections come up, routes to be decided etc.).

The approach taken is to treat the router as a processor hierarchy where packets follow switching paths that traverse different levels of the hierarchy. This hierarchy consists of the microengines of the IXP1 1200 at the lowest level, StrongARM processor at the middle and the Pentium at the highest level. The number of cycles available decreases from higher level to lower level since lowest levels must work at line speeds. The authors implement their extensible router on a three-level processor hierarchy with the goals of performance, extensibility and robustness.

The software architecture of the router consists of a classifier, a forwarder and an output scheduler. The role of classifier is to read packets from input port and select a forwarder to assign a packet to based on the packet header. The forwarder performs transformations on the packets and sends the modified packets to the output queue which are retrieved by the output scheduler to transmit to the output ports. Their architecture has two main attributes mainly, the explicit support for adding new services to the router and no fix mapping of the forwarders to the levels in the processor hierarchy. The paper discusses the performance bound on each of the levels in the hierarchy and how the context scheduling of each of the microengines and the mutual exclusion is achieved between the operations. The authors also discuss some insights into their packet switching, managing queue contention and the various optimizations performed. They perform experiments to measure the maximum forwarding rates and the excess per packet processor cycles. They find that an aggregate of 3.47 million packets can be forwarded per second and the upper limit is posed by the serial input of the DMA. The StrongArm processes at a maximum speed of 526Kpps and the limit enforced by Pentium is 534Kpps. The authors also talk about a virtual router processor that runs additional code on behalf of each packet. Essentially, the virtual router processor runs the protocol-processing step and supports a fixed number of cycles for each Mac packet.  The authors also performed robustness experiments in tune with their goals and ran synthetic suite of forwarders on configured Microengines.

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

The authors describe a 3 tiered processor structure for a software based router using a Pentium computer and a network processor (also from Intel). The aim is to show that using the tiered approach data and control packets can be routed at Gbs line rates while allowing for software based protocol extensibility.

The Intel network processor board comes with DRAM and SRAM caches as well as several MicroEngines and a StrongARM processor.  The MicroEngines are programed using microcode and can perform very fast rudimentary instructions (dequeue, enqueue, copy, etc.). The MicroEngines can operate in parallel servicing input and output ports. The StrongARM processor is used to service exceptional packets, those that incur a route lookup miss or are control packets. The StrongARM process runs as a single process and is capable of forwarding even more “exceptional” packets across a PCI bus to the Pentium processor. The Pentium processor is used to perform compute intensive operations such as calculating shortest paths.

The main contributions are in the design of the 3 tiered architecture and the design of the protocol extensions interface.

The architecture defines a fast path and a mechanism for routing through a longer path (more compute intensive). Packets arrive at an input port where they are queued in a FIFO. The packets are then processed in parallel by the MicroEngines that perform classification on the type of packet. If the packet can be forwarded on by the MicroEngines, they perform whatever transformation is necessary and enqueue the packet in an output FIFO. If the packet requires further processing, it is put into RAM where the StrongARM processor will check for packets to process. The StrongARM processor may copy some or all of the packets onto the PCI bus for the Pentium processor to service. The Pentium can copy back transformed packets via the PCI bus. When the StrongARM has completed the processing (or received the completion from the Pentium processor), it is responsible for forwarding the packet to the appropriate output queue. MicroEngines also work in parallel to service copying from output FIFOs to the output ports. Contention for output ports is solved by using mutual exclusion (implemented using a passing token).

The second contribution involves the interface for which extensions can be installed. The Pentium processor can functions on the StrongARM processor that register/unregister custom fiilters. These filters are used by the StrongARM to identify and forward specialized packets (conforming to some new/extended protocol) up to the Pentium for processing.

The paper provides a very detailed description for building a network processor + general purpose processor based software router. The authors’ results are impressive as they report a processing line rate of 3.47 Mpps.  What seems a bit unaddressed is what would actually happen if a new/extension protocol were implemented using this architecture. If that protocol required Pentium level processing for each packet, it seems as though the line rate would drop considerably. So from a practical perspective, one would not actually want to run their new protocol using this hardware (despite the fact that it could be run).

I’d very much like to see performance results of line rates for traffic using one of the extensions proposed in the paper. This paper does a great job of explaining how traditional traffic can be supported at high line rates but does not provide results from more complex protocols implemented using their design.

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

(i) The three most important things the paper says:

1. It is possible to build a cheap but effective routers. The authors combine a IXP1200 network board and a PC to build a router capable of forwarding an order of magnitude faster than the pure PC based routers. This gives them  three processors which are exploited to gain much higher throughput than average PC based routers of the day.

2. Specifically the author demonstrates how to obtain performance extensibility and robustness with a three level processor hierarchy. It explains in detail how to manage scheduling and resource allocation to gain great performance. On there board they dedicate the MicroEngines to reading and transferring packets from input to 0utput ports while exceptional conditions trap up to the StrongARM processor.

3. The author also points out another major gain of using this nerwork board rather than a pure PC. The board has extra resources it can devote to extensions to the router. Examples of the services that can be added are: performance monitoring, intrusion detection, denial of service detection, etc. These things were being put off since the PC would have to sacrifice performance since its processor would have to perform these tasks on top of the routing tasks.

(ii) The most glaring problem with the paper:

The author does not address how this architecture avoids the normal problems of separation and hiding when in a hierarchy. Meaning what performance hits do you take from this separation. Certain pieces may longer no about information which means extra transfers of data etc. Especially when later going for more complex performance boosts.

(iii) The future research directions of the work:

Something mentioned by the author that could have substantial gain are coming up with background work the processors could attend to during down time. Finding useful activities would make for better cpu utilization. Another necessary research topic is even if we do have a well defined hierarchy of processors, what about when we want multiple processors sharing one portion of the hierarchy.

 

Building a Robust Software-based Router using Network Processors April 30, 2009

Most important contributions

While a number of related approaches tried to implement software routers using a single general purpose processor, this paper introduced a new hierarchical processor architecture to be used in the forwarding engine of routers. The architecture consists of three different processor levels with different functional and performance requirements. The authors assume that the majority of packets processed by routers can be processed by very simple, dedicated and parallel engines. The main rationale behind the architecture was robustness, performance and extensibility. The goal was to preserve the performance and robustness of dedicated hardware-based routers while opening new extension points for a smaller number of packets.

For every packet, depending on the processing needed and the complexity of the forwarding engine involved, it is moved to a different processing layer. For each processing layer as the packets have to be transported up, the performance penalty increases and therefore the forwarding rate goes down. But since higher level programming languages are available on the higher processing levels, they are broader in application to extensions to routers.

The authors treat even the normal IP-Forwarding engine as an extension to their architecture, therefore proving their point about extensibility.

Most glaring problem

It is unclear whether the authors address the scalability issues of these software based routers enough. Whether the same design (with it’s memory access rates and processor speeds) still works for routers at Gigabit or 10-Gigabit speeds is not obvious.

Possible future work

Since this work might have triggered a standard architecture, but at the very least a standard operation set for forwarding engines when accessing/manipulating the routing packets, the most urgent work would be to standardize and harmonize these APIs and architectures in the routing industry

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

Important Things:
1) Demonstrates how network processors can be used to design a flexible and extensible router. The system is reporgrammable due to it being software-based. Simple extensibility is acheived by separating high-level functionality (implemented as extensions) from low-level mechanisms which are used by all extensions.
2) Demonstrates how the performance capabilities of the network processor can be harnessed. Presents a hierachical design which leverages the throughput of MicroEngines for high-volume traffic with simple processing requirements while allowing for more complex control protocols on a general-purpose CPU higher in the hierarchy.
3) Evaluates the performance and limitations of the design. Discusses implementation challenges and their solutions, particularly in regards to resource allocation and scheduling.

Problem:
Full IP, prefix matching, and other forwarders could not be done on the MicroEngines, so if this functionality is desired, performance is severely limited.

Future Work:
Given today’s performance demands, what are network processors capable of, how much flexibility do they allow, and how do they compare to other flexible alternatives?

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

This paper describes the architecture and implementation of a packet router which takes advantage of both a dedicated network processor (an Intel IXP1200) and a commodity Pentium III in order to provide a combination of high forwarding speed for small packets and rich services for control packets. They start by describing their basic line-speed packet forwarding mechanism, where a two-stage pipeline running on the MicroEngines moves packets from the input ports to queues, and from the queues to the output ports. This architecture is helpful because it allows easy extensibility when they want to add more functionality, provided by the StrongARM and Pentium processors.

Based on some performance numbers they obtained, it was decided to avoid implementing much of the robust functions on the StrongARM processor because they required most of its bandwidth and processing time to move packets between the MicroEngines and the Pentium.

Their final design used the concept of “virtual routing processor” (VRP) to provide rich services for control packets while allowing line-speed routing of simple packets. To do this, they gave their VRP a fixed number of cycles to perform its task in order to avoid penalizing simple packets. This requires validating VRP routines before installing them to ensure that they do not consume too many resources. Once this is done, the processing stage of the MicroEngine can either run one of the simpler VRP routines directly or hand the packet off to the higher levels of the processor hierarchy (StrongARM or Pentium) to perform more complex computations.

A weakness of their approach is that most of the code must be hand-crafted in assembly, since there is no compiler for the MicroEngines. This restricts the flexibility of the architecture, since the queue layout and therefore the implementation of the VRP is dependent on the architecture of that particular network processor. However, the lack of flexibility is balanced by the high performance attained by hand-coding the assembly. Since forwarding millions of packets a second is a very real requirement of commercial products, perhaps the greater amount of work required to implement the architecture on different platforms is worth it.

Future research in this area could focus on ways to find the optimum performance/flexibility point (as mentioned in the previous paragraph). Perhaps by adding a level of abstraction in a key location, performance can be maintained while allowing the code to run on more variegated platforms.

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

1. This paper describes the implementation of a software router built from a 733MHz Pentium III connected to an IXP1200 development board running on a 200MHz StrongARM CPU. The cost was roughly $1500USD and performs an order of magnitude faster compared to pure PC software routers.

2. The paper contributed a packet processing architecture with three switching hierarchical paths. The lowest level of the hierarchy, packets gets processed by the hardware microengines. At the intermediate layer, packet has access to StrongARM cycles. At the top layer, packet processing have access to cycles of the Pentium processor. Packets are switched to the layers based on a classifier, forwarding, and scheduling architecture.

3. The paper then contributes the experience of designing a fixed routing structure that fully and robustly exploits the IXP1200’s parallel hardware microengines.The paper talks about the bugs, optimizations, and synchronization issues encountered.

Further research can be done in this area by using newer hardware than was available in 2001. Different levels of hierarchy and different architectures can be explored with the variety of faster hardware available today.

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

The paper proposes the idea of a scalable software router that can handle and forward packets at rate comparable to conventional routers. They describe their experience of using emerging network processor, in particular the Intel IXP1200, to implement a router. They have shown that to is possible to combine an IXP1200 development board and a PC to build an inexpensive router that forwards minimum sized packets at a rate of 3.47 Mpps which is nearly an order faster than PC based routers and sufficient to support 1.77 Gbps of aggregate link bandwidth. The router implements both the data plane and the control plane where the former has the responsibility of forwarding packets at line speed while the latter does complex jobs like finding shortest paths to nodes and building routing tables. They have taken advantage of the underlying architecture which supports multiple cores each with many hardware contexts. Hence the software for processing packets has to be managed in parallel hardware contexts in a way that fully utilizes the available memory bandwidth. The architecture of the router has been divided into input queues and output queues and several forwarding units between them that parallely read packets from input, process them and finally write to the output queue. The architecture has the flexibility of adding new functionalities to the router. It also gives the freedom of where in the processor hierarchy each forwarder is implemented .

The conventional memory based processing can become a bottle neck in such architecture. Specially since the input and the output queues are shared between the multiple hardware contexts, they have to be protected by mutexes. This can add extra overhead for lock contention when all the threads are simultaneously trying to acquire the lock. Finally programming a network processor platform is complex since all code is written in assembly language due to lack of a compiler for such specialized application. This can lead to difficulties in upgrading different policies and adding services to the router. Also being written in assembly, critical portions of the code can become really error prone.

Future works can be done in a evaluating the flexibility of the network processor based router and its programmability. Research may also include the study of how efficiently emerging general purpose chip multiprocessors and state of the art I/O can be used to match data forwarding rates of conventional hardware based routers. Finally for specialized application like network packet processing and forwarding emerging GPUs can play a huge role in providing high bandwidth and scalability. Thus the distinction between hardware and software based router will hopefully become less meaningful.

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

(i) the three most important things the paper says

This paper addresses the feasibility of a software-based router while taking into consideration the specific interactions within the hardware design.  The paper demonstrates an architecture which the authors believe will provide sufficient worst-case performance while still allowing a significant amount of reprogrammability.  The first most important idea that the paper articulates is that this system is still not necessarily easy to program for because of the low-level programming required to do so.  I feel that the authors show, in this idea, that this system isn’t quite ready for any sort of implementation, but is more of a work-in-progress.  The second most important idea from the paper is the hierarchy that is used within its operation.  The authors dictate that packets that require the lowest level of external processing shouldn’t have to realize the latency of using the “higher-level” processors, but can instead be forwarded along at the maximum speed.  I believe that this bodes well for this design, so that the possibility of adding additional functionality later won’t greatly interfere with someone who may want to extract solely speed from this router design.  The third most important idea demonstrated is that this system can function well in the “worst case” conditions (minimum-sized packets).  This means that, ignoring any task that requires heavy processing, the design should function well for larger-sized packets.

(ii) the most glaring problem with the paper

The largest problem that I found with this paper was that they authors did not really address the operation of the system (performance-wise) in the event of a more complex algorithm for processing packets.  They spent most of the paper speaking about the worst-case operation.  It would have been much more enlightening to learn more about the typical-case operation and how that operation will scale with added functionality and algorithmic complexity.

(iii) the future research directions of the work

I feel that the next immediate direction of this work needs to profile a more typical usage case for this hardware, so that the research can show how this type of device will scale with added complexity.  The next direction for this work could be adding on to the programmability of the system, since many of the important aspects of the system must currently be programmed in assembly.

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

(i) The three most important things the paper says:

  1. The paper describes the authors’ experience with building a software router using the Intel IXP1200 network processor with a commodity PC. They identify three switching paths through the processor hierarchy — on the Pentium, StrongARM, and/or MicroEngines (from the highest to lowest level) in order to achieve the amount of required processing. The paths are not completely independent; for example, the path which sends packets to be processed by the Pentium requires the packets to pass through the MicroEngines and StrongARM. The MicroEngines work in parallel and use a fixed routing infrastructure.
  2. One of the key benefits to using a software router is the ability to build extensions to perform such tasks as performance monitoring, firewalling, and packet dropping and tagging, among others. The described architecture allows these functionalities to be implemented by allowing programming in the data plane. This is done with a data forwarder on the network processor and a control forwarder on the Pentium to control the actions of the data forwarder. The forwarders are “budgeted” within a virtual router processor.
  3. By using protected public queues with no contention for input processing, and a single queue with batching for output processing, the system is able to foward minimum-sized packets at 3.47 Mpps. Furthermore, even in the the face of many exceptional (control) packets which exceeding the processing capacity of the StrongARM, the forwarding rate was sustained at 3.47 Mpps.

(ii) The most glaring problem with the paper:

The experiments were performed with only minimum-size packets. I would be curious as to what kind of bandwidth is achieved with larger packet sizes (such as 1024 byte packets, as used in the “Can Software Routers Scale?” paper. Furthermore, the paper does not really address the scalability of their solution except a passing mention in the conclusions. I think it would have been worthwhile discussing the advantages and potential problems in scalability inherent to their architecture.

(iii) The future research directions of the work:

One potential research direction is to add the admission control mechanism described in section 4.6 to decide which forwarders to install. In addition to the functionality described in that section, it may also be worthwhile to add functionality to the admission control mechanism which detects and handles misbehaving forwarders (either unintentional or malicious). The authors also mention the lack of a compiler for MicroEngines and voice their doubts that such a compiler is feasible. Although this may be the case, it might be interesting to see if there was some set of abstractions that could be obtained and accessed through an interface such that managing parallel resources could be simplified a bit, and not need to be completely done in assembly language.

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

The authors note the flexibility of the software based routing systems and their increasing importance in the networks due to their flexibility. Their approach is to use specilized network processors on top of commodity PCs to achieve performance that is orders of magnitude better than the typical software based routers.

1) The authors show that software based routing on PC can be improved by orders of magnitude by utilizing a network processor. By doing so the network processor can take over the job of packet forwarding (data plane), allowing the packets to be processed quickly at line speed, and the PC processor can handle the signaling protocols. The network processor has multiple contexts that can work in parallel, and an efficient MicroEngine that can be programmed to examin the incoming data quickly. This approach combines the less flexible-quick network processor with the more flexible-slow PC processor and takes advantage of both flexibility and speed offered by both.
2) The paper provides a process hierarchy approach at splitting the router’s tasks into three separate areas: The data is either processed by the PC processor, by the network MicroEngine or by its StrongARM processor. By doing so, the bottleneck processing which is packet forwarding, can be handed to the more efficient network processor, and the PC processor can handle the rest of the data. A set of static Micro Engine contexts examines incoming MAC packets, and if necessary hand them to the StrongARM processor.
3) The authors show that both resource allocation and scheduling problems are approachable in a PC-based router solution. Resouce allocation is alleviated by the network processor: the MiroEngine decides whether the tasks are to be allocated to a router infrastructure for forwading at line speed, or if they are to be allocated to a virtual router processor for additional processing.

Glaring problems: The solution is based largely on the assumption that the majority of the traffic is forwarding packets. While this is true under realistic conditions, the software router would suffer performance under other conditions where ‘control plane’ traffic is high, since these are processed by the PC’s processor.

Future work: it would be interesting to see the combination of their single PC approach with the other assigned paper that describes using a cluster of PCs. This would provide the highest level of parallelism while still offerring the advantages of using a network processor. It would also seem beneficial if more user-friendly MicroEngines become available (they mention that there is no compiler and the code has to be written in assembly).

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

This paper presents a software based router that exploits a three-layer processor hierarchy architecture.

The main features of the proposed solution are the following:

  1. Separation of data plane and control plane. The former plane forwards packets from the input port to the appropriate output port. The latter plane instead defines the policies for such forwarding.  The main differences between the two layers are that the data plane must process packets at line speed, hence it has only few cycles available per packet processing. Instead, the control plane has a theoretical much smaller workload but it can run computer-intensive programs, such as shrotest-path algorithm. This separation between data plane and control plane allows the system to be much more flexible and axtensible than other already existing solutions where the two planes are combined.
  2. Flexibility and extensibility. As already mentioned at the previous point, the fact that the proposed system implements the control plane in a different place than the data plane allows the system to be flexible and extensible. In particular, it is possible to inject new routing policies or new features at run time, such as firewalls, TCP proxiex, virtual LANs etc. , without having to tear down the router nor having to rewrite the code from scratch. In this system, a new functionality injection results in basically adding a new forwarder F to the router.
  3. Full exploitation of the hardware parallelism. With their ad hoc programming, the authors achieved great performance out of the three level processor hierarchy by exploiting the parallelism of the IXP1200 MicroEngine. The authors also presented interesting guidelines on how such programming should be implemented.

I can see two main critiques to this paper.

  1. The tests showed in the paper and the implementation presented are very basic and trivial. In particular, they have not stressed the system with the heaviest forwarding policy types workloads. Most of the authors’ benchmarking is based on trivial IP forwarding. As they explained, the system must be able to handle packet forwarding at line speed. They claimed their system does it. However, it is also clear from their analysis that their architecture present a big bottleneck with the StrongARM. Even if the system is handling only basic IP forwarding, the StrongARM is already at full capacity and not able to provide any more cycle/packet. This would be required though when a more complex functionality was inserted into the system.
  2. The paper is 8 years old and at the time line speed was considerably slower. The system is structure to handle 8 100Mb/s Ethernet connections. As said at the previous point, the system is already at full capacity. Nowadays, we have 10 Gigabit connections, hence 2 orders of magnitude faster. The processors have not had such a speed up in the same time. They are only few times faster than 8 years ago. This is as the same time a potential bottleneck and also an interesting avenue for further research. Indeed it would be interesting to see how the proposed architecture scales up with the increased connection speed.
 

Building a Robust SoftwareBased Router Using Network Processors April 30, 2009

The paper discusses the design of a software based router using Intel IXP1200 network processor platform and an Intel Pentium III PC. The claim is that a cheap extensible router configuration can be built without having to use the ASIC technology and still get similar gigabit throughput performance. The authors implement the data plane/forwarding plane of the router which requires line speed processing without priority inversion on the IXP platform and they use the Pentium to implement the control plane of the router which needs intensive processing capabilities. The IXP internally contains 6 micro-engines and a strong ARM processor, thus making a three tiered processor hierarchy along with the Pentium and the paper presents a detailed description of which functionality is/could be implemented at each level of this hierarchy. The paper highlights the three main topics, viz. the system architecture, the fixed infrastructure required for minimal forwarding and the extensibility argument.

The system software as described in the paper primarily consists of a classifier which receives and reads packets on the input queue and redirects them to appropriate forwarder which then processes the packets and forwards the modified packets to the output scheduler. The scheduler ultimately schedules the packets on the output queue connected to the output ports of the router. This architecture allows for software extensibility and processor independent forwarder implementation. The hardware mainly consists of the processor hierarchy and associated DRAMs and SRAMs that implemented the queues mentioned in the software architecture and are capable of supporting several gigabits of aggregate forwarding bandwidth.

The paper then establishes a performance bound that can be achieved by each level in the processor hierarchy. The authors specifically highlight the mutual exclusions and parallelism considerations when programming the micro-engines of the IXP since they present multiple contexts. Micro-engine contexts forward 64-byte MAC packets in a two stage pipelined fashion with DRAM acting as the pipeline register. A pipelined approach is used to avoid contention on the output ports and to efficiently utilize all the processing contexts. An incoming packet is DAMed onto the on-chip memory and subjected to software protocol processing. To enable multiple micro-engines to process the input packet’s MPs in parallel, they are statically scheduled on to process contexts and mutex synchronization is used to mutually exclude them. Although processing happens on multiple contexts simultaneously, output contexts need to be serialized to obey packet ordering and this is done by passing token between contexts. Just like at the input, packets when processed are removed from contexts to output queues by static scheduling. The authors do a performance evaluation on the processing capabilities of the micro-engines and demonstrate that an aggregate through of 3.47 million packets per second are processed without losses and the upper limit is posed only because the input DMA engine is serial. When the micro-engines detect a routing cache miss or further protocol processing, they signal the strong arm to accept the packets from the DRAM for processing. The strong arm is shown to successfully process 526kpps and the only overhead is because of the handshaking between micro-engine contexts and strong arm. All control plane processing is handled inside the Pentium which is supports close to 534kpps of aggregate throughput.

The third major discussion in the paper is on the extensibility of the proposed architecture. Addition of complex forwarders and integrating the control plane into the overall architecture are seen as the possible extensions. The authors expose the limitations of the system in the form of strong arm having to share signaling with Pentium and sharing DRAM with the processor contexts. To circumvent this, the authors propose to add additional code in the form of a virtual router processor (VRP) over the micro-engine contexts to augment the packet processing capabilities. VRP budget and line speed rate processing are shown as conflicting goals and the authors propose to make extensions by fixing either. The authors propose performance monitoring, packet splicing and smart packet dropping as examples to demonstrate the possible extensions that can be made to their architecture by executing data plane on the IXP processors and control plane on the Pentium. They further propose an interface and a set of operations required to seamlessly integrate IXP and Pentium. The authors finally test different workloads on the system to validate its robustness in the event a transfer of processing capability is required up the hierarchy.

One major problem with this paper was that it lacked substantial performance evaluation data and the main focus was centered on the design implementation. The feasibility and deployment of the system is questionable since a crude PC-PCI-IXP interface is used to demonstrate it at a high level. Even with these limitations the paper opens up an arena of possibilities in the software router space and establishes a base for developing routers entirely on network processor platforms and PC hardware.

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

1. The paper presents a router design incorporating in itself both hardware and software features. The hardware components provide the speed and computation power required of a network processor, while the part of it implemented in software provides it with the flexibility to handle any protocol and to extend the current model to support new additions and changes.

2. The hardware portion of the router is implemented as a hierarchical structure. Though a hierarchical implementation has its shortcomings, it provides a simple structure which allows packets with different requirements to be processed at a different level. This way, only those packets that require more complex computations to be performed are forwarded to the upper layers. In addition, the hardware implementation also provides for “efficient” parallel processing. Efficiency here refers to, realizing parallel processing while keeping the overhead of parallel processing with shared resources to a minimum. The paper achieves this by implementing token passing. Only the unit in possession of the token can use the shared resource.

3. The paper makes some interesting trade-offs. By allocating fixed buffers and queues to each context (unit of hardware), it chooses speed over amount of resources to be provided. It also fixes the lifetime of a buffer so as to eliminate booking overhead. It chooses to make the structure as general as possible at the cost of efficiency achived from making a design protocol specific.

Shortcomings:

The hierarchical structure, while having its benefits, has its shortcomings. A three tier hierarchy means that every packet the needs to get to the highest tier needs to pass through all the lower ones before it can get there. This results in unnecessary consumption of time and resources. There are a few other choices they make that come at a price. Their use of token to co-ordinate amongst multiple contexts sharing a resource, while reducing the overhead of implementing mutual exclusion, means time is spent in giving contexts that donot require a resource a chance to use the resource. It also means priority is neglected while servicing ports. This is a conscious choice made on their part.

Future Work:

The design proposed by them has several avenues for additions and extensions. The model they propose is generic with maximal flexibility. A similar hardware/software system could be designed for real time networks. Features such as priority would be an important consideration is such a system. If flexibility isn’t a major criterion, a non-hierarchical structure could also be explored.

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

(i) The three most important things the paper says:

1) Advances in network processors warrants a revival in the discussion of software based routers on conventional PCs. These network processors are relatively inexpensive and have features such as parallel processing capabilities to improve pps (packets per sec).

2) Consideration of the three processor hierarchy as an integrated whole. This was a key point because if they were to introduce the network processor as a viable solution, there must be a plan to make it all work together. So in their paper they outlined all the potential issues along with their design solutions (e.g. running a minimal OS on the StrongArm).

3) They demonstrated how they can inject new features into all three levels of the processor hierarchy without jeopardizing robustness for different workloads. This is an important point, because this is a feature that sets them apart from standalone HW solutions. If this feature wasn’t robust enough, there wouldn’t be a compelling reason to switch to this solution.

(ii) The most glaring problem with the paper:

The biggest problem with the paper is that it focuses too much on the technical details and not enough on the impact of this work in a commercial setting. For example, how does this solution compare with standalone hw based routers in terms of power consumption. Also, how does this setup compare in terms of time to failure (aka stability of system)?

(iii) The future research directions of the work:

The future research of the work would be to take the lessons learned and system in this paper and extend it to a clustered solution of conventional PCs. This is because to make this solution viable in a commercial setting, there must be support for growing a network infrastructure.

 

Building a Robust Software-Based Router Using Network Processors April 30, 2009

i)

1. Seperating the data plane from the control plane and having a processor heirarchy allows normal packets to be forwarded at line speed while having a more powerful processor deal with exceptional packets. This allows for a software based router that can operate at line speed for normal traffic.

2. By statically allocating memory and processors, the issues of guaranteeing line speed forwarding is made easier, but it is at the expense of making portability to platforms with differing number of ports & port speeds more difficult.

3. Allowing the network processors to run any code as long as it fits into the resource budget allows the router to be extensible, while also ensuring guarantees on line speed routing for non-exceptional packets.

ii) I was unsure why the StrongARM was necessary and why it was necessary to have it on the path between the MicroEngines and the Pentium. The flaw in this paper is not adequatley explaining the reasoning for having the StrongARM between the MicroEngines and the Pentium and/or why the Pentium can’t be eliminated all together by simply getting a faster ARM processor.

iii) I think the future research areas include researching what capabilities can be implemented at line speed forwarding and what requires higher level processing.

 

Building a Robust Software-Based Router Using Network Processors April 29, 2009

This paper is primarily concerned with detailing advancements to software-based routers, as it was written at a time in which there were cheap routers that were relatively dumb compared to todays standards, and expensive PC based routers that handles resource intensive tasks such as firewall, port forwarding and the like. The primary points of this paper were:

1. The development of a hierarchical scheme in which some operations were performed at the lowest level in order to maximize throughput, some were passed up to the specialized network board that could handle moderate sized operations with good speed, and finally, the most intensive tasks were passed up to the general purpose CPU in order to perform those operations as quickly as possible. This took advantage of the strengths of each stage and ensured that the router was operating as quickly as possible.

2. The authors of this paper made extensibility one of their primary design considerations, a view that was a wise choice in retrospect. They recognized that the proliferations of services that inspired them to build this router would likely continue increase and that their design would need to be able to implement them cheaply and easily in order to stay competitive. Their design is somewhat modular in that the forwarding and scheduling mechanism of the different layers is somewhat removed from the modular services that they can incorporate. This would make it easier to add a service and then change the forwarding algorithm slightly so that it would recognize where to send the packets.

3. The forwarding mechanism mentioned above is also static, allowing for an extremely robust system, even under an extremely heavy load. By decoupling the services and the forwarding mechanisms, the authors were able to optimize each component to its specific strength. The three design considerations: performance, extensibility, and robustness were well thought out and addressed for each component.

The primary weakness of this paper that I saw was that they did not try to implement any of the services that were much less common at the time, but rather stuck with the basic ones common to many routers. While they did show that they could do it better, being able to demonstrate increased functionality would have improved this already excellent paper and added something that was missing. The other thing that I thought would have improved the paper would have been a simple graph detailing performance compared to other common router configurations.

Future research in this area has been ongoing as routers have been forced to do more and more over the years. The authors mentioned moving a lot of this functionality to FPGA’s which I see as something that might merit a considerable amount of research. Also, improving upon th system and even adding more layers or duplicates of each layer might significantly improve performance. These were all mentioned by the authors but I also believe that these are the most likely paths for future research.