The authors describe a 3 tiered processor structure for a software based router using a Pentium computer and a network processor (also from Intel). The aim is to show that using the tiered approach data and control packets can be routed at Gbs line rates while allowing for software based protocol extensibility.
The Intel network processor board comes with DRAM and SRAM caches as well as several MicroEngines and a StrongARM processor. The MicroEngines are programed using microcode and can perform very fast rudimentary instructions (dequeue, enqueue, copy, etc.). The MicroEngines can operate in parallel servicing input and output ports. The StrongARM processor is used to service exceptional packets, those that incur a route lookup miss or are control packets. The StrongARM process runs as a single process and is capable of forwarding even more “exceptional” packets across a PCI bus to the Pentium processor. The Pentium processor is used to perform compute intensive operations such as calculating shortest paths.
The main contributions are in the design of the 3 tiered architecture and the design of the protocol extensions interface.
The architecture defines a fast path and a mechanism for routing through a longer path (more compute intensive). Packets arrive at an input port where they are queued in a FIFO. The packets are then processed in parallel by the MicroEngines that perform classification on the type of packet. If the packet can be forwarded on by the MicroEngines, they perform whatever transformation is necessary and enqueue the packet in an output FIFO. If the packet requires further processing, it is put into RAM where the StrongARM processor will check for packets to process. The StrongARM processor may copy some or all of the packets onto the PCI bus for the Pentium processor to service. The Pentium can copy back transformed packets via the PCI bus. When the StrongARM has completed the processing (or received the completion from the Pentium processor), it is responsible for forwarding the packet to the appropriate output queue. MicroEngines also work in parallel to service copying from output FIFOs to the output ports. Contention for output ports is solved by using mutual exclusion (implemented using a passing token).
The second contribution involves the interface for which extensions can be installed. The Pentium processor can functions on the StrongARM processor that register/unregister custom fiilters. These filters are used by the StrongARM to identify and forward specialized packets (conforming to some new/extended protocol) up to the Pentium for processing.
The paper provides a very detailed description for building a network processor + general purpose processor based software router. The authors’ results are impressive as they report a processing line rate of 3.47 Mpps. What seems a bit unaddressed is what would actually happen if a new/extension protocol were implemented using this architecture. If that protocol required Pentium level processing for each packet, it seems as though the line rate would drop considerably. So from a practical perspective, one would not actually want to run their new protocol using this hardware (despite the fact that it could be run).
I’d very much like to see performance results of line rates for traffic using one of the extensions proposed in the paper. This paper does a great job of explaining how traditional traffic can be supported at high line rates but does not provide results from more complex protocols implemented using their design.