Graduate Networks, UCSD

CSE222 – Spring 2009

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — siva @ 4:28 pm

The primary contribution of the paper is a design of an entirely PC based architecture for software based routing at line rates. The authors have made an attempt to quantify the maximum possible theoritical performance that is possible using a particular PC architecture and also provide experimental results that describe the actual performance achieved.

An important aspect of the paper is that it focusses on achieving extensibility or easy of upgradation in routers. This has been a very crucial bottleneck to upgrading the internet router infrastructure.

The authors propose a mesh based topology with valiant load balancing to scale without being limited by a single PC’s processing power. Also they propose that L2 switches can be used in the valiant mesh to avoid the bottleneck of fan out at a single node

One missing aspect in their discussion is their discussion focusses on very simple routing / forwarding (raw speed) which could very well be the case for core routers – however other routers do perform more tasks than that. Extending the analysis for more complicated routing aspects would have given the paper more credibility.

Future research can explore if different topologies can offer better performance (such as fat trees). Also exploring price points of the PC based architectures vs. traditional architectures is another aspect.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — filipposeracini @ 4:27 pm

In this paper, written by several people from Intel and academics, an introductory analysis of the feasibilty of software routers is presented.

The first impression I got reading this paper is that it is only an early introductory paper. It basically presents only conjectures on whether a software router can keep up with the high volume workload that hardware router can handle.

Still, I can see the following a couple of interesting points in the paper:

  1. Analysis of software router bottlenecks. The authors did a quite extensive work in understanding where the bottlenecks of a software based router implementation are. In particular they identified the FSB and the memory system as the most problematic points. As solution they suggest a multicore architecture with point-to-point interconnections between c0res and memory in order to avoid the shared bus.
  2. Scaling switching analysis. The authors identified that another issue related with software routers is a scaling problem of switching packets from input to output ports. The workload that a standard pc, where usually a software based router is implemented, can handle is far lower than a specialized router. The authors present a interesting analysis of the reasons why a pc based architecture can’t scale too much and suggest a solution based on the Valiant load-balanced routing algorithm.

The main flaw of this paper is that it does not present much content. All the details of the hardware analysis are related to a previous work of the authors and only the results are presented here. The analysis they present of both the single core and multicore architecture is very approximated and weak. They claim a 4 times increase in performance for the latter implementation, but in the paper a complete evaluation section of a real full implementation is completely missing. It is hence hard to fully buy the authors’ conclusions. Moreover, all the techical problems are left open for further work and research.

As research direction, I would start with implementing at least a complete prototype of their idea and evaluate more seriously whether this approach is good to further research. Indeed, from the paper is not clear at all whether a software based router has any chance to be at least as performing as a hardware based one. Even after all the simplifications that the authors took, their proposed solution can barely get to the same performance level of the hardware one. Moreover, to achieve that performance would require big changes in both the hardware and the operating system. That sounds a lot of work. The main question is still the same and is still open: are software routers worth it?

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — nekumar @ 4:26 pm

Paper proposes the idea of software routers made from general purpose server platforms using cluster-based architectures for scalable and programmability. This approach has two main advantages: i. Software router’s upgrade is relatively cheap than the hardware routers. Ii. Use of commodity software further reduces the cost. One problem with the current routers is the fact that today’s software router are in the range of 1-3Gbps but carrier-grade routers are in the range of 40-90Gbps. Paper tries to revisit the question of increasing switching speed of software router by changing the “single server as router” approach to clustered software-router architecture that uses interconnect of servers to achieve greater scalability. Paper describes two type of cost: per packet processing cost and switching cost. Per packet cost directly depends upon the line rate while switching cost depends upon line rate as well as number of ports. Current carrier grade uses network processor and switch fabric for these tasks.

Paper proposes to build a N port router using N servers each supporting a line with rate R. These servers are connected to each other to switch packets from input to output. In this setup,

Each server is supporting a line rate of R hence per packet processing capability of each server should scale with R. Aggregate switching capability of the server must scale with NR.

Paper proposes a high level model to access the per-packet processing capability for a traditional shared bus architecture and point-to-point architecture. Paper shows that the bottleneck was memory chip and packet layout which was remedied by modifying the linux memory allocator. Paper further suggests the changes in the packet descriptor handling, Direct I/O, multi-processor mesh architecture.  Paper extrapolates based on other work and these validations that rates up to 10Gbps for shared bus servers and 40Gbps for mesh based servers can be achieved.

Another aspect of the problem is switching scaling, which occurs when input ports receive packets for the some output port at a rate that exceeds its capacity. High speed routers handle these situations using switch fabric configuration using a centralized scheduler. PC cannot scale up to the speed of schedulers paper proposes a interconnected switching network either scheduling is implicit in the routing through the network. Because OSes are general purpose they will not work in unison and per node connectivity is also limited to number of ports available. Link speed is also limited. Paper proposes to use load-based routing algorithm to achieve the switching scalability.

This paper does not adres the power usage, form factor and reliability aspects which is important in real deployment.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — mohit1982 @ 4:26 pm

The paper talks about the challenges involved in building scalable software routers and propose a solution of cluster-based router architecture that uses an interconnect of commodity server platforms to build software routers that are both incrementally scalable and programmable. The authors support the cause of software routers because of their flexibility in deploying new changes, the low cost of commodity PCs because of large-volume manufacturing, familiar programming environment and operating systems and widespread supply and support chains. The basic limitation of the current software routers is their ’single server as router’ approach which can never scale to hardware routers with high carrier speeds and thus, the need of clustered software-router architecture. The authors aim to build an N-port router using a cluster of service-class PCs where the challenges faced are per-packet processing capability of each server and the aggregate switching capability of the server cluster.

The authors predict the packet processing rates using a highly simplified model over a high-level view of PC architectures and then, experimentally measure the packet-processing rate achievable on a current PC platform. These analyses provided the upper and lower bounds respectively on the line-rate scaling and on whether their proposal is feasible. They chose shared-bus architecture and a multi-processor architecture as two architectures for their making predictions and performing experiments. They also computer lower bounds on the bandwidths of server scale processors and argue that small 64 byte packets limit bandwidth to 3.4Gbps. They suggest few ways to improve performance like improved packet descriptor handling, meshed processor technology and direct I/O. They also propose valiant load balanced architecture by using intermediate nodes for traffic rerouting.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — vikrams3 @ 4:25 pm

This is an interesting paper that focusses on a topic that few others have ventured into. Moin points are:

  1. Primary reason for relatively little prior work in this area is the underlying demotivating assumption that it is impossible for software to ever match the line speeds that current routers and traffic patterns demand. While this premise is true, the paper proposes interesting ways to get around it by trying to place best- and worst-case performance bounds when a PC is deployed as a router.
  2. The authors make a fine case that the specialized network processors of today have in fact turned out to be more of a stumbling block for new innovation in this space. The reason is that even adding a new module to a router requires changes in the hardware, which proves expensive. Unlike this scenario, a network world with software routers would enable deployment of new protocols and modules, just as today’s desktop and servers do.
  3. Shedding light into the problem of scalability of switching. The paper systematically characterizes the bounds in terms of what a software based router is expected to achieve: 1) the per-packet processing capability of each server must scale with O(line rate) 2) the aggregate switching capability of the server cluster must scale with O(#(ports) * line rate)

Problems:

This paper marks the first step in considering the feasibility of using software-based routers as a subsitiute to the network processors of today. As an obvious consequence, there are many if’s and but’s at every step of reading through the paper. We know that Cisco and Juniper use specialized hardware for their routers, different from the regular PC. The first one that comes to mind is: if software-based routers were indeed possible, why is it that not one networking company in the industry has been pushing for it. Most likely reason for it concerns the scalability issue.

Future:

The paper makes a mention of the wide array of issues that need to be confronted in the future work, before this is deemed viable. For example, critical functionalities like multicast, quality of service and buffer management need to be researched upon extensively. Power usage, reliability, tolerance to software failures etc, accumulate enhanced importance in the context of a software-based router.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — pefaymon @ 4:25 pm

Most important contributions

This paper is an overview paper on the scalability problem of software-based routers using commodity hardware. The authors point out that modern PC architectures are ready to act as general-purpose, software-based routers even at current and future line of 10 Gbits and 40 Gbits.

This scalability problem is explained to be twofold, one: the per-packet processing speed and second: the switching speed. The first problem is presented as needing changes in the core architecture of shared-bus architectures, the FSB seems to be limiting forwarding speed per packet. For the second problem, the paper only creates a space of possible solutions without going into details.

Most glaring problem

Most of the numbers presented in this papers are estimates and back-of-an-envelope calculations. I’d have liked to see a more general overview over different hardware architectures to consider for building future routing – the prevalence of the PC architecture seems a really narrow focus. Furthermore, this paper lacks any proof-of-concept for the switching solution.

Future work

Keeping all of the Green-IT initiatives and the fact that the most expensive part of using a data center is power consumption, I’d like to see how power consumption on dedicated switches fares against these general-purpose machines.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — stufflebean @ 4:24 pm
Tags: , ,

Yes (with caveats).

This paper provides two models of differing accuracy in order to explore the possibility of implementing a carrier-grade router using only commodity computers and software routing. First, they use a back-of-the-envelope calculation based on theoretical bandwidths of processor front-side busses (FSBs), PCI-Express, and memory. This highly-optimistic model shows that using processors which should be available in the near future (mesh-network instead of FSB), a 40Gb/s line-speed router should be possible, which is at the low end of carrier-grade.

They then proceed to refine this model by performing calculations using an actual software router running Click on a commodity server. After correcting for a memory bug which hurt performance, they found that performance was only limited by FSB, which should be mitigated by future processor architectures.

After addressing a couple of possible enhancements to their naive software router approach (amortizing packet descriptors and allowing the processor to snoop DMA data directly into cache), they go on to discuss the switching problem. One area in which a commodity server will never be able to compete with purpose-built products is in the speed of the switching fabric. However, to solve this, they propose using a cluster of machines, which will increase aggregate bandwidth while providing load balancing.

A weakness of this paper is that it tends to rely a bit too heavily on future technology without providing much in the way of convincing benchmarks. Their Click benchmark seemed to jibe with their theoretical result, but that does not mean that it will scale with the advancement of commodity server technology. They mention in their closing remarks that they are working on creating a complete cluster using a switching mesh, and it would be nice to see the results from that study before drawing conclusions from this paper.

Future research needs to examine whether the mesh-based processors can really provide the bandwidth promised by their model or whether there will be another bottleneck which they did not account for. Also, if the packet descriptors cannot be modified at the NIC level, they need to find a way to account for that in order for their claims of 40+Gb/s to hold. Further, they need to decide whether the power consumption, reliability, and packaging constraints involved in a clustered system are worth the price gap between a cluster and a purpose-built part.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — Mike @ 4:24 pm

(i) The three most important things the paper says:

1. This paper used simplifications of routing to be able to characterize and analyze it from a new perspective. In particular, they found that the first bottleneck they hit was indeed the memory system but not because of access times as everyone one assumes, but instead it was caused by a combination of packet layout and memory chip organization. After patching this, they discovered the next bottleneck was due to FSB saturation.

2. When analyzing whether real systems could reach those promised by their analysis using simplifications. They uncovered that the main difference between real results and their theoretical analysis was the added overhead of packet descriptors. The author gives a number suggestions of ways to fight this overhead. First by combining desciptors or getting hardware support for dealing with them.

3. The author then advocates for a cluster based mesh architecture. These meshes are no longer limited by the FSB since they use direct CPU links. The author claims, backed up in his other paper, that by simply switching networking to these architectures will offer 4x performance improvements without much trouble.

(ii) The most glaring problem with the paper:

This paper is very broad without focus or emphasis on the key discoveries of this paper are. The author tries to address and theorize on all aspects and limitations of router performance in an already short paper. The one saving factor is the author states that a more in depth analysis is done in a more in depth paper and this then only viewed as a summary is much more viable. In short the author must use words like we believe a lot indicating that which means you must decide to trust the author instead of being forced to by fact. The author doesn’t address the case when expansion makes a mesh of CPUs no longer a viable option.

(iii) The future research directions of the work:

This paper screams for more research to be done. The author repeatedly states that people need to test the theories put forth in this paper.  In particular, tests using the mesh based architecture and other design improvements brought forth are a must. The paper ends with the authors stating his current goal of building a 40 gbps router with the method he prescribed but the results therefore could not be put in this paper. Lastly future research should be done on ways to interconnect the computers using neither a FSB or mesh since a mesh has limited scalability. and what effects those architectures will have on performance.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — giledelman @ 4:23 pm

Three Important Things:

  • Software routing was not taken seriously as an enterprise solution at the time when this paper was written. This was mainly due to the accepted notion that they do not scale well and that commodity hardware could not compete with specialized hardware routing solutions. A high level sanity check calculation challenges this notion, showing that a shared-bus architecture can scale up to 10Gbps with the bottleneck in the memory bus. A multi-processor mesh architecture could scale up to 40Gbps. The reason for this improvement is that the mesh approach replaces the FSB with multiple point-to-point links.
  • In order to ground these calculations the authors implemented a shared bus architecture proof-of-concept system and ran several experiments. These experiments quantified the FSB bottleneck by showing that the address bus was 70% utilized, close to a known saturation point. Based on the results, the authors suggest several ideas for improvement that would allow scaling up to and beyond 10Gbps: batch handling of packet descriptors, Direct Cache Access implementation, mesh architectures.
  • Another concern in software routing solutions is making sure that the internal switching scales appropriately as the load and number of ports increases. Deterministic internal switching is ruled out because it would require that the interconnect run at a higher speed than the incoming line. Instead a dynamic load balanced approach is proposed, referencing work done by Valiant et al. Randomly choosing packet routes introduces potential problems of packet reordering however the authors claim that these can be mitigated by enforcing deterministic ordering on short bursts of packets as these are at the highest risk for reordering.

Glaring Problem:
By the author’s own admission, the paper is intended as a light overview of software routing solutions. As such, it ignores many issues related to large scale deployment. Routing companies and their customer have a vested interest in hardware solutions that by their natures cannot be upgraded. Despite its numerous benefits, software routing would need to blow existing solutions out of the water in order to be adopted, and it is not clear that this is the case.

Future Work:
The suggestion of applying software solutions to routing architecture is largely unexplored, and the authors mention several avenues for future research. These would entail a rigorous analysis of  cluster configurations, power, cooling, as well as large scale prototyping.  In addition to these, I would add that a cost/throughput analysis is very important in order to justify the use of commodity machines. Also, software routing raises a bigger issue regarding the overall routing architecture of the internet. Another avenue for exploration could be how to scale our routers with hybrid hardware/software solutions.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — ameenakel @ 4:23 pm
Tags:

(i) the three most important things the paper says

One of the most important things that this paper says, although the observation is not conclusive, is that it is feasible to generate multi-Gb routing speeds by using solely commodity PC parts.  This leads to the possibility of having software routing for higher speed links in the future.  I should note that the calculations and benchmarks used were not conclusive and were solely for “basic feasibility” (as the authors mentioned in the paper).  The next most important thing that the paper articulates is the need for non-typical switching topologies, which would be a whole new field of research within itself.  The authors noted that the number of interface ports on a typical server motherboard is not likely to scale (without producing specialized hardware, which was against the ideas set forth in the paper), so they must adapt their ideas to fit the hardware available.  A third important observation of the paper includes the conclusion that simplified high-level models of router operation on commodity PC hardware may actually reflect, to some accuracy, the actual operation of those routing ideas on the hardware.

(ii) the most glaring problem with the paper

The most glaring problem with this paper is the fact that there is little to no actual data presented and much of the observations presented had to be taken at face value.  Also, even though the models presented matched with the basic tests that were performed for this paper, it might not be the case that this carries over to the actual implementation of the routing system itself.  I feel that actual simulation might not reflect the results presented in this simple model as well as the authors would lead us to believe.

(iii) the future research directions of the work

A good future research direction of this work would be to establish a community around the topic by building tools that would help simulate the ideas presented here (and would thus allow others to more easily innovate upon those ideas).  I’m referring specifically to simulators, workloads, and the like.  This way, if the ideas presented here ended up yielding successful results in benchmarking, others could help to investigate other processor architectures or memory types easily.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — erubow @ 4:23 pm

Important Things:
1) Uses a simple model of commodity PC architecture to place an upper bound on its packet processing performance according to its specs, and uses experimentation to place a lower bound on packet processing performance for current server-class PC’s.
2) Begins exploring options for interconnecting a cluster of PC’s in a way that provides high routing throughput and is also scalable.
3) Introduces the idea of a mini-flow (a flow within some time window), suggesting that it may be a good balance between maintaining packet order and load balancing.

Problem:
As they point out in the beginning of the paper, power usage, form factor, and reliability not addressed. Additionally, they focus on acheiving high bandwidth, and there is no talk of latency. The cluster topologies they discuss involve routing packets through multiple PC’s, and the packet goes up and down the software stack at each node. If this approach is to be used in the core Internet routing infrastructure, then the additional latency will affect most Internet traffic, and will multiplied by the number of such cluster-based routers that are traversed. A flexible platorm for implementing routing protocols need not be purely software-based.

Future Work:
Certainly the PC architecture analysis will be important even in the context of end hosts with high-rate NICs, as rates scale to 40Gbps and 100Gbps. The topology question was not conclusively answered in this paper and should be investigated further. Also, I think the mini-flow idea for multi-path load balancing deserves further investigation.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — koderaks @ 4:22 pm
Tags: ,

This paper examines the use of software routers. The authors observe that the current software routers, while flexible, do not scale well beyond 3Gbps, thus they examine the problem of making a general purpose software routing solution that is scalable. They key idea is to use a cluster of servers rather than a single server to do the software routing. They try to predict the performance of a server with known hardware specifications.

  1. Past software routing solutions typically relied on a single server acting as the router. In this paper authors show that given clusters of general purpose servers it is possible (with optimism) to build a framework that supports rates close to 40Gbps using mesh architecture multi processors servers.
  2. The authors observe that traditional shared bus architectures impose a high penalty due to the FSB speeds, but multi processor mesh architectures remove this FSB barrier and thus have a much better packet processing rate.
  3. Several other techniques such as the use of Direct IOs by the NIC (NIC directly uses the processor cache), and integration of packets and descriptors are used.

Glaring problems: The authors seem too optimistic in their calculations (ie assuming a CPI of 1 may not be the case for many processors out there). Also, their solution is highly reliable on CPU and memory speeds, as well as a good system architecture. It seems that it would be rather difficult to provide performance guarantees for specific bandwidth rates.

Future work: The paper mentions that switching could be a problem but does not provide enough analysis on various software switching/routing algorithms. More work could be done to determine what switching solutions best suite interconnected-servers.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — supritapagad @ 4:22 pm
Tags: ,

1. Routers implemented in software running on servers, in spite of providing the flexibility, low cost and ease of programibility, weren’t popular since they didn’t scale and weren’t able to support line speeds. The paper proposes a method for implementing such a router, realized in software running on commodity servers. It achieves this by using multiple interconnected servers to form a cluster, each server handling one incoming line. It goes on to show that this cluster has the capability to support current line speeds and scale.

2. The paper also make a quantitative analysis of such a server, both in theory and by practical implementation. It does this for a server using a shared bus and one supported by a mesh of multi-processors. It identifies the major time consuming components and estimates the theoretically achievable, best case forwarding time for data packets. It then goes on to implement the proposed server and measures and compares the estimated times with the observed times. It identifies the communication buses to be the bottlenecks.

3. It looks into various topologies for inter-connecting the servers in the cluster. It identifies the issues, which include line-speed communication between the servers and in-order delivery of packets. It proposes a solution for in-order delivery of packets. It takes a look at the scalability of this cluster.

Oversight

The paper talks of providing as many servers as number of line. Though it talks of how this structure would scale as a function of O(sqrt(N)), it still might end up being quite formidable. Also, the paper does not consider packets requiring greater computation such control packets and the impact such packets would have on the processing times for data packets.

Future Work

The paper presents a new approach to implementing software based routers. It also proves theoretically that it is capable of supporting line speeds. The idea can be extended to incorporate servers more suited for the application. Attempt can be made to validate that the system works and supports all types of traffic, including data-control packets and is robust to external network changes. It internal scalability can also be verified.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — brokerer @ 4:21 pm

With the internet constantly evolving, it would be nice if old routers could keep in touch with the evolution. Using software routers can make changes easier to deploy. The problem with software routers is that it supposedly does not scale. This paper addresses whether or not the performance cap is fundamental or an architecture fault.
Major Points of Paper:
1.) Clustered software-router architecture
A big reason why today’s software routers cannot scale is because they adopt a “single server as router” architecture. This paper however sees the limitation in this architecture and has decided on using a clustered software-router architecture instead. Where a N port router can be built using a cluster of server-class PC’s. Each server handles one incoming line of the router and the N server are connected so that they can switch packets from input to output lines. To scale they need to show that for N ports with R bps line rates; lookups and classification must scale with O(R) and switching must be done in O(N*R). This clustered software-router architecture can take advantage of the newer hardware technologies like meshes.
2.) Back-of-the-envelope Analysis
The authors of this paper evaluate whether a architecture can scale as a software router by using back-of-the-envelope analysis and actually running experiments. The analysis looks at the architecture at a higher level view to see whether or not it can scale. This is more of a upper bound limit of whether or not line-rate scaling is possible. They check things like how many transactions on the memory bus , FSB and PCIe buses occur to see if it takes too long. From their work it seems as if a shared-bus architecture won’t be able to scale but with emerging server arhitectures the multi-process mesh architecture should be able to get 40Gps.
3.) Experimentations
Back-of-the-evelope analysis is nice to know whether or not something should work theoritically but a common problem is whether or not the implementations back up the theory. In their case, with large packets (1024 bytes) their server scales to 14.9 Gbps but for 64 bytes packets they get capped off at 3.4 Gbps. The bottleneck to the performance was found to be related to the memory system. To partially fix this they edited the linux memory allocator. Their experimentations lead them to discover many improvements that they can make to help scale.

Glaring Problem with Paper:
Using a cluster software-router architecture might help software routers scale better but is it enough? As advances in hardware allow for meshes that make a cluster architecture faster, can it keep up with the advances in network speeds? I think it is going to be very hard for it to if it is even possible. If it is possible, it seem like a lot of changes have to be made to acheive them.

Future Direction of the Work:
Using a clustered software-router architecture brings light into the software-router scaling problems. The authors are now building a 40Gbps prototype that should shed more light into whether or not software-routers can scale. There are bigger questions that have to do with configurations, footprint, power and cooling that the authors did not address but will be important if these software routers do ever scale.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — mdjacobsen @ 4:20 pm
Tags:

The authors provide theoretical and observational evidence that PC based software routing can achieve multi-Gbs routing speeds. The focus is on whether such routers can scale in terms of line and switching rates. They address the processing and bandwidth limitations in PC based architectures and where bottlenecks arise when forwarding packets at Gbs speeds.

The primary contributions are in identifying where the bottlenecks arise within a PC while doing packet forwarding and how a switching fabric can be constructed that preserves line rate.

For the first, the authors perform analysis of two internal PC topology models: traditional shared bus and point-to-point. They perform some coarse high level calculations to determine if the bandwidths on the PCIe bus, memory channels, and front side buses (FSBs) can support Gbs line rates. They consider this an upper bound. This bound is then tested using a real PC and multi Gbs network traffic generators. The result is that memory layout and access patterns are a bottleneck. If adjusted to suit the packet sizes, then the address bus on the FSB becomes the bottleneck. In these experiments they achieve roughly a 4.4 Gbs forwarding rate. They identify further packet descriptor handling adjustments to improve the bottleneck but do not implement them.

The second contribution comes in the area of switching. Because the line rate must be maintained throughout the entire process, the switching fabric has to support much higher processing rates. The solution they propose is to use a Valiant load balanced mesh architecture. In this architecture nodes are connected in a mesh and input on a node is forwarded to a random intermediate node before it is forwarded on to the destination. This helps alleviate the overload on any output node but introduces the prospect of packet reordering. They investigate several different switching networks based on this model (and variations of). The results show the maximum link and node capacities of each.

The paper does a good job of providing both theoretical analysis and emperical results. Unfortunately, I found their experiment setup to be too simplified. Their routing process uses static routes which considerably limits the amount of work needed to forward packets. This really makes the forwarding rates incomparable with real dedicated routers and works to undermine the applicability of their results.

I’d expect future research to focus on testing PCs using the customized memory and descriptor changes. Additionally, I’d like to see more reasonable “commodity” PC architectures considered. Multi-core point-to-point PC architectures are still not cheap commodity computers. Replacing low cost dedicated routers with a higher priced PC that can perform a the same level, may not be that surprising of a result.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — liyunjiu @ 4:20 pm
Tags: ,

The paper examines whether software routers can scale to speeds achieved by network processors and what architecture optimizations may be required.

1. First, a high level model was used to predict an upper bound and bottleneck for current general purpose architectures. It is estimated that a shared FSB architecture can achieve speeds up to 10Gbps and point-to-point architecture can achieve speeds up to 40Gbps using current hardware.

2. Experiements were performed to validate the high level model and provide a more realistic lower bound using unoptimized off the shelf platforms. A server with 16 GigE cards was tested. With packet sizes of 1KB, the server forwarded at 14.9Gbps with a 16Gbps load. With packet sizes of 64 bytes the server can only forward at 3.4Gbps with a 16Gbps load.

3. The initial bottleneck was identified to be the way Linux allocated memory. After the Linux memory allocator was modified, the next bottleneck was discovered to be from the high FSB address bus utilization when sending small packets. This can be partly remedied by modifying the NIC firmware to integrate packets and their descriptors. Other improvements can come from Direct I/O from the NIC to CPU cache and using multi-processor mesh architectures.

The paper then explores switching architectures and topology. This can be a topic of future research to implement routers connected in a Valiant mesh. Additional research can also include optimizations to current hardware and software to achieve network processor speeds. There were no technical flaws in the paper.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — krishnanadh @ 4:19 pm

Implementing routing algorithms in software on commodity hardware is a very inexpensive and attractive proposition. The paper explores the upper and lower bounds on the throughput achieved by software routers when implemented on processor hardware available today through theory and experimentation. Commodity hardware has the advantages of low cost, widespread availability and understood programmability but implementing router framework on single server is limited by peak performance the server can deliver to compare with the raw line speed. So, the authors propose a server cluster based distributed router architecture to build a software router that compare in scalability to an N-port R bps ASIC router.

The authors compute the theoretical and practical limits to which today’s processors can scale when servicing a router traffic-like workload. They expect the memory system, the FSB and the I/O buses to limit the performance of the processors. They illustrate a theoretical upper bound on the processor throughput when using shared bus and mesh architectures. Forwarding a single packet in shared bus architecture would require 4 memory accesses for DMA-ing packet data into and out of the processor and to 2 FSB and PCIe accesses, this, the authors show would approximately result in an overall bandwidth of 10Gbps. As opposed to a shared bus architecture, a multi-processor mesh architecture would give a higher bandwidth (approx 40Gbps) since it does not incur the cost of an FSB and since each processor would have its dedicated memory subsystem.

Using experimentation the authors compute a lower bound on the bandwidth delivered by server scale processors. They observer that while substantial bandwidth is delivered for large packets, small 64-byte packets highly limit the bandwidth since these transactions incur an additional read/write of per packet descriptors thus leaving a practical bandwidth of only 3.4Gbps. To overcome such limitations the paper proposes using improved packet descriptor handling (more packets per descriptor), meshed processor topologies and direct cache access. Having established that servers can indeed be scaled to give router like performances, the paper proposes valiant load-balanced architecture where in intermediate nodes are used to load balance incoming traffic and subsequently route them onto output ports and an 8-port mesh topologies to substitute valiant algorithm because valiant ends up reordering the packets to achieve balanced loads. Both these approaches are seen a potentially competitive replacements to conventional hardware routers used today. Thus the authors push for server cluster based distributed software routers for extensibility, programmability and ease of use.

The paper creates the scope of an interesting niche in the router architecture space. It can lead to future research on the possibility of server nodes replacing ASIC routers and the required driver and network stack software to support such configurations. Also since the paper only serves in making a proposal, extensive research needs to be carried out on power usage, form factor, server node reliability and other deployment related considerations when using servers to replace routers.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — subhramazumdar @ 4:19 pm

The paper proposes the idea of revisiting the feasibility of high-speed software routers that are also scalable with the rate of incoming traffic and the number of ports they support. The fact that costly hardware routers have already been deployed in the market has resulted in the inflexibility of considering the alternative of a software router. This paper shows that the general perception of software routers being unscalable and low bandwidth is not actually true. A basic limitation of today’s software routers comes from the fact that they adopt a “single server as router” architecture. Given the evolution of general purpose platforms, a single server architecture is unlikely to ever reach such high carrier speeds as conventional routers. But a clustered software-router architecture that uses an interconnect of multiple servers can achieve much greater scalability. Also with the emergence of mutlicores packets can be processed in parallel. Infact current processor architectures available in PCs can indeed support raw bandwidths of processing single data stream at 10Gbps. In such multiprocessor architectures, the memory bandwith of a bus based system really becomes the bottleneck. This can be further alleviated by using mesh interconnect model in which every processor has faster access to its portion of the shared memory.

The problem that further needed to be addressed is that of fast switching of packets between the different ports which are essentially individual servers in the cluster. Since the per-node connectivity in terms of number of interfaces on a typical server board is low, the interconnect complexity between the server nodes in a cluster has to sublinear for it to scale. Thus maintaining fast communication with low degree of connectivity is a major challenge. Finally the load between the nodes and their internal data links has to balanced dynamically for optimal performance.

The work mentioned in the paper has really opened the possibility of high speed software routers which are also much more flexible in terms of programmability and configurability. Future directions may include more comprehensive study and comparison between conventional and software routers both in terms of power and performance. The result of such study will shed light on the necessity of special purpose architectures for network centric workloads or alternatively on the architectural modifications needed to improve the packet processing capabilities of general purpose PCs.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — gracewangcse222 @ 4:18 pm

(i) The three most important things the paper says:

  1. Software routers are becoming an increasingly attractive alternative to special-purpose hardware routers for a number of reasons: they are cheaper, more flexible, and are a more familiar programming environment for developers. However, a key obstacle for software routers is their (potential) inability to scale up to the speed required of a router in today’s Internet environment.
  2. Experimental results seemed to indicate that the setup tested in the paper with front side buses could not scale beyond 10 Gbps. This was due to the overhead in reading and writing packet descriptors. The paper made a number of suggested improvements, but the likely most promising is switching the architecture from using front-side buses to a multi-processor mesh architecture.
  3. Load-balanced routing is generally preferred over deterministic routing since it can more efficiently make use of the overall capacity of the interconnect. However, this choice leads to a few problems: packet reordering and constraints on fan-out for a large number of software router servers. Potential solutions including mini-flows (a burst of packets from the same flow are sent along the same path) and using different topologies.

(ii) The most glaring problem with the paper:

This paper proposes discusses two potential architectures (shared bus and multi-processor shared architecture), but indicates that the shared bus architecture will likely be bottlenecked by the memory bus at around 10 Gbps. They then show this with experimental results, leading them to say that a multi-processor shared architecture would have done much better. Unfortunately they have not yet verified these claims with a prototype of their architecture. So, it seems that in the absence of more than just theoretical (and optimistic) estimates (especially since their estimates for the shared bus architecture overlooked a limiting overhead), it is rather premature from the evidence the authors present to claim that they have found a solution for scalable software routers.

(iii) The future research directions of the work:

It seems the paper presents a novel direction in which research on software routers may head. A lot more work should probably be done to verify not only the scalability of the architecture, but other necessary properties such as reliability, security etc. I am also curious about the price scaling of software versus hardware routers (for example, a 10-port hardware router may be more expensive than a 10-server setup of a software router, but would that be the case still with more ports — say 100,000? In other words, is incremental cost of adding a port in a hardware router more than that of adding a server (and the additional links necessary to connect it to the existing topology). If not, then one of the advantages of having a software router might be lost).

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — yipiokayyay @ 4:18 pm
Tags: , , ,

(i) The three most important things the paper says:

1) The current school of research that assumes that software routers adopt a “single server as a router” will never be feasible. Instead the idea of a “clustered software router” solution shows more promise. The rest of the paper then discuss how this can be achieved with specific algorithms and conventional PC architecture.

2) Pin pointing the bottleneck of current conventional PCs that limit the performance at the FSB. Since they are arguing for the use of conventional PCs, this point immediately allows follow up research into how to solve this problem and make an improvement in the performance. Furthermore, they even propose non-radical software and hardware already in this paper.

3) Observation that a conventional PC cannot hope to scale a PC to the speed of a centralized scheduler. Although this may be seemingly trivial, it is actual very thought provoking, since it incites research to look for alternatives instead of doing the same thing as a centralized scheduler. Later in the paper, they even mention specific possibilities, such as interconnecting PCs to form a switching or sorting network.

(ii) The most glaring problem with the paper:

Although a lot of analysis was put into comparing different switching solutions, the paper would have benefited with some actual experimentation. Without experimentation, there may be unforeseen issues that they missed when comparing with pure calculations.

(iii) The future research directions of the work:

A good area for future research is to consider how conventional PCs can be used to beat dedicated hw routers. Since, even if the ideas proposed in this paper are successful and they manage to match speeds of dedicated HW, it may not be enough incentive for users to change. Because changes cost money, and users would only change if there was a compelling reason such as speed improvement or cost reduction.

 

Can Software Routers Scale? April 30, 2009

Filed under: R09. Can Software Routers Scale? — jwegan @ 4:17 pm

i)

1. In order to achieve the high performance required of a software router, there must be a very high attention to detail to eek out the most out of the hardware. For instance the authors initially thought memory was their bottleneck, but upon futher investiagtion they discovered that by cleverly laying out the packets in memory they were able to improve performance by 30%

2. In order to scale to speeds like 40 Gbps advances in hardware design such as direct cache access (DCA) or multi-processor mesh architectures will have to make their way into commercially available hardware.

3. In order to achieve a high throughput packet switching network it is important to be able to balance load. They propose using a load balancing algorithm such as the Valiant routing algorithm

ii) I think one flaw of the paper is not addressing the feasibility of removing RAM from the picture and simply using the on-chip L1, L2 and in some cases L3 caches. The caches are getting so large now it might be feasible to assess the possibility of simply having the processors pull the data directly into their caches and bypass main memory.

iii) This paper examined the feasibility of building a 10 Gbps or 40 Gbps software router based on comodity hardware. The next step would obviously be to go try and actually build one.

 

Can Software Routers Scale April 30, 2009

Filed under: R09. Can Software Routers Scale? — damedeiros @ 3:29 pm

The purpose of this article was to look into whether or not it would be possible for software routers to scale to performance levels equal to and even beyond what todays specialized routers are capable of through the use of routing clusters. A fully programmable router network would solve many of todays issues regarding upgradability, security, etc. and could fundamentally impact the way the internet works. The key points in this article were:

1.It appears that through increased cache size and highly accurate lookup table algorithms, it may be possible for PC hardware to scale to the needed speeds of packet processing. The authors are not conclusive on this issue as they did not place it through a full battery of tests. However, the tests that they conducted suggested that the packet processing would not be a bottleneck.

2.Several hardware improvements such as Direct I/O and multi-processor mesh server architectures, some of which are already available, should significantly improve performance by removing the current bottlenecks in a software router. The removal of these bottlenecks is very difficult for a general purpose PC but very likely can be implemented for one specifically configured to be a router.

3.It is becoming increasingly common to place several computers into a cluster in order to improve performance without improving hardware. It could be very possible to do this with software routers in order to cheaply improve the performance to acceptable levels

The biggest issue that I was with this paper was the lack of specifics involving improvements that could be made and the relatively few number of measurements made to test their theory. While this paper was obviously designed to inspire new research without being too long. I think that a page or two more of analysis would have benefited future research greatly in providing some more direction.

The future research of this paper is going on right now. Research into clustering routers, adding more functionality and extensibility, and virtualization of routers are all topics being actively researched in order to improve the capabilities of the internet. What is most interesting to me is the use of inexpensive clusters as a way to improve performance as I would like to see some programmability introduced to the routing structure as a way to improve the internet at large.