Graduate Networks, UCSD

CSE222 – Spring 2009

MIRO: Multi-path Interdomain ROuting May 29, 2009

Filed under: Extra Papers — ameenakel @ 8:19 pm
Tags: , , ,

(i) the three most important things the paper says

  1. One of the most important topics that this paper addresses is that BGP, in its current form, does not allow for any given AS to (in any reasonably automated way) request a certain route for its packets to traverse, regardless of the motive.  In fact, beyond sending that packet to its next AS-hop, the originating AS has no control whatsoever within the BGP implementation on the Internet today.  Some great examples mentioned in the paper about why this level of control (over which individual AS hops a stream of packets will travel) could be useful are security, performance/bandwidth, or price (among other things, I’m sure).  For example, a government agency may not want to route a certain set of transmissions through a country that is considered hostile and thus would like to request that a certain AS not be used, regardless of the fact that it may be on the shortest and most economical path.  This lack of control is the problem that the authors of this paper have set out to solve with their new scheme: MIRO.
  2. The authors also address the standing issues with the current (most popular) solutions for this problem: source routing and overlay networks, demonstrating why these solutions lack the proper scalability, lack the proper backwards compatibility (which would require a batch change of all Internet hardware), and require far too much overhead to be used as Internet-wide protocols.  Quite notably, on this same note, the authors made it a point to demonstrate why MIRO satisfied these two requirements (at the protocol level).  The authors brought up the point that MIRO continues to keep relatively the same amount of BGP-control messages between ASes as vanilla BGP (because of its pull-based alternate route advertisements), helping scalability.
  3. The authors also describe a method in which it would be possible for MIRO to operate on existing hardware on the intermediate routers of a given AS (to reiterate, not including edge BGP routers).  This is a great advantage of the MIRO solution, as most/all changes to the Internet backbone that require increasingly larger change rarely are implemented.  The solution involves treating an alternate route as if it were another IP address in the given AS’s inner network (essentially adding a level of indirection).  This would essentially require changes solely to the routing tables of each router instead of to the underlying software of hardware.

(ii) the most glaring problem with the paper

One of the largest problems with this paper (and it was quite difficult to find a topic that wasn’t addressed well by the authors) was that it did not address any performance metrics below the level of the protocol level.  Specifically, the authors did not investigate the impact of these new changes on the performance of edge BGP routers.  Since these routers are meant to move packets from their ingress ports to their egress ports (or vice-a-versa) as quickly and accurately as possible, adding to the latency of moving a packet from one end to the other is very undesirable.  If it is the case that this scheme adds significant overhead to these routers, this scheme may find greater resistance in its adoption (as it would also require a hardware or significant software change to each of the edge BGP routers–another big hurdle to overcome).

(iii) the future research directions of the work

Although mentioned briefly in the conclusion, security would be on the mind of the average grad-student reader of this paper from the very beginning.  The level of control can is given (insecurely) to any router that can send a packet is practically limitless, as long as the negotiating AS approves of its requests.  Some sort of secure handshaking protocol or digital signature technique would greatly help to strengthen this scheme.  Another great future direction of this work would be to investigate the true overhead created by this scheme versus the BGP baseline, along with he overheads of other popular replacements for or augmentations to BGP.  It would be useful to quantify the different overheads created by each scheme.

 

Network Coordinates in the Wild May 29, 2009

Summary:

Currently there is a debate about the behavior of real-world distributed systems based on network coordinates. Specifically, their inaccuracy, instability, and fragility in the presence of violations of the triangle inequality. This paper looks at a subset (10,000 nodes) of a deployed real-world network (Azureus BitTorrent), and provides techniques for improving as well as characterizing the behavior of the coordinate system. All coordinate messages are piggy-backed on to existing network messages. This means that the coordinate system must be able to function passively. The Azureus BitTorrent network is characterized as a network with a small dimensionality, a wide spread distribution of round trip times between nodes, and a large number of nodes that violate the triangle inequality. To mitigate these problems, Ledlie et. al. applied three new techniques. First, they added a latency filter. This filter is used to summarize the latency measurements observed between two nodes i and j. This provides a current and stable description of the expected latency between the two nodes. They found that a simple, short, moving median worked well for dealing with anomalous measurements and plateau shifts (actual changes of the latency between nodes). Second, they added an update filter. This filter addresses stability issues when receiving a frequent number of small updates. It uses a heuristic to determine when a change in latency is large enough that the application-level coordinate has undergone a significant migration to a new location relative to other coordinates. Finally, they added neighbor decay. This addresses the issue of the passive nature of the coordinate system and the transient nature of the nodes. In the Azureus BitTorrent network, a node might talk to a large number of transient nodes, but only talk to a small number of nodes consistently. This means that the local coordinate optimization process will become skewed towards this small set of nodes. By scaling the effect each neighbor has on the nodes coordinates with respect to the age of the data, it allows nodes that are heard from infrequently to have a longer lasting effect on the local optimization process. The effect is that data is incorporated into the coordinate system more smoothly, providing a stabilizing effect. They found that by using these techniques in the Azureus BitTorrent network, they provided a 43% improvement in relative error and a four orders-of-magnitude improvement in stability of the network coordinate system.

Three major contributions:

  1. Characterized a real-world internet scale network coordinate system that is currently deployed.
  2. Provided new techniques for managing neighbors in a passively maintained network coordinate system.
  3. Provided evidence that internet scale network coordinate systems can actually work in the real-world.

Major Problem:

Their subset of nodes in the Azureus network consisted of users that are automatically updating their clients to the latest SVN release. They said that these users exhibited normal network behavior, but gave no data to confirm this. Was this an accurate sample of the overall network?

Future implications:

I think this paper shows that in practice network coordinates are a viable way to build internet scale systems. This work was done on a file sharing network where nodes can adapt to changes in latency at a relatively slow pace. How would network coordinates work in an internet scale system that had strict QoS requirements (i.e. content delivery networks, data stream processors, VoIP)?

 

Structured Streams: a New Transport Abstraction May 25, 2009

Filed under: Extra Papers — pefaymon @ 4:17 pm
Tags: ,

The paper “Structured Streams: a New Transport Abstraction” by Bryan Ford of MIT proposes a network transport abstraction based on an application of the composite pattern to network connections between applications. The abstraction provides a lightweight mechanism to create substreams of existing network streams which are forming a tree-like inheritance structure. This approach has a number of advantages, including a reduction of overhead for applications which need to open a number of connections to the same host end point and increasing the level of control an application has over its opened connections. This provides an opportunity to favor the delivery of control packets over data stream packets for example for a multimedia streaming application.

The described Structured Stream Transport (SST) assembles a collection of design patterns current applications running over TCP and/or UDP exhibit and tries to address their flaws and combine their advantages into a new transport protocol. The protocol provides to an application the service of transporting an arbitrary number of ordered, reliable byte streams which are in an hereditary relationship to each other. On the other end, it relies on an underlying protocol which can handle packets and deliver them with best-effort delivery. In between, the proposal multiplexes multiple streams onto channels modeled after IPsec, to provide security and reliability (services which are needed by all streams).

Only for the top-level root stream a 3-way TCP handshake is performed, for the spawned child streams inherit the connection information implicitly by being mapped on the already established channel, therefore reducing overhead. Flow control and security is controlled on a per-stream basis, allowing flexible application designs. Since the congestion control operates on a per-channel basis, it is easier to distribute the network bandwith more fairly between applications, because the number of  open streams has no influence on the congestion control. The delivery of data packets up to the application happens on a per-stream basis again, so packet losses on one stream don’t influence other streams.

Furthermore, the paper indicates a number of ways in which applications can benefit from this abstraction by effectively reusing existing connections and utilizing the flow control / prioritization mechanisms for ongoing streams. Also, a prototypical implementation and experiments are presented. The results indicate the performance for simulated web traffic to be on par with HTTP/1.1 with pipelining, which is not enabled in many browsers due to compatibility concerns, as the paper states. The performance impact of the user space prototype is moderate and below 10% overhead.

What is missing is a more extended and elaborate experimental section, where the protocol would be used on a network with realistic side traffic from many hosts and where other applications than HTTP traffic are tested. This should provide valuable insight into whether it is feasible to include this structured mechanism into future applications and gives also more implementation experience for feedback on the developer’s end.

Future work for this protocol apart from extended experimental validation is to provide an application-level framework for networked applications to facilitate developer adoption. Also, there should be some research into the question if it is possible to leverage this abstraction asymmetrically, only on one side of a connection, at first.

 

High-Bandwidth Data Dissemination for Large-Scale Distributed Systems May 25, 2009

Filed under: Extra Papers — dgaschk @ 4:17 pm

The popularity of large scale file distribution overlay networks such as BitTorrent grows steadily. This document describes Bullet, a self-organizing overlay network that outperforms BitTorrent and other contemporary distribution systems. Bullet uses a hybrid push-pull model for distribution and eschews a tree based topology for a more efficient mesh based model. The source does not retransmit a block until all blocks have been sent at least once.

A tree based distribution model has been shown to offer poor performance to the leaves. The mesh based system allows the formation of dynamic peer groups that exchange data eliminating the starvation that occurs with a tree topology. An established algorithm for peer group formation is used. Under-performing peers and peers without useful data blocks are dropped allowing more useful peers to be added.

Data blocks are sent in disjoint block sets. This makes it more likely that a randomly chosen peer will have required missing pieces. The peer selection algorithm facilitates the formation of peer groups that have complementary data portions. It is not possible that some subset of peers has not received required missing data after the sources initial transmission.

The system is self-organizing. User configuration is kept to a minimum. Adaptive dynamic operation allows the system to perform well under a variety of conditions. Static configuration is not amenable to the changing conditions encountered on the Internet. Self-organizing dynamic operation allows Bullet to outperform its contemporaries.

The self organizing nature of the system does not prohibit interference from self created cross traffic. A greater understanding of the physical topology could help to alleviate this self generated interference. Optimal exploitation of the physical topology could further improve performance.

Overlay networks are unable to perceive physical topology and this limits performance. Physical topology information is available in router databases. Integrating router information into the peer group selection algorithm would improve performance. Network multicast is able to exploit physical topology but for a number of stated reasons it alone is not sufficient for high performance data dissemination. A hybrid overlay-multicast system may prove most efficient.  Localized multicast groups in combination with a self organizing overlay is an interesting research direction.

 

Analysis and simulation of a fair queuing algorithm May 25, 2009

Filed under: Extra Papers — nekumar @ 4:17 pm

Paper presents a fair queuing algorithm for congestion control in datagram networks. Goal is to provide a fair allocation of bandwidth, lower delay for sources using less than their full share of bandwidth.  Congestion at Gateways can be controlled through routing and queuing algorithms. Adaptive routing lessens the congestion by routing packets away from network bottlenecks. Queuing algorithms do not affect congestion directly because they do not change the total traffic on the gateway’s outgoing line. These algorithms determine the way in which packets from different sources interact with each other which in turn affects the collective behaviour of the flow control algorithms. This aspect makes it a crucial component in effective congestion control.

Queuing algorithms is essentially allocating three nearly independent quantities: bandwidth (which packet gets transmitted), promptness (when do these packets get transmitted), and buffer space (which and when packets get discarded by the gateways). Most commonly used queuing algorithm FCFS intertwines all these three aspects because order of arrival ompletely determines all these three metrics. This algorithm is not fair because depending upon the packet size one flow can get more resources compare to other flows. FCFS does not have the property of functioning well even in the presence of ill-behaved sources.  A single source sending data at high rate can capture an arbitrary high fraction of the bandwidth of the outgoing line.

Nagle proposed an algorithm called fair queuing in which gateways maintains separate queues for packets from each individual sources and queues are served in round-robin manner. This scheme prevents a source from arbitrarily increasing its share of the bandwidth or delay of other sources.

Paper proposes a modified version of Nagle’s algorithm and provides simulation data comparing networks with FQ gateways and those with FCFS gateways. As the other components of congestion control mechanism source flow control, gateway routing, and gateway queuing algorithms, interact in an interesting and complicated ways, it is impossible to assess the effectiveness of any algorithm without reference to the other components of the congestion control in operation. Paper evaluates the queuing algorithm with in context of static routing and several widely used flow control algorithms.

Three goals of the paper are: describe a new fair queuing algorithm, provide rigorous understanding of the performance of this new fair queuing algorithm and evaluate the algorithm in the context of real networks.

Paper defines fairness as follows: (1) no user receives more than requested.  (2) no other allocation scheme has a higher minimum allocation. (3) the above 2 should remain true even if the user with the minimum allocation is removed and the total resource is reduced accordingly.

Paper defines the user to be a source-destination pair, which appears to be the best tradeoff between security and efficiency. Fair queuing algorithm works as follows: whenever a packet finishes transmission, the next packet to be transmitted is the one that will finish being sent the earliest (in the non pre-emptive case). In order to account for promptness, paper suggests giving less delay to users who utilize less than their fair share of bandwidth. This gives lower queuing delay to lower throughput sources. Paper also present a delay-throughput curve to show the fairness of this queuing algorithm over the First-Come-First-Serve (FCFS) algorithm (delay tends to infinity as throughput reaches 100% of available bandwidth for fair queuing but not for  FCFS). They also show that the delay is lower for a low throughput source with fair queuing as compared to FCFS.

Paper then briefly describe several flow control algorithms used at the source and present simulations of using these algorithms in conjunction with fair queuing and FCFS. The simulations show the throughput, average RTT  of packets, number of packets retransmitted, and number of packets dropped  for each combination of flow control and queuing algorithms, and for each of  the six scenarios simulated (including underloaded gateway, overloaded  gateway, ill-behaved source, mixed flow control algorthims, multihop path,  and more complicated but underloaded network). Paper concluded that overall fair queuing always performed better that FCFS in terms of fair allocation of bandwidth and buffer space, and in terms of delay. Fair queuing shuts out ill-behaved sources (thus acting as a firewall) and also encourages the use of more intelligent, self-optimizing flow control algorithms at the source resulting in fair, protective, and stable networks.
The paper ends by talking about two objections to their approach: (1) some source-destination pairs need more that their fair share of bandwidth (for which they introduced a simple modification to their algorithm) and (2) deployment issues due to the requirement of fast and smart gateways (which they question themselves and leave as future work to determine).
The paper presents the arguments clearly, but it is limited in its comparison to only one other queuing algorithm and in its evaluation to only a few small networks. Also, paper simplify the flow control algorithms quite a bit, especially the explicit binary feedback algorithm for which they simplify the signalling criteria.

 

Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks May 25, 2009

Filed under: Extra Papers — nekumar @ 4:17 pm

Paper describes the difference between congestion avoidance techniques and congestion control mechanisms. In congestion avoidance techniques, goal is to operate at a point where delay is low and throughput is maximum without congesting the network.  Congestion control mechanism allows the system to recover from the congested state consisting of high delay and low throughput. Key difference between these lies in the fact that Congestion avoidance tries to operate network at a point here queues have just started filling up while congestion control mechanism tries to keep the network from dropping the packets due to full queues. Graphically paper defines congestion avoidance as the point where throughput starts saturating with load and congestion control point where it drops rapidly with the increase in load.

Heterogeneity of technologies result in rate mismatches resulting in the congestion. Congestion control or avoidance mechanism can be classified as dynamic resource management problem. Paper models this as a control system where state is defined by the network load and control is defined as the window or rate at which users put packets in the network.

Analysis presented in the paper was used to propose a mechanism by the authors known as binary feedback scheme. In this scheme resources monitor their usage and depending upon the load level it sends a binary feedback to the users incorporated in the packets. Assumption is that this works synchronously, that is, all users receive same feedback and reacts to it and next feedback is then generated after all users have reacted to the feedback.

Paper analyzes the case of de-centralized decision rather than centralized decision. In de-centralized system users make decisions but resources feed the information regarding current resource usage. Assumption that the system is operating in congestion avoidance mode implies that user demand is same as resource allocation.

Purpose of the congestion avoidance mechanism is to determine the change in window/data rate as a function of system feedback and previous state. This change can be represented simplistically as a linear function describing the next state as a function of previous state and two constants describing the multiplicative and additive coefficients.

Paper presents the constraints on these coefficients to be able to provide congestion a voidance mechanism supporting efficiency, fairness, distributedness, convergence. Paper defines these metrics qualitatively and quantitatively. Efficiency is defined as closeness of the total load on the network to the load corresponding to network load corresponding to empty buffers after which throughput saturates with the load. Fairness is defined as maxmin fairness criteria in which users are partitioned into equivalent classes according to the resource which is their primary bottleneck. Then fairness is defined as equal resource share of the bottleneck resource between the users belonging to same equivalence class. Paper also supports the idea of distributedness which means that resource only tells whether it is overloaded or under loaded via the binary feedback bits. Convergence is defined by two parameters first by the speed with which system approaches the goal state from any staring state. Due to binary nature of feedback this might not be the steady state. Oscillation around this state determines the smoothness of the control.

Paper describes the linear control mechanism with control theoretic approach and provides the constraints on additive and multiplicative coefficients. It further gives the graphical interpretations of of all these constraints. It addresses the all possible solutions that converge to efficient and fair state and comparison among them. Further addressing the distrubted critearia and optimal tradeoff of responsiveness and smoothness defing the convergence. Some important results are: Multiplicative increase/decrease of individual resources does not change the fairness metric.  It proves that additive increase/additive decrease control mechanism does not help to converge for fairness. It can converge to efficiency though. Another non-intuitive result is the fact that fairness does not need to go up during both increase and decrease.

In order to satisfy the requirements of distributed convergence to efficiency and fairness without truncation linear policy decrease should be multiplicative and the linear increase policy should always have an additive component optimally it might have a multiplicative component with the coefficient no less than one. For linear controls with truncation increase and decrease policy both can have both additive and multiplicative components.

It further shows a result quite familiar from the control theory that any attempt to increase responsiveness (settling time) results in decreased smoothness, and vice versa. For both the feasibility and optimal convergence to fairness, increase policy should be additive and decrease policy should be multiplicative. Additive increase gives the quickest convergence to fairness.

It further shows that decrease should always be multiplicative to ensure that at every step the fairness either increases or stays the same as that of current operating point. Optimization constrains further require that changes in fairness and efficiency be maximized every feedback cycle. Paper shows that additive increase with multiplicative decrease is best to achieve this.

Paper further explores the analysis of non-linear control schemes but concludes that these schemes require parameters which are sensitive to the system state and information. Paper proposes future direction in this work as inclusion of delayed feedback impact, marginal utility of increased bits of feedback, estimation of current number of users, and impact of asynchronous operations.

 

Endpoint Admission Control: Architectural Issues and Performance May 25, 2009

Filed under: Extra Papers — dgaschk @ 4:17 pm

Admission control is a way to allow traffic that does not respond to congestion, e.g. non-TCP, to traverse the network without unfairly consuming network capacity. Endpoint admission control requires the source of the flow to probe the network to determine if enough available capacity exists to support the oncoming flow. Only if the probe is successful does the data flow commence. Several techniques for endpoint admission control are compared. The success of these technique are furthered if existing facilities such as DIFF-SERVE is configured throughout the network. DIFF-SERVE allows separate queuing for admission controlled traffic; however, its efficacy is shown even when such queuing is not enabled.

Traffic that does not respond to congestion could “unfairly” consume network capacity on congested links. TCP responds to congestion by throttling the rate at which it transmits. Existing TCP sessions slow the rate at which they send when congestion is experienced allowing new sessions to have their fair share of capacity. Traffic that is non-responsive to congestion will continue to arrive at the congestion point at a constant rate, forcing compliant TCP sessions to slow down. Admission control determines if capacity exists to support the desired flow before the non-responsive flow is initiated.

The authors do not develop any new endpoint admission control techniques rather they survey and compare existing techniques by simulation. The study compares several queuing disciplines and several probing techniques over a variety of topologies and distinct traffic descriptions. Per-flow queuing is examined and discarded due to its unsuitability. A single queue for all traffic is examined as a legacy or in an incremental deployment scenario.

DIFF-SERVE configurations that enable 2 or 3 priority queues were the focus of the study. A 2 queue system places admission controlled traffic in the high queue and best effort in the low queue. A maximum rate is applied to the high priority queue. The maximum rate prohibits the admission controlled traffic from consuming all of the link capacity, but allows the link to go underutilized in the absence of best-effort traffic. A 3 queue configuration places best effort in the low queue and admission controlled data traffic in the high queue. The middle queue carries only the admission control probe packets. A maximum limit is placed on the aggregate of the high and middle queues.

Probe packets are used to determine if capacity exists before starting the data flow. Probe packets are sent at the rate of the oncoming flow. If sufficient loss is experienced then the flow is not started. Graduated or “slow-start” probe flows as well as probe duration are examined.

The authors conclude that endpoint admission control may be a viable solution for admission based traffic. However, their model did not include traffic descriptions that match modern compression algorithms. Modern compression algorithms for real-time traffic vary bandwidth requirements greatly over time. For example, voice compression algorithms typically suppress transmission during periods of silence. Only a single participant speaks at a time meaning that traffic will travel in only a single direction at a time. Probe packets may find that available capacity exists if they happen to travel during a silence period, allowing the erroneous admission of a new flow.

The authors point out that further work is required. They point out that their work does not prove the applicability of endpoint based admission control. They show its potential under a limited number of scenarios. Much further testing and simulation is required. Traffic descriptions that match modern real-time compression algorithms must be included. Deployment of this technique in an enterprise network is a policy decision; however, deployment in the public Internet will require legal agreements. Research into defining such agreements is needed.

 

The End-to-end Effects of Internet Path Selection May 18, 2009

Filed under: Extra Papers — subhramazumdar @ 11:17 pm

The path taken by packets in the internet depends on many factors like routing protocol and per network routing policies. The effect of these on the end-to-end performance experienced by end users is not well understood. This paper does a measurement based study comparing the “default” path taken in the internet to potential performance available along some alternative paths. Performance metrics like round trip time, loss rate and bandwidth were measured between geographically diverse internet hosts and it was found that in 30-80% of the cases there was an alternate path with significantly superior quality. While it is easy to measure the performance seen traversing the default path between two hosts, it is difficult to obtain the same metrics for alternate paths or even to discover what those alternate paths might be. To overcome this, the key observation used is that different hosts have different views of the network since they use different providers and use different degrees of connectivity. Because end-to-end internet paths are a conjunction of local routing policies, this allowed to compare the quality of the default path to the hypothetical path that routes around any inefficiency by traversing through a sequence of hosts. To explore the quality of alternate paths, three new large data sets of paire-wise host measurements were collected over an extended period of time. It was found that n terms of all the metrics viz. RTT, loss rate and bandwidth, there were a large fraction of default paths which had a significantly superior alternate path. They also proposed a hypothesis which claimed that these superior alternate paths were not due to a small number of contributing hosts, high bandwidth links or small number of efficient ASs. They supported their claim by experiments which removed the top most contributing entities in each case and re-computing the alternate paths. In all cases there were still significantly better alternate paths found for large fraction of the default paths. Finally they also experimented to find the contribution of propagation delay vs congestion delay and found that neither of them had a dominating effect on the inefficiency of the default paths and each of them play equally significant roles.

The major problem of the paper is the robustness and confidence of these measurements. Since measurements were not taken based on actual packet routing and computed from the aggregation of discrete performance measurements as observed by different hosts, the results can have significant error margin. They have used the mean for the metrics instead of mode which can be affected if the distribution is highly skewed. Variations in the data sets can also significantly contribute to the results. Some of the potential sources of variation is the upgrade of the network infrastructures during the traces. Time of day at which the measurements were carried on also effects the results. For example most parts of the network are heavily loaded with traffic during the weekdays while carry much less traffic during weekends. Variation of traffic within a day is also significant depending on the hours. Finally since with the increase of traffic along these alternate paths, even their performance can drop, the comparison is not justified to a great extent. Taking into account all these sources of variation, the final measurements can become quite fragile and less reliable. Although they have repeated their experiments in different hours of day over extended period of time, the robustness of the end results are still questionable.

Future directions of the work can include taking the measurements of such alternate path metrics by sending real traffic. This will require cooperation between different networks to allow traffic forwarding along paths that may be closed due to business reasons or other contractual agreements.  Also the effect of change in existing routing protocols must be explored that takes a global view of the network rather than local views based on individual networks, and try to optimize the overall performance. If only such protocols are deployed can the internet enjoy its full power which is otherwise hidden beneath various local policies and restrictions.

 

In VINI Veritas: Realistic and Controlled Network Experimentation May 18, 2009

Filed under: Extra Papers, Paper Reviews — supritapagad @ 11:17 pm
Tags: , , ,

1. Virtual network that provides control and realism

The paper identifies that the two characteristic features a virtual network needs to posses are realism and control. VINI uses traffic generated by real hosts. At the same time it provides a flexible framework that allows its users implement their own protocols over a topology they have complete control over. It implements virtual channels over physical channels while maintaining the characteristics of the latter intact.

2. Support for multiple users

It allows multiple users to deploy their simulation over the same set of VINI nodes. It identifies the issues involved. Each node is provided with a dedicated set of resources. Each user is isolated from the others and can maintain his/her own set of protocols and topology. It also provides for fair-share & real-time priority based  scheduling for CPU time. It ensures that each users simulation is isolated and unaffected  by the others running on the node.

3. The System

It implements the entire system using the Click software router and the XORP routing protocol over IIAS. The structure allows participating nodes to communicate with the rest of the Internet and also intercepts traffic generated by the rest of the Internet directed to the participating nodes. The model also allows multiple users to run their simulation over the same node by providing for separate user-spaces and UDP tunnels per user.

Loopholes

The CPU scheduling suggested by the paper to support multiple users does not provide complete isolation of the various simulations running on the node.The paper talks of fair-share with real time priority based scheduling. This it achieves by allowing a real-time process that becomes runnable to jump to the head of the queue preempting any non-real-time process. However, the number of real and non-real time processes running on a node could vary with time. This would mean a simulation could get different results on different runs of the simulation. This defeats one of the main objectives of simulation closely modelling a real system.

Forcing XORP to run in UML means a simulation would never be able model a routing protocol in the kernel level. This limits the flexibility provided by VINI.

Future Work

Currently the working of VINI has not be verified for multiple simulations running on the same node. This can be implemented and it can be verified that the simulation provides the same results for different runs of the simulation.

Also, enhancements could be made to reduce the amount of indirection in the system to provide greater speed than is currently being achieved. Or, the user applications can be given the flexibility to decide the layer in which they want to run and directly control how their routing protocol interacts with the underlying router.

 

A Delay Tolerant Network Architecture for Challenged Internets May 18, 2009

Filed under: Extra Papers — yipiokayyay @ 11:17 pm
Tags: , , ,

(i) The three most important things the paper says:

1) With challenged networks, you cannot assume that an end to end routing path exists.  Rather, routes can be better thought of as a “cascade of time-dependent contacts”.  They also parameterize contacts in terms of time available, capacity, latency, direction and most importantly a concept of “predictability” is built in.  This was an important observation as it allows DTP to take into account a challenged network’s end system characteristics like low duty cycle operations.   These were factors that existing solutions for challenged networks didn’t consider (e.g. PEP).

2) DTP chooses to use a “hop by hop” custody transfer of messages instead of “end to end”.  In a custody transfer mechanism, once a message arrives at the next hop (usually a DTP device), the sender’s job is considered complete.  This allows messages to be delivered reliability even in situations when only segments of a particular path may be on at a time.  Consider a case in which a sensor network needs to transfer information, however they all come powered on at different intervals.  This is yet another consideration that existing solutions didn’t consider.

3) Another factor that they considered in their solution that was not previously conceived was security between hop points in a challenged network.  They choose to implemenet security in the form of public/private key exchange between hops.  By doing it this way, DTP routers only need to store public keys of nodes of distance one away.  Therefore it makes key management much more simple.  Furthermore, it also supports the “hop by hop” custody transfer that they decided on earlier.

(ii) The most glaring problem with the paper:

The biggest problem with the paper is they “hand waved” a lot of the detailed proofs and explanations.  For instance, they mention that re-transmission issues can be solved with a “time out”, however no futher evidence or proof was given.  In addition, they said that taking into account “contacts” in different regions connected by DTP will allow the DTP to make good routing choices.  However no detail was given about how or what algorithms can be used to make that process reliable.  Some of these “hand waved” points are critical to the accuracy of this paper.

(iii) The future research directions of the work:

An interesting follow on work would be how their choice to use “hop by hop”, for reliability, can actually have an opposite effect.  Considering the overhead cost of a transferring messages via custody transfers, the next hop to the DTP can be compromised.  In such situation when it is compromised, would sending it via end to end solution afford more success?  Also does this change when messages sizes are small?  In the case of small messages, the overhead cost with custody transfer may cause delays that may prevent the message from ever being delivered.

 

Understanding the Performance of TCP Pacing May 18, 2009

Filed under: Extra Papers — mkhoang @ 11:17 pm
Tags: , ,

(i) Three most important things

1. According to queuing theory bursty traffic produces higher queueing delays, more packet losses, and lower throughput and it has been observed that TCP’s congestion control mechanisms can produce bursty traffic flows on high bandwidth and highly multiplexed networks. Based on intuition it would seem that TCP pacing, evenly spacing data transmission across a round-trip time, would be better for slow since packets are less likely to be dropped if they aren’t clumped and better for the network since competing flows will see less queueing delay and burst losses.

2. TCP pacing improves fairness without sacrificing throughput in some cases. TCP pacing achieves much better normalized fairness than TCP Reno with multiple flows of variable round-trip times, while achieving similar throughput.  The higher burstiness of TCP Reno due to the overlap of packet clusters from different flows becomes noticeable.

3. For most cases TCP pacing resulted in lower throughput and higher latencies. By evenly spreading out traffic, TCP pacing delays the congestion signals to the point where the network is already over-subscribed. Since in TCP senders continually increase their traffic in the absence of congestion, pacing can causes flows through the bottleneck to experience simultaneous losses known as “synchronized drops”.

(ii) Most glaring problem

The most glaring problem would be that the paper initially makes us aware that TCP’s congestion control mechanisms produce bursty traffic flows on high bandwidth and highly multiplexed networks and then shows us that TCP pacing degrades performance but doesn’t provide any other possible solutions to improve the negative impact of bursty traffic.

(iii) Future Research Directions

Future research directions for this work would be to use the results from the TCP pacing simulations to help explore other possible solutions to improve the queueing delays, packet losses, and low throughput of bursty traffic.

 

Consistent Overhead Byte Stuffing May 18, 2009

Filed under: Extra Papers — stufflebean @ 11:15 pm
Tags: , , , ,

This paper describes a method with which to stuff bytes into a packet in order to allow the existence of a packet delimiter which can unambiguously be discerned from packet data. Consistent overhead byte stuffing, or COBS, works by breaking the packet into chunks terminated by the byte they are trying to eliminate (zero for the purpose of this paper) and re-encoding the chunk with a length and a data field, neither of which contain the removed byte.

The main points of this paper are as follows:

  1. The existing bit and byte stuffing mechanisms–high-level data link control, or HDLC, and point-to-point, or PPP, encoding–generally have good average-case performance. However, their worst-case performance adds 20% overhead to HDLC and 100% overhead to PPP. In many scenarios, this is unacceptable, since some transmission media have a hard upper limit on the maximum transmission unit (MTU). One example is packet radio, where the FCC mandates a maximum transmission time, and therefore the MTU. In this case, the effective MTU for PPP is only half the actual MTU, since it cannot be determined a priori that a packet will not be a worst-case packet and thus incur the 100% overhead and be untransmittable. COBS, on the other hand, provides very tight bounds on its overhead. While there are some cases in which PPP manages a lower overhead than COBS, the upper bound for COBS is much lower, which in the case of a hard limit on MTU means that the effective MTU using COBS can be set much higher.
  2. Not only are the overheads for COBS much more uniform, but the upper bound is very low. COBS provides an upper bound of 0.40% overhead, since it only incurs an overhead if there is not a zero (zeroes are removed by COBS so that they can be used as packet delimiters). There are a few cases in which PPP outperforms COBS for small packets because COBS adds a “phantom zero” to the end of every packet so that every packet has a valid COBS encoding. This is solved by creating a variant encoding using “zero pair elimination” called COBS/ZPE which can actually reduce the size of the encoded data by setting aside part of the code space for portions of the code space ending in two zeroes instead of just one. Since the short packets which give COBS trouble relative to PPP tend to consist of mostly packet headers, and packet headers tend to have pairs of zero bytes, COBS/ZPE can usually encode these packets using no overhead, and in some cases can incur “negative overhead,” shrinking the size of the packet.
  3. While COBS is targeted at byte-level stuffing, the data link layer also has to perform bit framing. COBS makes this almost trivial, since a COBS-encoded bitstream does not contain any zero bytes, and therefore a sequence of 15 zero bits cannot possibly occur. This insight allows a framing marker of a one followed by fifteen zeros, which unambiguously marks the end of a frame. This is a huge advantage of COBS over PPP, since PPP still requires bit framing, which is typically performed by HDLC. As we mentioned earlier, HDLC has a worst-case overhead of 20%, much worse than COBS, and thus by eliminating the need for HDLC, they are able to make much more believable claims about their worst-case overhead.

The biggest weakness of this paper is that they talk about their worst-case overhead in terms of 255-byte chunks (since that is what COBS operates on). However, it seems that theeir overhead could actually be much worse for a 40-byte packet, if that packet contains no zeros. In that case, since they add the phantom byte, their overhead is actually 2.5% (41/40) instead of their quoted 0.40%. While it may be the case that packet headers usually contain a zero pair and therefore COBS/ZPE will eliminate the mandatory one-byte overhead, this is not assured.

Future research on this topic should look at larger traces of data to determine whether COBS/ZPE is really necessary, and if so, where they should set the threshold in their code space between one trailing zero and two. They admit that their decision is arbitrary and therefore more research might make their decision a bit more believable.

 

Tor: The Second-Generation Onion Router May 12, 2009

Filed under: Extra Papers — stufflebean @ 12:32 am
Tags: , , , ,

This paper describes the architecture of the Tor communication service. Tor provides an overlay on the Internet which provides relatively strong anonymizing guarantees for communication carried over it.

The main points of this paper are as follows:

  1. There are a great number of possible attacks against traffic over the Internet. There are timing attacks to determine the identity of a person accessing a given service, denial-of-service attacks to reduce or eliminate the functionality of a network, exploits which aim to impersonate a trusted service or user, and many others. The Tor network mostly aims to anonymize users, only preventing against other attacks to the extent that it makes the service more desirable or useful. The main mechanism by which Tor guarantees anonymity is by having many onion routers (ORs) over which communication is established in a circuit-based manner. Each hop in the circuit is encrypted using a different key, so as the encrypted message is passed along the circuit, one layer of encryption is “peeled away” at every step (hence the term “onion router”), only revealing the unencrypted data at the time it needs to exit the Tor overlay. This makes it very difficult to correlate data coming into the network with data leaving the network unless the attacker has compromised a large segment of the network.
  2. This system has an interesting blend of end-to-end and per-hop technologies. The onion unwrapping technique is necessarily per-hop, but the system also provides an end-to-end message verification to prevent a hostile OR from injecting or modifying the data in a packet. There is also a blend of centralization and distribution. The network should be comprised of many ORs in order to make it more difficult for a malicious observer to watch all endpoints for timing information, which can be used to deduce who is accessing a given service at a given time. However, to reduce overhead, directory services, which are used to provide an anonymous rendezvous for a responder service–normally only the requester’s identity is protected–are provided by a small number of centralized servers. This opens up further avenues of attack against the network, but should allow the system to operate more efficiently than one based on a gossip-based protocol. The reason that these dichotomies are interesting is that it makes the argument that there are always tradeoffs involved when choosing end-to-end/per-hop or centralized/distributed and that there are generally applications which require aspects of each.
  3. The paper also makes the point that no matter how well the Tor network is designed, it is only part of the communication process, and therefore can only provide a relatively limited set of protections against attack. For example, if a malicious user manages to commandeer a web server which an anonymous user is attempting to access through Tor, unless the user has used an application-level proxy to scrub any personally-identifying information from the HTTP request, then there might be information that could be used to track down the requester even though they used the Tor network. Tor also only provides so much deniability for a basic user. The default method for a user to connect to Tor is using an onion proxy, which packages his or her requests for transit to the overlay. However, since all of the data coming from the proxy consists of requests from that user, it makes it easier to identify which service the user is accessing through the use of traffic analysis at the exit nodes. This is not a failure of the Tor network, but an artifact of the service it provides. This limitation can also be mitigated by the user running a full OR, since this means that onion data leaving the user’s computer could either be a local request or just data forwarded on the behalf of other users.

The largest weakness that I see with the paper is that the Tor network, while effective, depends a great deal on there being a considerable number of trustworthy, reliable nodes to form the OR overlay. Since there is not much of an incentive for the average Internet user to go to such great lengths to protect his or her privacy, the overlay network might have problems becoming large enough to provide the deniability which is so valuable to the anonymous users. Further, since it introduces considerable latency into the communication path, it is unsuitable for some classes of applications which might benefit from anonymity, such as VoIP. It will be interesting to see how useful it is in large-scale deployment, especially when attackers have time to exploit many of the weaknesses listed in the paper.

Future research in this field should include ways to reduce the overhead of communication while incentivizing being a part of the overlay. If the overhead can be reduced to the point where it is tolerable for the average user, then the onion routing approach could be deployed on a very large scale, providing an unprecendented level of anonymity and, to some extent, privacy for the Internet at large.

 

A Framework for Scalable Global IP-Anycast (GIA) May 11, 2009

Filed under: Extra Papers — siva @ 12:39 pm

This paper proposes a scalable architecture for Global IP-Anycast (GIA). IP-anycast is a service that allows a sender to send packets to any one of a group of receivers that share an anycast address. It shares similarities to IP-multicast in that the sender does not know the actual identities of the recipients. However, in IP-multicast, the packets are delivered to all the hosts that have subscribed to the multicast group address while in IP-anycast, they are delivered to just one of the hosts that share the anycast address (ideally the closest host with the particular anycast address).

Although anycast was proposed several years back, one of the chief reasons for its low usage and deployment has been the lack of a scalable architecture. Traditional designs for providing IP-anycast either distribute routes globally to individual IP-anycast groups or confine the anycast group to a small topological region. While confinement reduces the problem of scalability, it does not make it a non issue in spite of the serious limitation of the limited area over which the anycast address can be used.

One of the primary contributions of GIA is the identification of the differences between unicast routing and anycast routing. While unicast routing scales through hierarchical aggregation, anycast defies any form of hierarchical aggregation. Anycast group topology need not comply with unicast topology. So the anycast routing protocol should recognize the characteristic of anycast and benefit from that to scale.

Another important contribution of the paper is the novel address architecture and address assignment. The concept of the home domain for every anycast address facilitates the use of default routes for those anycast addresses in other domains which avoids wasting any resources in the BGP routers. It also allows for easy distribution of administrative responsibilities of anycast addresses to different domains. This facilitates individual domains to control the anycast addresses that they provide to their customers.

The third and perhaps most important contribution of the paper is the actual anycast routing algorithm. The global anycast groups are divided into internal, popular external and unpopular external groups. Routing is performed differently for different categories of groups. The use of default routes reduces the state to be maintained in each BGP router as  well as reduces the scope of advertising of anycast addresses, but also results in non-optimal routes being chosen i.e. the host to which a particular anycast packet actually gets delivered might not be the actual closest host that shares that anycast address. In the worst case, the packet might go all the way to the home domain. So GIA uses a query based inter-domain routing protocol to identify nearby hosts that have a particular anycast address. This is done on based based on popularity of the anycast groups. Only the most popular ones have the routes cached in the BGP routers.

One of the primary challenges to IP-anycast is probably the lack of a killer app. One of the most obvious services that would benefit from IP-anycast would be websites – Several servers can be configured with the same anycast address and serve pages for the same site. While using a common anycast address for several webservers that service the same request has its benefits, most large webservice providers already achieve this through DNS redirection although the DNS service itself might be implemented using IP-anycast. Popular sites like Google, Yahoo, MSN etc, peer with several ISPs to ensure that the their clients are redirected to the closest servers. So IP anycast needs a killer app which goes beyond what can currently be achieved through DNS redirection.

Future directions for research could include writing applications that can benefit from IP anycast. IP anycast is typically best suited for stateless services. A better characterization of the kind of services that would benefit from IP anycast or extending IP anycast to be useful for other types of services is desirable. While IP anycast offers a good form of load balancing, it would be interesting to see how the distance function can be tailored for better load balancing in different deployment scenarios.

 

Modeling TCP Throughput: A Simple Model and its Empirical Validation May 11, 2009

Filed under: Extra Papers — yipiokayyay @ 12:38 pm
Tags: , , ,
(i) The three most important things the paper says:
1) This paper was able to show that the majority of TCP window decreases are due to time outs and not fast retransmits as previously thought.  This was a novel concept at the time and it became their basis for their model which takes into account time out loss indications.
2) They proposed a simplified equation/expression for TCP bandwidth in the paper (Equation 32).  This equation provided an output of performance for TCP taking into consideration “packet loss indication rate”, “RTT” and “max window size”.  Subsequently, they were able to prove the correctness of this model through comparison with traces of real data.
3) It was observed that during a period without loss indications, the window of the sender is dominated by the congestion avoidance algorithm and the receiver’s advertised window.  This shows that as a consequence, a period without loss indications will have a window grow up to Wmax, but will not grow further beyond this value.  This was an important observation in the basis of Equation 32, since it provided a bound for the window size.
(ii) The most glaring problem with the paper:
They made a couple of big assumptions in this paper, which they didn’t really provide significant reasoning and proof (e.g. triple ACK, window sizes, subtleties of fast recovery algorithm).  Their proof of correctness was linked to how their model stacked up against real data.  Since they didn’t dive too deeply into their assumptions, there may be cases (possibly big ones) that their model overlooks.
(iii) The future research directions of the work:
How does this model hold up in a wireless environment, where failures occur much more frequently.  Also, does the assumptions that were used to construct Equation 32 in this paper still hold true in 2009 (almost a decade later the paper was produced).  Assumptions such as, round trip time is independent of the window size being “mostly” true.  In light of such evolutions in technology and assumptions, perhaps “Equation 32” would need to be updated.

(i) The three most important things the paper says:

1) This paper was able to show that the majority of TCP window decreases are due to time outs and not fast retransmits as previously thought.  This was a novel concept at the time and it became their basis for their model which takes into account time out loss indications.

2) They proposed a simplified equation/expression for TCP bandwidth in the paper (Equation 32).  This equation provided an output of performance for TCP taking into consideration “packet loss indication rate”, “RTT” and “max window size”.  Subsequently, they were able to prove the correctness of this model through comparison with traces of real data.

3) It was observed that during a period without loss indications, the window of the sender is dominated by the congestion avoidance algorithm and the receiver’s advertised window.  This shows that as a consequence, a period without loss indications will have a window grow up to Wmax, but will not grow further beyond this value.  This was an important observation in the basis of Equation 32, since it provided a bound for the window size.

(ii) The most glaring problem with the paper:

They made a couple of big assumptions in this paper, which they didn’t really provide significant reasoning and proof (e.g. triple ACK, window sizes, subtleties of fast recovery algorithm).  Their proof of correctness was linked to how their model stacked up against real data.  Since they didn’t dive too deeply into their assumptions, there may be cases (possibly big ones) that their model overlooks.

(iii) The future research directions of the work:

How does this model hold up in a wireless environment, where failures occur much more frequently.  Also, does the assumptions that were used to construct Equation 32 in this paper still hold true in 2009 (almost a decade later the paper was produced).  Assumptions such as, round trip time is independent of the window size being “mostly” true.  In light of such evolutions in technology and assumptions, perhaps “Equation 32” would need to be updated.

 

Cloud Control with Distributed Rate Limiting May 11, 2009

Filed under: Extra Papers — vikrams3 @ 12:36 pm

This is one of the more relevant papers in light of increasing push towards cloud-based services. The main points addressed are:

  1. Motivation for traffic rate limiters: The paper elucidates on how rate limiting benefits both the parties involved in a cloud service: the service provider and the customer. The service provider can hope to reduce the wasted resources because DRL removed the artificial separation between access metering and geography. The customer will not be taken by surprise at massive bills, because he can be sure that his usage will not exceed the limit pre-determined with his service provider.
  2. Identification of the need to distribute the rate limiting of network traffic: Though the cloud services are inherently distributed across multiple sites, currently the rate limiting is done at each site statically without taking into account the flow rates of the same service at other sites. This could lead to under-utilization and unfair allocation. By introducing communication between rate limiters, each site will have the global view of the traffic rates, and can decide the local rate limit more accurately.
  3. Leveraging the congestion-control features of TCP: The paper correctly recognizes that TCP is the most dominant traffic in the Internet today. It takes advantage of the fact that TCP tries to achieve inter-flow fairness, by reducing the detailed information on packet arrival rates — called “Flow Proportional Share.”

One aspect of distributed systems that was not addressed sufficiently in the paper is fault tolerance. We could easily imagine a situation where messages between rate limiters are lost or a rate limiter fails. Unless taken care of, it would lead to bad estimations of traffic rates which will impact the algorithm for rate limiting.

As mentioned, further research can be done to enhance fault tolerance. We could use a broadcast protocol like GBCAST to ensure each site is aware of which other sites are currently in the group, or which sites have joined the group recently. Other aspect is to see if there is a business incentive for companies to implement DRL at all. Would they rather find conventional/simpler methods like static imposition of rate limits and occasional allowance for enormous bills for customers economically viable?

 

DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers May 11, 2009

Filed under: Extra Papers — vikrams3 @ 12:27 pm

Researchers at Microsoft Research introduce a novel topology for Data Center Networks (DCN) called DCell. The main points are:

  1. This paper describes the geometrical elegance of DCell, which is built recursively. The authors argue that such a topology gives a number of paths between any pair of end-hosts, and is hence more resilient to fault tolerance than the hierarchical tree topology.
  2. If there is one aspect that distinguishes this paper from others that have dealt with DCN topologies, it is in the rigor of mathematical proofs presented for various facts inherent in a DCell structure. The authors elaborate on the efficiency of routing in DCell by providing a bound on the properties like maximum path length and  diameter of the network. Also, they provide insight into the recursive properties of DCell.
  3. A good property of DCell is its ability to support incremental growth. It is very important to a data center facility, both on economic and management fronts, to be able to add more servers as the need arises. The authors have correctly identified this and show that a full DCell is not really necessary for functionality.

I have a number of concerns with regard to DCell. I am not convinced that the overhead of wiring explosion that DCell would introduce would be compensated by other benefits like fault tolerance. The robust, fault-tolerant broadcast mentioned in the paper, each sender at level k delivers the message to all its neighbors in level k-1. This seems inefficient leading to an increase in the number of messages in the average case.

The future for DCell holds if more analysis goes into whether it is economically viable for companies to rejig their existing infrastructure in favor of DCell. Typically this is deemed to be an expensive affair, and the benefits have to far exceed the problems associated with the functioning facilities, to convince the companies.

 

Practical Network Support for IP Traceback May 5, 2009

Three important contributions

1.) At first, this paper introduces a comparative analysis of Traceback methods to find the origins of denial-of-service (DOS) attacks. These methods try to map the anonymous (and eventually spoofed) IP source address of an attack origin to its originating network. The described methods begin with simple filtering, testing and logging methods on single routers, which are dependent on the support of each network operator on the attack path between the attack origin and the victim of the DOS-attack. Since this is a time-consuming effort and can only be employed while the attack is still underway, the authors introduce to sampling methods, like ICMP Traceback and their method, probabilistic marking. The overall idea behind this method is to inject information about the path into the network, either via special packets (ICMP Traceback) or by modifying existing packets (Marking).

2.) In general, the marking can be done in more than one way, by means of appending each node, inserting a random node, or inserting a random edge. The authors prefer an edge sampling method, where information about an edge is inserted with low probability into a packet to encode a certain pair of routers on the attack path. Furthermore, the distance from the attacker is kept. Because the marking is inserted into all packets at random, the whole attack graph between the attackers and the victim can be reconstructed due to the fact that farther-away edges are received less often by the victim.

3.) Implementation of this method is done by reusing the IP identification header field. Since the algorithms would use 72 bits (2 * 32 bits + 8 bits distance) and the IP identification header field is only 16 bytes long, a number of encoding tricks and fragmentation is used to transmit the inserted edge information into the normal IP network traffic. The authors deem the reuse of this IP header field to be feasible, because according to literature, only 0.25 percent of the IP network traffic is fragmented, due to techniques like Path MTU discovery. Still, the implementation is not completely backwards compatible.

Glaring Problem

The biggest issue with this method is the problem of widespread deployment, as network operators worldwide have to voluntarily implement this method on their routers, knowing that the implementation may disturb traffic even for a small percentage of their users. Furthermore, it is not at all clear how the knowledge of the attack origins is helpful in mitigating a really distributed denial of service attack by a large-scale botnet, as they’re common today.

Future work

The most appealing future work to me would be to investigate possible methods of mitigating attacks by distributed denial of service attacks at their sources and not at the victims end. The network (or ISPs) should be more aggressive in detecting malicious behaviour from users and be responsible for helping their clients to clean out these large-scale botnet infections on their networks.

 

Unraveling the Complexity of Network Management May 5, 2009

Filed under: Extra Papers — filipposeracini @ 2:10 pm

In the paper a suite of network complexity models is presented. This suite measures the complexity of a network in terms of referential complexity, router roles and inherent complexity. All these metrics focus on Layer-3 of the network stack. For each of the mentioned parameters, the authors created a set of metrics to analyze and assess the network.

The referential complexity models the complexity of the configuration file of the routers constituting the network. It is important to mention that routers’ configuration files in a network are often interrelated between each other, and an update to one of them can imply the update of several other files in several other routers. It is easy to see that this interrelation between different configuration files can make a trivial update challenging. It is indeed fundamental for the correct functioning of the network that the configuration files are consistent among each other.

The router roles analysis assesses the complexity of the network in terms of number of roles existing within the network and/or the number of roles that routers play simultaneously. Their algorithm is able to identify similar configuration files among the routers and hence to count how many roles exist in the network. The authors claim that networks become more complex and challenging to manage as the number of different roles in the network increases.

Finally, the inherent complexity refers to the impact of the reachability and access control policies on the network’s complexity. The basic idea behind this measurement is that the more fine grained the control (i.e. access control list) and the configuration of the network are, the harder is its inherent complexity, and hence the harder is the network to understand and manage.

What I really liked about this paper and the tool presented is that:

  • it gives a nice overview on what the main problems related to network management are.
  • the tool allows to understand the network configuration, structure and topology. This helped quite a lot the network operators with reasoning about the network and also with finding configuration bugs and inconsistencies. I found surprising to read that often the biggest part of the configuration file of network is a “dead” file, that is,  it is related to old configurations of the network system that have not be removed because to understand the effects of such removal would have taken too much time and effort. This tool eases this process.
  • what-if analysis. The tool allows to evaluate a priori how the complexity of a network would change if a particular update were performed. Hence, in this way it is easier to make a smarter decision on what is good or bad when updating the network.

The proposed tool has been tested with the help of network operators that evaluated the results of the program. What the authors did was to ask the operators whether the information retrieved by the tool were comprehensive and reflected the real implementation of the network, as well as whether they agreed with the estimated complexity. Although this approach sounds fine, the way it is presented in the paper leaves a lot of room to interpretation. In particular is not clear how much the operators really agreed with the results from the tool and when and where they did not. This is at my advice, one of the weakest point of the paper.

There are still many network features that this tool cannot assess, such as the effects of NAT packet transformations or the effects of dynamic events such as node failures and load balancing. Extend the tool to these features would be an interesting area of further reaserch.

 

TCP and Explicit Congestion Notification May 5, 2009

Filed under: Extra Papers — filipposeracini @ 2:10 pm

This paper discusses the use of Explicit Congestion Notification (ECN) mechanisms in the TCP/IP protocol.

The paper initially starts with an introduction on how the TCP/IP handles congestion over the network. In a standard situation, the TCP source detects congestion, in the form of dropped packets out of routets’ queues, either from the receipt of three duplicate ACKs or after the timeout of a retransmit order. As reaction to the congestion, the TCP reduces the congestion window. At the current state most of the routers in the Internet do not provide support for explicit notification of congestion nor they have mechanisms for the detection of incipient congestion.

Two possible techniques for ECN are ICMP Source Quench messages and setting an ECN field in the IP header in a ACK packet to notify the source that the route to the receiver is congested. The authors present a set of guidelines on how the TCP’s response to ECN mechanisms should be. In particular they highlight that:

  • The receipt of a single ECN should trigger a response to congestion.
  • TCP should react to an ECN at most once per round trip time. Hence it should ignore further ECN messages as well as it should not reach to three duplicate ACKs since a reaction to congestion has already taken place. In particular, the TCP source should not further reduce the congestion window.
  • The response to an ECN should not trigger the sending of any new or retransmitted data packets.

The paper compares in a comprehensive performance analysis three different gateway implementations in both a LAN and WAN environment:

  1. Drop-Tail gateway with no ECN mechanism. The drop-tail gateway drops incoming packets as soon as its queues are full
  2. RED gateway with no ECN. The Random Early Detection (RED) gateway detects incipient congestion and start dropping incoming packets once the queue lenght goes over a threshold. Packets are dropped accordingly a probabilistic algorithm.
  3. RED gateway with ECN field in the IP header

The results of the simulations over the LAN show that the RED gateway with ECN outperforms the other implementations in terms of lesser sensitivity to the network parameters, smaller delay and bigger throughput. In the WAN case instead, the throughput is similar among the three implementation. However, the RED gateway with ECN still presents a smaller daly for low-bandwidth interactive traffic.

The authors conclude by hoping a more widespread adoption of some ECN mechanisms, even though they admit that both the Source Quence and the ECN field in the IP header present difficulties for their implementation.

 

ROFL: Routing on Flat Labels May 4, 2009

Filed under: Extra Papers — giledelman @ 12:32 am
Tags: , ,

Three Important Things:

  • The motivation behind flat routing is the desire to separate location from identity. Having labels be flat implies that no location information is stored in a host identifier. This fundamental change brings numerous advantages for individual hosts such as mobility, multihoming, and stable identity. These are the benefits of location/identity separation. Removing location altogether brings further benefits; The paper proposes to reuse DNS infrastructure to provide naming, which would leverage existing resources. Allocation must only satisfy uniqueness and does not need to confirm to the physical characterisitics of the network.
  • Intradomain routing is a basic building block for the flat label scheme. A new arriving host is assigned to a first-hop router which is responsible for determining its “successor” and “predecessor,” effectively inserting it into a ring topolgy. In addition, the host is assigned an “external successor” for forwarding packets outside of the local domain. Packets are forwarded along the ring in a greedy fashion towards the destination host. The first-hop router can cache the path . Interdomain routing follows in similar syle, with the additional border router complexity for enforcing ISP policies. One can imagine each internal domain as a small ring, with the border routers for each domain forming their own large interdomain ring.
  • In order to ground their claims of viability, the authors gathered traces from 4 moderate sized (200-600 internal routers) ISPs and performed several simulations. Standout results were 1MB of path caching on a typical router, 40ms host setup latency, and 45 control messages generated per host per join request. These numbers were for intradomain routing, and were used to extrapolate information about Internet scale networks. The authors felt they could do this because of the similarities between interdomain and intradomain routing.

Glaring Problem:

Though several interesting and exciting potential benefits of ROFL were presented, not all of them were explored in depth in the discussion that followed. Possible security applications were only briefly mentioned, and the adaptation of DNS to work with ROFL was not discussed. Host mobility was implied (through a rejoining operation) but not explicitly described.

Future Work:

Given that the performance simulation results for flat routing were largely unspectacular, it would be interesting to explore a hybrid flat approach. In such a scheme a small (possible fixed) number of hierarchical naming levels would exist. The requirement that location be separate from identity would thus be relaxed a little, and so perhaps desirable properties like mobility would be possible not everywhere but within a large geographic area. This would perhaps allow better scalability and performance, making ROFL a viable candidate for deployment.

 

Trickle: A Self-Regulating Algorithm for Code Propagation and Maintenance in Wireless Sensor Networks May 2, 2009

Filed under: Extra Papers — mohit1982 @ 9:16 pm

This paper proposes an algorithm, Trickle, for propagating and maintaining codes in wireless sensor networks. It uses a “policy gossip” policy where motes  periodically broadcast a code summary to local neighbors but stay quiet if the summaries they have heard recently are identical to what they intended to broadcast.

Some challenges in context with wireless sensor networks have been discussed. The reprogramming protocol must send few packets and propagate the new code quickly to be effective. Also, sensor networks exhibit highly transient loss patterns and asymmetric links. The code propagation to all the motes in the network is big challenge since the network membership is not static and therefore, propagation has to be continuous which puts increasing demands on energy and network lifetimes.

The paper discusses three crucial properties, an algorithm for reprogramming the sensor network must possess to be effective. These are:

  • The algorithm shouldn’t have high maintenance overhead which essentially means that metadata exchanges should be infrequent after the network reaches a stable state.
  • It should allow for rapid propagation of code and code must propagate to every mote. Propagation shouldn’t take more than a minute or two.
  • The algorithm should scale with increasing network density.

Trickle satisfies these three requirements. It imposes maintenance overhead on the order of a few packets an hour. It breaks the time into intervals τ and at a random point in each interval, it considers broadcasting its code metadata in accordance with “polite gossip”. The random selection of time point in the interval evenly spreads the transmission energy load across the network. The authors mention that maintenance without synchronization of time intervals can result in short-listen problem and therefore, they modify the trickle algorithm by making the first half of the interval as “listen only” period. The authors achieve the second merit by dynamically scaling τ which allows rapidly propagating updates with a very small cost. There algorithm works in O (logn) of the number of motes and hence, scales for widely varying network densities.

The paper evaluates the implementation of the proposed algorithm, Mate using a Trickle-specific algorithmic simulator, TOSSIM and perform both bit-level and packet-level simulations. The paper ignores the actual policy to propagate code once Trickle detects the need to do so. Also, the algorithm assumes that motes are always on which is far from reality and interferes with power saving goals.

Future work can be done in analyzing the algorithm performance by choosing a model for motes where the motes are on and off with a probability to enforce a realistic setting.

 

Span: An Energy-Efficient Coordination Algorithm for Topology Maintenance in Ad Hoc Wireless Networks May 2, 2009

Filed under: Extra Papers — mohit1982 @ 9:16 pm

This paper propose an algorithm Span which allows the most of the nodes in the ad-hoc wireless network to go to power saving sleep mode without compromising the connectivity or the capacity of the network. It is a distributed randomized algorithm where nodes in the network make local decisions on whether to sleep or coordinate in forwarding the packets. The nodes which take the role of forwarding the packets are called coordinators and are rotated with time to ensure fairness. The authors claim that the proposed algorithm improves communication latency, capacity and system lifetime.

The paper mentions that good power-saving coordination techniques for wireless ad-hoc networks must possess the following characteristics:

  • It should try to maximize the number of nodes with radio receivers off since an idle receive circuit consumes almost as much as an active transmitter.
  • It should forward packets between any source and destination with minimally more delay than if all nodes were awake.
  • The awake node should provide as much capacity as the original network to counter congestion.
  • It should not make assumptions about link layer’s facilities for sleeping and should inter-operate correctly with the routing system used by the ad-hoc network.  

Span achieves the above requirements. It ensures that enough coordinators are elected so that every node is in radio range of at least one coordinator. The paper lay down rules for a node to become a coordinator. It accounts for the conflict resolution in coordinator election using a randomized “slotting-and-damping” method. The method takes into account the left energy with a node over maximum energy available for the node, the utility factor used and a random value in the interval [0, 1] to arrive at a damping value. The algorithm allows for withdrawal of coordinator role to give other nodes chance to become coordinators and also introduces a “grace period” to maintain consistency in the routing protocol till new coordinator takes over the old one.

The paper describes the implementation of the Span algorithm and uses geographic forwarding algorithm in order to advertise to other coordinators. Their algorithm also implements the MAC-layer failure feedback and interface queue traversal. The authors present a performance comparison of the algorithm when used in conjunction with 802.11 power saving mode with the performance of 802.11 power saving mode alone. They report the power consumption of the 802.11 network interface card in transmitter, receiver, idle and sleeping mode and points to a stark difference between the power consumed in idle and sleeping modes.  They ran their simulations on both static and mobile network topologies to determine its effectiveness. The reported results show how Span performs better than 802.11 alone in capacity preservation and decreasing latency.  Simulation results also show that Span elect more coordinators but it divides the responsibility of being a coordinator equally among all the nodes. The most interesting result of the simulation showed that network lifetime is more than a factor of two greater with Span than without using Span.

Future work can be investigation into a more robust and efficient power saving MAC layer, one that minimizes the amount of time each node in power saving mode must stay up.

 

The Case for Separating Routing from Routers April 27, 2009

Filed under: Extra Papers — subhramazumdar @ 5:51 pm

The paper proposes the idea of introducing a Routing Control Protocol (RCP) that is intended to separate the routing information computation overhead from the actual routers, thus leaving them with only the responsibility of forwarding packets by lookups that can be done in efficient way. The complexity of the internet is inherent due to the limitations of today’s distributed infrastructure to scalably cope with the new requirements. Thus the RCP should select the routes on behalf of the routers by gathering the route information from them. The information across various routers is infact not replicated but decomposed. Thus each node does not have a complete consistent view of the entire network. Also distributed path selection causes routing decisions at one router to depend on the configuration of another. Computing routes on a networkwide basis using a consistent view of routing state can reduce interdomain routing’s dependencies on these subtle details. Also dividing functionality into distinct modules with clear interfaces helps control complexity. For example inconsistencies between iBGP and IGP routers can cause forwarding loops and routing oscillations. Hence instead of being ignorant about the IGP forwarding paths the routing can use the available knowledge to explicitly enforce consistency in the router level forwarding paths. Finally small changes in the IGP topology can trigger unnecessary shifts in the BGP routes. By controlling the path selection for each router, RCP can force routers to continue using an egress point (border gateway of AS) even when link failure or small IGP costs change make one egress point slightly closer than another. Thus the routing decisions can be implemented by RCP as a single logically centralized entity in each domain. This centralized logic must be actually implemented in a physically distributed fashion to avoid a single point of failure.

Such separation of route states from routers can potentially introduce robustness, scalability, speed and consistency problems: the challenges that RCP must be able to handle. RCP can become a single point of failure since it serves as a single repository for route information. For these RCP should be distributes across several RCP servers (RCS) which must maintain a consistent view of the available routes to ensure that all routers receive consistent and loop-free paths. Hence the types of inconsistencies that can result from various combinations of partitioning of the ASs has to be studied and taken care of. Also RCP must be able to handle thousands of BGB sessions each with thousands of routes. Such huge memory and computation intensive jobs must be distributed across many physical machines which is difficult and complex. RCPs must also compute the route information and propagate the results in a timely fashion as BGP and IGP topologies change. Hence the convergence speed of such routes should not be worse than in today’s routing architecture. Finally transient inconsistencies may occur if the routers donot receive updates from RCP in a certain order. Such inconsistencies can cause loop formation which are as worse as those occurring in current BGP convergence and hence needs serious attention.

Future directions of the work may include simplification of the underlying routing mechanisms which can in turn simplify configuration languages. For example configuring routing policy using RCP will obviate the need for implementing high level tasks with communities and complex import and export policies on individual routers. Locating configuration state at RCP should make it easier for the operators to specify high level tasks leaving the implementation details to the RCP. The RCP should also impose various invariant conditions on the network configuration to guarantee correctness. Defining those invariants should be an area for future work. Also being a repository of routing states it can help the operators debug routing and performance problems. Finally RCP can improve routing efficiency by aggregating prefixes for a particular routers forwarding table if it can determine that the router will make the same forwarding decision for all of the specific routes. Efficient aggregation of such contiguous prefixes is also left as an open question.

 

Towards an Evolvable Internet Architecture April 27, 2009

Filed under: Extra Papers — krishnanadh @ 5:50 pm

The paper describes the framework of changes that need to be done to contribute towards the evolution of the Internet. The most basic change as described by the authors is to make suitable changes in the IP layer. One naïve solution could be to augment the existing Internet with multiple overlay networks implementing newer architectures. But this as the authors caution would lead to revolution as against evolution since every individual ISP would come up its own augmented IP architecture. The idea behind any evolving change in the Internet should still be to provide Universal access and the authors provide the mechanisms for the universal deployment of an IPvN network given an existing starting point IPv(N-1) network.

Universal access could be implemented by IP multicast wherein each content provider would interface with an IPvN ISP and enable users to access IPvN networks through multicast. But this proposal has a problem of partial deployment of services by the content providers as it requires migration of their applications to support IP multicast. Another strategy for deploying IPvN could be redirection either at the application-level or at the network-level. Application level redirection would involve looking up a service to identify the ISP domains/routers supporting IPvN and the clients could then tunnel their packets through. Lookup service implementation could either by done by the ISPs or third party brokers, but the authors suggest that either form of lookup is infeasible. The first form would require manual lookup/hash-table configuration at the ISP and the second the form would require change in the market structure (i.e. introducing thridy party brokers). Given these limitations of lookup/application-level redirection based IPvN deployment, the authors then put forth the idea of network-level redirection and build a framework around it. They give three mechanisms, namely, anycast, vN-bone construction and packet forwarding in the IP layer to support IPvN.

Network-level redirection means a mechanism in which a network router automatically redirects packets to an IPvN router, but clients should be able to initially connect to non-IPvN offering ISPs and still access IPvN capabilities. This enablement is achieved by the three mechanisms described above. In anycast, a packet is routed to nearest best possible server providing IPvN and thence the IPvN packet is encapsulated in IPv(N-1). This simple anycast strategy allows seamless spread of deployment and reuses existing unicast IP routing infrastructure. But a potential problem of IP anycast is that it is limited to certain individual domains and hence requires the development of a concrete routing scheme. Anycast could be categorized as intra-domain (within the network of IPvN supports) and inter-domain (within a hybrid network of IPv(N-1) and IPvN routers). For instance, in a link-state anycast strategy, intra-domain routing can be accomplished by associating a high-cost ‘advertisement’ with the anycast router. While intra-domain anycast is accomplished relatively simply, inter domain anycast would require global routes with non-aggregatable addresses or aggregatable addresses default routes. The first case would require the designation of a portion of unicast address to anycast and make requires policy changes (not mechanism changes) in BGP support these addresses and has a problem of all ISPs having to support anycast propagation. The second case of aggregatable addres with default routes would require anycast addresses to be allocated from the unicast address space of the first ISP to initiate deployment of IPvN and IPvN routers to be configured to advertise the anycast address in their IGP. Additional ISPs that adopt IPvN also then need to configure their IPvN routers to advertise the same anycast address internally. The paper then describes the formation of a virtual network bone or vN-bone for cooperation between the ‘existing’ IPvN routers built by various ISPs to cooperate and forward the packet. This would require vN-bone spanning multiple ISPs to be built as an overlay over IPv(N-1). Similar to anycast, routing with vn-bone is divided by the authors into intra-domain and inter-domain. Intra-domain vN-bone would simply involve each IPvN router to pick out the k closest routers running IPvN. Whereas, intern-domain vN-bone routing requires the ISPs to setup intern-domain tunnels based on peering policies with other ISPs. Once a vN-bone is successfully established, routing between endhosts then boils down to the problem of routing between ingress and egress vn-bone routers.

Lastly the papers discusses packet forwarding which can be accomplished by leveraging the anycast and vN-bone infrastructure described above. To start with the source encapsulates the IPvN packet in an IPv(N-1) header with destination A(n-1). Then using anycast, the packet is forwarded over legacy IPv(N-1) routers to the closest IPvN router. The IPvN router then strips off the IPv(N-1) header, processes the packet as needed, looks up the next hop to the destination using the vN-Bone forwarding tables, and forwards the packet to the next IPvN router and so on. This is repeated until the packet reaches the egress IPvN router which tunnels the packet through to the destination.

The paper concludes by providing another broad alternative to the above strategies in the form of SSM or source specific multicast which again can function even with partial cooperation among ISPs. The authors reiterate the need for the evolution of internet to support exponentially multiplying endhosts by the day. They propose the strategies above to which can they claim can be seamlessly integrated in the IP framework of today and claim that these changes will encourage the ISPs for immediate adoption and benefit end-users. Although the claims of the authors are relevant and thought provoking, a possible problem is the non-explanation of economics that would encourage the ISPs of today to adopt these IP augmentation policies. The commercial viability of creating anycast and vN-bones need to be practically explored, probably by experimentation, to attract the ISPs to supporting such IP architectures. Finally security concerns and attack vulnerability of such hybrid IPvN/IPv(N-1) architectures need to be thoroughly considered and dealt with and they are made no stab at in the paper.

 

Multicast Routing in Datagram Internetworks and Extended LANs April 27, 2009

Filed under: Extra Papers — koderaks @ 5:50 pm
Tags: ,

This paper talks about extending multicast beyond a single LAN. At the time of writing (1990) multicast was feasible in LANs, but not across LANs. In order to support multicast across multiple LANs, the paper offers extensions to three common routing algorithms for unicast:

  1. Single-Spanning Tree Multicast Routing: Existing bridges of the day would forward broadcast and multicast packets to every segment of the extended LAN. To save bandwidth and processing time, the authors suggest one multicast member from each LAN to periodically send a membership-report to all bridges. When a bridge receives a membership-report from a multicast group G, it remembers which direction the packet arrived from. Any subsequent packets destined to group G would then be only forwarded along the direction of that group. For example if bridge B receives a membership packet from LAN C destined to LAN D, with multicast group G, then the bridge learns to forward packets of G from LAN C to LAN D, but not vice versa nor to any other LAN. This saves bandwidth that would otherwise be wasted by forwarding multicast packets along all segments connected to the bridge
  2. Distance Vector Multicast Routing: by sending routing packets, routers maintain distance vectors to all other routers, thus enabling them to compute shortest-path routes to all other nodes in the network. The authors build on top of several existing unicast algorithms. Using Reverse Path Flooding (RPF), a router forwards a broadcast message from source S to all other links, iff the packet arrived from the shortest path from the router to S. Reverse Path Broadcasting (RPB) modified RPF so that packets are not forwarded to the parent router — router with minimum distance to S. This provides shortest-path broadcasting while avoiding duplicate messages. Truncated RPB (TRPB) builds on top of this but deletes child links that no router uses to reach the multicast sender. Finally, Reverse Path Multicasting (RPM) narrows the broadcast tree further by receiving non-membership messages from routers that are not connected to the multicast-group, and by removing these non-members from the multicast spanning tree.
  3. Link-state Multicast Routing: Using this approach, each router monitors its attached links. Upon detecting a change in a link state, all routers attached to that link broadcast the change to every other router. By doing so, each router learns the complete topology of the internetwork, and can find shortest paths to any node in the network using Dijkstra’s algorithm. To extend this algorithm to support shortest-path multicast, the routers would consider creation of new groups or deletion of old groups as a state change, and broadcast this multicast-group-change to all other routers. Like the algorithms above, the hosts periodically send membership-reports to the routers in order to maintain multicast membership information.

Most glaring problem: The paper proposes many algorithms and ideas, but there is no implementation. The authors speculate the efficiency of their algorithm without providing any experimental results or simulations. Also the algorithms presented are non-trivial, and would require extra router memory, as well as increased processing. It is my impression that there might be too much overhead using the distance-vector algorithms. The link-state algorithm seems very simple and efficient but might not be practical across LANs because the state change is broadcast to the entire network.

Future research: it would be interesting to extend the distance-vector algorithms to use source-routing (I am not aware whether they do this or not). By doing so, a router who finds a shortest path for the multicast packet could specify this path and relieve the other routers from the burden of calculating shortest paths again.

 

Realistic and Responsive Network Traffic Generation April 25, 2009

Filed under: Extra Papers — Mike @ 3:50 pm

(i) The three most important things the paper says:

1. The authors present a new traffic generator named Swing. The generator uses a trace measured at a single point in the network and automatically finds distributions for user, application, and network behavior. It also captures burstiness in ackets and individual bytes as they are flung across it.

2. Swing then has the capabilities to randomly recreate these conditions. This means you can take a sample of the network you will be deploying an application on, then you simulate network performance of your application or network hardware. It is important for the generated trace to be random and not just recreate the trace exactly so unexpected behavior is encountered and you can gain confidence in you system under various conditions.

3. Swing then goes one step further to let you change parameters to extrapolate network conditions. You are able to increase traffic, the prevalence of a certain protocol, or burstiness of your desired conditions.  This allows the user to test his machine in conditions that aren’t readily available even if he were to deploy his system.

(ii) The most glaring problem with the paper:

This paper did not have as glaring problems as the others I have read this far. That being said, something that bothered me is the authors mentions that having a session generator offered better response towards matching network conditions but a lot of randomness was lost. I would think they would have said that this could be something worth pursuing or looking for ways to get around this problem in the future.

(iii) The future research directions of the work:

A feature desired by the authors but not yet deployed is the ability to instead of taking averages and deviation of network data, it would be nice to be able to capture that something, say bandwidth is steadily increasing throughout the trace. A possibility that I find intriguing, not mentioned by the author, would be using multiple traces taken at the same spot that represent different conditions that occur at the location as the seed to the generator. The author lists many other of his own future directions in the future work section.

 

TCP Congestion Control with a Misbehaving Receiver April 25, 2009

(i) Three most important things

1. TCP was designed for a cooperative environment which as a result contains several vulnerabilities that the receiver can exploit to obtain improved service at the expense of others on the network or to implement a denial-of-service attack. End-to-end congestion control mechanisms were implemented for sharing scarce bandwidth between users on the Internet but require both endpoints to cooperate in determining a proper rate at which to send data. However, a misbehaving receiver may send more data quickly than the cooperating host, possibly forcing competing traffic to be delayed or discarded.

2. There are three different approaches the receiver can take to exploit the vulnerabilities of TCP that allow the receiver to enhance performance. Because of the incongruence between byte granularity error control and segment granularity congestion control, the receiver can divide the ACK for a segment received into multiple ACKs so that the TCP sender will grow the congestion window at a faster rate than usual. The receiver can also send duplicate ACKs which tells the sender that segments are leaving the network so the congestion window inflates in order to reflect the additional segments that have left the network. Lastly, the receiver can send ACKs before the data has actually reached the receiver so that the sender will think the round trip time was short and send segments faster.

3. The design of TCP can be modified so that the receiver cannot exploit the vulnerabilities of TCP that allow the receiver to improve service at the expense of other clients on the network, without requiring that the receiver be trusted in any manner. One of the solutions described was the Cumulative Nonce, which requires that the sender sends a unique random number with each segment and the receiver either echoes the current nonce sum (the cumulative sum of all in-sequence acknowledged nonces) or the nonce value sent by the sender, for each ACK in response to a data segment.

(ii) Most glaring problem

The most glaring problem would be that the nonce solution to the majority of the possible misbehaving receiver attacks requires the modification of clients and servers and an addition of a TCP field. This would not be solution that would likely get implemented because the solution requires a whole makeover to TCP that would take a lot of time and effort.

(iii) Future Research Directions

Future research directions for this work would be to see how this research could be extended to other protocols. The Cumulative Nonce could be adapted to any sender-based congestion control scheme which could prove to be more conducive for unreliable transports because it effectively defines a sequencing mechanism between untrusted parties.

 

Timing Analysis of Keystrokes and Timing Attacks on SSH April 25, 2009

Filed under: Extra Papers — jwegan @ 1:04 pm

This paper investigates using timing analysis of packets sent during an SSH session to try and recover users passwords. The paper notes there are two main weaknesses in SSH that can be used as a vector for an attack

1) Packets are padded only to an eight-byte boundary if a block cipher is used

2) In interactive mode, every individual keystroke a user types is sent in a packet right when it is pressed.

The authors proposed using the time between keystrokes to help determine what the password might be. They note that users tend to type their password very quickly without much thought since they type them so often and also that users tend to type out their passwords in groups of 3-4 characters with a small pause before typing the next group of characters.

The approach they used is break down a password into a series of keystroke pairs, so for instance a password of length n would have n-1 keystroke pairs. Next, they conducted studies where they brought in people and had them practice typing the same randomly selected keystroke pair like (e,v) 40 times and then recorded the time between keystrokes. They did this for 142 keystroke pairs and then identified 5 major classes of keystroke combination that had similar time spacings for pairs that fell within that class.

1) Two letters typed with different hands

2) A number and a letter typed with different hands

3) Two letters typed with the same hand, but different fingers

4) Two letters typed with the same finger

5) A letter and a number typed with the same hand

They then used this information to develop hidden Markov model. When given a sequence of timings between keystrokes based on observing the spacing between packets sent from one host to another they then used the n-Viterbi algorithm to select the top n most likely sequence of keystrokes for the password based on the hidden Markov model and the observed timings between keystrokes. They could then attempt to break the password by trying each password that the model indicated was a likely candidate for the password. Overall the results indicated that using this method, only 1/50th login attempts had to be made in comparison with a brute force approach.

The most glaring problem is the authors only conducted the study for users using randomly generated passwords instead of passwords they people might actually use. Some arguments were made about why the approach would likely work for normal passwords, but it wouldn’t have been much trouble for them to actually try it out on a couple human selected passwords. The future research direction would be to see how this attack translates to normal user selected passwords.

 

Untraceable Electronic Mail, Return Addresses and Digital Pseudonyms April 21, 2009

Filed under: Extra Papers — gracewangcse222 @ 11:30 am

(i) The three most important things the paper says:

  1. A problem with using traditional public key cryptography in electronic mail systems (that is, using a cryptographic system such as RSA to encrypt/decrypt messages) is that although the contents of the sealed messages are safe from prying eyes, other information such as the origin and destination of the messages may be easily determined, and messages may be tampered with (added, deleted or changed). In other words, cryptography is employed to guarantee message secrecy, but authentication is not ensured. The paper attempts to obtain some additional security for messages in a mail system even when one or more authorities may be malicious. The idea is to send messages through a number of “mix” computers and to obfuscate the sender’s identity through a layered series of encryptions, so that the sender’s identity is hidden even if all but one of the authorities is malicious.
  2. To send a message, the participant prepares a message Kn(Rn, Kn-1(Rn-1, …, K1(R1, Ka(R0, M), A) …)) (where Ki is the public key of mix i, Ri is a random string, Ka is the addressee’s public key, M is the original message and A is the destination address) which passes through a cascade of n mixes. Each mix peels away a layer of the encryption to determine the “next hop” until the message is delivered to the recipient. A similar idea is used to allow the receiver to respond to the sender while unable to detect the sender’s identity by using an untraceable return address embedded in the prepared response.
  3. The idea of being able to send anonymous messages is also useful in the context of a general election. Applicants send an anonymous letter to the voting authority with their digital pseudonym, a public key that may be used to verify their future actions, as well as other information that would allow the authority to decide whether to accept the application or not. The authority can then prepare a roster from the accepted pseudonyms. Ballots received from applicants are similar to the prepared inputs for n mixes described above. Once the input reaches the authority (the recipient), it is of the form K, K-1(C, V), where K is the pseudonym, K-1 is the corresponding private key of the applicant, C is some constant and V is the vote, the authority verifies that the pseudonym matches an entry in the roster and then decrypts and counts the vote.

(ii) The most glaring problem with the paper:

At the time this paper was written, it may have seemed like a good idea to have a technique for allowing anonymous mail to be send through an e-mail system. However, with the advent of spam and phishing via e-mail, allowing anonymous e-mail on the Internet may no longer be such a good idea.

Furthermore, the government would likely want to be able to obtain certain information pertaining to mail messages without having to subpoena every single mail provider.

(iii) The future research directions of the work:

Although I don’t believe that mix nets are feasible in e-mail systems today given the current climate, there are certainly other applications for a method which allows for anonymous feedback. Perhaps a mix network can be used on a more limited scope to provide anonymous feedback for a variety of situations. Governments may also want to be able to have this capability for sending out anonymous e-mails that cannot be traced back to them (though perhaps this can be done with the use of proxies?). One of the most prominent uses for mix nets in current research is in voting (and many proposals for online voting in particular) to provide ballot secrecy. The idea of leveraging untraceable return addresses may also be used to permit voters to audit an election by checking their own votes.

 

Inferring Internet Denial-of-Service Activity April 21, 2009

Filed under: Extra Papers — jwegan @ 11:29 am

In this paper the authors attempt to gauge the amount of denial-of-service attacks are happening on the internet. Most denial of service attacks attempt to either flood the victim with a large number of small packets (since routers and NICs are limited by packets/second no bytes/second) and/or send a large number of requests that need to be processed by the operating system or an application by sending a large number of TCP SYN packets. The attackers also lie about their ip on the packets they send to the victim so the victim cannot identify attackers or easily filter to the malicious packets (this is known as ip spoofing).

The authors use a technique known as backscatter analysis to try and detect denial-of-service attacks. Backscatter analysis involves listening to a chunk of the ip address space that is not actively in use and logging any unsolicited packets that are recieved. The idea is that attackers spoof their ips in a uniformly random distribution so when the victim sends out a response to the malicious packet the responses are uniformly distributed across the entire ip address space. It should be noted that this paper’s analysis relies on the assumptions that addresses are spoofed in a uniformly random manner, there is reliable delivery of response packets, and unsolicited packets are backscatter from a denial-of-service attack.  The authors verify for attacks that they detect, the responses for that attack are uniformly distribute throughout the chunk of addresses they are listening on. As for reliable delivery or responses,  response packets that get dropped will only cause the intensity of the attack to be underestimated.

The authors logged all packets recieved in a chunk of ip addresses that made up 1/256th of the ip address space over the span of 3 seperate 1 week periods. In order to estimate the intensity of the attacks they simply multiply the number of packets they recieved by 256 since they are theoretically recieving 1/256th of the responses. During the sampling time they detected 12,805 attacks directed at over 5,000 unique ips. TCP SYN floods made up over 50% of the attacks they observed. Of these SYN flood attacks about 46% of them could generate packet rates high enough to overwhelm a typical server and 2.4% of them could overwhelm servers with specialized firewalls to protect against SYN floods.

Other Interesting Observations:

  • 50% of attacks last less than 10 minutes and 90% of attacks lasted less than an hour
  • A significant portion of attacks are directed against home computers indicating they are being used to settle grudges.
  • 2-3% of attacks are directed against network infrastructure such as name servers.
  • Attacks are directed at a wide range of comercial targets ranging from Amazon or AOL to small and medium size businesses.
  • After attacks against URLs in the .com and .net domains, .ro (Romania) and .br (Brazil) were the next most frequently attacked.
  • 65% of victims were only attacked once.
 

How to 0wn the Internet in your spare time April 18, 2009

Filed under: Extra Papers — damedeiros @ 4:34 pm

This paper is primarily a discussion of several internet “worms” that had a major impact on the way people viewed Internet security, the different methods by which they might spread and several proposals that could greatly help with such threats in the future. The three key ideas that I got from this paper were:

1. Worms such as Code Red and Nimbda show the plausibility of gaining control of thousands or more computers in a completely automated fashion in a relatively short period of time. The power that could be wielded by such a botnet is startling and could even approach the sort of power only entrusted to governments. This is a serious threat that must be taken into consideration when designing any kind of software or hardware system.

2. Worms use a variety of methods to spread, some discrete and some not, and Nimbda showed that it does not even have to spread to every computer by the same means. A multi-vector attack can be extremely successful if the creator is talented enough. When designing system security, it is very important to look at all possible angles and implement defense in depth in order to guard against those possibilities that may not have been thought of.

3. A centralized agency with appropriate resources could be of great use in identifying outbreaks, analyzing the virus’s, developing protection mechanisms, and helping to slow or even stop the spread of a large scale worm. The example used in the paper was the CDC and I do believe that this metaphor is fairly appropriate. This problem could be big enough for the government to devote significant funding as it is hard to believe that there are not outside agencies that would not hesitate to release a new large-scale virus attack if it was available to them.

The most glaring weakness of this paper that I saw was that while it did a good job of analyzing the (at the time) recent attacks and discussing large scale controls that could be implemented. I think that it could have focused more attention on the user habits and security lapses that were most common to the spread of these virus’s and the standard controls that could have prevented them. While a large agency would be instrumental after the attack has happened, it is really at a lower level that security controls must be implemented in order to truly prevent a worm from spreading.

Future research in this area has been ongoing, both in their suggestion of a centralized agency and with lower level security. People are much more security conscious now than at the time that Code Red and Nimbda were deployed and attackers have a much more difficult job on their hands. There have also been efforts to centralize computer security analysis and tracking through organizations such as NIST, CERT, and private parties like E-Security, among others. This has let to a much greater understanding of such attacks and a large increase in communication between all of the white hats that are working for the good guys. While we have made improvements, I feel that there is a long way to go and in all likelihood, security research will remain an ongoing project as long as the internet is in existence.

 

On Packet Switches with Infinite Storage April 18, 2009

Filed under: Extra Papers — Mike @ 3:31 pm

(i) The three most important things the paper says:

1. By considering a packet switch network with infinite storage we can ignore the problem that is often focused on, buffer management, and instead concentrate on solving congestion. It is found that with congestion a datagram network may drop all packets.

2. It is possible and not even difficult with today’s technology to make a network switch with “infinite buffer space”. The author states that for an ARPANET-like network with 15s time to live only 420K bytes are needed to guarantee that no packets will be lost due to buffer exhaustion.

3. Not all protocol implementation are done in well behaved manor. That is they do not adhere to the protocol in a way such that maximum throughput can be seen for all users. The author says the best possible solution is one where you make the network perform best when the user is well behaved. This encouragement makes no room for temptation that you can offer better performance for your clients by making adjustments that hurt the network as a whole. To do this the author suggests that each first order network switch (the on closest to the user) the switch should have multiple queues that it services in a round robin fashion rather than one queue which is first come first serve. Since this guarantees every user a chance to go high congestion will not make all packets lost. And if the big congestion is created by just one user only his packets will be dropped and others will not be affected. This means that optimal throughput will be to only send packets at the rate the network can handle them.

(ii) The most glaring problem with the paper:

The author of this paper differentiates between badly behaved and malicious protocol implementations in the following way. He states that badly behaved implementations are those that do not follow the exact specs of the protocol usually for some gain in performance for themselves.  Malicious programs however are just trying to damage the network by creating unneeded congestion with no gain for itself.  The author gives the example of giving packets with many different host fields to clutter the network and slow it down.  The author says that this is a discussion that shall be left to network security and not addressed in this paper. The problem is that a badly behaved implementation could use this as well, since by acting as two hosts he would be able to get two turns for each round and be able to send packets almost twice as fast as other users making the network no longer fair.

(iii) The future research directions of the work:

The author leaves it wide open on what should be done about malicious implementations of the protocol and does not address the case of a badly behaved protocol pretending to be more than one user. The author has not began testing his theory of having a fair network delivery service of round robin, rather than first come first serve.  As such, someone needs to implement such a network and begin getting performance results for such a system to back the authors claim that this is an effective way to only reward well behaved protocol implementations.

 

Jigsaw: Solving the Puzzle of Enterprise 802.11 Analysis April 18, 2009

Filed under: Extra Papers — brokerer @ 3:22 pm

Jigsaw is a paper that tries to answer the question of “What is going on in my wireless network?” It seems now that more and more enterprises are deploying access points(AP) in their sites for wireless internet. The problem with wireless versus wired networks is that wireless networks are not well described as either a single broadcast channel nor as a graph of links. There can be many reasons why a receiver in a wireless network did not receive a frame, including the distance between the receiver and AP, sender’s transmit power, microwaves, etc… With the problems set, Jigsaw differs from other research because it is more broad. Other research has measured wireless networks problems individually while Jigsaw measures a wireless network(aiming at enterprises) as a whole using 150 monitors at the UCSD CSE building to capture all the data and send it to Jigsaw, their centralized measurement application.
The three most important features of this approach are:
1.) Trace Merging
Trace merging allows Jigsaw to make sense of all the traces/data from the monitors. In order for trace merging to be successful, Jigsaw needs three things: unification, synchronization and efficiency. Jigsaw needs unification because multiple monitors might receive a particular frame that so happens to be a duplicate frame. Since the hardware might have different clocks it is important that the timestamps used from the monitors be the same so Jigsaw estimates a global time for each monitor. Lastly trace merging must execute efficiently because to allow online applications. Trace merging is good approach to tackling such a large scale problem.
2.) Reconstruction of Link and Transport Layer
After Jigsaw receives all the frames from the monitors it must reconstruct the link and transport layer. For the link layer, Jigsaw groups frames by MAC address to make sure that transmission request are delivered properly. Sequence numbers are also used on the group of frames. For the transport layer, Jigsaw takes frame exchanges and reconstructs TCP flows using the packet headers. Jigsaw also captures TCP acks to find omitted frames from the packet.
3.) Results
The results that Jigsaw found showed that Jigsaw works. They found out that the 802.11g protection mode for networks that have both b and g were creating a lot of traffic because of the conservative policy timeout. They also found out that the 802.11 management software were creating a lot of ARP request. Multiple copies of the same ARP request will also interfere with each other. These problems are hard to find without Jigsaw.

Potential problems:
The potential problems with Jigsaw is does it really scale? Since monitors are capturing all packets in the wireless network and sending them through the wire network, this can become a problem. Although in this case UCSD’s Gigabyte ethernet could easily handle it, maybe the monitors could be more complex in shorting their frames. An example of this is say a monitor hears a frame and the corresponding ack. Instead of sending both the frame and ack, the monitor can send some message that just states that a frame and corresponding ack came.

Possible future research:
From the results it is clear that Jigsaw is going towards the right direction. It was capable of finding some reasons why wireless networks are slow, via protection mode of 802.11g, ARP, etc… The authors plan to build on top of Jigsaw to further answer why networks are slow and how they can be fixed. It would be interesting if the authors can find a natural flaw in wireless technologies in large enterprises. The gap between measuring wireless and wired networks is shrinking.

 

A Layered Naming Architecture for the Internet April 13, 2009

Filed under: Extra Papers — siva @ 12:44 pm
Tags:

The paper proposes a new naming scheme for the Internet that attempts to fix several architectural problems with today’s Internet. The paper borrows heavily from several earlier attempts at solving the problems and is a great ensemble of these earlier ideas. Today’s Internet has just two global name spaces – DNS names and IP addresses based on administrative domains and network topology respectively. Some of the challenges with today’s Internet include:

  • Hosts are closed interlinked with IP addresses which are location identifiers as well. So this affects mobility and multihoming.
  • Services and data are closely tied to hosts through the DNS resolution to IP addresses. However services might be replicated, or migrated to different hosts etc. Moreover the application is only interested in receiving the service, not on the actual end host servicing the request or the location of the end host.
  • Packets might have to be sent to intermediate nodes like NATs on their way to the actual destination. But with the current Internet, the source or the destination does not actually select these intermediate nodes for forwarding dynamically based on requirements.

The authors propose 4 design principles for naming which attempt to fix the above limitations of the Internet:

Principle 1: What to name:

Names should bind protocols only to the relevant aspects of the underlying structure; binding protocols to irrelevant details unnecessarily limits flexibility and functionality.

The authors propose a layered naming scheme for different ‘objects’ that can provide additional levels of indirection and name each ‘object’ uniquely based on just the requirements of the particular layer. They propose naming services and data directly so that they can be considered as first class Internet objects with unique names. They introduce Service identifiers (SIDs) which are host independent data names, End-point identifiers (EIDs) which are location independent host names. These are in addition to User level descriptors (which are converted into SIDs through search engines etc.) and IP addresses.

Principle 2: What should the names be:

Names, if they are to be persistent, should not impose arbitrary restrictions on the elements to which they refer.

The authors propose using flat name spaces for SIDs and EIDs since they do no impose any restrictions. They describe the use of Distributed Hash Tables (DHTs) to allow scalable resolution while using the flat name  space.

Principle 3: What should the names resolve to (Giving control to recipients):

A network entity should be able to direct resolutions of its name not only to its own location, but also to the locations or names of chosen delegates.

The authors propose that recipients or the destination should be able to direct the connection to a chosen delegate. This allows incorporation of intermediaries into the architecture.

Principle 4: What should the names resolve to (Giving control to sources):

Destinations, as specified by sources and also by the resolution of SIDs and EIDs, should be generalizable to sequences of destinations.

The authors propose that the source application can specify a series of intermediate points (like source routing). Through principles 3 and 4, both senders and receivers can indicate the path of packets through the network.

Their primary contribution include identifying several features/design principles:

  • DHTs make the use of flat name spaces plausible.
  • 4 layered naming scheme allows separation of concerns between the layers.
  • Delegation is a powerful and useful feature which should be integrated into the architecture of the Internet.

The main challenge to their proposals (indicated repeatedly in the paper as well), is that applications and host protocols would have to be changed to use the framework. The resolution of these flat name spaces requires the setup of a new trusted infrastructure of DHTs. Also, the paper does not describe any performance degradation due to introduction of additional layers or the resolution of names through the DHTs.

Future directions for research include studying the impact of using flat name spaces, and how services and data can be used as first class Internet objects. IP will likely remain as the bottom layer since it requires a lot of router infrastructure changes to replace it.

 

Scalability and Accuracy in a Large-Scale Network Emulator April 13, 2009

Filed under: Extra Papers — koderaks @ 12:40 pm
Tags: ,

The paper presents ModelNet, a network simulation environment to test large scale distributed applications. Researchers need to test their distributed-system projects on actual networks in order to evaluate them with realistic network conditions. However, configuring existing distributed networks is non-trivial, and rather difficult and costly. ModelNet offers a scalable environment which closely mimics a large scale network. It allows researchers to evaluate their projects by dynamically varying network conditions (bandwidth constraints, congestion, latency) and observing the responses. ModelNet also drastically simplifies evaluation, since running ModelNet to evaluate a distributed application is a simple task compared to building a large scale system to run the application.

In ModelNet edge nodes are configured to run a user-specified application. The edge nodes are physically connected to a network of server cores which emulate the varying network conditions. Each instance of the user-specified application represents a virtual edge node (VN). The VNs are each configured with a unique IP address. Multiple VNs run on a single edge node, thus multiplexing the resources of the node. However, by doing so, ModelNet trades reduced emulation costs for accuracy (costs in terms of using fewer nodes by running multiple VNs on each node).

VNs transmit packets to the core servers using the physical link between the VN’s edge node and the ModelNet core. The core routes network traffic through a set of emulated routes called pipes. Each pipe has its own packet queue, bandwidth and latency requirements, and loss rates; thus by using pipes ModelNet can emulate the effects of congestion, latency, etc easily. The packet travels several hops between emulated pipes before being delivered to its destination VN by the core server. ModelNet runs a scheduler to periodically check for delivery deadlines for packets waiting in the pipe packet queues. The scheduler either moves the packet from one pipe’s queue-head to the next pipe’s queue on its path, or hands it to the core server’s kernel to deliver it to the destination VN. To achieve high accuracy, ModelNet’s scheduler runs at higher priority than the network interface hardware interrupt handler. This means that with high server CPU utilization, the physical network will drop packets and the core scheduler continues to deliver packets through pipes. Thus ModelNet’s accuracy is inversely proportional to the number of packets dropped by the core servers’ physical layer.

The authors further show the generality of ModelNet by providing several case studies in which they evaluate distributed systems. In one case for example, they evaluate CFS using ModelNet, and show that their results match closely with prior experiments on CFS. Furthermore, ModelNet is able to achieve a worst-case error that is equivalent to one tick of the hardware timer per network hop. For example if a system tick takes 100 micro seconds, a path with 10 hops would have a worst-case error of 1 milli-seconds (compared to packet delivery on an actual network).

Some Major contributions of this paper – as outlined in the paper- are:

  1. The authors show that the characteristics of a large-scale network can be accurately emulated on a single server PC. They provide a solution which can dynamically adjust network conditions such as bandwidth, latency, and data loss.
  2. Unlike previous emulation environments which were specific and static, ModelNet is a very generic solution that can be extended to many applications without the need to modify the underlying application.
  3. Emulation of a network has penalties, but the authors show that ModelNet does not sacrifice a noticeable amount of accuracy. In fact their case studies closely match results published by others without using an emulation environment.

The paper considers clusters of servers running ModelNet software to do the emulation. For future research it would be interesting to consider two other possibilities: (1) implement ModelNet at a lower layer: build a router that specifically emulates network conditions like ModelNet does. (2) implement ModelNet at a higher layer: a lighter version of the emulation system that can run without the need of additional servers would be helpful to do small scale emulation.

 

Paris Metro Pricing for the Internet April 13, 2009

Filed under: Extra Papers — giledelman @ 12:39 pm
Tags:

Paris Metro Pricing (PMP) refers to the system used by the Paris metro system to implement first and second class. Cars in both classes were identical in every way, except that the tickets to first class cost more. The authors present this approach to deal with internet traffic congestion and enable more demanding real-time internet applications like voice and video conferencing to run. The premise here is that the different price points will affect user demand and thus have an effect on network load. Note that this approach relies on a rather crude mechanism (price points), and does not provide any quality of service guarantees. They present this solution as an alternative to the “fat dumb pipe model,” which they claim is not a feasible way to implement the internet because some packets need to be given higher priority than others.
The crucial point of the PMP scheme is whether users are likely to accept pay-as-you go premium price plans with no quality of service guarantees. This after being accustomed to the current flat rate model for internet access. The authors point out that many purchases of goods and services today are made based on expected quality with no guarantees. This includes books, movies, restaurant food, etc. They also argue that a mindset precedent for pay-as-you go charging exists in the form of long distance telephone services.
Another important question concerns the implementation of PMP; How to determine the appropriate number of channels and relative price points? The authors cite the fact that only a few channels are necessary to unlock the potential benefits of staggered pricing. Such simple channel designation can be incorporated into the network using the currently empty IPv4 priority field. The low priority channel could initially have a cost of zero, effectively representing the current internet, and the price for sending packets over the premium channels should be set according to empirical observations of load and could be made to vary based on time of day (off peak rates would be reduced).
The paper focuses on introducing the PMP concept and attempts to address the main questions concerning it, however it tends to gloss over the difficult aspects of actually implementing PMP. There is a convincing case that PMP will do no worse than the current internet in terms of congestion control, however there is no hard evidence as to the extent of the possible improvement PMP can bring.
For future research I would like to see PMP deployed on a relatively large sub-network like UCSD and see how users respond to the pricing scheme as well as how the model holds up under real-time application constraints and heavy loads on the premium channels.

 

An algorithm for distributed computation of a spanning tree in an extended LAN April 13, 2009

Filed under: Extra Papers — mdjacobsen @ 11:36 am
Tags: ,

This paper describes the classic spanning tree algorithm used to route packets between interconnected LANs. The algorithm assumes LANs are connected via bridges (where each bridge has a link into at least 2 different LANs). The links between bridges represent edges in a general mesh graph. By running the spanning tree algorithm, bridges elect a root and identify non-loop paths from every LAN to the root. Thus a spanning tree across the entire network is formed.

The spanning tree protocol is based on sending HELLO messages. Each bridge sends a HELLO message with its unique id out on all its links. The bridge in the interconnected network with the lowest id is eventually elected the root. Until a lower id HELLO message is received, each bridge believes it is the root. The HELLO messages also are used to identify paths from a bridge to the root. The bridge with a link that is closer to the root is identified as the designated bridge for all the LANs to which it is linked (with the exception of the link that provides the path to the root). This way, multiple bridges serving the same LAN will not create a loop in the spanning tree. Only the root and designated bridges send out HELLO messages.

After the spanning tree is created, data traffic helps to build/maintain the routing information kept in each designated bridge’s database. Similarly, periodic HELLO messages sent by the root provide a means to keep the spanning tree up to date. Loss of the root and/or designated bridges will result in missing HELLO messages. After a time period of inactivity, this will cause downstream designated bridges to send out HELLO messages to rebuild the spanning tree.

The major contributions of this paper are the deterministic, fast, and low overhead characteristics of this algorithm. Because any ties are broken by lowest bridge id, the same topology will always result in the same spanning tree. Additionally, the tree is formed after a maximum time of two network diameter round trips. Lastly, because the HELLO messages contain only bridge id and age information, their packet size is small and don’t represent a abundant amount of overhead.

The problem with this algorithm is that it requires some notion of how large the entire network will be. This information is used to set reasonable timeout values for determining if a HELLO message should have been received. Setting this value too small will result in instability as bridges will assume a root (or other designated bridge) has failed when it simply hasn’t received its message. This also implies that this algorithm will not scale to large networks of interconnected LANs.

The authors briefly touch on the idea of using weighted paths to force a spanning tree through certain LANs and not through others. I expect further research along these lines might prove useful for identifying optimal bandwidth paths, or paths that are of lower costs (however “cost” is defined).

 

A Policy-aware Switching Layer for Data Centers April 11, 2009

This paper presents a new approach for enforcing network management policies in a data center setting through the use of pswitches. Currently, data center network managers are faced with difficult problems. Data center networks are very complex, containing thousands of nodes running a large number of different applications. Each of these applications has different policies that need to be enforced (i.e. QoS, or Security).

Administrators deploy a large number of middleboxes throughout the network that enforce these policies. These middleboxes include things like firewalls, load balancers, SSL offloaders, web caches and intrusion prevention boxes. There are many problems with this approach, mostly stemming from the fact that a middlebox is only effective if it is place on the network path between the gateway into the data center and the destination within the data center. If it is not placed on this path, packets will not pass through the middlebox and application policies will be violated. As a result, network administrators are forced to either modify the physical network topology or massage layer-2 spanning tree construction to ensure packet traversal across middleboxes. In large networks, both of these approaches become very complex and are extremely error prone.

For example, if an administrator removes physical links to force packets along a certain path, the data center network has lost some redundancy in the network. In the face of failures, this may lower the availability of a service, something that could be an even higher priority than policies the middleboxes were trying to enforce. Secondly, trying to modify link costs to massage the spanning tree construction to choose a specific path is very difficult as well.

Pswitches act as regular layer-2 switches that enforce middlebox policies as well. An administrator can configure pswitches, and implement various policies all from a central location. Middleboxes are plugged into the pswitches, and packets are forced to the correct destination through the use of indirection. This allows the separation of the physical network topology and the logical network topology.

This approach requires a minimal amount of change to existing data center networks during deployment. The only switches that need to be modified are the ones that face the outside of the data center. Middleboxes, servers, and routers do not need to be modified in any way.

Three major contributions:

  1. Separation of the physical and logical network topology to allow middleboxes to be placed off the physical network path.
  2. Created a centralized way for administrators to implement and enforce network policies.
  3. A fault tolerant method that can survive network churn, including hardware and network failures.

The most glaring problem with the paper:

This approach requires deep packet inspection, as well as application content identification. These are both non-trivial tasks. How much will this increase the cost and complexity of switches?

Future research:

Pswitches appear as layer-2 devices. They require middleboxes to be in the same layer-2 network as all of the nodes that they service. It would be nice if these devices could allow for packet direction across domains. Secondly, pswitches still depend on indirection and spanning tree construction to communicate with each other. If there is a problem with spanning tree construction, or any other part of the layer-2 network protocols then pswitches will not be able to function correctly. It would be interesting to take this idea even fruther, following the 4D architecture, and completely separate the data dissemination layer with the network management layer. As a result, network policies could still be enforced even if network components are not configured properly.

 

Sting: a TCP-based Network Measurement Tool April 11, 2009

Filed under: Extra Papers — brokerer @ 1:35 pm

The goal of Sting was to be able to measure packet loss rate on both the forward and reverse paths between a pair of host by just using the TCP protocol. Being able to differentiate between the packets lost from the forward path and reverse path can help diagnose performance problems and design for future services.

The tools used as network measurement today have two problems. The first problem is loss asymmetry. Tools like ping can measure loss rates in the network but cannot figure out whether these packets are lost on the forward path or the reverse path. Ping also has a problem if the receiving host is not playing nicely, ping cannot tell for sure that the acknowledgements it receives from the receiver were only sent once. The second problem is that some measurement tools like NIMI and Surveyor require that measurement software be installed in both the receiver and sender. ICMP Echo or ICMP Time Exceeded services are also an option but because of security problems, they are often picked off by firewalls.

Sting breaks down measuring packet losses in two ways, Data-seeding and Hole-filling. For measuring forward loss, Sting will send n packets with increasing sequence numbers in the Data-seeding phase. In the Hole-filling phase, Sting will send a packet with sequence number n+1 until this packet is acknowledge. If there are packets lost from the Data-seeding part, the Hole-filling phase will get acknowledgements for them so we can tell how many packets were lost. For example in the data-seeding phase we send packet 1 2 3 4 and two packets are lost(2,3). In the Hole-filling phase we will send packet 5 and get acknowledgement for 2. We then send 2 and get a acknowledgement for 3. After sending 3 we get acknowledgement for 5 and we know we lost 2 packets on the forward route.

For the reverse loss, things are more complicated because we need to make sure that the receiver sends just one acknowledgement for every data packet it receives(ack parity). Some servers use delay acknowledgement so to deal with this they inserted a long delays after each data packet sent. Since Sting knows how many acknowledgements it gets and that each acknowledgement is sent only once, it can tell how many packets are lost in the reverse direction.

To deal with the slow delays require for ack parity, Sting takes advantage of TCP’s fast retransmit. In the Data-seeding phase, Sting sends n-1 packets omitting the packet with sequence number 1 so that the receiver gets all packets out of order. In the hole filling phase, Sting sends a packet with sequence number 1 instead of n+1. This new rule allows Sting to eliminate the long delay after sending data packets because TCP will not use delay acknowledgement for out-of-sequence packet arrivals.

Sting also cleverly deals with TCP server prematurely closing connections. TCP servers send a RST packet to a client to prematurely close a connection which will mess up Sting’s measurements. To deal with this Sting takes advantage of flow control in TCP. By advertising a receiver window of 0 bytes, the TCP server cannot sent the RST packet to close the connection so Sting can finish its measurement.

The main point about Sting is that it seems to work. From the graphs in the paper they tested 50 websites and found that the loss rates increase during business hours and died down during off-peak hours. Sting also showed that reverse loss rates are on average higher than forward loss rates which makes sense because TCP servers are usually sending a lot of the requested data back. The ability of the author to use the TCP protocol in a non-traditional way to measure packet losses will impact the future work of networks. There is work on measuring bottleneck bandwidth using similar tricks on TCP.

 

The Tiny Tera: A Packet Switch Core April 11, 2009

Filed under: Extra Papers — krishnanadh @ 1:35 pm

The Tiny Tera paper proposes a novel architecture for a packet switch to be used in high-speed networks. With a data bandwidth of 10 GB/s across the 32 ports of the switch, Tiny Tera claims the support for an aggregate bandwidth of 320 GB/s. The overall system is built using inexpensive off-the-shelf technology as opposed to esoteric optical switching and uses smart scheduling algorithms and high speed CMOS serial links to support high-speed switching. The three most important things highlighted in the paper are the architecture of the switch, the underlying scheduling algorithms for unicast and multicast traffic and a description of high speed IC-to-IC serial links used in the switch.

The system is architected in the form a central hub and a set of input and output port-cards emerging radially from the hub. The central hub consists of a centralized scheduler and 32×32 PCB crossbar-switch slices stacked over each other where each slice switches a single bit of input to the output. Each arriving packet is divided into 64-bit chunks and input-queued at the scheduler and each bit of the chunk is presented at the crossbar slices for switching based on a set of scheduling decisions. Every Tiny Tera port-card internally consists of an application-independent packet datapath and an application-dependent port-processor which would process 53-bit ATM cells per se. The packet datapath caches N incoming/outgoing 64-bit chunk into an SRAM and the port-processor and scheduler handshake to perform input/output queuing.

Tiny Tera uses a scheme called virtual output queuing (VOQ) for scheduling unicast packets wherein there is a unique FIFO for each output. This model eliminates head of the line (HOL) blocking because each packet is queued in the FIFO at the output destined for it. The underlying algorithm used for this VOQ scheme is the iSLIP which is described in the paper to have great properties like, 100% throughput guarantee, no connection starvation, approx log2N convergence iterations for N packets.

For multicast packets, Tiny Tera uses a fanout-splitting service discipline wherein packets are delivered to output ports over multiple time intervals called slots. Unsuccessful packets in any given slot then continue to contend for output ports in the following slot. Tiny Tera proposes to use TATRA (tetris game style) and WBA (weight based algorithm) algorithms for multicast scheduling efficiency.

High speed intercommunication links are used between the packet datapaths and the crossbar slices. These high speed links require high noise tolerance and clock stability and Tiny Tera uses a feedback integrating control system and a low phase-noise high speed PLL respectively, to achieve these goals. Also multiple inputs on each serial link are phase adjusted using a smart phase control circuitry described in the paper.

Tine Tera being a system still in the design phase with multicast algorithms in blueprint, it needs to address several problems like interface and integration to the network layer 2, multiplexing packets in non-ATM like protocols where packet sizes are non-multiples of 64-bit chunks and realization of full bandwidth in bursty traffic conditions. However, Tiny Tera opens a realm of possibilities in high speed switching required with the imminent fading of layer 2 and layer 3 boundaries and a future where traditional connectionless packet switching routers become non-scalable.

 

ZigZag Decoding: Combating Hidden Terminals in Wireless Networks April 6, 2009

Filed under: Extra Papers — liyunjiu @ 2:38 pm
Tags: , ,

One well known problem today in 802.11 WLANs is the hidden node or hidden terminal problem. WLANs rely on CSMA/CD to limit collisions. This fails when a node is visible to the 802.11 access point  but out of range to other nodes sharing the medium. RTS-CTS is proposed in 802.11 but significantly reduces the overall throughput.

ZigZag is an 802.11 receiver algorithm that decodes collisions with a focus on resolving hidden terminal collisions. ZigZag accomplishes this without requiring any scheduling, power control, synchronization assumptions, coding, changes to the 802.11 MAC, or overhead when there are no collisions. ZigZag only requires modification of the AP, is modulation independent, and can generalize to decoding more than a pair of colliding packets.

ZigZag algorithm exploits two characteristics of 802.11 to resolve collisions. “An 802.11 sender retransmits a packet until it is ACKed or timed out, and hence when two senders collide they tend to collide again on the same packets. 802.11 senders jitter every transmission by a short random interval, and hence collisions start with a random stretch of interference free bits.”

To see how ZigZag works, consider the hidden terminal case where two senders A and B can sense the AP but cannot sense each other. When packets from senders A and B collide, they retransmit and cause a second collision with a different offset. ZigZag saves the signals from both collisions and uses the interference-free chunks of data to bootstrap the decoding process. The interference-free data chunks in the first collision is subtracted out of the second collision. Since the offsets of the collision in the two collisions are different, part of the interfering signal in the second collision is decoded by subtracting out the interference-free data chunks in the first collision. With the new interference-free chunk of data decoded in the second collision signal, we subtract that chunk from the first collision, and iteratively repeat until both packets from A and B are decoded.

For example: Chunk A from the first collision is interference-free so subtract A from the second collision to make C interference-free and use C to decode B in the second collision, etc…
From Node A: XABYZ       XABYZ
From Node B:     CDEFG     CDEFG

The paper lists the following linear time algorithm and can extend to any number of node collisions instead of just two.

  • Step 1: For each of the collisions, decode all the overhanging
    chunks that are interference-free.
  • Step 2: Subtract the known chunks wherever they appear in all
    collisions.
  • Step 3: Decode all the new chunks that become interference free
    as a result of Step 2.
  • Step 4: Repeat the last two steps until all the chunks from all the
    packets are decoded.”

To detect if there was a collision and the offset in the collision, ZigZag exploits correlating the known 802.11 preamble with the received signal. Moving the complex conjugate of the preamble across the received signal and computing the correlation, the correlation will spike only when the preamble in the signal lines up exactily, giving the position and the fact there was a collision if it was detected in the middle of the signal.

The paper then talks about several glaring problems and the ways in which it compensated for them. Inter-symbol interference (ISI), frequency offset, and sampling offset when re-encoding the interference-free chunks to be used in subtracting from the other collision are compensated for. Basically the correlation trick was used again for frequency and sampling offset and reversing the effect of the ISI by filtering. This method was effective in increasing the ZigZag’s decode success rate from 0% to 98.2%. It may get worse with a higher dynamic range for the frequency and sampling offset since data was only taken with stationary terminals.

The ZigZag decoder was then evaluated in a 14-node GNURadio testbed with ZigZag implemented on a PC using USRP GNU radio as the RF front end. ZigZag was shown to reduce packet loss rate at hidden terminals from 72.6% to about 0.7%.

Further research can be done in implementing this decoding technique across a wide range of wireless systems and applications.

 

Efficient Fair Queuing using Deficit Round Robin April 6, 2009

Filed under: Extra Papers — gracewangcse222 @ 3:11 am
Tags:

(i) the three most important things the paper says:

  1. Deficit round robin is a fair queuing approximation scheme which can be used to share network resources. The technique is as follows:The queues are serviced in round-robin order. Each active queue to be serviced is allocated a service quantum per iteration (say 1500 bytes), and any leftover quantum caused by a packet that was too large  will be added to the quantum for the next iteration. That is, if a queue was unable to use 200 bytes from its quantum, then those 200 bytes will be added to the 1500 bytes it receives for the next iteration, so it will be able to send 1700 bytes.
  2. This technique improves on many other fair queuing schemes by having only O(1) expected time required to process a packet, compared to O(log n) work for such schemes as self-clocked fair queuing and bit-by-bit fair queuing. Furthermore, it achieves a small constant for fairness measure (perfect fairness has a fairness measure of 0).
  3. Two applications of  deficit round robin in other scheduling problems are in token rings (where deficit counters can be used to keep track of unused bits during each token opportunity) and load balancing (by having a deficit counter for each output line and a quantum).

(ii) the most glaring problem with the paper:

One issue with the deficit round robin scheme is that DRR may be off by a multiplicative factor in delays. A small packet may experience up to a quantum’s worth of delay from every other flow. Also, the number of deficit counters required to maintain the deficits scales with the number of active queues – this might become problematic with a large number of such queues.

(iii) the future research directions of the work:

One possible research direction is to dynamically determine the optimal quantum size given recent flows. A variant of DRR (from speaking to Professor Varghese) is to leverage “overcharging” along with the “undercharging” used in each round (that is, each queue is permitted to exhaust their quantum plus a little extra in order to send one more packet, but the additional quantum is subtracted from the next quantum). In the variant, no deficit counters are maintained, and each queue either undercharges or overcharges its quantum with probability 1/2. Over a long period of time, the scheme maintains a good fairness bound.

 

ExOR: Opportunistic Multi-Hop Routing for Wireless Networks April 6, 2009

Filed under: Extra Papers — supritapagad @ 3:09 am
Tags:

Contributions of the paper:

The paper proposes a method of routing packets that exploits those features of a wireless network absent in a wireline network. This method of routing provides significant increase in the throughput at the destination node when compared to traditional methods of routing which were developed for wireline networks. In a wireless network, even when a packet is forwarded to a single node, all nodes within the range of the source receive this packet at no additional cost. In a wireless network, it is possible that a packet forwarded to node A might be received by a node B closer to the destination than node A. It is also possible that the packet fails to reach node A but is received by a node closer to the source than node A. The paper exploits this property, amongst others, of wireless networks to propose a new method of routing packets in wireless networks.

1. The paper suggests that, at each hop, a packet is forwarded to a set of nodes instead of only one. Amongst the subset of nodes that receive the packet, the node, from which the cost to route the packet to destination is the least, gets to forward the packet further. This way, the protocol uses the most ideal node to receive a packet, to forward the packet, at each hop. It is possible that a node closer to the destination than the node that would have been picked out by traditional routing receives a forwarded packet. ExOR would then use this node to forward the packet further, thereby, getting the packet to the destination possibly sooner than traditional routing would have. It is also possible that the node (say A) the packet was forwarded to does not receive the packet, but a node B closer to the source node than node A receives the packet. If the packet is forwarded from this intermediate node B instead of the source node, the packet would cover more distance than if it were re-transmitted from the source node. Also, the packets are grouped into batches to minimize overhead. A batch of packets is forwarded at a time.

2. The paper also comes up with all the protocol necessary to achieve this. There is a fair amount of synchronization required between the nodes to which a batch of packets is forwarded. Only the node with least forwarding cost should transmit the packet so that there is no duplication. The source node needs to know the packets that were successfully received. The paper achieves this by making each receiving node forward a “batch map” which contains, in addition of other information, information about the fragment of packets in a batch that were received by the node and the priority, in terms of forwarding cost, of the node. It also makes additions to the packet header to achieve the proposed routing. The protocol avoids explicit acknowledgments and in the process favors networks with asymmetric link conditions where it is possible that the link conditions in the forward direction allow the packet to be delivered but reverse link conditions prevent the acknowledgment from arriving at the transmitting node. In ExOR, the acknowledgement could reach the source from a different node in the set of nodes the packet was forwarded to.

3. The paper also suggests a mechanism to ensure that no two nodes begin forwarding simultaneously. In addition, it uses of a ‘forwarding timer’ and a ‘transmission tracker’, which allows a node (say A), which needs to start transmission when another node (say B) completes transmission, to track the rate at which node B forwards packets. Hence, with knowledge of the number of packets node B needs to forward, node A can determine the time at which node B will complete transmission. This way, node A can start transmission even if it does not receive the last packet transmitted by node B.

Glaring Problem:

Initially it might seem like the communication cost of agreement and synchronization between nodes introduces a large overhead. Experimental results however indicate that the overhead is more than made up for and ExOR does provide increased throughput. However, the protocol is efficient only if there is a large choice of forwarding nodes. Else, the benefits achieved by ExOR become insignificant and the overhead dominates. Also, the protocol is suited for transferring large blocks of data as compared to small chucks. The data needs to be large enough to offset the overhead. The protocol also becomes less effective as the probability that a packet sent to a node was received by the node becomes very large. I.e, in networks in which there is a 0 or 1 probability that a packet sent from any node A to any other node B in the network, ExOR methods of forwarding does not provide any benefit. The protocol is more suited for environments with localized, time varying link conditions.

Future work:

The idea can be extended to vary bit-rate during the second round of forwarding depending on the set of nodes to receive the packets in a batch in the first round of transmission. It can also be enhanced to allow for moving nodes. In addition, since a packet is received by more than one node, it might be possible to exploit this to provide for error checking or error detecting, if the cost is less than the cost of re-transmission or transmission from a node further away from the destination than another node that received an incorrect version of a packet.

 

Random Early Detection Gateways for Congestion Avoidance April 4, 2009

Filed under: Extra Papers — liyunjiu @ 1:12 pm

Gateways on the internet handling large amounts of traffic need to queue packets to accommodate for transient congestion. With a larger queue, the average delay in the network will increase significantly in the presence of congestion.  This paper presents Random Early Detection (RED) algorithm to avoid congestion at gateways in packet switched networks by controlling the queue size.

The paper argues that the most effective place to detect congestion is at the gateway itself and congestion control should not be on a per connection basis. The gateway needs to provide explicit feedback to the source so the transport layer protocol could be informed of congestion and adjust the rate or window size. With TCP/IP, for example, detecting a dropped packet would alert the transport layer of congestion. RED gateways use on this feedback mechanism and randomly mark or drop packets at the gateway when the average queue size at the gateway exceeds a certain threshold and this will maintain the network  in a low delay and high throughput region of operation.

One goal is to avoid a bias against bursty traffic. Another goal is to avoid the global synchronization problem of notifying all connections and having them reduce their windows at the same time which results in a sudden decrease of throughput. RED  gateways accomplishes these two goals by using randomization in choosing which packets to mark or drop. This way, the probability of a packet getting dropped from a connection is proportional to the connection’s share of bandwidth.

The average queue size is calculated using a low-pass filter with an exponential weighted moving average. The short term increase in queue size from bursty traffic and transient congestion do not significantly increase in the average queue size. Given a minimum queue size threshold and that we allow bursts of L packets, we can calculate the time constants for the exponential weighted moving average.

In RED, the average queue size is compared to a minimum threshold and a maximum threshold. If the average queue size is below the minimum threshold, no packets are dropped. If the average queue size is above the maximum threshold, every packet is dropped. If the average queue size is in between the two thresholds, each arriving packet is marked with some probability at random intervals as function of the average queue size and packets since last marked packet. The paper argues that a uniform random distribution is better for the random intervals since too many marked packets close together and longer interval between marked packets can result in global synchronization.

Since there are many parameters, performance is sensitive to those various parameters. Performance will depend on how adequate time constants are for calculating the average queue. Performance will be better if minimum threshold is set sufficiently high to prevent under-utilization. Making max_th – min_th sufficiently large to avoid global synchronization that results from oscillations of the average queue size. Further research topics can be done in finding optimal average queue size for different traffic conditions.

Summary of RED Benefits

- Appropriate time scale: Does not wait a round trip delay before seeing reduction in arrival rate. Time scale matches the time scale required for connection to respond to dropped packets.

- Simplicity: no per connection state information simplifies congestion control job of transport protocol and gateway.

- Congestion avoidance, No global synchronization problem, Fairness: The rate of randomly dropped or marked packets depend on the level of congestion.

 

Lessons from Giant-Scale Services April 3, 2009

Filed under: Extra Papers — mdjacobsen @ 2:36 pm
Tags: , ,

This is an experience paper that focuses on how to build, maintain, and quantify very large scale infrastructure services (i.e. portals, web applications, etc.). For this level of service, the general structure follows the server farm approach. Clients access the service via the network by a single identifier/address. A load manager directs traffic to one of many servers which are connected to (possibly) multiple data stores.

The author categorizes load management as either the job of a layer-4 or layer-7 load balancing device, a layer-7 software based load balancing computer, or the client itself. When the client performs load balancing, it is capable of utilizing multiple addresses (say DNS server IPs). When one does not respond, the client can automatically direct its traffic to another.

The author further argues that there are two models of organization for servers, partitions and replication. Partitions separate different operations and data onto different servers. Replication provides all operations and all data to all servers. Neither model is strictly superior to the other.

In order to quantify the availability and thus justify what model makes the most sense for the needs of the service, the author presents several metrics by which a service can be measured.

uptime = (mean time between failures – mean time to repair)/mean time between failures

yield = queries completed/queries offered

harvest = data available/complete data

DQ = data per query x queries per second

Here, yield and harvest are different aspects of uptime. The author argues that all decisions to design a service can be measured using these metrics. Losing a server may result in a lower DQ which may be the result of a loss of yield (in the case of replication) or of harvest (in the case of partitioning). The best solution depends on which is more appropriate for the specific service.

The paper also covers ways to handle service degradation gracefully. Service degradation can be managed by dropping certain requests to avoid livelock situations. Which requests are dropped determines how graceful the degradation proceeds. The author suggests making these decisions in a cost-based or priority-based manner. Cost-based approaches drop the more complex queries in order to service more simpler ones. Priority-based approaches attempt to service queries that are more important in some respect (such as stock trading requests vs. stock quote requests). Alternatively, under certain situations reducing the freshness of the data by relying on caches provides an acceptable way to avoid query execution (such as with stock quotes).

Lastly, the paper identifies common ways to deal with the constant software updates that are consistent with these types of services. The author describes three methods in terms of the loss of DQ. The first is a fast-reboot procedure, where all the servers are taken offline, updated and rebooted as quickly as possible. This method provides no availability. The second approach is to do an rolling upgrade, where the old and new versions of the software coexist within the cluster. Individual servers are upgraded, one at a time. This can avoid any loss by allowing the other servers to take additional load. The last approach is to perform a big flip. This involves taking half of the servers offline, upgrading them, then switching over to use only the upgraded servers. The other half of the servers are then upgraded and brough back online afterward as well. This causes half of the servers to assume all the load for a short period of time, but does not require both versions of the software to coexist.

All the heuristics and guidelines presented by the author are indeed valuable when designing a giant scale infrastructure service. The only issue is that it does give real consideration other models of server organization. Hybrid based approaches can provide the benefits of many of these solutions. Hybrid approaches are only touched on in passing.

I would expect that future work would include data regarding how well these heuristics have worked in practice as well as a disscussion of the common pitfalls that befoul initial designs for such giant scale services.