Graduate Networks, UCSD

CSE222 – Spring 2009

MIRO: Multi-path Interdomain ROuting May 29, 2009

Filed under: Extra Papers — ameenakel @ 8:19 pm
Tags: , , ,

(i) the three most important things the paper says

  1. One of the most important topics that this paper addresses is that BGP, in its current form, does not allow for any given AS to (in any reasonably automated way) request a certain route for its packets to traverse, regardless of the motive.  In fact, beyond sending that packet to its next AS-hop, the originating AS has no control whatsoever within the BGP implementation on the Internet today.  Some great examples mentioned in the paper about why this level of control (over which individual AS hops a stream of packets will travel) could be useful are security, performance/bandwidth, or price (among other things, I’m sure).  For example, a government agency may not want to route a certain set of transmissions through a country that is considered hostile and thus would like to request that a certain AS not be used, regardless of the fact that it may be on the shortest and most economical path.  This lack of control is the problem that the authors of this paper have set out to solve with their new scheme: MIRO.
  2. The authors also address the standing issues with the current (most popular) solutions for this problem: source routing and overlay networks, demonstrating why these solutions lack the proper scalability, lack the proper backwards compatibility (which would require a batch change of all Internet hardware), and require far too much overhead to be used as Internet-wide protocols.  Quite notably, on this same note, the authors made it a point to demonstrate why MIRO satisfied these two requirements (at the protocol level).  The authors brought up the point that MIRO continues to keep relatively the same amount of BGP-control messages between ASes as vanilla BGP (because of its pull-based alternate route advertisements), helping scalability.
  3. The authors also describe a method in which it would be possible for MIRO to operate on existing hardware on the intermediate routers of a given AS (to reiterate, not including edge BGP routers).  This is a great advantage of the MIRO solution, as most/all changes to the Internet backbone that require increasingly larger change rarely are implemented.  The solution involves treating an alternate route as if it were another IP address in the given AS’s inner network (essentially adding a level of indirection).  This would essentially require changes solely to the routing tables of each router instead of to the underlying software of hardware.

(ii) the most glaring problem with the paper

One of the largest problems with this paper (and it was quite difficult to find a topic that wasn’t addressed well by the authors) was that it did not address any performance metrics below the level of the protocol level.  Specifically, the authors did not investigate the impact of these new changes on the performance of edge BGP routers.  Since these routers are meant to move packets from their ingress ports to their egress ports (or vice-a-versa) as quickly and accurately as possible, adding to the latency of moving a packet from one end to the other is very undesirable.  If it is the case that this scheme adds significant overhead to these routers, this scheme may find greater resistance in its adoption (as it would also require a hardware or significant software change to each of the edge BGP routers–another big hurdle to overcome).

(iii) the future research directions of the work

Although mentioned briefly in the conclusion, security would be on the mind of the average grad-student reader of this paper from the very beginning.  The level of control can is given (insecurely) to any router that can send a packet is practically limitless, as long as the negotiating AS approves of its requests.  Some sort of secure handshaking protocol or digital signature technique would greatly help to strengthen this scheme.  Another great future direction of this work would be to investigate the true overhead created by this scheme versus the BGP baseline, along with he overheads of other popular replacements for or augmentations to BGP.  It would be useful to quantify the different overheads created by each scheme.

 

Network Coordinates in the Wild May 29, 2009

Summary:

Currently there is a debate about the behavior of real-world distributed systems based on network coordinates. Specifically, their inaccuracy, instability, and fragility in the presence of violations of the triangle inequality. This paper looks at a subset (10,000 nodes) of a deployed real-world network (Azureus BitTorrent), and provides techniques for improving as well as characterizing the behavior of the coordinate system. All coordinate messages are piggy-backed on to existing network messages. This means that the coordinate system must be able to function passively. The Azureus BitTorrent network is characterized as a network with a small dimensionality, a wide spread distribution of round trip times between nodes, and a large number of nodes that violate the triangle inequality. To mitigate these problems, Ledlie et. al. applied three new techniques. First, they added a latency filter. This filter is used to summarize the latency measurements observed between two nodes i and j. This provides a current and stable description of the expected latency between the two nodes. They found that a simple, short, moving median worked well for dealing with anomalous measurements and plateau shifts (actual changes of the latency between nodes). Second, they added an update filter. This filter addresses stability issues when receiving a frequent number of small updates. It uses a heuristic to determine when a change in latency is large enough that the application-level coordinate has undergone a significant migration to a new location relative to other coordinates. Finally, they added neighbor decay. This addresses the issue of the passive nature of the coordinate system and the transient nature of the nodes. In the Azureus BitTorrent network, a node might talk to a large number of transient nodes, but only talk to a small number of nodes consistently. This means that the local coordinate optimization process will become skewed towards this small set of nodes. By scaling the effect each neighbor has on the nodes coordinates with respect to the age of the data, it allows nodes that are heard from infrequently to have a longer lasting effect on the local optimization process. The effect is that data is incorporated into the coordinate system more smoothly, providing a stabilizing effect. They found that by using these techniques in the Azureus BitTorrent network, they provided a 43% improvement in relative error and a four orders-of-magnitude improvement in stability of the network coordinate system.

Three major contributions:

  1. Characterized a real-world internet scale network coordinate system that is currently deployed.
  2. Provided new techniques for managing neighbors in a passively maintained network coordinate system.
  3. Provided evidence that internet scale network coordinate systems can actually work in the real-world.

Major Problem:

Their subset of nodes in the Azureus network consisted of users that are automatically updating their clients to the latest SVN release. They said that these users exhibited normal network behavior, but gave no data to confirm this. Was this an accurate sample of the overall network?

Future implications:

I think this paper shows that in practice network coordinates are a viable way to build internet scale systems. This work was done on a file sharing network where nodes can adapt to changes in latency at a relatively slow pace. How would network coordinates work in an internet scale system that had strict QoS requirements (i.e. content delivery networks, data stream processors, VoIP)?

 

Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications May 29, 2009

Paper presents Chord which is a distributed lookup protocol that addresses the problem of locating the node that stores a particular data item. It provides this functionality by just one operation, mapping a key onto a node. Data location can be implemented on top of Chord by associating a key with each data item and sorting the key/data item pair at the node to which the key maps. It adapts naturally as more nodes join or leave the system. It uses a variant of consistent hashing to assign keys to Chord nodes. Consistent hashing automatically balances the load since each node receives roughly the same number of keys and it involves very little movement of keys when a node joins or leaves the system.

One major difference from previous approach is the fact that it does not assume that nodes are aware of other nodes in the system. This property enables it to scale. It needs routing information only for few other nodes. Due to distributive nature of routing a node resolves the hash function by communicating with a few other nodes. In steady state of an N node system, each node maintains information only about other O(logN) nodes and resolves all all lookup via O(logN) messages to other nodes.  It also maintains its routing information as nodes join and leave the system with high probability each such events result in no more than O(log^2N) messages.

Only one piece of information per node needs to be correct for chord to guarantee correct (though slow) routing of queries. Chord can be effectively used for cooperative web caching, with data mapped to caches by consistent hashing functions. It can be used to find the cache that contains the desired web data, with URI as the key for the DHT.

One shortcoming of the Chord is the fact that though it ensures that no node will have to carry more than k/n items (where k is the number of keys and n is the number of nodes), it does not consider the fact that some data items could be substantially larger than others. In web caching scenario this situation might arise due to embedded video data or any large object.

Though Chord has disadvantage n terms of run time while looking for key to node mapping due to distributive nature compared to the centralized solution which is fast. But this disadvantage is outweighed by the advantage provided by the distributive nature resulting in higher reliability and fault tolerance.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

This paper talks about a distributed protocol, Chord, which addresses the problem of efficiently locating the node that stores a particular data item in peer-to-peer applications. The protocol also works in the scenario of dynamic disappearance and joining of nodes in peer-to-peer network. Therefore, Chord can answer queries even if the system is continuously changing. The basic Chord protocol provides fast distributed computation of a hash function mapping keys to nodes responsible for them and uses consistent hashing. The consistent hashing assigns each node and key an m-bit identifier using a base hash function. They choose a node’s identifier by hashing its IP address and key identifier by hashing the key. The protocol assumes that the length of m is large enough so that two nodes or keys do not hash to the same identifier. The protocol also needs very small amount of routing information to implement consistent hashing in a distributed environment since each node need only aware of its successor node on the circle. A portion of the Chord protocol also maintains the successors pointers to ensure that all lookups are resolved correctly. The chord protocol makes sure that each node’s successor is correctly maintained and for every key, node successor(k) is responsible for that key in providing the dynamic node joining capability. They maintain finger tables at each node. Chord needs to deal with nodes joining the system concurrently and with nodes that fail or leave voluntarily.  Each Chord node maintains a “successor-list” of its r nearest successors on the Chord ring to account for failures and replication.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

Important Things:
1) Chord addresses a general problem faced by P2P applications by implementing a conceptually simple and generally useful mechanism (distributed lookup) which is nontrivial to implement. A wide variety of applications can be built on it, and it removes much of the complexity from P2P applications.
2) Chord is scalable, with the scale of the Internet in mind. Time and storage complexity is O(log N).
3) Chord can handle nodes joining and leaving (or failing) gracefully.

Problems:
The problem of a Chord network that becomes partitioned is not solved. Not only is there no recovery mechanism, but there is also no mechanism to detect the partition. If data is replicated, then the application could continue, each partition having different copies of data, and from that point on their views of the universe can become inconsistent. They did mention the posibility of recovering from this in the future, but then what happens when you try to resolve inconsistencies that develop in the application?

Future Work:
Future work should address problems of how to resolve inconsistencies and security issues that may appear in decentralized applications where no central authority exists.

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

This paper presents a beautiful idea for easy name lookup in a distributed system. It asks the question: how can peer-to-peer applications efficiently locate the node that stores a particular data item? Highlights are:

  1. The solution uses a variant of consistent hashing to map key to nodes. Each key and IP address of a node is hashed to an identifier; each node is responsible for maintaining the data items associated with the keys occuring after its predecessor node on a circular number line. Each node maintains information  for O(log N) nodes in a finger table. It is proved that lookup resolution can happen within O(log N) messages.
  2. The solution tackles many difficult issues in distributed systems design, namely, a) load balancing by use of consistent hash function to spread keys evenly over the nodes, b) decentralization: no node is more important than the others, c) scalability: lookup cost grows as log of the number of nodes in the system, d) availability: automatically adjusts internal tables as the system changes, e) flexible naming: no constraints on structure of the keys.
  3. The paper provides an inkling of the applications that can immensely benefit from usage of Chord’s approach to naming. Chord can be used effectively for cooperative mirroring and search engines. Unlike DNS, Chord requires no centralized components which provides effective fault tolerance.

I see a minor security loophole in Chord. It could be the case that it has been fixed subsequently. The paper mentions that the Chord software on the node notifies the application of all changes to the set of keys that node is responsible for. Wouldn’t it be the case that the application is informed of the physical location of a data item that it is not interested in? This information could be misused by an adversarial application and presents a security threat.

Future work should look into the possible reasons why Chord has not been widely employed in many file-sharing systems. Is it the case that name-lookup in a centralized fashion in O(1) time is considered viable instead of O(log N) time here? Since DNS-type lookup is extremely popular, do the benefits of Chord outweigh the overhead of replacing DNS in such systems? Will mere replication of DNS entries ensure better fault tolerance than Chord? These questions need to be looked into.

Chord can be used effectively for cooperative
web caching, with data mapped to caches by consistent hashing functions and Chord used to find
the cache that contains the desired web data, with its URI as the key for the DHT
 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

The primary contribution of the paper is the Chord algorithm. The algorithm addresses a very fundamental problem in peer-to-peer networks – location of services.In a very large peer-to-peer system, it is not possible for a single node to store the location information for all the services (eg. files in a peer-to-peer file system). So the location information needs to be distributed across different nodes in the system and efficiently looked up. The Chord algorithm provides this service which can be used a low level primitive that is commonly used by several higher level applications built for different purposes.

Another contribution of the paper is the numerous improvements/optimizations suggested to improve the performance of the lookups and the proofs of correctness for all these optimizations. The paper also provides upper bounds on the lookup time in the common case, failure cases etc which is very useful to theoretically evaluate how well the algorithm would perform. The basic idea is that if all the location information is stored in one node (complexity O(N)), then the lookup can be performed in O(1) time. At the other extreme, if each node stores only O(1) location information, then the lookup can be performed in O(N) time. So Chord achieves a balance between the two by storing O(log N) location information in each node and using O(log N) time for a lookup.

The final contribution of the paper is the extensive simulation results which show that the performance of Chord is as expected. These results validate their claims that Chord offers good performance at large scale where some number of nodes would simultaneously be joining or leaving the system at any point of time.

One flaw with Chord (mentioned in the paper as well) is that it does not take advantage of locality. In a large scale peer-to-peer system spread across the Internet, it would be beneficial to place services such that the node with a particular piece of data or service and the node that uses the data are close together. However, Chord does not have a mechanism to support this. In fact, Chord actually explicitly tries to make a uniform distribution in a random fashion which make locality hard to achieve. Another problem with Chord is that the distribution of nodes and keys around the Chord ring need not actually be uniform. If some nodes got lucky and some others got lucky, it is possible that some nodes end up with a very large number of keys that they are responsible to store. The authors claim that this can be taken care of by having several virtual nodes per single physical node, which will increase the probability that the nodes get more evenly spread out in the ring. This however does not solve the problem of keys being unevenly distributed. If the keys are unevenly distributed, even if additional virtual nodes are used, information might not get evenly distributed. Also, it appears like the virtual nodes technique will help when all the nodes in the system introduce several virtual nodes rather than just have a single node. But the paper does not describe how performance can be affected by some physical nodes introducing more virtual nodes than others etc. Another crucial aspect is security. The paper says that application level security is required to take care of that although it is not explicit that handling security at the application layer would not affect performance. For example to avoid DOS attacks, if security is handled entirely at the application level, then the lower Chord layer still does a lot of work to service each of the requests and bring it up to the application layer. This is a lot of overhead. The other problem as stated by the authors is that the algorithm as described in the paper cannot recover from network partitions.

Future research in this direction can look at making the algorithm deal with network partitions better. It also appears like there are some parallels between Chord and IP anycast. They serve similar goal of identifying a node that offers a particular service. Anycast tries to find the closest node while Chord finds ‘the’ node that contains that information. It would be interesting to see if the two ideas can be integrated to achieve better exploitation of locality with Chord.

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

The Chord system is designed to be a distributed service which provides a very simple API: mapping a key to a node.

The main points of the paper were as follows:

  1. The Chord service only needs to provide a very basic primitive in order to be functional. While the service only provides key-to-node mappings, it is straightforward to implement richer services on top of it. For example, it is possible to implement a distributed file system by making the node which holds the mapping for a given key to contain the file whose name generates that key. Therefore, when a host wants to find a given file, it generates a key based on the filename and queries the node that Chord provides for that key, which responds with the file. By maintaining a general interface, it is possible to use Chord for many different, seemingly disparate applications without having to modify the underlying protocol
  2. Consistent hashing allows the system to be very efficient. The hashing system is designed in such a way that nodes joining or leaving the network cause a minimum amount of disturbance. A joining node can only steal a subset of the keys mapped to its successor, and when a node leaves the network, the mappings automatically fail over to its successor. The basic idea of a shared ring is quite simple, but much of the paper describes extensions designed to make lookups much faster (essentially by storing exponential probes with each node so that it is possible to search in log(n) instead of n time).
  3. Operating in the face of concurrency affects both correctness and performance. In order to deal with this fact, they add support for a stabilization protocol, which keeps the correctness information up to date. The fingers, which are used to provide the logarithmic searching performance, take longer to update in the face of a rapidly-changing network, but this is acceptable because they only provide performance, and as long as the stabilization algorithm keeps the successor pointers up to date, searches will always succeed, even if it takes longer than logarithmic time.

The main weakness of this paper is that since it only provides such a limited set of features, it is possible that services built on top of it will have to be more sophisticated. For example, a file system built on top of this will have to provide some kind of replication, because an unexpected node failure will leave the node’s successor as the holder of a given file without actually passing it the file data. While it may be argued that replication is easy to build into the system, doing it in such a way that it is reliable enough to be used for a long-term distributed file system is likely a serious undertaking.

Future research into Chord should provide a library on top of it to provide the things that were left out of the initial design, such as replication and security. This will simplify writing applications which are based on Chord by allowing them to a validated set of operations larger than just mapping a key to a node.

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

(i) The three most important things the paper says:

1. The most important feature of Chord is that it is constrained. The authors tailored the protocol quickly and simply locate a key in a network and maintain a stabilized network for doing so.  Having the problem be concise and efficient means it can be easily adapted by any applications that can make use of it. There is always a risk that a service will be over designed which doesn’t allow for the flexibility necessary for it to be widely deployed and useful.

2. The goal of the project was to provide quick and efficient lookups of where keys are stored in a distributed hash. They provably achieved this goal by deciding on an algorithm that will tell you which of you neighbors is closest to the key and then passing the request on to him. This makes the algorithm take only logarithmic time since you are dividing the possible hosts that could have the key by some factor at each hop. This was the best they could hope to do with distributed keys.

3. The other key challenge upon implementing Chord was to make the system stable enough to still function even nodes join and leave (by choice or by crashing) the network. The authors have the system periodically running a stabilize function which updates each ones list of which nodes it can access and passes its distance to different hosts to its neighbors. This means that even a request that is in route should be rerouted around faults while in flight.

(ii) The most glaring problem with the paper:

I felt that the paper did not adequately address how to use chord to create a redundant array of data. Where data has copies and if the data storing a certain key goes down ideally it would provide a mechanism for the request to be rerouted to another copy of the data. In many peer to peer applications you wish to locate multiple people with the same data (how does chord facilitate this being that you can only have one of each key?). Given that they had a clear target for the protocol I feel it would have been wise to give application programmers ideas of how to make the best use of their network architecture.

(iii) The future research directions of the work:

Follow up on how Chord performs in real systems will be very useful information. The author mentions that it is unclear if separate loops can form in practice so the deployment should demonstrate whether or not these can occur. They also only simulated the recursive implementation of Chord so it would be nice to have data on the iterative version as well.

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

This paper introduces Chord, whose main functionality is to efficiently locate a node in a peer to peer (P2P) network. Given a key, Chord maps it onto a node. It adapts efficiently with nodes leaving and joining the network frequently. Main contributions are:

  1. A modified version of consistent hashing: Chord uses consistent hashing to map keys to nodes. Consistent hashing provides load balancing by evenly distributing keys to nodes (each node is assigned roughly K/N keys), and it also has the nice property that keys don’t change too much as nodes join or leave the network (only O(1/N) fraction of the keys are moved to a new location as a result).  One modification that the authors made is to remove the need to be aware of all other nodes in the network. Based on their implementation, a node needs to know about only O(log N) other nodes in a network of N nodes. Therefore, lookups require O(log N) messages to other nodes. Chord can still function with awareness of fewer nodes, but its performance would suffer (each node only needs to know about one other node, its successor node).
  2. Notion of successor nodes, and finger tables: Chord is able to achieve its scalability and good properties through the use of finger tables. First, the keys are arranged in order on an identifier circle. A function such as SHA1 could be used to hash node IP addresses to keys on the identifier circle. The successor of a key is then the first node on the identifier circle with key equal or greater than the search key. Finger tables are maintained on each node n. The table is a list of successors of n in distances exponential in 2: the first key on the the table is the successor of n. The i-th entry on the table is the id of the first node that succceeds n by at least 2^i-1 on the identifier circle.  To lookup a key, the current node compares the key with its own id and if it’s not a match, it uses the finger table to find the node with id closest to the search key, and forwards the search to that node. In this manner, the distance to the search key halves each time a node is contacted, and thus at most O(log N) nodes are contacted before reaching the destination node.
  3. Chord simplifies P2P systems by providing: load balancing, decentralization, logarithmic scalability, high availability, and flexible naming. Chord avoids a single point of failure, since all nodes in the network have equal responsibility. To join a P2P network, a node only needs to know about one other Chord node in the network; there is no servers that are required. This, along with the fact that keys are distributed evenly on the nodes, provides availability and decentralization. Further, nodes keep track of several successors, so in case of failure they can update the finger tables easily. Scalability is achieved without requiring hierarchy, because nodes only need to keep track of O(log N) other nodes; lookups run in predictable time. Chord requires no naming structure, as opposed to services like DNS. So it offers a very flexible naming scheme. Another nice thing about the design is that it is simple and handles concurrent join and failures very well.

Chord has a very clever design. While it’s hard to critique it, I feel like frequent node join/leaves could hinder its performance. The keys of one or more nodes need to be redistributed once a node joins or leaves the network. If the joins/leaves are too frequent, then the nodes have to redistribute keys too frequently, thus hindering their performance.

Future research: it would be good to take into accoun the distance/latency between the nodes, and rather than selecting a successor based on closest hash, take latency into account as well

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

(i) the three most important things the paper says

  1. One of the most important things mentioned in this paper is the fact that, in a distributed peer-to-peer system, no single node needs to know about more than two other nodes in order for that system to function properly (assuming a ring-type topology, and ignoring the requirements to remove a node).  Although slow, this interconnection scheme is the most scalable (in terms of adding nodes) and will continue to maintain an all-inclusive state within the system.  Another item mentioned on the same topic is that a reasonable lookup time can be achieved in this system by only requiring each node to remember the locations of log N nodes (where N is the number of nodes in the system).
  2. Another important observation is the fact that the system will always converge (possibly barring any severe partitions) from any number of joins (whether concurrent or not) using the stabilize machinery given the proper amount of time.  Basically, the authors guarantee that no additions to the system (joins) can ruin its consistency.  All that is required is time for the proper stabilize commands to complete and distribute the predecessor and successor information to each node.
  3. A third important observation of the paper is the fact that probability and statistics of typical usage are on the side of the system’s design.  The authors bring up the point that this system will work well based upon a typical failure rate for hosts on the internet.  I would say that if this system was used for distributing data from server to server, this scheme would never encounter problems with performance that would require it to reduce to minimal performance (because of the sheer number of failures that it would have to deal with).

(ii) the most glaring problem with the paper

Although mentioned in the “Future Work” section, the lack of security to this protocol is a bit disconcerting.  Even though security can be added later, I would think that for a protocol whose main deployment is on the Internet security would be a main concern (and most likely integrated directly into the protocol design and not just patched in later).  It may be that this entire protocol could be invalidated just because of the fact that it would be difficult to guard against a host spoofing the system for malicious purposes.

(iii) the future research directions of the work

I feel that it would be very interesting for the authors to investigate the effect of changing the distance between the entries in each for the finger tables per server (instead of log N).  Another study could display the empirical results of these comparisons alongside the theoretical results, just for the sake of accuracy on both ends.

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

Three Important Things:

  • Chord utilizes the consistent hashing algorithm in order to assign responsibility of keys to individual nodes. This is an important and decision and a good choice, because of the properties of the algorithm. The algorithm distributes keys evenly across nodes, ensures minimum disruption when a node joins or leaves the network (efficient rebalancing), and provides that a node needs only a small amount of routing information about other nodes. The paper presents these features as the motivation for picking the algorithm, as well as short description and a reference to the paper that introduced it.
  • Due to the unpredictable behaviors of nodes leaving and joining simultaneously, chord required that every node periodically run a stabilization routine that reconciles the network.
  • The first part of the paper makes a number of theoretical performance claims about the algorithms it utilizes. Because these numbers are essential for demonstrating load balance and scalability, the paper dedicates a section to evaluate these claims. The performance section shows that the path length for a search is indeed on the order of log(n), meaning that as the number of nodes and keys grows the protocol is scalable.

Glaring Problem:

The paper mainly focuses on the description and evaluation of the chord protocol in the interest of constructing a p2p network. It would have been helpful to also include a discussion of some actual applications that utilize the protocol, in order to demonstrate how easily they were constructed and how generally the protocol can applied.

Future Work:

The principles developed and refined by Chord are useful in other areas of routing. Specifically in protected or private networks such as data centers, a scalable flat routing scheme based on Chord would help facilitate useful features such as host mobility and VM migration.

 

Chord a scalable peer to peer lookup service for internet applications May 29, 2009

(i) The three most important things the paper says:

1) By leveraging consistent hashing, Chord is not only able to resolve many of peer to peer’s most difficult problems (load balancing, scalability) , but do it in efficient time.  In addition to the fact that, Stoicia et al proved that this solution allowed nodes to enter and leave the network with minimal disruption as well as to have searches complete in O(log(n)).  I felt this was important, since the discovery does not involve further increasing complexity (e.g. CAN using d-dimensional Cartesian coordinate space) and thus easier from a minimalistic and security perspective.

2) The development of scalable key location in Chord allows it to only need to know about a few number of nodes.  In addition, this was further accelerated by keeping some additional routing information, so it doesn’t need to go through all the successors.       I thought keeping this additional information was important, because not only did it provide an optimization but it also added a way to check for correctness in case the successor information was not kept correctly.  Furthermore, since each node only stores a small bit of redundant information, scalability and robustness is naturally built in.

3) Chord handles failures very gracefully which contributes to the overall robustness and quickness of the system.  Whenever a node fails, a lookup quickly recovers by using information in the successors-list.  This feature, also has the benefits of allow lookups to proceed even during periods of stabilization.  I felt this is important, because without this tid-bit of insight, the failure during instability would be even exponential (currently linear).   This is yet another important feature to ensure that Chord is scalable to n nodes.

(ii) The most glaring problem with the paper:

The biggest problem with the paper is its failure to address all of the lookup failures during stabilization.  According to Figure 12, it shows that as the number of nodes increase, the number of failure due to instability increases linearly.  And although it isn’t an exponential increase, it doesn’t scale logarithmically as the tout in other areas of Chord.  This problem would be further exacerbated in today’s large networks which would spend most of the time in a state of instability.

(iii) The future research directions of the work:

The future research of the work would be to provide anonymity in Chord.  One of the most appealing uses of peer to peer is to share and spread data (e.g. bit torrent).  And these types of systems benefit the most when there are more nodes in the network.  I believe that folks would be more incline to join a peer to peer network, if there was no threat on their identity (think of all the identity theft these days).  Therefore, this system with all its benefits may never be adopted over a less efficient system (e.g. Freenet), if it doesn’t provide anonymity to its users.

 

Chord: A scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

This paper introduces Chord, a lookup service using peer-to-peer hosts without reliance on hierarchy. It has applications in all kinds of distributed systems which have no central naming service and are distributing information on a large number of hosts. The contributions of this paper are as following:

The main service performed by Chord is a mapping service that maps keys onto nodes, which are supposed to be providing the value associated to the key. This is realized by using the consistent hashing scheme to distribute the content (which is formed by all values) equally on all nodes participating in the network. By using hashing, a node location (for example its IP address) is hashed to some node identifier. When a new value enters the network, its key hash value (=key identifier) is of the same length as the the node location hash (=node identifier), and the value is put on the closest node, e.g. with the node identifier being the closest successor to the key identifier.

The next contribution is the lookup function used to determine the location of the value in the network. The underlying assumption is that due to scalability concerns, a single node knows only the identifiers of a small subset of nodes in the network. Since a consistent hashing scheme can be thought of as a ring of hashes, a single node always keeps at least the location of his successor in memory for correctness, and for performance improvements also the location of 2-to-the-ith nodes for some small values of i. This helps to break down the search space exponentially with each jump.

The last contribution is a scheme for adding and removing hosts from this network, which means performing the necessary updates on the lookup tables at some hosts and by redistributing some of the content in the network. Correctness only depends on maintaining the successor node correctly. The stabilization operation also handles the case of multiple node joins or multiple leaving/failing nodes.

The most glaring problem of this work seems to me that the consistent hashing scheme just places the nodes arbitrarily on the ring, not leveraging any neighborhood information about hosts. If this system would really provide world-wide lookup services on globe-spanning peer-to-peer networks, not leveraging this information leads potentially to largely inefficient communication patterns.

Future work should as indicated in the paper present a security study on this type of protocol, to make sure that it is resilient to attacks by malicious peers at least to some extent.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

The paper proposes a scalable protocol Chord for lookup in a dynamic peer-to-peer system with frequent node arrivals and departures. The primary operation supported by Chord is to map a given key to a node containing a data-item. To this end the authors use consistent hashing which tends to balance load, since each node receives roughly the same number of keys, and involves relatively little movement of keys when nodes join and leave the system. Chord tries to address the problems of load balancing, decentralization, scalability and provably low complexity of implementation. The main points discussed in the paper are Chord’s consistent hashing and key assignment mechanism, the scalability arguments for dynamic node joins and its fault tolerance.

Distributed hash tables provide for easy location of data by lookup on a key in applications such as peer-to-peer file-sharing, content distribution systems, web caching, and dynamic name services, to name a few. Starting from the idea of consistent distributed hashing to assign data to nodes, Chord provides a completely decentralized mechanism that allows a node to find the receptacle for a data item in O(log N) time by tracking only O(log N) locations of other nodes. Chord can be used effectively for cooperative web caching, with data mapped to caches by consistent hashing functions and Chord used to find the cache that contains the desired web data.

Chord is fully distributed where no node is more important than any other thereby improving its robustness and making Chord appropriate for loosely-organized peer-to-peer applications. The cost of a Chord lookup only grows as the log of the number of nodes, so even very large systems are feasible and making it very scalable. Chord automatically adjusts its internal tables to reflect newly joined nodes as well as node failures, ensuring that, barring major failures in the underlying network, the node responsible for a key can always be found. This is true even if the system is in a continuous state of change and hence makes Chord a completely available system. Chord assigns identifiers to nodes by arranging them in a circular fashion and assigning keys to nodes whose identifier is the same or the successor of the current key. For join operation every successor node need only its predecessor whose tables are updated when a node joins the circle of nodes. Optimizations are made to avoid the traversing the entire circle and only stepping though having only O(log N) routes to the destinations. Chord also implements support to handle concurrent node joins and failures.

Replication for fault-tolerance is left up to the applications. This is reasonable respecting the end-to-end argument, if the rate of failure is not particularly bad, which in true for many peer-to-peer scenarios. However application writers may not be aware of just how prone individual nodes can be to failure. For example, if a failure-prone node joins the peer-to-peer system and takes over an application’s data, an application that has come to expect that its data is rarely lost can suddenly be disappointed. If replicated data is not read-only, however, it might be unreasonable to suggest that Chord be adapted to handle that issue.

A possible shortcoming of Chord is that while its load balancing ensures that no node will have to carry more than k/n items (k being the number of keys and n the number of nodes), it doesn’t consider that some data items could be substantially larger than others. In the case of web caching, most cached data could be textual web data, interspersed with some large videos, causing problems for the nodes that have to host the videos in addition to the normal load of textual data items. Whenever it is possible to structure the distribution of data across a number of hosts in a network Chord’s structure is best suited since it uses a DHT to distributing the data objects. Chord has a bright future since it removes the presence of a centralized authority, thus making the distributed nodes of a peer-to-peer system truly independent. However it is worth considering that centralized authority though likely to be more costly provides its advantages of reducing the time required to find the node for a data item from O(log N) to O(log 1).

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

Thus paper proposes Chord a distributed lookup protocol that addresses the problem of efficiently locating a node in a peer-to-peer network that stores a particular data item. Chord just supports one operation: given a key it maps to a node. Data location can be easily implemented on top of Chord by associating a key with each data item and storing the key/data item pair at the node which stores the data. Chord uses a variant of consistent hashing algorithm that tends to balance the load since each node receives roughly the same number of keys, and also needs little movement of keys when nodes join and leave the system. In previous consistent hashing algorithm each node were aware of most other nodes in the system, thus making it impossible to scale to a large number of nodes. In contrast, Chord node needs the routing information of only a few other nodes. Because the routing table is distributed, a node resolves the hash function by communicating with a few other nodes. In steady state in N-node system, each node maintains information about only other O(log n) nodes, and resolves all lookups via O(log n) messages. The most important features of Chord are its simplicity, provable correctness and provable performance. Its simplicity results from the routing of key through a sequence of O(log n) other nodes towards the destination. A Chord node requires information about O(log n) other nodes for efficient routing but the performance does not degrade drastically when the information is out of date. This is important because nodes may join and leave the system arbitrarily thus making consistency hard to maintain. Hence even in case of failed nodes, the mapping of the key will eventually succeed with a little more performance overhead.

The problem existing with the Chord system is that it heavily relies on probabilistic behaviors and hence may fail in certain unlikely pathological cases. The ability to handle concurrent joining or failure of nodes efficiently is a challenge. Also in certain cases failure of nodes can entirely partition the network, thus breaking the key lookup chain and effectively making Chord to fail on a key mapping. Finally there are also a lot of security loopholes, specially if the Chord system is used for cryptographic purpose, in which case an adversary can insert a node with a convenient key that may perform a DOS attack for a particular query look up.

Future directions of the work may include building mechanisms for healing partitioned rings. One way to check global consistency would be for each node to periodically ask other nodes to do a Chord look up. If it does not yield the result, there might be an existing partition. Another approach might be for nodes to maintain a long-term memory of random nodes they have encountered in the past. In case of a partition, random sets in one partition are likely to include nodes from other partition. Also for defending security loopholes, techniques like cryptographic hashing of IP addresses can be used which makes an attack difficult. Finally the lookup latency can be improved by using a server selection technique which can measure the network delay of each of the table entries pointed to by a node and use the node with minimum delay to forward the lookups.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

(i) Three most important things

1. Peer-to-peer systems and applications should be distributed systems without any centralized control or hierarchical organization, where the software running at each node is equivalent in functionality. The paper presents a scalable lookup protocol known as Chord that is fully distributed so that no node is more important than any other which improves robustness and makes Chord appropriate for loosely organized peer-to-peer applications.

2.  The load between nodes in peer-to-peer systems and applications needs to be balanced. The Chord protocol maps keys to nodes. A node might be responsible for storing a value associated with a key. Chord uses consistent hashing to assign keys to nodes and consistent hashing tends to spread keys evenly over the nodes, providing load balance.

3.  Lookup service for peer-to-peer systems and applications should be scalable to a large number of nodes. Previous work on consistent hashing assumed that nodes were aware of most other nodes in the system but that would be difficult to maintain for a large network. Each Chord node only needs routing information about only a few other nodes because the routing table is distributed so a node can resolve the hash function by communicating with a few other nodes through messages.

(ii) Most glaring problem

The most glaring problem would be that the paper doesn’t consider that data items associated with a key could be substantially larger than others. Chord load balancing ensures no node will carry more than k/n items (k being the number of keys and n the number of nodes) but a node could still be storing a much heavier load than the other nodes.

(iii) Future Research Directions

Future research directions for this work would be to improve lookup latency and robustness against malicious users. A malicious set of Chord users could present an incorrect view of the Chord ring or an attacker could insert a node into the Chord ring with an ID immediately following the item’s key and have the node return errors when asked to retrieve the data.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

1. Decentralized lookup with minimal hops
Chord realizes a distributed framework mapping a piece of information to the node corresponding to the information. This allows a node on the network to identify the node maintaining the information. The “information” being spoken of could be any value expressible as a key with a hash function. The mapping is achieved by using a common hash function to map each node identified by its IP address to a key. The information being accessed is mapped to a key as well. Knowing the two keys, a node on the network can identify the node that maintains the piece of information. The framework also supports portability of data over the network and

2. Efficient implementation of the lookup algorithm
In the Chord implementation, a node A seeking information maintained by node B accesses an intermediate node C which maintains the mapping. The hash of the information provides node A with a key which can be used to identify node C. On access node C, node A can determine the node maintaining the information of interest (node B). Chord realizes this while minimizing the amount of mapping entries that need to be maintained by each node and the number of hops that node A would take reach node C, both of which are conflicting requirements. Chord realizes an algorithm that allows node A to identify node C in minimal number of hops.

3. Dynamic and scalable.
The solution provided by the paper supports nodes joining and leaving the network on the fly. The complexity, information to be maintained and delay in lookup increases logarithmically with increasing nodes. Moreover, addition or removal of a node from the network results in the maximum of O(K/N) entries migrating, where K is the number of keys and N the number of nodes on the network.

Drawback

Introduces considerable overhead in maintaining mapping information. Each piece of information of interest, introduces a new entry into the mapping table.
It also expects the node interested in access the piece of information to know a priori the key corresponding to the piece of information.
Might result in triangle routing, wherein the node being used for lookup is further away than the node of interest (i.e. the node maintaining the information).

Future Work
A solution to minimizing the possibility of triangle routing could be worked out. This would take into consideration location of nodes in addition to the keys they generate. Caching could be introduced at different stages to minimize latencies and repeated lookups.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

Chord solves the problem of scalable peer-to-peer lookups for applications by providing just one operation: given a key, it maps the key onto a node. Chord is a scalable protocol for lookup in a dynamic peer-to-peer system with frequent node arrivals and departures. It attempts to solve these 5 problems: load balancing, decentralization, scalability, availability and flexible naming.

Major Points:
1.)
Consistent Hashing
Chord uses consistent hashing with sha1 in a circular fashion that allows nodes to enter and leave  with very minimal disruption. The consistent hashing is what gives Chord part of its scalability and the sha1 gives Chord a deterministic behavior. Consistent hashing also balances load with a high probability by giving all nodes roughly the same number of keys.
2.)
Scalable Key Location
It would not be scalable if every node had to maintain information about every other node. For this, Chord has every node maintain a routing table with at most m entries where m is the number of bits in the key/node identifiers. It then uses modules of 2^m to create intervals around the Chord circle so that each node only needs to know about a small number of nodes and the nodes close to them. They use successor and predecessor pointers and a concept of finger tables for maintenance. With this they prove that they can with high probability, only visit O(logN) nodes to find a successor in a N-node network.
3.)
Joining/Leaving
To handle joins and leaves, Chord maintains predecessor pointers along with the successor pointers. Every time a node joins or leaves it must update the nodes that point to it or it points to like a link list. They prove that with high probability, any node in a N-node network will use O(log(N)^2) messages to re-establish the Chord routing invariants and finger tables. It is important that this is scalable because in a peer-to-peer network, nodes leave, fail and join very frequently. To deal with failures, Chord runs a stabilization protocol periodically on every node. This finds bad pointers in the tables and tries to fix them.

Glaring Problem:

Chord’s correctness is based off of probabilistic guarantees. There can be a given time where the pointers for the successor/predecessor are incorrect and data that is in the system can’t be located. It is just prove that with high probability this will not occur. Some applications need stronger guarantees though. Chord is also not very secure because of a bad peer could give bad tables.

Future Work:
Chord is important because it provides a decentralized scalable look-up for peer-to-peer systems. Chord can also be easily used too because of it’s availability as a library and it’s simplicity. Some of the future work on Chord that is mention in the paper include, security against malicious peers and the detection/healing of nodes who know each other.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

Three major contributions:

  1. A distributed lookup protocol that does not require consensus, or two-phase commit in order to work. (i.e. nodes do not need a strictly consistent view of the network)
  2. A distributed lookup protocol requiring a node to store O(log N) information about other nodes (for efficient routing).
  3. A distributed lookup protocol that can deal with a reasonable amount of network churn (i.e. nodes joining/leaving).

A major problem:

They do not address load balancing in two situations. First, their placement of nodes is based on SHA-1, an uniform hash function, meaning that nodes have an equal probability of being placed anywhere on the ring. This characteristic, in practice, allows some nodes to be responsible for a large fraction of the ring while other nodes will be responsible for far less. Also, this system assumes that nodes are homogeneous. Fifty percent of lookup requests for a node are going to pass through the nodes predecessor. Therefore, nodes are bottlenecked by the capacity of their predecessor.

Future implications:

Distributed applications now have a scalable way to lookup other resources in a network without a centralized service. Future work could include addressing the above problems.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

i)

1. Chord uses consistent hashing to ensure that the number of keys at each node is relatively balanced, and only a small number of keys must be moved when a nodes joins or leaves the network. Chord does this by hashing the IP address of each node and hashing each key as well. Then every node is responsible for all keys that are on the interval from that nodes hash value to the hash value of the node with the next lower hash value (with hash values wrapping around from 0 back to the maximum value). This allows the assignments to be relatively stable since a new node joining/leaving the network will only intersect one interval and only a portion of keys on that interval will need to be transferred between nodes.

2. Chord speeds up location of keys by dividing up the keyspace into m partitions where m is equivalent to the number of bits in the hash value. Then each node n assigns the each entry in it’s table a pointer to the node that is the successor to the key value n+2^(i+1) where i the the entry’s index. When this table reflects the current state of the nodes, this allows a node servicing a request to very quickly route it close to where the key resides rather than making the request travel all around the ring.

3. One problem is dealing with nodes dynamically joining and leaving the network. Chord deals with this by having a stabilization protocol that each node runs periodically. The stabilization protocol has a node n, query it’s successor s, to see if it is that nodes predecessor p, and if not it either updates its state to reflect that p is now n’s successor or that n is now s’s predecessor. This allows the all the nodes pointers to be kept up to date.

ii) One flaw in Chord is that a malicious attacker may be able to compromise the Chord ring by having a large number of clients from different ip addresses join the ring. Those clients would then take up a large amount of the key space and allow the attacker to disrupt the system or return erroneous values.  For instance if a user is looking up the location of a file, the attacker could instead point the user to a piece of malware instead.

iii) Future work would be to investigate how Chord could be made more secure and prevent malicious attackers. One mechanism might be a node could periodically insert a key with some nonce value and then attempt to retrieve the key and check to see if it gets the same nonce value back. If not, a gossip style protocol could be used to propogate the fact that the node thinks the successor to the key it inserted may be malicious.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

The primary contribution of the paper is the design of the datagram based unreliable congestion control protocol. The protocol performs congestion control to avoid overwhelming the network, but does not guarantee reliability or in order delivery of packets. These features can be layered on top of DCCP and are typically not required for the applications that DCCP targets. The protocol itself is built in a modular fashion and provides a mechanism to support different congestion control algorithms based on the application need.

The detection of reverse path congestion is another nice idea that has not been achieved in TCP. However, I find it hard to understand how exactly that can be used. One potential possibility that I can think of is  that if congestion is detected on the reverse path, then delayed ACKs can be used with larger delay (more packets acknowledge by a single ACK). This avoids overwhelming the network with the use of ACKs.

The idea of using ACKs to indicate the last received packet rather than the lowest sequence number  not yet received (as in TCP) is another cool idea which leverages the fact that DCCP is an unreliable protocol unlike TCP. Also, the notion of ACKs indicating the receipt of particular packets rather than bytes, is inherent to the idea that DCCP is a datagram protocol rather than a byte stream based protocol.

One of the primary concerns with DCCP is that there is not sufficient incentive for a user to start using DCCP. Other users who use UDP would likely be able to get higher bandwidth from the network since they do not back off due to congestion. Applications like video streaming typically already include some form of application level feedback in the reverse direction which indicates the goodput and can hence be used to switch codecs based on the usable bandwidth (coarse granularity). Typically network congestion is a transient condition and switching codecs at that fine granularity when congestion occurs does not lead to perceivable performance difference for users.

Future research could look at how to provide incentives for users to adopt DCCP for datagram based protocols. Determining different congestion control algorithms to use with DCCP is another direction for research.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

As an alternative to UDP, the authors propose DCCP which combines the unreliable nature of UDP’s delivery mechanism with TCP-like congestion control.

  1. Applications such as multimedia video and audio currently use UDP, but they implement their own congestion control mechanism at the application layer. The authors point out that this leads to enormous overhead on part of the application programmer, and any bug will potentially affect other flows in the network. Also, multiple implementations will lead to incompatibility and duplication of effort. This is the motivation behind DCCP protocol which like UDP prioritizes timeliness over reliability, but provides TCP-like congestion control for long-lived datagram applications.
  2. The authors make a credible case for why certain features were not included in DCCP, for what they consider a minimal DCCP design. This gives confidence in their design process. The minimal design omits flow control, selective reliability, streams and multicast, since all of these (except multicast) can be implemented above DCCP layer, without any changes to DCCP itself.
  3. The authors propose two schemes for congestion control. The first one is similar to TCP — as the available bandwidth changes, the sender rate (congestion window) also changes. One significant difference is that acknowledgments in DCCP are also congestion-controlled. The second method is based on TCP-Friendly Rate Control (TFRC). This uses the feedback from the receiver feedback.

My issue with thsi paper is that the authors have conducted little testing of application performance with DCCP. The focus has been mainly on the design decisions, but for a successful protocol, performance is equally if not more important. The obvious advantage of UDP is less connection overhead, which is lost with DCCP employing TCP-like connection management.

Though TCP dominates as of today, the future for a protocol like DCCP is bright because increasingly UDP traffic is steadily on the rise due to videos and internet telephony. The incomplete work on supporting multicast would go a long way in making DCCP more attractive, since live streaming of video on demand is generally thought to be multicast.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

(i) the three most important things the paper says

  1. The initial observation that congestion control is necessary in connectionless protocols like UDP (and not as its own layer) is an important one.  The authors argue that having congestion control in any other part of the Internet model (in IP or on its own layer) would require input from each program that implements that protocol; a quite unreasonable demand from application programmers.  The authors also made the observation that having the congestion control in the protocol itself is really the best option for receiving information about packet loss that would be appropriate for a connectionless best-effort protocol.
  2. Another important observation that the authors make is that although there should be a “safe” set of parameters (where all mechanisms would work at today’s link speeds), there should also be a method in which to negotiate a smaller header size so that traffic that uses smaller packet sizes (or any other such difference from the “standard” type of communication) won’t suffer by using congestion control in its traffic.
  3. A third important observation that the authors make is that a customer of this protocol should have his or her choice of congestion control policies and should not be limited to a mandated policy.  Locking down this decision is often a protocol-killer as that particular design choice (that might not be the “right” one all the time) might not work for all applications.

(ii) the most glaring problem with the paper

I feel that the lack of meaningful simulation/verification is a large problem with this paper.  I found that it was hard to believe that this scheme will work entirely without some sort of empirical evidence of it working (also, it would be useful to see how this will affect different types of connectionless traffic, in terms of performance).  The authors provide some minimal simulation, but simulating this over other internet traffic (TCP, etc.) would be necessary in order to verify the usefulness of this protocol. (Let it be known that this paper was very through, and finding holes in it was a bit difficult for me.)

(iii) the future research directions of the work

One future research direction would be to profile applications using this protocol over UDP (mapping the slowdowns that the application will incur–if any at all, the effect on the latency of the connection, how the protocol coexists with other internet protocols, etc.) .  Another great topic to look at would be if it would be in the best interest of the protocol to look at improving its semantics without referring back to TCP for each comparison.  Some of the design decisions made could be better off without relying on a connection-based protocol for holding the “oracle” of how this connection should behave.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

i)

1. There are many applications that want to send data in real time and the reliability of all packets being delivered is not important. TCP’s reliable delivery can cause delays when there is packetloss and UDP has no mechanism for congestion control. The authors propose DCCP, a UDP like protocol with congestion control, thus allowing it to be a network friendly protocol.

2. The designers were very worried about byte overhead in the headers. One problem with sequence numbers was they needed a large number of bits (48 bits) to avoid the chance of sequence numbers wrapping around. They came up with a clever design to optionally avoid having to send the 48 bit sequence number every time and instead send only a 24 bit sequence number on data packets. The mechanism they use is they keep track of a predicted upper 24 bits and simply extend the 24 bit sequence number on a received packet by using this prediction.

3. They built in a mechanism to detect misbehaving clients. One problem with a protocol like this is that clients can lie to the sender in order to avoid having the sender cut their sending rate, but also a client can’t be expected to receive every packet from the sender. To prevent the client from lying, a single nonce bit is included in every packet and when the client ACKs several packets it must also include the XOR of all the nonce bits for the packets it is acking.

ii) I think the major flaw in this paper was trying to design a new transport layer protocol. New layer 3 and 4 protocols face major hurdles in adoption due to many middle boxes such as NATs having to understand the protocol as well as the end clients. I think it might have been better to design this as a library using UDP packets and pack the DCCP header information that isn’t in the UDP header in the first few bytes of every UDP packet. In this manner the end client applications will understand the protocol, and middleboxes won’t present an obstacle to adoption.

iii) Future research directions should look at the possibility of designing a protocol that uses UDP packets as its underlying mechanism of delivery. This might entail changes to the protocol as well as examining whether the extra overhead is acceptable.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

The Datagram Congestion Control Protocol (DCCP) is a transport layer protocol that values speed over correctness, e.g. streaming media, telephony, etc.

DCCP provides a choice of congestion control schemes. This option is exposed at the application level. The choices are either a selective acknowledgment TCP-style scheme using a congestion window with additional control based on the ratio of data packets to acknowledgments, or a scheme where the sender transmits at a certain rate based on periodic feedback from the receiver.

Retransmissions are not always desirable and are easily added above the transport layer. Thus, everything sent over the transport layer (packets and acknowledgments included) has a unique sequence ID number. Since there are no retransmissions, the acknowledgment ID numbers refer to the latest packet received, leaving the specifics of packets not arriving to other modules within DCCP. Furthermore, for some flows, a 6-byte long sequence number in the header may be a significant amount of overhead, especially when sending e.g. an 8-byte data packet; therefore, DCCP includes an alternate set of headers using a 3-byte long sequence number which is concatenated with 3 upper bytes at the receiver.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

Three Important Things:

  • The motivating factor for the design of a new transport protocol is that many internet applications would benefit from unreliable data transfer, and most today use UDP. The UDP protocol lacks congestion control, which represents a significant disadvantage since applications do not want to implement that feature on their own. The solution to this is a new protocol called DCCP. The authors outlined a set of requirements inluding congestion control and robustness. They also outlined a set of omissions including flow control, streams, and multicast. This is an insightful point in protocol design as it illustrates the concept of minimum functionality. The provided features were kept simple and tailored the application needs, and the protocol did not include anything that could be layered on top or make the design overly complex.
  • DCCP makes a fundamental change to the TCP sequence number scheme. All packets including acks are assigned numbers from a 48-bit space, as opposed to bytes of data. Acknowledgements are sent to confirm the delivery of the last packet received as opposed to earliest not received. The consequence of this is there is no cumulative tracking of the data stream, and therefore a vector is required to keep connection history. Acknowledgements are themselves acknowledged periodically to help prune the control state. The large number space increases the robustness of the protocol, however there is still a need for an explicit synchronization mechanism in the case of an unexpected sequence number.
  • The paper proposes two methods for congestion control. The first is very similar to TCP with the added feature of reverse-path congestion detection. This is achieved by keeping track of the ratio of data packets to acknowledgments. Once the ratio leaves the target zone corrections are made to the sending rate. The second method operates on the principle of receiver feedback. The sender attaches sending rate information once per RTT, and based on this the receiver reports back the loss rate. This method results in smoother changes in the sending rate.

Glaring Problem:

The DCCP designers decided to implement a framework to offer selectable congestion control mechanisms via a flag in the packet header, as opposed to implementing a fixed algorithm. After extensive experience with TCP congestion control experimentation it has been shown that different mechanisms often interfere with each other and could lead to unfair bandwidth allocation among flows. The authors do not address this issue in their discussion, which is significant because the stated goal of the new protocol is the ability to perform effective congestion control for all applications.

Future Work:

Given the lack of performance evaluation for DCCP in the paper, there is much room for simulation and deployment research in order to evaluate the success of the new protocol and the possibility for modifying existing applications to run on a modified kernel with a DCCP stack. The parameters for success should be throughput, fairness, coexistence with TCP etc.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

(i) Three most important things

1. A new protocol is needed to accommodate increasingly popular applications such as Internet telephony, interactive videoconferencing, streaming media, and interactive games. UDP provides unreliable, connectionless data transport, for applications where timeliness is the primary concern but users are forced to implements congestion control themselves. The paper proposes the Datagram Congestion Control Protocol (DCCP) as a new protocol that does not burden the application developer by providing congestion control.

2. Design goals for a new UDP protocol with congestion control would be minimalism, robustness, and framework for modern congestion control. The protocol should be simple in protocol design, implementation, and ability to leverage for applications and should have a reduction in protocol overhead. The protocol needs to be robust to withstand abuse and the protocol should support the congestion control features of TCP, yet still have the capability to evolve over time.

3. The headers of the new UDP protocol with congestion control must include sequence numbers that measure datagrams and ACK numbers should represent the most recent packet received. Sequence numbers should measure datagrams rather than bytes because unreliable applications generally send and receive datagrams rather than portions of a byte stream. DCCP’s ACKs report the most recent packet received and stores additional info in the ACK Vector, a TCP SACK relative that enables DCCP to report the specific packets received.

(ii) Most glaring problem

The most glaring problem would be that unreliable UDP flows are still a small fraction of the total Internet traffic so the case for DCCP just doesn’t seem convincing currently. For now UDP is good enough until unregulated UDP flows become a larger portion of Internet traffic.

(iii) Future Research Directions

Future research directions for this work would be to test DCCP over the Internet and demonstrate the viability of the new protocol in comparison to UDP and TCP and under what kind of networks loads.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

(i) The three most important things the paper says:

1. There is a growing class of applications that are not well suited for the well-established standard transport protocol of today: TCP. This class is comprised of applications that no longer need or even want reliability in there protocol. They are the time-sensitive applications mainly for streaming video, telephony, online gaming etc. These applications reject the notion of retransmitting packets because by the time a retransmission is received the data is no longer pertinent and the bandwidth that could be used to send data for the current time has been wasted. The author presents DCCP as a protocol that can offer the features desired of TCP, like congestion control, without reliability.

2. In normal train of thought this sounds simple. If you view reliability as an extra constraint then you would assume that once this constraint is removed from the protocol the protocol will become simpler and easier to develop. This was however not true in this case. Surprisingly knowing that all packets will eventually arrive allows the acks to be a lot more meaningful since they tell you the entire history of what has been received with just one number. This was just one example of the challenges the authors of DCCP had to overcome.

3. The authors chose to implement variable congestion mechanisms that fit into DCCP. This will allow the applications to choose the one that best fits they’re needs. The first congestion control was very similar to the congestion control implemented by TCP. An improvement was to add an ack ratio so that only some percentage of the packets will be acked thus reducing the congestion added by acknowledgements. The other mechanism implemented by the author was TFRC where the receiver is responsible for acking only periodically and reporting how many losses it has detected. If no ack is received the sending rate must be halved figuring that the link has become very congested.

(ii) The most glaring problem with the paper:

The performance analysis section of this paper was oddly missing. It is hard to know if this protocol has any worth without a comparison on how it performs compared to current methods implemented by these applications using UDP or TCP. Since this crucial information was left out it makes it very hard to know if the goals of the protocol were adequately attained.

(iii) The future research directions of the work:

First, it would be worthwhile to test the current protocols performance before deciding directions in which to pursue researching. The author suggests many possibilities for improvement upon DCCPs current congestion control policies that should be assessed. I was curious if it may prove profitable to only care about the RTT and % of packets being acked instead of focusing much effort on knowing exactly which packets have made it. This makes the sequence numbers much less important.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

UDP does not have any congestion control mechanism originally incorporated. It avoids TCP’s built in end-to-end congestion control mechanism. Till now UDP traffic volume was small relative to the congestion-controlled TCP flows hence network did not collapsed. Recent development of application such as VoIP, on-line games, streaming audio has preference for timeliness over reliability and in-order requirements. This long-lived connection is sometime supported by application level congestion control. DCCP provides an enabling technology for existing and new application to transfer data without destabilizing the Internet. Preference for low overhead, clean and minimal protocol results in design decisions to leave reliability or Forward error correction to be layered on top rather than on transport protocol.

Major problems with UDP are problem of causing network failure due to non-congestion-controlled datagram flow, difficulty of correct implementation, lack of standard for congestion feedback transmission. Reason of using UDP over TCP includes, less start-up delay to avoid three-way handshake before data transfer, they do not wish to hold connection state, trading of reliability against timing due to application nature.

DCCP provides a minimal solution both in functionality and mechanism so it does not provide functionality that can be layered above it by the application. It also achieves the minimal header size (important for applications like VoIP). It is robust against data injection, connection closure and denial of service attack. Its goal is to support timing-reliability trade-off needed by UDP based applications.

DCCP implementation is a unicast connection oriented protocol with bidirectional data flow. Connections start and end with three-way handshakes. It depends upon sequence numbers and detects loss without application support. Every packet occupies sequence space and uss a new sequence numbers. It is different from TCP in a way that ack number corresponds to the latest packet received rather than the earliest not received. This requirement is driven by the application type which might not be interested uin lost packet retransmission.

Synchronizations are supported to resolve the unexpected sequence number/ack number. Unique packets are used to help mutual unsynchronized end points to synchronize. DCCP provides a set of congestion control mechanisms appropriate for applications. These choices are made through Congestion Control Ids(CCIDs). This is negotiated at the connection setup time. CCID 2 is TCP type congestion control and ACK uses Ack Vector option which is similar to TCP SACK. It uses a congestion window. CCID 3 uses sending rate rather than congestion window. If no feedbacks are received for several RTTs then sender reduces the rate by 2. It supports checksum to cover less than entire data which prompts underlying layers to forward corrupt datagrams. Congestion loss can be differentiated from corruption los by reporting corruption separately by the receivers.

Overall DCCP is a simple protocol that manages congestion controlled connections without reliability and caters to the UDP type application which requires reliability-timeliness tradeoffs.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

This paper talks about a new protocol, Datagram Congestion Control Protocol (DCCP) which provides a midway between a protocol like TCP (which has congestion control but doesn’t emphasize timeliness) and UDP (which doesn’t have congestion control but can be used to enforce timeliness). The authors talk about the need of such a protocol in applications like streaming media, internet telephony, videoconferencing, games etc. where timeliness is preferred over reliability. At the same time, these applications do not want to implement congestion control mechanisms themselves which is taken care of in a protocol like TCP but it can introduce arbitrary delay between the byte streams. Therefore, authors propose a new protocol for time sensitive applications where choice of congestion control mechanism is desired, low per-packet overhead is needed and buffering control is required to stop delivering old data. Also, the feature for Explicit Congestion Notification (ECN) which marks the congested packets and a TCP like explicit connection setup and tear down for NAT and firewall support is required. The proposed protocol borrow certain good features of TCP and UDP like port numbers, checksums, sequence numbers(with difficulty), acks (congestion and ECN info), piggybacked acks and three-way handshake to set up and a two-way with wait to tear down the connection. The protocol introduces new concepts like negotiating congestion control (CC) mechanism and parameters on setup and two half way connections. Half way connection is emphasized since traffic is typically asymmetric with different routes implying different congestion issues. Also, it is better than two one-way connections to work better with firewall and NAT and piggybacking ACKs with data. The selection for both half-connections is done in parallel at startup and is generic and extensible. The basic packet structure used is similar to UDP but is smaller and extensible for additional features. The CC mechanisms are ‘plug and play’ whose parameters for both ways are hosen during setup. The two CC mechanisms talked about are TFRC (control equation for changing the sending rate) and TCP-like congestion control with tweaked parameters. The mechanisms are selected using Congestion Control IDs (CCIDs) which names standardized CC mechanisms. In TFRC, the receiver sends feedback to the sender roughly once very round-trip time reporting the loss event rate it is currently observing. The sender uses this rate to determine its sending rate and defaults to TCP-like halving of rate if feedback is not received for several round-trip times. The checksums are partial and covers DCCP header and optionally any number of bytes in the payload. It allows delivery of slightly damaged data and can be useful for error-prone links like wireless.  In DCCP, ACKs are sent for latest received packet rather than earliest not received in TCP. It allows for windows to be resynchronized when out-of-window packet is received. There is also ECN mechanism to prevent misbehaving receiver.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

1. A bare framework in line with the end-to-end argument
DCCP is a congestion control protocol for unreliable connection oriented network. The protocol provides only the bare necessities required for congestion control. All other features have been left for the higher layers to implement. This makes it protocol independent allowing it to support a large number of existing and future protocols. It also makes it application independent which is an important feature considering the type of traffic it was directed at. It is likely that applications for multimedia, VOIP, video streaming etc have advanced mechanisms to implement fine-grained control over the arriving data. Not providing these features in DCCP provides these application the flexibility to implement these feature in a manner most suitable for their needs. This also allows each connection to negotiate its own features depending on the application at hand.

2. Specialized sequencing scheme
DCCP differs from TCP in method it uses for sequence numbering. Every packet in DCCP, regardless of the type of packet it is, gets its own sequence number. Moreover, each packet in DCCP gets a sequence number as compared to a byte-wise sequencing scheme in TCP. Hence the entire exchange maintains a single sequence of numbers for all its packets. This would also imply that each connection would require a larger sequence space.

3. Congestion control specialized for traffic requiring timely delivery over minimize-packet-drop delivery.
DCCP makes several enhancements/changes to TCP congestion control. It provides acknowledgement of acknowledgments. Its sequence number reports the latest packet received as compared to the earliest packet not received since applications here are more concerned with timely delivery as compared to confirmed, every packet delivery. DCCCP also leverages the unidirectional nature of its traffic by maintaining acknowledgment history and order in only a single direction.

Drawbacks
Some of the features provided by them to support real-time traffic adds some over-head over the congestion-control provided for by TCP. A sequence number for every packet implies a larger space for sequence number (48 bits in DCCP as compared to 32 bits used by TCP). The paper does talk about selectively using the upper 24 of the 48 bits. The protocol also requires acknowledgment of acknowledgment packets which would add to the network traffic. The Ack Vector, if used, would lead to even more overhead
In trying to exploit the unidirectional nature of their traffic, a direction goes into a “quiescent” state if there is no traffic in the direction for more than 0.2 second. This could lead to obvious issues if the application involved generates data in bursts.

Future Work
DCCP follows a policy of bare framework. This, while having its advantages, leaves out several hard to implement features to the upper layers. It might be possible to selectively provide these features in DCCP. The paper does mention support for feature negotiation. The more complex features (such as support for multicast) could be added on during feature negotiation. However, there is no end to the additional features that can be provided.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

This paper describes the DCCP protocol, which is designed as an alternative to UDP which will provide the same timeliness prioritization over reliability while adding the congestion control that UDP lacks.

The main points of this paper are as follows:

  1. Congestion control should be decoupled from the protocol itself. Different applications have different communications requirements. While some prefer discarding corrupted information, others prefer receiving it. In other cases, some datagrams are more important than others (I-frames vs. B/P-frames in MPEG video), so if some have to be dropped due to buffers, it can be helpful to be able to handle some more carefully than others. DCCP provides flexibility in congestion control by acting as a congestion control framework which different congestion algorithms can be “plugged into.” This has the added benefit of allowing different congestion control for halves of a bidirectional communication, since many of the applications of UDP (and subsequently DCCP) tend to be mostly unidirectional.
  2. The main requirement for congestion control is sequence numbers. These allow the protocol to detect loss and respond accordingly. Even though DCCP does not require retransmission, it also does not preclude it, if the application determines that there is sufficient time to make it worthwhile. However, as opposed to TCP, which numbers bytes in a byte stream, UDP and DCCP provide an abstraction in terms of datagrams, not bytes. Therefore, the sequence numbers in DCCP are specified in terms of packets. They also took the opportunity to specify a variable number of bits in the header for sequence numbers. Whereas TCP only specifies 32 bits which are used to specify bytes in the byte stream (which is becoming more and more of a problem with gigabit and higher bandwidths), DCCP can use up to 48 bits to specify a number of packets. This provides resilience against both malicious users (trying to run reset attacks is much harder than it is against TCP) and mistimed packet arrivals. However, they also recognized that not all applications need the overhead of 48-bit sequence numbers, so they created an option to use 24-bit numbers instead. This opens the attack window for reset attacks, but reduces the overhead when it is unlikely that mistimed packets will overlap a window in 24-bit space. This provides the application developer a choice of resilience or overhead.
  3. Connection management should provide flexibility in usage models. In order to provide congestion management, DCCP is required to use a connection-based paradigm more similar to TCP. However, since connections over DCCP can often be mostly unidirectional, it is helpful to be able to configure the half-connections separately. DCCP provides this through feature negotiation, wherein both sides of a connection can agree on the policies to be implemented at one or both endpoints. In addition to adding per-direction policies, DCCP provides mobility support by allowing multiple connections to tie into a single session. This is made much easier since DCCP is an unreliable protocol and therefore requires much less synchronization upon the change in an endpoint’s location.

The main weakness in this protocol is that it is not backwards compatible in any useful sense with either TCP or UDP. While it is careful to be TCP-friendly (and indeed, compared to UDP, almost anything is friendlier to TCP), it requires fresh implementations on end-nodes, NATs, etc., which is a distinct barrier to deployment.

It would be interesting to see future research on this protocol using an overlay on a UDP system. Since NATs already support UDP, it might be an effective way to test the congestion control policies without requiring multiple nodes on the path to understand a new protocol. However, it remains to be seen whether it would be as effective as an overlay.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

TCP is the predominate Internet transport protocol. TCP ensures reliable delivery and limits its transfer rate in order to avoid network congestion. UDP follows TCP in popularity. UDP performs unreliable delivery without congestion control. UDP is used when real-time delivery of data is paramount, i.e., VOIP. DCCP, Datagram Congestion Control Protocol, is new protocol that performs unreliable delivery with congestion control.

DCCP has a modular design that allows the implementation of new congestion control algorithms. The application designer specifies which congestion control algorithm is used in each direction of a DCCP session. The initial design includes 2 protocols. A TCP-like algorithm quickly determines the maximum flow rate and quickly adjusts to changes in available capacity. A second protocol option more easily maintains a constant rate.

DCCP has a default API that is similar to UDP. Its default simplicity will facilitate its use as a UDP replacement. The protocol does allow the user to specify a desire for greater timeliness versus reliability. Real time applications e.g., VOIP, games, often have latency bounds on delivery beyond which the data is useless. A DCCP programmer can specify that newer data be presented and old data be deleted from the receive buffer.

One definition of perfection is when there is nothing left to remove. The DCCP designers followed this motto. Flow control, selective retransmission, streams, and multicast were purposefully not included in DCCP. Consideration for security is included. The minimum facilities to meet the problem were implemented. Additional function is left as an exercise for the application programmer.

Current congestion control methods are limited to the various flavors of TCP. New congestion control algorithms are designed to be TCP friendly. DCCP modular approach to congestion control is a boon for congestion control algorithm designers. The coming onslaught of congestion control algorithms may interact unpredictably. It is no longer sufficient to be merely TCP friendly. An objective method of appraising congestion control algorithms is required.

The document points out the state of the art in congestion control is still advancing. Objective metrics for the analysis of congestion control algorithms are required. Historically, comparision with TCP, i.e., TCP friendliness, has been the metric. TCP friendliness may still be a valid metric if its transitivity can be shown.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

Filed under: R19. Designing DCCP: Congestion Control Without Reliability — krishnanadh @ 7:55 pm

The paper proposes the Datagram Congestion Control Protocol (DCCP), an unreliable transport layer protocol with congestion-control. The authors of the paper motivate the need for DCCP by the increase in applications, such as streaming media and telephony, which utilize the UDP protocol. They point out that application developers are reluctant to implement their own congestion control as it is too complex, but are willing adopt one if it is serviced as a part of a the protocol implemented inside the OS. The paper begins by surveying the range of applications that use a connectionless protocol, such as telephony, streaming media, interactive games and examine the requirements of each. Based on these requirements, the paper presents the following primary goals for DCCP: minimal in both functionality and mechanism, robustness in the face of attackers, NATs and firewalls, support for various application types and provide framework for development, self-sufficiency by providing simple API without application intervention and support for applications requiring fine-grained control. To remove the complexity of a TCP like protocol the authors decide to omit flow control, selective reliability, streams and multicast. The main point discussed in the paper are DCCP’s functioning, its connection management and congestion control.

DCCP is connection-oriented protocol that operates on sequence numbers for invidual datagrams rather than bytes of the packet. The protocol begins with a three-way handshake similar to TCP and a new sequence number is used for every packet. Synchronization packets are used to handle the case where a large amount of packet loss results in unexpected sequence numbers. Acknowledgements are handled in a way similar to TCP except that acknowledgments themselves require acknowledgment like the TCP SACK from time to time. These acks of acks are piggybacked in data packets for two-way communications. Also, acknowledgments do not guarantee delivery of data to the application they are only used as hints of probable packet thus removing much retransmission and making the overall protocol unreliable. The senders are often given the flexibility to choose the length of the sequence numbers used by DCCP to reduce the complexity of their application and make them more sensitive to latencies. On the whole DCCP is proven by the authors to be more robust to sequence number attacks than TCP.

Congestion control in DCCP is divided into two main categories, one that is similar to TCP and mimics its functionality and the other TFRC which is TCP friendly. The TCP-like mode differs in that it also employs reverse-path congestion control. It is achieved by continually changing the frequency of acknowledgments the sender requires. The TFRC mode is a rate-based mode rather than window-based. Acknowledgments are thus required once per RTT and the rate is adjusted accordingly at the same frequency. Several measures included for misbehaving senders and receivers are discussed in the paper. Connection management is made efficient by try to avoid denial of service attacks (DoS) that fill up memory on a server with half-open connections. To address this DCCP is designed so that state information is primarily maintained on the client-side thus avoiding server side burden.

The paper puts forth an important proposition when it is states that designing a congestion control mechanism is complex since Internet router behavior is not fully understood. The paper also highlights the difficulty to decouple reliability and congestion control which are tightly intertwined in TCP thus opening up an argument of DCCP-like protocols which require much effort to deploy going forward. However, the deployment of the ideas presented requires revamp of some TCP ideas and requires implementing them from scratch rather than overlaying them on TCP structure by stripping of its features. The most important point proven in the paper is that even unreliable layer4 protocols cannot completely do away with congestion control since efficient bandwidth usage is of utmost importance thereby opening up an arena research areas which could delve deeper into this philosophy and come up with more protocols that suit applications like telephony, streaming media and the like. Overall the authors do a commendable job of presenting a new protocol to compete TCP in certain applications.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

(i) The three most important things the paper says:

1) DCCP’s decision to use the “latest packet received” for its ackno, and supplement with “Ack Vector” for summary information.  Kohler et al made this decision since when considering protocols without reliability, the latest information is the most important.  However since summary information is still important for congestion control, they added “Ack Vector” which is an additional cumulative history.  This was an important design choice since this feature was the cornerstone for providing congestion control (Ack Vector) at the same time not loosing the importance of acks (used for security, out of order,…).

2) The use of “feedback” in the implementation of TFRC Congestion Control along with how they deal with trust issues with the receiver.  Kohler et al designed this particular implementation of congestion control by using feedback from the receiver.  This solution was keen enough to avoid the “seesaw” problem of TCP like approach, which is not useful for applications such as VOIP.  Although in my opinion this implementation may not be good enough, since using information from the receiver doesn’t take into account other users of the same bandwidth (flow path), it still offers some smarter logic that can be leveraged by higher level protocols.

3) The carefully selected features that it deliberately left out of the design of DCCP which prevents it from making “assumptions” about higher level applications.  However Kohler et al, didn’t do nothing, instead they provided features that better enabled higher level applications.  For instance, their inclusion of “CCID-3” for congestion control can easily be used to add features on top of DCCP for flow control.  I felt this was an important contribution because minimalism and simplicity help with adaptation and security respectively.  Both of which are important for the success of the protocol.

(ii) The most glaring problem with the paper:

The biggest problem with the paper is the section on acknowledgements and saving sate information.  It made the right statement when it said that “pure acknowledgements must occasionally be acknowledged”.  However when asked the question of “infinite regression”, it simply said that it was sufficient to have a single acknowledgement.  This response was not enough to answer the question of “lost acknowledgements”, considering the Byzantine soldier problem.  Since lossyness is ok, I believe they should have added a “timeout” in which they clear the buffer of really old state information.

(iii) The future research directions of the work:

The future research of the work would be in the area of finding a better alternative that resolves the trade off between “short sequence numbers” vs “security”.  At this time, using short sequence numbers (better performance) will leave a bigger window for attackers to inject data.  Considering that, unlike UDP, there is a handshaking step for connection setup and end, perhaps there is some secret information that can be exchanged.  And this “secret Perhaps adding some sort of “pseudo randomness

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

Application writers of VOIP, interactive games, and streaming media programs use UDP over TCP since these applications cannot tolerate arbitrary delay and do not require reliability. Using UDP will burden application designers with the need to implement their own congestion control.

The authors present DCCP as a protocol designed for this set of applications with the goals of minimalism, robustness, modular framework for congestion control, self-sufficiency, and support timing-reliability tradeoffs. The authors argue that having each application designer write their own congestion control mechanism is too complex and can be dangerous.

DCCP is connection oriented, support explicit synchronization, has a 48-bit sequence number for security, mobility and multihoming support. Data and ACKs occupy the same sequence space so that ACKs can be acknowledged. The sequence number is incremented at a packet granularity rather than byte granularity. Cumulative ACKs doesn’t make sense in a unreliable protocol so ACK vectors were introduced similar to TCP selective acknowledgements. With this, DCCP can report specific packets received if there were lost packets. ACK vectors can grow to unbounded size so ACKs of ACK vectors will allow the receiver to clear it’s state. DCCP partial checksums can differentiate non congestion related corruption for congestion control over lossy links and also allow for corrupted packets to be accepted.

DCCP implements modular congestion control by negotiating a CCID. CCID = 2 allows for a TCP like congestion control with a congestion window and uses Ack Vectors for feedback. Unlike TCP, ACK  Ratio controls the ratio of data packets per acknowledgement. This provides a reverse-path congestion control and the ratio is capped to provide sufficient timely ACKs. CCID =3 provides TFRC congestion control where instead of a congestion window, the sender uses sending rate and depends on receiver feedback and does not require ACKs for ACK vectors.

The main problem that still needs to be resolved is related to whether routers drop packets or bytes of a packet. This affects the congestion control and fairness at bottleneck routers. The paper didn’t conduct any benchmark for performance differences between applications running UDP w/ own congestion control and DCCP so it’s not clear whether DCCP is a clear winner.

Further research can be done to compare congestion control layer on top of UDP, DCCP, and proprietary protocols. The benefit of a modular congestion control mechanism can open research into new innovation in this area. Further research to resolve middlebox issues can be done to ensure compatibility.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

DCCP is a protocol that provides congestion control without reliability. It is a unicast, connection-oriented protocol with bidirectional data flow. This paper addresses the results of designing a protocol with the new challenges that were not present when TCP was designed. Their goals were to make a very minimalistic, robust and self-sufficient protocol. DCCP had in mind telephony, games and streaming media in it’s design of the protocol. These applications prefer to have occasional losses or corrupted packets instead of the overhead of reliability and checksums.

Major Points:
1.) Sequence Numbers
DCCP must be able to detect losses without application support so sequence numbers need to be used. The sequence numbers are measured in datagrams instead of bytes because it does not need to be reliable. DCCP wants to apply congestion control to acknowledgements, negotiations and connection setup/teardown. To do this every packet occupies a sequence space and uses a new sequence number. DCCP also implements a syncing functionality that can sync the receiver and sender when a large burst of data is lost. Although similar to TCP’s sequence numbers, DCCP found a lot of challenges in just implementing a TCP like sequence number scheme.
2.) Congestion Control Framework
DCCP gives the application a choice of congestion control mechanism. This is important because Internet router behavior is not well specified at the time so there is no right answer for congestion control. With the framework many different congestion control mechanism can be tested. The framework also allows different applications to dynamically change their congestion control mechanism without changing protocols. There are special option fields that help two end points negotiate on which congestion control mechanism to use.
3.) Security
Although security was not a big concern when TCP/IP was designed, it is a huge concern to DCCP. To get around some DOS attacks, DCCP tries to push state to the client whenever possible using something like an Init Cookie option. DCCP can also push Time-Wait state onto willing clients and has rate limits. For misbehaving receivers, DCCP sends an unpredictable nonce value that the receiver should echo back. DCCP also uses 48 bit sequence numbers to deal with sequence number attacks.

Glaring Problem:
The biggest problem I found with this paper is that it did not have any benchmarking experiments. It is hard to believe that this protocol is more important than the related work without seeing results that confirm it is.

Future Work:
DCCP is in the waiting process of whether or not it will be adopted. What makes DCCP a major impact is the congestion control framework that they implemented. Since getting congestion control is a big question mark, researchers can use DCCP’s congestion control framework to implement and use different congestion controls. This will help further research in congestion control techniques.

 

Designing DCCP: Congestion Control Without Reliability May 29, 2009

Filed under: R19. Designing DCCP: Congestion Control Without Reliability — filipposeracini @ 7:54 pm

Many multimedia applications deal poorly with the TCP protocol because of its retransmission feature. Applications like Internet telephony tolerate much better a packet lost than a delayed one. The natural choice for this kind of applications is to use the UDP protocol. However UDP does not provide any congestion control mechanism. As result, the applications using UDP are forced to implement the congestion control themselves. This makes application development much harder and error prone.

The protocol presented in this paper, the DCCP (Datagram Congestion Control Protocol), addresses the congestion control over an unreliable transport protocol.

The three most important aspects of this protocol are:

  1. Congestion control framework. DCCP does not provide a single mechanisms to deal with congestion control. Instead it provides a framework. This allows two end-users to establish the congestion control algorithm they want to use during the handshaking phase. Without constraining the congestion control to a single mechanism, DCCP becomes a very flexible protocol that leaves plenty of room for further optimizations and changes. Being a framework has also two other benefits: first, it facilitates the adoption of DCCP among the set of common transport protocols; second, it allows applications to use the congestion control algorithm that better satisfies their needs.
  2. Flexibility. Multimedia application needs can vary broadly. For example, some application may only have a few bits of data to transmit, where others could have a substantial amount. In the former case, the header of the transport protocol should not add too much overhead. DCCP is able to adapt some of its fields, in particular the sequence number field’s length, accordingly to the application’s requirements. In this way it can be adopted by a much wider range of applications.
  3. Security. Nowadays protocol must provide security and be as much robust as possible  against attacks. This was not a prerequisite at the time when TCP was created. The authors of DCCP instead set security as one of the main requirements of the protocol. Through a very long sequence number field, explicit synchronization of the sequence numbers between the sender and the receiver and changes in the handshaking and closing connection procedures, DCCP was conceived to be a quite robust protocol that should resist to the most common types of attacks. For instance, DCCP should be able to resist a typical DoS attack because it moved the Time-Wait state from the server to the client, hence preventing the server from running out of resources.

The most glaring point of this paper is that a performance analysis and comparison with existing unreliable transport protocols is completely missing. Indeed the performance of a protocol is as much important as its mechanisms. Another point that is missing is a real test of its robustness against attacks. Accordingly to the paper, the only testing that has been done on DCCP is some sort of formal modeling. As direction for further research I would definitely go through a performance analysis and comparison and then I would try to identify types of applications where DCCP works better or worse than existing solutions.

 

Designing DCCP: Congestion Control Without Reliability May 26, 2009

This article described DCCP or the Datagram Congestion Control Protocol. This protocol was an attempt by the authors to develop a protocol with the low overhead of UDP but with congestion control built in, similar to TCP. The driving reason behind this was the rapid growth of real time Internet operations such as telephony and streaming media. The key points of this paper were:

1.Congestion control in UDP applications is currently only implemented in the application layer. However, this is problematic due to the fact that congestion control is very difficult to implement correctly, as witnessed by the long evolution of TCP. It is also something that developers would rather not have to deal with, possibly don’t care about (to the detriment of the network) and might possibly implement incorrectly, causing even greater harm. By standardizing a protocol, the implementation would be taken from the developers, theoretically to the benefit of themselves and the network.

2.DCCP is implemented as a flexible framework that allows the developers to choose which method of congestion control suits them best, as opposed to one algorithm that they all must follow. The reason for this is that different applications may have different requirements and the developer is able to tailor the controls to best suit their application.

3.DCCP supports explicit synchronization and packet sequence numbers in order to ensure that the two host do not become out of sync and that the receiver can recognize what packets are relevant. The reason that this is done, rather than the method utilized by TCP is that the unreliable protocol does not need to ensure that every packet is received by the end host. This actually has advantages to the method used by TCP in that connections can by re-synced rather than just reset should an unexpected situation arise.

The main issue that I saw with this paper was the lack of measurements comparing this protocol to TCP and UDP. While I thought that this paper did an excellent job of describing an overview of DCCP, I would have like to have seen them back of their claims, at least to a small degree. This protocol seems like it might have a lot of promise but it will be difficult for operating systems to justify including it and for developers to implement with it if they are not sure that it really will improve performance.

Research in this area is likely to increase as the amount of real-time and streaming data across the Internet increases. People will continue to try to improve on existing protocols in order to optimize their application and maximize the available bandwidth. It does seem interesting that there are only two primary protocols as it would seem that several more could be adopted with minimal confusion in order to more efficiently deal with the traffic of emerging communication trends.

 

Structured Streams: a New Transport Abstraction May 25, 2009

Filed under: Extra Papers — pefaymon @ 4:17 pm
Tags: ,

The paper “Structured Streams: a New Transport Abstraction” by Bryan Ford of MIT proposes a network transport abstraction based on an application of the composite pattern to network connections between applications. The abstraction provides a lightweight mechanism to create substreams of existing network streams which are forming a tree-like inheritance structure. This approach has a number of advantages, including a reduction of overhead for applications which need to open a number of connections to the same host end point and increasing the level of control an application has over its opened connections. This provides an opportunity to favor the delivery of control packets over data stream packets for example for a multimedia streaming application.

The described Structured Stream Transport (SST) assembles a collection of design patterns current applications running over TCP and/or UDP exhibit and tries to address their flaws and combine their advantages into a new transport protocol. The protocol provides to an application the service of transporting an arbitrary number of ordered, reliable byte streams which are in an hereditary relationship to each other. On the other end, it relies on an underlying protocol which can handle packets and deliver them with best-effort delivery. In between, the proposal multiplexes multiple streams onto channels modeled after IPsec, to provide security and reliability (services which are needed by all streams).

Only for the top-level root stream a 3-way TCP handshake is performed, for the spawned child streams inherit the connection information implicitly by being mapped on the already established channel, therefore reducing overhead. Flow control and security is controlled on a per-stream basis, allowing flexible application designs. Since the congestion control operates on a per-channel basis, it is easier to distribute the network bandwith more fairly between applications, because the number of  open streams has no influence on the congestion control. The delivery of data packets up to the application happens on a per-stream basis again, so packet losses on one stream don’t influence other streams.

Furthermore, the paper indicates a number of ways in which applications can benefit from this abstraction by effectively reusing existing connections and utilizing the flow control / prioritization mechanisms for ongoing streams. Also, a prototypical implementation and experiments are presented. The results indicate the performance for simulated web traffic to be on par with HTTP/1.1 with pipelining, which is not enabled in many browsers due to compatibility concerns, as the paper states. The performance impact of the user space prototype is moderate and below 10% overhead.

What is missing is a more extended and elaborate experimental section, where the protocol would be used on a network with realistic side traffic from many hosts and where other applications than HTTP traffic are tested. This should provide valuable insight into whether it is feasible to include this structured mechanism into future applications and gives also more implementation experience for feedback on the developer’s end.

Future work for this protocol apart from extended experimental validation is to provide an application-level framework for networked applications to facilitate developer adoption. Also, there should be some research into the question if it is possible to leverage this abstraction asymmetrically, only on one side of a connection, at first.

 

High-Bandwidth Data Dissemination for Large-Scale Distributed Systems May 25, 2009

Filed under: Extra Papers — dgaschk @ 4:17 pm

The popularity of large scale file distribution overlay networks such as BitTorrent grows steadily. This document describes Bullet, a self-organizing overlay network that outperforms BitTorrent and other contemporary distribution systems. Bullet uses a hybrid push-pull model for distribution and eschews a tree based topology for a more efficient mesh based model. The source does not retransmit a block until all blocks have been sent at least once.

A tree based distribution model has been shown to offer poor performance to the leaves. The mesh based system allows the formation of dynamic peer groups that exchange data eliminating the starvation that occurs with a tree topology. An established algorithm for peer group formation is used. Under-performing peers and peers without useful data blocks are dropped allowing more useful peers to be added.

Data blocks are sent in disjoint block sets. This makes it more likely that a randomly chosen peer will have required missing pieces. The peer selection algorithm facilitates the formation of peer groups that have complementary data portions. It is not possible that some subset of peers has not received required missing data after the sources initial transmission.

The system is self-organizing. User configuration is kept to a minimum. Adaptive dynamic operation allows the system to perform well under a variety of conditions. Static configuration is not amenable to the changing conditions encountered on the Internet. Self-organizing dynamic operation allows Bullet to outperform its contemporaries.

The self organizing nature of the system does not prohibit interference from self created cross traffic. A greater understanding of the physical topology could help to alleviate this self generated interference. Optimal exploitation of the physical topology could further improve performance.

Overlay networks are unable to perceive physical topology and this limits performance. Physical topology information is available in router databases. Integrating router information into the peer group selection algorithm would improve performance. Network multicast is able to exploit physical topology but for a number of stated reasons it alone is not sufficient for high performance data dissemination. A hybrid overlay-multicast system may prove most efficient.  Localized multicast groups in combination with a self organizing overlay is an interesting research direction.

 

Analysis and simulation of a fair queuing algorithm May 25, 2009

Filed under: Extra Papers — nekumar @ 4:17 pm

Paper presents a fair queuing algorithm for congestion control in datagram networks. Goal is to provide a fair allocation of bandwidth, lower delay for sources using less than their full share of bandwidth.  Congestion at Gateways can be controlled through routing and queuing algorithms. Adaptive routing lessens the congestion by routing packets away from network bottlenecks. Queuing algorithms do not affect congestion directly because they do not change the total traffic on the gateway’s outgoing line. These algorithms determine the way in which packets from different sources interact with each other which in turn affects the collective behaviour of the flow control algorithms. This aspect makes it a crucial component in effective congestion control.

Queuing algorithms is essentially allocating three nearly independent quantities: bandwidth (which packet gets transmitted), promptness (when do these packets get transmitted), and buffer space (which and when packets get discarded by the gateways). Most commonly used queuing algorithm FCFS intertwines all these three aspects because order of arrival ompletely determines all these three metrics. This algorithm is not fair because depending upon the packet size one flow can get more resources compare to other flows. FCFS does not have the property of functioning well even in the presence of ill-behaved sources.  A single source sending data at high rate can capture an arbitrary high fraction of the bandwidth of the outgoing line.

Nagle proposed an algorithm called fair queuing in which gateways maintains separate queues for packets from each individual sources and queues are served in round-robin manner. This scheme prevents a source from arbitrarily increasing its share of the bandwidth or delay of other sources.

Paper proposes a modified version of Nagle’s algorithm and provides simulation data comparing networks with FQ gateways and those with FCFS gateways. As the other components of congestion control mechanism source flow control, gateway routing, and gateway queuing algorithms, interact in an interesting and complicated ways, it is impossible to assess the effectiveness of any algorithm without reference to the other components of the congestion control in operation. Paper evaluates the queuing algorithm with in context of static routing and several widely used flow control algorithms.

Three goals of the paper are: describe a new fair queuing algorithm, provide rigorous understanding of the performance of this new fair queuing algorithm and evaluate the algorithm in the context of real networks.

Paper defines fairness as follows: (1) no user receives more than requested.  (2) no other allocation scheme has a higher minimum allocation. (3) the above 2 should remain true even if the user with the minimum allocation is removed and the total resource is reduced accordingly.

Paper defines the user to be a source-destination pair, which appears to be the best tradeoff between security and efficiency. Fair queuing algorithm works as follows: whenever a packet finishes transmission, the next packet to be transmitted is the one that will finish being sent the earliest (in the non pre-emptive case). In order to account for promptness, paper suggests giving less delay to users who utilize less than their fair share of bandwidth. This gives lower queuing delay to lower throughput sources. Paper also present a delay-throughput curve to show the fairness of this queuing algorithm over the First-Come-First-Serve (FCFS) algorithm (delay tends to infinity as throughput reaches 100% of available bandwidth for fair queuing but not for  FCFS). They also show that the delay is lower for a low throughput source with fair queuing as compared to FCFS.

Paper then briefly describe several flow control algorithms used at the source and present simulations of using these algorithms in conjunction with fair queuing and FCFS. The simulations show the throughput, average RTT  of packets, number of packets retransmitted, and number of packets dropped  for each combination of flow control and queuing algorithms, and for each of  the six scenarios simulated (including underloaded gateway, overloaded  gateway, ill-behaved source, mixed flow control algorthims, multihop path,  and more complicated but underloaded network). Paper concluded that overall fair queuing always performed better that FCFS in terms of fair allocation of bandwidth and buffer space, and in terms of delay. Fair queuing shuts out ill-behaved sources (thus acting as a firewall) and also encourages the use of more intelligent, self-optimizing flow control algorithms at the source resulting in fair, protective, and stable networks.
The paper ends by talking about two objections to their approach: (1) some source-destination pairs need more that their fair share of bandwidth (for which they introduced a simple modification to their algorithm) and (2) deployment issues due to the requirement of fast and smart gateways (which they question themselves and leave as future work to determine).
The paper presents the arguments clearly, but it is limited in its comparison to only one other queuing algorithm and in its evaluation to only a few small networks. Also, paper simplify the flow control algorithms quite a bit, especially the explicit binary feedback algorithm for which they simplify the signalling criteria.

 

Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks May 25, 2009

Filed under: Extra Papers — nekumar @ 4:17 pm

Paper describes the difference between congestion avoidance techniques and congestion control mechanisms. In congestion avoidance techniques, goal is to operate at a point where delay is low and throughput is maximum without congesting the network.  Congestion control mechanism allows the system to recover from the congested state consisting of high delay and low throughput. Key difference between these lies in the fact that Congestion avoidance tries to operate network at a point here queues have just started filling up while congestion control mechanism tries to keep the network from dropping the packets due to full queues. Graphically paper defines congestion avoidance as the point where throughput starts saturating with load and congestion control point where it drops rapidly with the increase in load.

Heterogeneity of technologies result in rate mismatches resulting in the congestion. Congestion control or avoidance mechanism can be classified as dynamic resource management problem. Paper models this as a control system where state is defined by the network load and control is defined as the window or rate at which users put packets in the network.

Analysis presented in the paper was used to propose a mechanism by the authors known as binary feedback scheme. In this scheme resources monitor their usage and depending upon the load level it sends a binary feedback to the users incorporated in the packets. Assumption is that this works synchronously, that is, all users receive same feedback and reacts to it and next feedback is then generated after all users have reacted to the feedback.

Paper analyzes the case of de-centralized decision rather than centralized decision. In de-centralized system users make decisions but resources feed the information regarding current resource usage. Assumption that the system is operating in congestion avoidance mode implies that user demand is same as resource allocation.

Purpose of the congestion avoidance mechanism is to determine the change in window/data rate as a function of system feedback and previous state. This change can be represented simplistically as a linear function describing the next state as a function of previous state and two constants describing the multiplicative and additive coefficients.

Paper presents the constraints on these coefficients to be able to provide congestion a voidance mechanism supporting efficiency, fairness, distributedness, convergence. Paper defines these metrics qualitatively and quantitatively. Efficiency is defined as closeness of the total load on the network to the load corresponding to network load corresponding to empty buffers after which throughput saturates with the load. Fairness is defined as maxmin fairness criteria in which users are partitioned into equivalent classes according to the resource which is their primary bottleneck. Then fairness is defined as equal resource share of the bottleneck resource between the users belonging to same equivalence class. Paper also supports the idea of distributedness which means that resource only tells whether it is overloaded or under loaded via the binary feedback bits. Convergence is defined by two parameters first by the speed with which system approaches the goal state from any staring state. Due to binary nature of feedback this might not be the steady state. Oscillation around this state determines the smoothness of the control.

Paper describes the linear control mechanism with control theoretic approach and provides the constraints on additive and multiplicative coefficients. It further gives the graphical interpretations of of all these constraints. It addresses the all possible solutions that converge to efficient and fair state and comparison among them. Further addressing the distrubted critearia and optimal tradeoff of responsiveness and smoothness defing the convergence. Some important results are: Multiplicative increase/decrease of individual resources does not change the fairness metric.  It proves that additive increase/additive decrease control mechanism does not help to converge for fairness. It can converge to efficiency though. Another non-intuitive result is the fact that fairness does not need to go up during both increase and decrease.

In order to satisfy the requirements of distributed convergence to efficiency and fairness without truncation linear policy decrease should be multiplicative and the linear increase policy should always have an additive component optimally it might have a multiplicative component with the coefficient no less than one. For linear controls with truncation increase and decrease policy both can have both additive and multiplicative components.

It further shows a result quite familiar from the control theory that any attempt to increase responsiveness (settling time) results in decreased smoothness, and vice versa. For both the feasibility and optimal convergence to fairness, increase policy should be additive and decrease policy should be multiplicative. Additive increase gives the quickest convergence to fairness.

It further shows that decrease should always be multiplicative to ensure that at every step the fairness either increases or stays the same as that of current operating point. Optimization constrains further require that changes in fairness and efficiency be maximized every feedback cycle. Paper shows that additive increase with multiplicative decrease is best to achieve this.

Paper further explores the analysis of non-linear control schemes but concludes that these schemes require parameters which are sensitive to the system state and information. Paper proposes future direction in this work as inclusion of delayed feedback impact, marginal utility of increased bits of feedback, estimation of current number of users, and impact of asynchronous operations.

 

Endpoint Admission Control: Architectural Issues and Performance May 25, 2009

Filed under: Extra Papers — dgaschk @ 4:17 pm

Admission control is a way to allow traffic that does not respond to congestion, e.g. non-TCP, to traverse the network without unfairly consuming network capacity. Endpoint admission control requires the source of the flow to probe the network to determine if enough available capacity exists to support the oncoming flow. Only if the probe is successful does the data flow commence. Several techniques for endpoint admission control are compared. The success of these technique are furthered if existing facilities such as DIFF-SERVE is configured throughout the network. DIFF-SERVE allows separate queuing for admission controlled traffic; however, its efficacy is shown even when such queuing is not enabled.

Traffic that does not respond to congestion could “unfairly” consume network capacity on congested links. TCP responds to congestion by throttling the rate at which it transmits. Existing TCP sessions slow the rate at which they send when congestion is experienced allowing new sessions to have their fair share of capacity. Traffic that is non-responsive to congestion will continue to arrive at the congestion point at a constant rate, forcing compliant TCP sessions to slow down. Admission control determines if capacity exists to support the desired flow before the non-responsive flow is initiated.

The authors do not develop any new endpoint admission control techniques rather they survey and compare existing techniques by simulation. The study compares several queuing disciplines and several probing techniques over a variety of topologies and distinct traffic descriptions. Per-flow queuing is examined and discarded due to its unsuitability. A single queue for all traffic is examined as a legacy or in an incremental deployment scenario.

DIFF-SERVE configurations that enable 2 or 3 priority queues were the focus of the study. A 2 queue system places admission controlled traffic in the high queue and best effort in the low queue. A maximum rate is applied to the high priority queue. The maximum rate prohibits the admission controlled traffic from consuming all of the link capacity, but allows the link to go underutilized in the absence of best-effort traffic. A 3 queue configuration places best effort in the low queue and admission controlled data traffic in the high queue. The middle queue carries only the admission control probe packets. A maximum limit is placed on the aggregate of the high and middle queues.

Probe packets are used to determine if capacity exists before starting the data flow. Probe packets are sent at the rate of the oncoming flow. If sufficient loss is experienced then the flow is not started. Graduated or “slow-start” probe flows as well as probe duration are examined.

The authors conclude that endpoint admission control may be a viable solution for admission based traffic. However, their model did not include traffic descriptions that match modern compression algorithms. Modern compression algorithms for real-time traffic vary bandwidth requirements greatly over time. For example, voice compression algorithms typically suppress transmission during periods of silence. Only a single participant speaks at a time meaning that traffic will travel in only a single direction at a time. Probe packets may find that available capacity exists if they happen to travel during a silence period, allowing the erroneous admission of a new flow.

The authors point out that further work is required. They point out that their work does not prove the applicability of endpoint based admission control. They show its potential under a limited number of scenarios. Much further testing and simulation is required. Traffic descriptions that match modern real-time compression algorithms must be included. Deployment of this technique in an enterprise network is a policy decision; however, deployment in the public Internet will require legal agreements. Research into defining such agreements is needed.

 

Ten Things Lawyers Should Know About the Internet May 25, 2009

Important things:
1) We need to change legal policies to allow more research of the Internet in order to make better policy decisions in the future.
2) There are serious problems with the Internet that need to be addressed, and even though some solutions have been conceived, the deployment of them is hindered by economic, political, and social factors.
3) Solving the challenges that the Internet currently faces will require more than short-term profit incentives of individual providers.

Problems:
I can sense through the paper frustration that she must experience as she tries to go about her research in studying Internet structure and traffic characteristics. This is of course a real problem, though she might come off as complaining about how difficult it is to get her own work done. The topic addressed by this paper is fairly broad and can have many more dimensions than those discussed.

Future Work:
Future work in this area should address the conflicting concerns that are make innovation and development on the Internet difficult.

 

Ten Things Lawyers Should know about the Internet May 25, 2009

Filed under: R18. Ten Things Lawyers Should Know About the Internet — nekumar @ 4:16 pm

Paper brings to the attention about various legal and economic issues of internet associated with internet which prevents us to understand Internet behaviour, usage pattern,   architectural limitations. These issues are mostly due to economics, ownership or trust rather than technical.

. Understanding of these issues is key to solving legal challenges faced by technology advancements.  Even with these limitations few available data points presents a dire picture of the situation including the limited number of IP addresses, vulnerabilities in different layers.

Data collection and analysis which has been established for financial systems has not succeeded for Internet. This has been partly due to muddy legal territory and objection to any kind of surveillance in US. Inability to do research has resulted in the contradiction of the view points for basic issues such as whether QoS based services make sense or not. In the absence of access to private data, interferences drawn from studies are highly biased favouring the amount of personal information extracted from data.

Paper further emphasise the aspects of the internet that there is not much government structure defining all the aspects of internet.  The opaqueness of the infrastructure to empirical studies have complicated the solution to two most important issues: IPV4 addresses and scalability limitations of the routing architectures.

Finally paper concludes that the assumption that compete ting entities will finally cooperate for architectural changes which are not upto their short term interest  is flawed. Engineers have provided solution to all the technical challenges expecting the private sectors to adopt these solutions. Another important consideration in this is the fact that solutions have to be employed beyond the boundaries hence any solution that emphasizes the country boundaries is doomed to fail.

This fact that competing at the middle infrastructure while cooperating at the edge has resulted in lots of innovations but these innovations do not mean that regulations should be removed or transparent or accountable experimentation is not required. Any decision on technology requires the information of underlying infrastructure to weed out any false belief or any decision on the mechanism.

Public investment in knowledge production gains from universal connectivity including distribution of resulting products to all taxpayers on zero marginal cost. It is in the interest of taxpayers to directly fund for universal deployment of network infrastructure.

If these aspect are not addressed then scientific researchers in the danger of doing science without data.

 

Ten Things Lawyers Should Know About the Internet May 25, 2009

Filed under: R18. Ten Things Lawyers Should Know About the Internet — brokerer @ 4:16 pm

This paper is literally 10 problems that occur on the internet due mostly from economics and government security.

Major Points:
1.) Lack of Data
Since ISP’s are rather private entities that do not reveal any information to researchers; researchers are doing work without real data. This is a problem because there is a lot of money being thrown into Internet research that do not have data from operational Internet infrastructure so they are basically working in the dark. Government security is also a limitation of data being revealed.
2.) The Dire Picture
Even with the fact that data is hidden from operational Internet infrastructure, researchers still have a dire picture from a few available data points. Just from these little points, researchers show that there is a lot that we need to be concern with. So there are problems and the solutions to these problems failed to solve them because of the economics and business side of the Internet.
3.) Regulations
It is very hard to tell whether or not data is being used for malicious reasons. It is also hard tell who really owns and sent the malicious data. This calls for restructuring the internet which at this point is impossible. Regulating wiretapping, disaster recover etc… causes more concern than hope too.

Glaring Problem:
This paper reveals a lot of problems with Internet research and even talks about the people that need to do something. They do not provide an real concrete solutions to all of their problems but they do mention that the solutions will cross boundaries.

Future Work:
With the internet growing so fast, the government needs to catch up. Although the internet was develop for file sharing, it is getting limited by copyright and security problems. Privateness and economic issues are severely slowing down Internet research and as time goes by our Internet infrastructure may not last without real Internet research helping it. This paper is challenging the policy makers to help create a better researching environment on the Internet.

 

Ten Things Lawyers Should Know About the Internet May 25, 2009

The article gives a good overview of several legal aspects related to the Internet and how research is hampered by regulatory issues related to the Internet that do not encourage research in this area. The Internet was initially developed as a research poject with the assumption that all the users would cooperate with each other. However, it has turned into a mammoth infrastructure that is used by billions of people today for communication.

The other aspect about the Internet is that several technical problems with the Internet have been discovered by the research community and the research community has proposed several ideas and techniques to solve them. However, the incentives to adopt these new ideas have been very few, and often the ecnomics of the change or legal frameworks do not allow such changes. This has led to the Internet turning into a fairly inefficient system. Although it works, it could be made a lot better.

An important factor about the Internet is that although different countries impose certain laws on the usage of the measured data, taking measurements etc, the Internet is largely unregulated or rather regulated in a distributed fashion. There is no single authority that dictates how the Internet should be used. This makes regulation a very difficult problem.

The article does not really provide any directions to solve these problems about the regulatory and business aspects of the Internet. While several problems are suggested, it would have been nice to have some directions towards possible solutions as well.

It would be interesting to explore how to create competitive advantages to ideas to solve the problems of the Internet. This would offer incentives to ISPs to adopt them. Also, there should be incentive for the government to support active research in the area. The laws governing the use of the Internet should be modified in the wake of the rapidly changing and deregulated Internet.

 

Ten things lawyers should know about the internet May 25, 2009

Filed under: R18. Ten Things Lawyers Should Know About the Internet — mohit1982 @ 4:15 pm

This paper tries to convey a message of cooperative measurement and modeling of open networked systems and the challenges and roadblocks that exist in pursuit of this goal. The author has tried to convey certain thoughts to the lawyers about how to update legal frameworks to best accomodate information technologies in the 21st century. The author has tried to list her thoughts which are:

  • There is a need to update existing legal frameworks to accommodate empirically grounded research into the things that have to be built, used and costs to sustain them. She stresses that current frameworks prevent sharing of data with researchers to scientifically investigate the technology advances. The author thinks that the scientific knowledge about internet is weak because of its shared and distributed nature and the roadblocks are economic rather than technical.
  • The author also argues that there are several known limitations with whatever data that researchers have and that itself call for concern. There are concerted efforts too in that direction but most of them happen in small pockets and that too with several pitstops.
  • The author stresses repeatedly the lack of data available in the field which the progress towards knowing a lot about this complex system and finding out ways to deal with it. There are a lot of restrictions on data sharing put by the government (for security reasons) or private competent players for their own economic motives.
  • The author comments that network community is not able to perform research on the very network backbones provided to them to do so.
  • The author also points out the fact that a growing number of segments of society spend large amount of money in performing internet measurements but they are not in pursuit of answers related to technical weaknesses but they are happening for purposes of security or business to maximize the amount of personal information that can be extracted from the data.
  • The author argues that government organizations who are policymakers are themselves reluctant to disclose the much needed information on how the internet is engineered, used and financed.
  • The author notices a positive side to this end that internet’s practical promise is for individual freedom, dem­ocratic engagement, and economic em­powerment, is also unparalleled. This promise is sufficient inspiration for an open, technically literate conversation about how to invest in technologies and policies to support articulated so­cial objectives.
  • The author also states that available data shows clear directions for solutions which cross policy-technology boundaries similar to interdisciplinary research into the network which is not emphasized a lot.
 

Ten Things Lawyers Should Know About the Internet May 25, 2009

Although this paper has ten points which concern laws and the Internet, they can be boiled down to three:

  1. The laws relating to copyright, privacy, wiretapping, and common carriage are not relevant to the Internet today. Furthermore, trying to apply them to the Internet has many deleterious effects with no real gains, since the usage model of the Internet differs so much from the services provided at the time when the laws were written.
  2. Furthermore, we actually know very little about the Internet from an empirical perspective. This is due to two factors. First, much of the current legislation (see point 1) makes it impossible (or rather, illegal) to gather the very information that would illuminate the usage patterns on the Internet. This hamstrings researchers, who are able to come up with many wonderful theories about current or future traffic patterns, algorithms, etc., but are unable to test or validate them on the Internet at large. Secondly, even when legal, these same efforts are frustrated by the fact that the Internet is composed of many private companies only motivated by relatively short-term financial concerns and unconcerned with the viability of their practices in the mid- to far future.
  3. With both the legal and financial restrictions on what can and cannot be implemented, the Internet as it stands today is in a fair amount of risk. The underlying routing and naming services are insecure because they were designed for an Internet that is very different from the one present today. Not only are they insecure, but they are not designed to scale to the size of the modern Internet. Even though these problems are evident from even the limited amount of data available under current laws and from the companies operating the core routers in a secretive manner, the educated people who have exposed these problems are helpless to enact any change at a global level.

The main problem with this paper (and likely this summary) is that it views the current state of the Internet from a very pessimistic perspective. While it may be true that there are many unresolved problems on the Internet, it is risky for the researchers to look down from an ivory tower and proclaim that the Internet is doomed unless we do X, Y, and Z. What does need to happen is that government regulation needs to be designed in such a way as to not stifle the approaches that the market takes to optimizing the Internet for its current and future needs.

Future research needs to look at ways to incentivize the change that researchers think needs to be implemented. Unless they can provide economic incentives for the companies that control the Internet, then the changes will fall on deaf ears.

 

Ten Things Lawyers Should Know About the Internet May 25, 2009

Filed under: R18. Ten Things Lawyers Should Know About the Internet — vikrams3 @ 4:15 pm

This paper dares to raise some pertinent questions that Internet researchers have long overlooked. Main points are:

  1. The author provides ten reasons why she thinks Internet research is not progressing in the way other sciences have. She identifies the major impediment as the incentive model. The major stakeholders in the Internet like ISPs, Government, Internet companies have strong disincentives against sharing usage data with the researchers. This in turn places the intelligent researchers at a great disadvantage because they will have to work with assumptions and simulations without access to any real data.
  2. The author rightly says that the pace of our understanding of the working of Internet has been far lower than the pace of Internet growth. This has led to a situation where multi-billion dollar investments are being made by businesses and Government without a thorough knowledge of what is going on behind the scenes. No one has the right intuition as why Internet is working as it is today.
  3. The security companies have a tendency to exaggerate the dire consequences facing Internet security, which has led to disproportionate funding in that area. Such hysterical environment suits those companies best too. This has often come at the cost of other more fundamental areas critical to through understanding of Internet dynamics. Moreover, since there is no one agency that can be in-charge of Internet as a whole, most wonderful research ideas and solutions are far from being deployed. A case in point is IPv6 in layer-3. No one is willing to take the responsibility to overcome the economic implications of this transition.

Since this is an article on the state of Internet research, there are no obvious drawbacks. The author makes cogent arguments about the reasons behind our lack of understanding of the Internet. What is missing though is a proposal for an Internet where the major players have a direct incentive to share data without compromising the privacy of their users with researchers.

There needs to be an open culture in the Internet innovation, where good ideas find their way to the real world. Unless these ideas percolate, we will be stuck with a system that we know is dificient, but we are not able to do anything about it. Future studies should focus on how to achieve these goals.

 

Ten Things Lawyers Should Know About The Internet May 25, 2009

Filed under: R18. Ten Things Lawyers Should Know About the Internet — liyunjiu @ 4:15 pm

1. Internet research is hampered primarily by issues of economics, ownership, and trust. NSF and DARPA pulled out of funding network research. Topology mapping and simulation is blocked by  ownership and trust issues. The researchers who have access to privately owned traffic and topographical data are not allowed to share their data with other researchers to reproduce research results.

2. There are vulnerabilities in naming and routing of internet infrastructure for solutions have been developed but failed to gain traction due to political and economic constraints. ICANN, FCC, other agencies lacks the architecture, legitimacy, and authority to enforce any regulations.

3. Researchers are unable to perform proper basic network research under all the EOT constraints.  With funding and on networks established to support academic network research, researchers still cannot do basic scientific research using proper scientific method researchers of other fields expect.

This paper provides an excellent overview of the problems with the internet and internet research. Further research can be done on ways to solve the problems with the internet and internet research highlighted in the paper.

 

Ten Things Lawyers Should Know About the Internet May 25, 2009

Filed under: R18. Ten Things Lawyers Should Know About the Internet — dorkin222 @ 4:15 pm
  1. It is difficult to make accurate measurements of the Internet due to issues of security and policy. For instance, ISPs do not want their topologies publicly known, as it may provide an edge over their competitors. Most companies would deny outside parties access to its traffic to protect trade secrets and keep various other information out of public knowledge. Also, many home users prefer to keep their network usage habits private. In addition, for those entities that do provide some form of information, the information from one entity may not be compatible with those from other entities (i.e. different bases of measurement), or may not be reliable or even useful.
  2. Because it is difficult to obtain accurate measurements of the Internet, it is difficult to set or change policies related to it. “It works” is not a meaningful metric, and if no problems are apparent or imminent, then the easiest thing to do is nothing; conversely, if one entity wishes to further its own agenda through publishing biased information, it becomes difficult to oppose due to lack of evidence to the contrary.
  3. Even if accurate measurements are available and policy is set, it is difficult to enforce. There is no “governing body” with any real authority over the Internet. In general, connections are usually overseen by the government where they are located, but one agency cannot enforce anything under another’s jurisdiction. The IETF may hold some sway, but there is nothing to stop anyone from ignoring it.
 

Ten Things Lawyers Should Know About the Internet May 25, 2009

Filed under: R18. Ten Things Lawyers Should Know About the Internet — ameenakel @ 4:15 pm

(i) the three most important things the paper says

The most important topic that this paper addresses is the fact that our current laws regarding the internet are completely counterintuitive when looking at improving the internet’s infrastructure.  It completely hinders any progress that relates to actual internet topology and requires researchers to make extrapolations that may or may not be accurate.  I believe that the author is completely correct when stating that legislation around the use of internet topology must make its way to a more cooperative standpoint.  Another important idea that this paper addresses is that if the legislation around the internet does not change, we could cause quite a bit of disaster by choosing incorrect directions for the future of the internet.  IPv6 is a great example of this.  Researchers were not truly able to perform the research necessary to profile this new protocol without physically putting it out in the wild, so to speak.  The paper calls for laws, or at least guidelines, to be put into place so that acquiring an internet topology isn’t so prohibitively difficult, expensive, or in some cases illegal.  Not only would researchers be able to validate their own observations, but other teams of researchers would also be able to help validate the results of others (given this consistent set of data about the internet).  Another important idea addressed by the paper is that through this sort of secrecy about the internet, not even those organizations that have power over internet legislation and organization actually know what’s going on.  This makes it very difficult for even them to make decisions about how the architecture of the internet should progress.  This alone should constitute some sort of change about how information about the topologies of the internet are shared.

(ii) the most glaring problem with the paper

I don’t mean to pick on minor details about the paper, but i found the method of citation very distracting and disturbing to the flow of the paper.  I am much more comfortable with the list of references appearing at the end of the paper, so not to distract the reader throughout the paper itself.  I also found the paper to be very repetitive throughout some of the sections.

(iii) the future research directions of the work.

The future directions of this work are heavily based in legislation.  Those who have legislative power must act on changing the laws surrounding the internet for any of the topics addressed in this paper to actually gain legislative visibility on the proper scale.

 

Ten Things Lawyers Should Know About the Internet May 25, 2009

Filed under: R18. Ten Things Lawyers Should Know About the Internet — ctrezzo @ 4:15 pm

Three major contributions:

  1. Made it clear that, because of economic and political pressures, it is extremely difficult to obtain data from real world networks. This makes it impossible to do any real scientific studies about the behavior and characteristics of the Internet. As a result, policies that could have a profound impact on the world will not be based on fact, but intuition about what might be going on in the Internet.
  2. It is not clear how vulnerable the Internet is in reality. People say that it can’t be that bad because “it still works,” but this is only the case because the criminals depend on the Internet being functional as much as the law abiding users.
  3. Lawmakers and Politicians have not caught up to the progression of technology. Industries that have a tremendous amount of economic and political power today are founded on laws that are very out of date. (i.e. copywriting, security, common transport)

Major problem:

While this paper is informative, it paints a very gloomy picture for the future of the internet and offers no “next step” to make any progress in a correct direction. In addition, the paper criticizes other network research publications for lack of real world data but provides no real world data itself.

Future implications:

Once again, the human race will have to deal with the unintended consequences of their actions. The Internet is the largest and most complex system ever built by man, and it is not being used for what it was originally designed for. Politicians and business people will have to work together in order to redefine the policies and industries that are currently in place today. This presents an extremely difficult social as well as technical problem.

 

Ten Things Lawyers Should Know About The Internet May 25, 2009

This booklet is a list of points explaining legal challenges surrounding the current state of legality of network research and the impact of this state on the research itself. Also it provides an overview of incentives (or the lack of) that are responsible for sharing of knowledge and data between researchers.

The main point of the paper is the need for updates to the legal framework surrounding network research, because the current state effectively hinders the acquisition of large-scale representative data from the Internet, which makes the measurement research and interpretations weak. This has also effects on the law-making process and regulation for network providers, because no reliable third-party information is available to guide these processes.

Furthermore, the paper points out that the research that is done points to problems in the current state of the internet architecture, but the impact of those problems cannot be estimated due to missing knowledge about the inner workings in provider networks and missing public information about the Internet topology. Also, the author criticizes the shortness of research outlooks, where funding agencies only fund research proposals which return results on a time scale of less than five years, which hinders researchers in getting to the fundamental problems surrounding the Internet.

For glaring problems and future research, I personally think that the technical community has to communicate more and more with politics, regulation agencies and lawyers involved in the lawmaking process to sensibilize them for ordinary people’s issues for their daily Internet usage, but also to uncover the incentives for companies and lobbyists which are representing network providers. If we don’t improve the education on technical/regulatory issues for these people, the impact of the research work will be limited.

 

Ten Things Lawyers Should Know About the Internet May 25, 2009

Filed under: R18. Ten Things Lawyers Should Know About the Internet — dgaschk @ 4:15 pm

Technical issues that may begin to limit the growth of the Internet are tied to the administrative policy of service providers. This policy lacks the legal regulation that could enforce a comprehensive strategy to insure unhindered growth. The biggest challenges are address space and global routing. A solution exists for the address space problem. The routing problem which requires cooperation from competing entities has not been addressed.

Regulation needs to be based on scientific knowledge. Comprehensive scientific knowledge of the Internet does not exist because competing service providers keep confidential the operational mechanics of their private networks. The lack of addresses will eventually force conversion to IPv6 which provides a much larger address space. Optimized global routing will reduce costs and ensure reachability. Routing is determined by confidential agreements between service providers. The proprietary nature of the Internet limits its study and thus hinders regulation and prevents global policy.

The premise that these problems have consequences cannot yet be proven. Much like global warming, the indicators exist and scientists agree that there is a problem and it will have consequences. While the Environment is too large and complex to make any proof about the future, the Internet is not. EOT issues prevent the study of the Internet.

The author does an excellent job of presenting the issues. However, no solutions are presented. How are we to proceed in order to address these problems? Further data collection is required. What data is required from the ISPs? Can they be motivated to provide this data? Is regulation that certain data be provided possible without hindering competition? If global strategy requires regulation and regulation requires data then the first step may be regulation that allows the collection of such information.

 

Ten Things Lawyers Should Know About The Internet May 25, 2009

(i) Three most important things

1. Updating legal frameworks to accommodate technological advancement of communications capabilities requires first updating other legal frameworks to accommodate empirically grounded research into what we have built, how it is used, and what it costs to sustain. Unfortunately, current legal frameworks for network provisioning prevents sharing of data with researchers, making Internet behavior, usage, patterns, architectural limitations, and economics constraints difficult to understand.

2.  Public policy intended to protect individual user privacy places the research community in the illogical situation of not being able to do the most basic network research even on the networks established explicitly to support academic network research. This inability to do research on our own networks has lead to contradictions in the field of science that cannot be resolved. Two papers (Why Premium IP Service Has Not Deployed and The Evolving Internet – Traffic, Engineering, and Roles) on the costs and benefits of using QOS to support multiple service classes and how these service classes should be determined has contradictory arguments.

3.  The opaqueness of the Internet infrastructure to experimental analysis has generated many problematic responses from rigidly constrained communities trying to get their jobs done. The IETF attempted to solve the technical limitations of the current IPv4 protocol, primarily the insufficient number of addresses and the inherent scalability limitations of the routing architecture. The IETF couldn’t find an approach that would yield an architecture that made progress on both problems and so solely focused on building a new network architecture that had a larger number of addresses. Yet a larger number of addresses actually exacerbates the routing problem so many experts are grim about the future of IPv6.

(ii) Most glaring problem

The most glaring problem would be that the paper lists all these issues on how the government restricts them from researching in depth the Internet infrastructure due to privacy concerns without mentioning how researchers could work with policymakers and the FCC to possibly retrieve the data needed.

(iii) Future Research Directions

Future research directions for this work would be for scientific researches to make progress by working with legal experts in answering Internet research problems such as the ongoing discrepancies between supposedly scientific studies and suggest what data is needed to investigate these problems.

 

Ten Things Lawyers Should Know About The Internet May 25, 2009

1)

i. Internet research is largely not based on real data. This is because there are no good tools for simulating the internet. Furthermore ISPs do not share information about their infrastructure or who they connect with. It is also very hard to analyze internet traffic due to privacy concerns. The end result is that the internet is a very complex network and no one has a really good idea of how it works.

ii. ISPs economic model is not sustainable. They just move packets around and try to maximize profit by determining to what peer they pass the packet on to.

iii. The internet was not designed for what it is today. It was originally supposed to be a government supported file sharing system. Now it has been leveraged into a cooperative network of ISPs and now supports streaming video, VoIP, P2P file sharing, and many other uses it was never designed for.

2) The biggest flaw in this paper was the fact it was the worst article I have ever read in my entire life (academic or otherwise) and I am not exaggerating. There was so much fluff and very little substance, I would say about maybe 5-10% of the article was actual substance. And for things that were actually interesting such as why the ISPs economic model is not sustainable, little to no reason was given to back it up. Also it is not necessary to have a citation for every single word. Kimberly Claffy should not be allowed to write anything ever again.

3) Clearly future research needs to be done in getting a better understanding of how the internet works. Perhaps research efforts that collaborated with ISPs, but make an effort to protect an ISPs interest of not letting competitors know about its infrastructure could get something done.

 

The Things Lawyers Should Know about the Internet May 25, 2009

(i) The three most important things the paper says:

1) The actual data that we have regarding the Internet is very shallow and the primary reason for this is economical.  Claffy explains that while funding is being provided to areas such as national security, however zero dollars on infrastructure.  She goes on to say that the current timing is also bleak, since we are at a time of sustaining internet growth, it would require an extraordinary amount of capital and realignment of incentives to make this possible.  I feel this is an important because although the Internet was originally derived for research and national security, we as society lost sight of that.  In fact the original purpose of the internet was lost as it became commercialize, therefore this point is important to capture.

2) The silver lining is that perhaps its not bad since the internet has successfully “connected people” even though the focus was initially to provide computing.  Claffy argued that the internet would not have grown at the fast rate if it were not for its effect in connecting people.  I feel that this is an important point, since it provides reasoning to think contrary to popular economic arguments.  Based on the Internet’s success in connecting people and providing individual freedom, perhaps we should keep it deregulated and anonymous.  What if changing its current model will break the Internet and have irrevocable “social” effects?

3) The problem of not having enough information about internet infrastructure is not new; in fact there has already been attempts to address it unsuccessfully.  Claffy indicated several institutions (e,g, National Science Foundation) and projects (e.g. PREDICT) to remedy this problem. I feel that these are important points, as it indicates what has been tried and failed.  Moving forward as we try to solve this problem, we should learn from the mistakes of the past, such as how to get around the fact that ISP may be liable for its traffic.  Perhaps if we solve that problem, ISPs will be more willing to provide help to researchers for analysis of the internet infrastructure.

(ii) The most glaring problem with the paper:

The biggest problem with the paper is there lack of consideration of how the commercial space has also help the internet infrastructure as well.  For instance the addition of 3G wireless overlay to expand connectivity.  This paper seems a bit bias in blaming the economic incentives of corporations in preventing research.  However it doesn’t talk about how companies has spent much R&D budget to further the internet infrastructure as well.

(iii) The future research directions of the work:

The future research of the work would be to either find new ways to monitor Internet or find ways to battle the legalities currently hindering research.  For instance, how do we protect ISPs and or music publishers in a way that facilitates the discimnation of internet traffic data.  Or another area of research could be how to better monetize inteternet traffic, since this would encourage ISPs to upgrade their networks.  In summary, now that Claffy indicated the problems and the unsuccessful attempts at solving it; the next logical steps is to address the problems.

 

Ten Things Lawyers Should Know About the Internet May 25, 2009

Filed under: R18. Ten Things Lawyers Should Know About the Internet — giledelman @ 4:15 pm

Three Important Things:

  • There are currently many outdated legal frameworks in place regarding the internet, copyright, and privacy that are no longer relevant in today’s world. It is clear to see that these frameworks cannot keep up with the growth of the internet and its rapidly evolving usage patterns. As a consequence the internet is unregulated by government bodies such as the FCC, but at the same time largely opaque to those who would attempt to formally observe it. This is because of a legal deadlock where ISPs are unwilling to release traffic information for fear of being sued on the basis of draconian copyright acts.
  • Today the structure and vital statistics of the internet are largely a mystery. Researchers have been unable to come up with meaningful results for bandwidth, quality of service, and traffic patterns. However there is a general consensus that IPv4 addresses are soon to be exhausted, the routing system is unwieldy, and that a large number of hosts and traffic are compromised from a security perspective.
  • Despite the grim picture there are several lessons to be learned moving forward. The conclusion is that many problems are caused by competing entities vying for short term gain. It is the responsibility of the research community to come up with a more stable infrastructure that address scalability, fairness and security. It is up to the government to enable this research and ultimately ensure that it is implemented, even against short term cost considerations of ISPs.

Glaring Problem:

The structure of the paper detracts somewhat from its message, as the arguments presented are at times disjointed. There is some but little mention of specific bodies such as the RIAA that exacerbate and represent the legal issues plaguing the internet. Along this line, there should have been more background information on the technical usage details given the intended audience.

Future Work:

In light of the arguments presented in this paper, there is a lot of joint work to be undertaken combining the efforts of computer scientists and lawyers. This is crucial for the success of any new legal framework, as it will have to be grounded by facts and reality. The focus should first be on finding ways to circumvent the restrictions on internet measurements, and later on the prospects of general usage enforcement that enables the activities of the majority (such as file sharing) as opposed to limiting the natural evolution of the internet.

 

Ten Things Lawyers Should Know About the Internet May 25, 2009

(i) The three most important things the paper says:

1. There are numerous ominous signs of major problems looming over the internet today. A few examples listed by the author are: It is that it is understood that millions of machines compromised by botnets supporting criminal activity, A vast majority of Internet traffic today is illegal by current laws, routing and addressing is not scalable enough to handle current growth rate, and the economic architecture shows little hope for sustainability.

2. The biggest problem the author finds with internet today is that so little is understood about it. It is a complex hard to measure organism that must be studied so problems can begin to be addressed. The problem is that research is underfunded and data collection for research is often deemed illegal by current laws. For the few companies that manage to obtain legal data collection there are laws preventing them from sharing their data with other researches, so the credibility of their findings is always questionable because it cannot be backed up by scientific data.

3. The author decides that someone must be held accountable for regulating the internet and preventing it from succumbing to the plight that faces the internet today. At the current state the FCC has jurisdiction over regulating the internet but unfortunately the author implies that they stay blissfully unaware of the true nature of the internet. For this reason it is the job of the lawyers to take up the fight force action that enables research and through it progress for the internet.

(ii) The most glaring problem with the paper:

The biggest problems I found with the paper is that there wasn’t a clear no call to action or potential solution presented by the author. In a way this makes the paper almost seem as fear mongering more likely to cause panic than helpful results. Another thing that bothered me was that the pictures were unexplained and seemingly random. Lastly the paper seemed to jump around and not have a clear well thought out path from beginning to end.

(iii) The future research directions of the work:

The biggest research direction called for by the paper is coming up with ways to enable research and data collection easy and legal. After gaining further understanding a myriad of research topics will emerge to address problems and suggest improvements on the current internet architecture. The author presented many of already prevalent problems that all must be evalutated and answered in the upcoming years.

 

Ten Things Lawyers Should Know About the Internet May 25, 2009

Filed under: R18. Ten Things Lawyers Should Know About the Internet — gracewangcse222 @ 4:15 pm

(i) The three most important things the paper says:

  1. There is a dearth of good data with regard to the internet. There are a number of contributing factors, with economics, ownership and trust being the greatest contributors. Although technical expertise has the ability to yield potentially vital data, the search for and distribution of collected data (or the failure to do so) is driven by privacy and security concerns and short-term profit margins. Meanwhile, policies are being made without data and researchers are blocked from producing insightful data.
  2. The internet was not originally built to support the kinds of demands that we currently require of it (for example, IPv4 is no longer sufficient due to internet growth, and network infrastructure companies are trying to move more intelligence into the network in order to profit from internet traffic). It is furthermore quite difficult to make fundamental changes to support these demands.
  3. The “practical promise” of the internet means that it is needed by everyone, even though it is complex and messy. There are many people with the knowledge and ability to make meaningful changes to the policy and infrastructure of the internet.

(ii) The most glaring problem with the paper:

I feel that a number of the points made in the paper seem a bit too bleak and fatalistic. Although security on the internet is definitely a major concern for the US government, I think the “big brother” scenario painted in parts of the paper is not completely justified. Wiretapping for example would still fall under search and seizure, and be protected against under the Fourth Amendment. A number of such precedents are covered in Orin Kerr’s work.

(iii) The future research directions of the work:

I think that one of the problems in meshing network research with policy is the lack of a common language spoken by both policy-makers and researchers. An important aspect of crossing the boundaries between policy and technology is ensuring that the two sides understand each other. Another potential research direction may lie in the “clean slate” internet that has been proposed by a number of people. It is possible that such a measure, properly informed by both researchers and policy-makers, might improve the future internet.

 

Ten Things Lawyers Should Know About the Internet May 25, 2009

The article is a list of serious problems in the social, political, and legal Internet landscape. It provides a rather eye opening account of how difficult it is to legally gather network data on the Internet. The article lists 10 top issues, but several seemed to be elaborations on the same basic ideas. I consider the article’s main contributions to be the following.

ISPs don’t share information for primarily business reasons. The terms of agreement between their customers and peers even prohibit details, such as logs and peering locations, to be collected or released to third parties. Because of the competitive nature of these businesses, it is unlikely that even basic topology information will be made available.

As a result, the amount of network data available to researchers is extremely limited. This presents difficulties because there can be no validation of claims made by any organization. There are many claims that “most” of the data traffic going over the backbone consists of p2p data. As the author points out, these claims cannot be backed up by any data set. The result is an environment of fact free arguments driving network policy and contradicting research publications.

The current laws are outmoded and vaguely worded to the extent that most Internet research can be classified as illegal. The author is understandably frustrated with this situation, especially given the fact that the U.S. government is actively funding network research. I find it completely ridiculous that basic network topology probing for research purposes can be considered illegal.

The problems with this article are that it provides too many cluttered points and no suggestions. If indeed this is a condensed version of a talk, the author could have spent more time distiling the key issues that need to be communicated. Furthermore, there are no suggestions for improving the landscape. A few recommendations for moving in the right direction would really be appropriate for this type of article.

I’d like to see a more concise version of this article make it to a larger reaching publication. As it is now, I don’t think it’s terribly effective despite being well researched.

 

Ten Things Lawyers Should Know About The Internet May 25, 2009

Filed under: R18. Ten Things Lawyers Should Know About the Internet — filipposeracini @ 4:15 pm

In this article, Kimberly Claffy gives an overview of the biggest legal and economical issues related to the Internet as it is today.

The most important problem with the Internet is that basically it is an unknown entity, such as it is really hard to draw any conclusion on its status. Many attempts has been done in the last decades to acquire some knowledge but all of them ended with small or inconclusive results. This is due to the fact that even if researchers have developed several instruments and platforms to measure the Internet, they still have to fight with economics, ownership and trust (EOT) issues. There is not a well defined legal framework that regulates this topic and the collection of network an communication data is pretty much illegal. Moreover, data sharing among both researchers and companies, such as ISPs, simply does not happen. Companies, as well as federal institutions, are very reluctant to share network data for basically two reasons: the first is legal. Since most of the Internet traffic is estimated to be illegal, such as content sharing of material under copyrights, the ISPs do not want to make it official because they are scared of possible legal action against them. The government institutions instead do not want to tell what they are observing, since their data is a clear violation of privacy laws. Indeed federal offices observe the Internet to gather intelligence related and national security related data.  The second biggest reason instead is that entities that manage the Internet do not want to provide information on how they structured their business to competitors, for obvious economical reasons.

However, not being able to have network real data prevents researchers to fully understand and analyse the network. This process is the core part of every attempt to modify, update or improve the existing infrastructure. How is it possible to make something better if we are not able to understand what is really going wrong with the existing implementation? That is the fundamental reason why when it comes to the real Internet everything becomes extremely challenging.

Until there will not be a clear legal regulation about what kind of data can be gathered and shared, as well as until there will not be a coherent will of the both ISPs and the governements to work together to improve the existing Internet, it will be hard to get to any real good change and knowledge of it.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — vikrams3 @ 4:13 pm

The title of the paper describes the efforts of the authors to make Internet research more compatible with other science fields.

  1. Unlike other science areas, Internet researchers are faced with a dearth of real data, which keeps them from realistically monitoring the state and patterns of network traffic. This lack of understanding has led to many major investments being made in the dark, after getting misguided by businesses’ individual interests. Hence there is an immediate need to develop tools that make monitoring of Internet simpler.
  2. The measurement system Ark that is described in the paper has several useful features. It helps in easy development and rapid prototyping of a measurement idea, and hence indirectly helps the researcher devote more time to innovative ideas. Another important feature is the distributed measurement and coordination (aggregation of results) among several nodes in the experimental system.
  3. A very useful feature of Ark is that its coordination is based on tuple-space distributed shared memory. Ark provides simple interface and usage semantics to the user, who can use simple queries to get data on the dynamics behind ping, traceroute programs, RTT as a funtion of distance etc. The results are stored in the form of tuples.

One problem that I find is in the usage of distributed shared memory for storing data in tuple-space. DSM is known to have many problems like complex semantics, availability issues etc. Not clear from the paper whether the authors have addressed these issues.

It will be interesting to see the measurements from the largest set of IP topology data that the authors are currectly corlecting. Also, it will be nice to see if there are any substantial advantages of IPv6 over IPv4 based on the measurement data from te Ark tools developed.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — giledelman @ 4:13 pm

Three Important Things:

  • The Ark architecture was developed as a means to deploy active internet measurements. It was built around good software engineering principles of rapid prototyping and low barrier-to-entry coding. Recognizing that internet measurement is an embarrassingly parallel operation, the architecture incorporates several “monitor” nodes that can cooperate and coordinate their operations. The implemented communication channel (tuple space) allows unicast and multicast communication, and serves as the building block for decentralized management and data aggregation.
  • The process for consolidating path information into AS-level topologies is non-trivial in the context of the paper. Still, a high level view of the internet is desirable, and so the authors outline three steps to achieve such a map. First they establish ways to map IP addresses to routers, and then IP addresses to ASes. A router level topology is established by consolidating IP addresses that belong to the same routers. Interdomain and intradomain links are identified when two adjacent addresses map to different entities.
  • With an infrastructure of 31 monitors at the time that this paper was written, the team members were able to gather 2.1 billion traceroutes. This was done with random probing of /24 IP prefixes using the “scamper” tool. This demonstrates the scope and ultimate feasibility of internet measurement.

Glaring Problem:

Lacking in this paper was a discussion on creating a geographic map of the internet. With information about the IP-router mappings this should be possible to obtain.

Future Work:

Ark is presented as an extensible and distributed architecture for performing internet measurements. As such there are many more possible uses for it. Possible future applications are: bandwidth estimation, quality of service monitoring, content sampling, traffic pattern observation based on time of day and location.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — ctrezzo @ 4:13 pm

Three major contributions:

  1. Developed and designed a new architecture for Internet topology measurement and mapping. It is called Archipelago (Ark).
  2. Currently collecting the largest set of IP topology data for use by academic researchers.
  3. Provide a method for Internet researchers to quickly and easily deploy new measurement techniques at a low cost. This allows more experimentation with measurement techniques that could lead to greater discovery.

Major problem:

They claim that it is easy and cheap for researchers to deploy measurement experiments, but they do not provide any proof for this claim. Also, they do not show if their measurement architecture is actually producing realistic data.

Future implications:

This will allow for greater collaboration between Internet researchers. Also, this system will hopefully provide larger real world data sets that researchers can use to substantiate their work. Questions about the political (security) and economic effect this system will have on the world need to be answered.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — Mike @ 4:13 pm

(i) The three most important things the paper says:

1. Ark was conceived to give researchers a platform capable of many different kinds of internet measurements. They distributed system nodes worldwide to enable measurements of separate internet cultures as well as combined measurements of the internet as a whole. Ark also has the capability to create AS topologies and IP topologies of the internet through use of novel identification techniques.

2. Since the system is completely distributed, coordination was a huge concern and forte of Ark’s design. Without coordination that allows the nodes to communicate quickly and effectively the researchers would be extremely limited on the kinds of measurements they could take and most of the advantages of having wide deployment would vanquish. Ark employs the tuple space coordination language. This language takes full advantage of pattern-matching to allow autonomous measurements while allowing communication and control between nodes.

3. Since parameters, commands and return values can all be expressed in the forms of tuples, the user can easily write programs in the functional model used my most programming languages today. This abstraction allows the programmer to stay far away from the network-level details of his implementation and focus his effort on taking sufficient measurements for his needs.

(ii) The most glaring problem with the paper:

The most glaring problem I noticed with the paper is that it doesn’t discuss the shortcomings of Ark. One of the most useful things to know about an application is what it fails to achieve. Even though it is hard for an author to want to present weaknesses it allows others to further understand the design and it brings to light further research topics in the area. Unfathomable that Ark has perfectly addressed of its original objectives.

(iii) The future research directions of the work:

Ark in itself presents a wide  range of future directions for research. Finding novel ways to exploit ark’s design to acheive different measurements would be a rewarding avenue to persue. There is also the opportunity to try to build a better reasearch platforms than Ark which could be tailored towards certain applications or even more flexible.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — mohit1982 @ 4:13 pm

This paper talks about the importance of internet infrastructure measurement and assessment and proposes a secure measurement platform to accomplish it. They claim to be using the best available techniques for IP topology mapping and developing some new techniques and software for data analysis, topology generation, and interactive visualization of result-oriented annotated graphs. The authors talk about their next generation skitter-based active measurement infrastructure, Archipelago (Ark). They stress on three qualities that Ark strives for:

  • The authors see easy deployment and rapid prototyping as very important for internet measurements and therefore, Ark supports software development at high-level of abstraction using dynamic scripting languages and pre-built API’s and services. They use Ruby as the primary implementation language.
  • They stress that many desirable measurements require dynamism and coordination among measurement nodes. To enable this, they provide a tuple-space coordination model called Marinda which is a distributed shared memory with a small number of easy to use operations. It also allows for decoupling of measurement processes in time and space.
  • Another quality pursued is the support of measurement services. This support for services is made possible by the tuple space, which acts as the unified mechanism for transport and messaging, in the terminology of the web services protocol stack.  The service architecture based on tuple space has advantages of low deployment effort and cost, no special privileges required, decentralized management, ease of implementation, ease of aggregation and diverse communication patterns.

The authors have deployed these Ark monitors in diverse geographical regions. This infrastructure systematically measures IP-levels paths to a dynamically generated list of IP addresses covering all /24 prefixes in routed IPv4 address space. They also perform DNS lookup of all IP addresses seen in the IPv4 Routed /24 topology dataset.   Reconstructing the router-level topology from this data requires grouping

multiple IP addresses belonging to the same router. This grouping process is called alias resolution. The authors have developed several IP alias resolution heuristic techniques to this end. They derive an AS-level topology map of the internet from Ark and route views data.

Future work would involve gathering large dataset of IP topology data and the expansion of the analysis tool required to make inferences of trends out of the data.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — ameenakel @ 4:13 pm

(i) the three most important things the paper says

One thing that the paper says is that in order for their system of measurement to work properly, their system must be deployed diversely around the world.  Thinking about this observation, this is really the only way for a scalable measurement of the internet as a whole, as it is quite vast and complicated.  Also, since these types of measurements will require such hardware, the measurement hardware itself cannot be prohibitively expensive.  Another important point that the paper makes is that even measuring traffic around the internet will no alone give insight as to how the internet is actually organized.  It requires much more analysis of that data to even think of developing some sort of meaningful organization of ASes and routers.  The authors even note that these algorithms are not trivial either.  The authors also point out that ICMP-based traffic seems to reach the greatest number of router links on the internet, but unfortunately does not return very desirable observations about the network being analyzed.  This goes to show how difficult of a field internet measurement can be.

(ii) the most glaring problem with the paper

It seems that, although the researchers here have invented a great system for measuring certain aspects about the internet, they have yet to develop a system that measures the data that they are most interested in.  They claim that they must use pretty complicated heuristics (and that they’re still searching for more) in order to extrapolate data from their results.  It seems as if the project wasn’t quite ready for a paper submission (as the system itself–including interpretive models–isn’t even ready yet).

(iii) the future research directions of the work.

I can’t quite think of a better research direction than the one the authors are already pursuing.  It is of great importance that the network community study an infrastructure that mimics that of the internet in order to make decisions about future innovations and changes to its underlying protocols.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — siva @ 4:13 pm

One of the contributions of the Ark project is the simplicity and speed of development and deployment of different measurement experiments related to Internet measurement. It provides a high level of abstraction to users, which allows for faster development of measurement experiments.

Another important contribution of the project is the ability to create experiments using several nodes with a high degree of coordination between them. This is made possible through the tuple space abstraction. This allows different heterogeneous hosts to coordinate with each other and perform a measurement task efficiently.

An characteristic feature of the Ark project is the creation of the service oriented measurement infrastructure for mapping the internet topology. Users can implement and deploy mesurement related services using the the tuple space abstraction. This allows other the research community to easily use the services developed by other users.

One of the drawbacks of the paper is that it does not describe how the Ark project is different from other similar efforts on Internet measurement like iPlane. There is no comparison to other related efforts.

Future research in this direction could be on how to use this topology data to determine interesting properties about the internet. Eg. Studying how the topology changes over time (both on a short timescale and over a longer period) combined with other measurements on bandwidth demands can provide us good pointers on provisioning resources as the internet grows.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — stufflebean @ 4:13 pm
Tags: , , ,

This paper describes Archipelago (Ark), which is a measurement infrastructure being used primarily for mapping the IP topology of the Internet. The main body of the paper describes their efforts to measure the topology of the Internet, but they start by describing the three features they are striving toward in the Ark infrastructure. We will examine how their topology mapping benefits from the architecture of the Ark infrastructure by examining their goals:

  1. Easy development and rapid prototyping. The process of generating IP topology data requires multiple steps involving creative uses of (primarily) ICMP and UDP probing. Since the Ark infrastructure provides high-level APIs and services, like the scamper measurement tool, generating these probes is a very straightforward process involving only a few commands in a high-level scripting language (they use Ruby). In addition, Ark facilitates writing new measurement tools, like those comparing ICMP traceroute against UDP link measurement and analyzing probes from spoofer clients, which have been developed by other researchers for use on Ark.
  2. Dynamic and coordinated measurements. The Ark infrastructure proves a tuple space which allows storage and communication of data through a map-like interface. This tuple space provides shared state which is vital for creating distributed services, since it allows them to communicate without having to coexist temporally or geographically. It also provides greater flexibility since messages don’t have to be directed to an address which might not be known a priori.
  3. Measurement services. Another useful attribute of the tuple space is that it enables the construction of named services without having to know who is providing them. For example, by placing the tuple ["PING", "<some IP address>"] into the tuple space, the user or program requesting the ping is counting on someone providing a ping service to the tuple space. However, the requester does not have to know who is providing the service in order to receive the response. It also allows great parallelism, since multiple service providers have access to the tuple space and one or many of them can respond to a given request, depending on its semantics.

The weakness of this peper is that the Ark service does not seem to have any real provisions for security. Anyone who is connected to the tuple space can serve requests without having to provide credentials certifying that they will produce a valid result. While this may not be a problem at the earlier stages of the network (similarly to the Internet, where initially all users were assumed to be trusted), it might be a problem if it is used on a larger scale or for commercial purposes.

Future research might include a layer of security that is simple to use in a similar manner to the tuple paradigm already in effect. For example, a credential server could provide service registration through a shared key system whereby only the registered service providers had the private key necessary to decrypt requests.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — liyunjiu @ 4:13 pm

The Archipelago measurement infrastructure is CAIDA’s newest active measurement infrastructure which uses state-of-the-art strategic measurement and analysis methods to provide a comprehensive and coherent view of the internet topology.

1. Ark allows for rapid prototyping by developing at a high level of abstraction using dynamic scripting languages. They provide a library for controlling tools like scamper and iffinder from a Ruby script which allows users to collect distributed measurements.

2. Ark focus on coordination between Ark monitor nodes to obtain results. Ark employs a new implementation called Marinda which utilizes distributed shared memory which stores tuples with a small number of operations.

3. Ark supports measurement services which a user can write a program which interpret tuples as commands and performs measurements, and return the result as a tuple. This approach allows researchers to build on the work of others at a granularity of services.

There are no serious flaws in the paper. Further research topics include measurement tools for probing the internet, legal ramifications of providing AS-level topology that providers will not want to be public.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — koderaks @ 4:13 pm

This paper is about CAIDA’s newest measurement platform. The main contributions of the paper are:

  1. Archipelago  (Ark): The paper describes the new generation of CAIDA’s measurement infrastructure. Ark is designed with three goals in mind: (1) Easy deployment; this is achieved by choosing Ruby as the primary implementation language for measurements, with a set of APIs. (2) Dynamic measurements; the goal is to achieve a dynamic environment for the targets where coordination among the measurement nodes is possible. Ark achieves this by utilizing a tuple-space model, where the notion of distributed shared memory is made feasible. (3) Providing measurement services; the goal is to provide a set of services that the researchers can build on, and to hide the complexity of network programming by utilizing these services.
  2. Two alias resolution methods are used to group multiple IP addresses belonging to the same router. The iffinder tool and kapar (modified version of APAR) are used towards obtaining a router-level map of the Internet. This creates a IP-to-router mapping.
  3. The infrastructure presented in the paper enables the creation of a AS-router internet topology. A trace-route is first captured, including IP hops along the path from a source to a destination, and then Route Views is used to map IP addresses to AS numbers. Two different ASes within two consecutive IP hops of a trace usually represent a link between the two ASes. This topology is represented using a simple undirected, unweighted graph. This is an IP-to-AS mapping. This mapping together with the mapping mentioned above (IP-to-router) are merged together to create a dual AS-router graph: links between ASes are annotated with router IDs, and routers are annotated with AS that they belong to. This map is a big step forward in the internet mapping desired by the authors.

Problems: The only problem I can think of is the placement of Ark monitors. The map shows no monitors in Africa and the ones in Asia are all concentrated around China/Japan. Though these areas represent most of the internet, the other areas that are under-represented by the monitors could affect the final mapping of AS-to-router significantly.

Future: there seems to be many scenarios where it’s not possible to determine precisely which AS owns which IP address. A good number of these scenarios are avoided by the measurement protocol (ie, the AS-sets  and multi-origin ASes). More research to find ways to determine mapping for these ambiguous cases could result in a more precise mapping .

 

Internet Mapping: from Art to Science May 25, 2009

(i) Three most important things

1. We critically depend on the Internet for our professional, personal, and political lives but we know little about what keeps the Internet stable as the Internet becomes continuously challenging to research and analyze. The paper presents the design of an infrastructure and operating system platform known as Ark that supports large-scale active measurements studies of the global Internet for Internet topology mapping.

2.  Easy development and rapid prototyping are important factors in how they promote discovery. A researcher can explore more risky ideas which could have higher returns, by lowering the cost needed in time and effort to implement a measurement idea. Ark supports rapid prototyping by promoting software development at a high-level of abstraction using dynamic scripting languages and pre-built API’s and services.

3. It should be easy for researchers to use and to build upon the work of others at the granularity of services. Ark supports measurement services by providing a tuple space which acts as the unified mechanism for transport and messaging. A user can easily deploy a measurement service by writing a program that interprets tuples as commands, performs some measurement, and returns the result as a tuple.

(ii) Most glaring problem

The most glaring problem would be that paper doesn’t discuss much how Ark has helped study Internet topology. The paper mentions a couple researches that have implemented Ark but doesn’t really provide any actual results and conclusions.

(iii) Future Research Directions

Future research directions for this work would be to have Ark implemented by other research communities so that more data can be collected on Internet topology and expand Ark to perform more IPv6 topology measurements.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — yipiokayyay @ 4:13 pm
Tags: , , ,

(i) The three most important things the paper says:

1) The building of their system based on a concept of easy development and rapid prototyping are important factors.  Claffy et al points out that this not only allows an increase of productivity, but also how they promote increased discovery.  I agree that this is the case because they allow researchers to focus on developing new strategies for research instead of tons of time on development.  Because if development of a program for research may take a long time, they may limit the types of experiments that they complete.  However if the facilities to perform this research is easy, then they may be encourage to perform more types of analysis.

2) The paper indicated that “In a few cases when we cannot determine a Provider-Customer relationship for a set of ASes accessing the same  router, we assign this router to the AS with the smallest outdegree.”  Basically in this case they are making assumptions using a best effort educated guess.  What I think its important is this type of approach, in which they are actively looking for ways to continue their research and not be bounded by external factors.   This is unlike the Claffy paper “Ten Things Lawyers Should Know About the Internet ”, which didn’t really address the limitations that they complained about.  Furthermore, they did mentioned that moving forward they will fill in these assumptions with accurate data as they get them.  This approach that they promote is very important to this study since so many bits of information are blank.

3) They are actively looking to obtain information about IPv6 as it is being deployed.  I believe this is an important fact, since they are in the unique position to capture information about a technology at its early adoption phase.  This type of “infrastructure data” is invaluable for future research as indicated in “Ten Things Lawyers Should Know About the Internet”. .   In addition, they have already indicated a good use of this data to help with dealing with IPv4 exhaustion.  This further reinforces the point that this is an important topic.

(ii) The most glaring problem with the paper:

The biggest problem with the paper is the fact that they didn’t talk about the errors in the ARK system.  No system is perfect and many assumptions were made in the building of ARK and in the measurement it collected (e.g. AS information in traces).  However, there was no mention about potential errors in the calculation of the topology, only assumptions.  Also, when it presented the assumptions that were made, Claffy et al didn’t address the implications of these underlying assumptions.  The reader is left to speculate the outcome.

(iii) The future research directions of the work:

The future research of the work would work to resolve inconsistency or gaps in their data by building business relationships.  Perhaps there is some investigation into the legal options in which they can partner with an ISP.  Ultimately this system seems like they are from the “outside looking in”.  However if we want something accurate and precise, we will need to have data on the inside.

 

Internet Mapping: from Art to Science May 25, 2009

The paper describes an active measurement infrastructure currently in use by CAIDA, the Cooperative Association for Internet Data Analysis. It states that the need for this measurement effort is justified by the large dependency in today’s society on the function of the Internet.

Mainly, this paper contributes a description of the Archipelago (short: Ark) measurement infrastructure, which provides a high-level abstraction for the service of measuring network properties in a controlled, distributed manner. A Ruby API ensures rapid prototyping. A tuple-space coordinates the different measurement nodes by providing shared state between the nodes.

Second, this paper introduces a number of measurements which are supported by the Ark infrastructure and describes the methods and goals of these measurements. To capture IP topology, traceroute and ping tools are run from measurement nodes targeting all routable /24 prefixes in the IPv4 Internet. These measurements obtain topology information annotated with round-trip times over time. Furthermore, DNS resolution and IP to router mappings are measured. With these annotations, an AS-level map of the internet is generated by mapping routers to ASes.

One problem of these approaches is a possible bias in all those measurements because the Ark measurement nodes are likely not placed in representative places in the network, because only academic nodes are involved. This view of the network may differ from what a normal DSL customer experiences on the internet.

Future research involves mainly coming up with new measurement ideas to leverage the data collected by this infrastructure and to further extend the toolset. Also, it would be worthwhile to compare the collected results with the real topology existent at a large provider. While this topology is likely to be private, the providers could at least acknowledge the precision of those measurement methods.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — erubow @ 4:13 pm

Internet Mapping: from Art to Science

Important Things:
1) The structure, performance limits, dynamics, and evolution of the global Internet is not currently well-understood. Structural information is hard to come by due to the distributed ownership and maintenance of it, and concerns about sharing information. At such a large scale, it is a challenge to perform infrastructure measurements.
2) They have built a flexible Internet measurement infrastructure, Archipelago (or Ark), which runs on several Ark monitors distributed throughout the globe (31 as of mid-December 2008). It makes use of such tools as ping, traceroute, and DNS queries to gather data and places the results in a shared tuple-space to enable coordination. A library makes the development of experiments easy. The experiments can be static or dynamic.
3) The results will be validated through surveys of infrastructure owners, and this infrastructure will continue to be refined to profide useful Internet data for research.

Problems:
Measurements and analysis at this early stage may be prone to some degree of error. It is not yet clear to me how much error is invovled, or how much of the Internet can currently be captured. She also questioned the Internet’s ability to “maintain and strengthen its role as the world’s communications substrate” in the first paragraph without elaborating on this.

Future Work:
This is definitely very exciting work, shedding light on the structure of the Internet through external measurements. This should certainly continue, eventually giving a view of its evolution over time. I think it will be important to attempt to quantify the degree of error and completeness in the resulting data, as well as to add richer information to the Internet map.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — supritapagad @ 4:13 pm
Tags: , ,

1. Distributed and scalable measurement architecture

The paper suggests a measurement/monitoring architecture that allows different groups of users dispersed over geographically separated locations to make topology measurements. It also provides a common, shared database that is updated with all measurements made. This way, not only is support for a large user base provided, but also, diverse and time skewed measurements of the Internet topology is  obtained. It makes use of tuple-space co-ordination model to achieve this shared memory structure. Decentralized measurement also introduces randomness into measurements which more accurately captures topology data.

2. Incremental development

The architecture makes it feasible for the users to build upon the work and development done by others. It shields the users from underlying complexities and allows them to use a scripting language to develop their codes for the measurement.

3. Interpreting and processing gathered information

The paper suggests some interesting methods for deriving topology information from the raw data gathered. For eg. they mention a technique to obtain the IP addresses of hosts connected to a router from data obtained by measuring paths to a list of IP addresses covering all /24 prefixes in an address space.

Short-comings

A shared memory to update measurements and a common code to which any user can make additions mean a faulty or malicious code and compromise the entire structure. However, this is a trade-off one needs to make with any form of open-ware. In addition, there is no mention of accommodating for dynamic and changing network topologies while allowing different users to update the database with measurements made at different instances of time. The method of alias resolution mentioned by them seems to only support end routers with hosts directly connected to the router of concern.

Future Work

The idea is still young and can grow in multiple dimensions. The API can be developed further to provide for greater ease 0f use and deployment. Increased functionality can be added to their data interpretation tools to glean greater insight into the network topology from the raw data gathered by the system. Attempt can be made to increase the variety of data gathered by the system as well.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — mdjacobsen @ 4:12 pm
Tags: ,

The authors describe a distributed network measurement infrastructure, Archipelago (Ark), which they have deployed around the world to help researchers coordinate large scale Internet measurements. The Ark infrastructure is deployed at over 31 sites around the globe and can coordinate using a tuple-based directory service. This is the primary contribution of this paper.

Ark is deployed as individual monitors (nodes) that run the Ark software. The interface is written in Ruby, though lower level access using C/C++ is supported as well. The distributed directory service support is based on Marinda, a tuple-space distributed shared memory. Monitors can modify this tuple-space to implement support for dynamic coordinate and even measurement services. The tuple-space serves as the layer of abstraction that services can use to read/write domain/range values.

As described, such an infrastructure by itself is not such a new or useful tool. However, the designers wrap the implementation with an easy to use API (in Ruby). This allows measurement studies to be written in a high level scripting language, which is conducive to rapid development and high adoption. Furthermore, the designers decided to allow access to anyone who wants it. No special authorization is needed.

Using this system, the authors have been able to gather Internet topology information that is annotated with link latencies, router ids, and AS numbers (whenever possible). This well annotated global topology can be used by many researchers for macroscopic studies. The authors claim that using Ark, they have been able to coordinate with other researchers and develop improved methods for such measurement tasks as: reconstructing router level topology, dns resolution, and ICMP & UDP topology probing.

One of the most attractive features of Ark is that it is open to anyone who wishes to gather measurement data. The paper describes the Ark infrastructure as secure, but there is not explanation of how it is secure. Indeed, if anyone can use the system, access the tuple-space shared memory to coordinate with other services/researchers, and there is no authorization, how is the system secure? This is not addressed in the paper. Some discussion of the security claim is in order.

I’d expect to see further development for Ark in the direction of built-in measurement services. It seems like the addition of a library of highly optimized (best practice) basic measurement routines would be very useful for any researcher starting out with a measurement project.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — brokerer @ 4:12 pm

This paper is about a measurement platform capable of performing various types of Internet infrastructure measurements and assessments. The authors use various techniques for IP topology mapping while supporting software for data analysis, topology generation etc… It is important that we have a good Internet topology for research and ownership of network devices. No one really knows what really keeps the Internet stable and Ark is a stepping stone in that direction. The major points are all ways that help widespread the platform as the IP topology techniques were based off of various known techniques.

Major Points:
1.) Easy development and rapid prototyping
It is very important that development on ARK is fast because ARK is a tool that should aid researchers. Researchers are not going to adopt the tool unless it is easy to develop on and it is fast. Using a language like RUBY and keeping everything in a high level view, allows developers to get more done without getting messy with the low level development work.
2.) Coordination
ARK focuses on coordination to allow the heterogeneous pieces of a measurement infrastructure to work efficiently toward a common task. They use a new implementation call Marinda that uses a tuple-space coordination model. The tuple space stores tuples and the clients retrieve tuples by pattern matching. The tuple space is easy to use and hides the complexities of network so it lowers the barrier to deploying sophisticated distributed measurements.
3.) Measurement Services
To allow researchers to build on top of the work of others, ARK supports services using the tuple space. This allows a user to easily deploy a measurement service by writing a program that interprets tuples as commands, performs some measurement and returns the result as a tuple. This has a lot of advantages like: low deployment effort and cost, anyone can provide a service, decentralized management, etc…

Glaring Problem:
Assigning routers to ASes is a difficult task and ARK’s approach works fine when the IP address on both sides of the link belong to the same AS but they run into cases where they can’t find a Provider-Customer relationship for a set of ASes so they just assign the router to the AS with the smallest outdegree. This leads to an incomplete map.

Future Work:
The whole idea of ARK is to help expand the future of network research. By making their platform easy to develop with, ARK has already been used by two researchers outside of CAIDA. As the authors stated, researchers and policymakers are analyzing a trillion-dollar ecosystem in the dark. With ARK on it’s way, these researchers and policymakers will have a new dimension of information to work with. The authors also mention working on a IPv6 topology measurement, exploring more dynamic IPv4 topology measurements using our new ad-hoc topology measurement facility and implementing a new visualizations of IP- and AS-level topology.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — subhramazumdar @ 4:12 pm

The paper discusses the design, implementation and deployment of a secure measurement platform that is capable of performing various types of internet infrastructure measurement and assessments. The state of the art technology has been used to build a coherent view of the internet topology that will help understand the strength and pitfalls of the network, thus helping it’s faster and secured use. Internet topology measurement and mapping data have been gathered to build a consolidated map for use by academic researchers. Current agencies charged with infrastructure protection have little awareness regarding the global dynamics and operational threats. In face of such darkness, it is imperative to build up a coherent view of the internet and understand it much better. The problem arises from the distributed ownership of the various part of the network and the lack of availability of internal data to others. This paper has proposed a secure measurement platform for collecting and processing data, analyzing and topology generation and interactive visualization of large annotated graphs as a view of the internet topology. They have described Archipelago or Ark as the measurement infrastructure which supports rapid prototyping by promoting software development at a very high level of abstraction. Ark focuses on coordination which concerns planning, execution and controlling a bunch of distributed computations working towards a common goal. For this purpose Ark uses a tupple-space coordination model which acts as a channel for one-to-one and many-to-one communication. The Ark does a systematic measurement of the IP-level paths by dynamically distributing the task among the members. From the IP trace routes, an IP-level topology of the internet is constructed. Also DNS look ups for the IPs give a data set for studying characteristics of DNS name servers. Detail analysis also yields data such as relationship between organizations. AS level topology are also built from the route views data which requires mapping of trace-route observed IPs to ASs. Finally a dual AS-router level internet topology is built by merging the AS level and router level maps that gives an integrated view of the links and nodes in both graphs, consistently annotated with relevant metadata.

The major challenge of the project seems to be the deployment of task force, collection of such large scales of data and analyzing them to get a coherent view of the internet topology. Since the big picture of the internet is built from numerous individual trace routes, integrating them into one consistent data set and representing information in a comprehensive way is difficult. Also due to the lack of certain detailed information and access, certain information needs to be extrapolated from the collected data set which might not be true in all cases.

The project is still in the early stages and future work will include increasing its extensiveness and large scale deployment. The set of analysis tools needs to be expanded and also the queries to be asked on the dataset has to be properly modeled. The emerging IPv6 topology also needs to be measured. Finally new visualizations for IP and AS level topology has to be implemented that will enable regular provision of rich topology maps of observable internet infrastructure and support security objectives.

 

Internet Mapping: From Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — gracewangcse222 @ 4:12 pm

(i) The three most important things the paper says:

  1. It is currently difficult to obtain a clear view of the current topology of the internet, as well as many characteristics crucial to research and making policies about the internet. The paper describes a measurement platform called Archipelago which aims to collect data about internet topology.
  2. The Ark service architecture uses tuples to allow users to create measurement services (with tuples representing the input and output to the measurement program) that are easy and cheap to develop, decentralized, and able to support a number of communication patterns.
  3. The authors used 31 Ark monitors deployed around the world to obtain traceroute measurements to all routed /24s by parallelizing the measurement tasks (randomized probing) dynamically among groups of monitors. A desired result of obtaining this data is to determine both a router-level topology and an AS-level topology of the internet.

(ii) The most glaring problem with the paper:

I’m not sure how consistent DNS responses tend to be over time, but 1 or 2 days seems like a while to wait before doing a DNS lookup. Furthermore, I would be curious as to how malicious internet behaviour, such as poisoning the DNS cache, would affect the validity of the resulting data.

(iii) The future research directions of the work:

Once the macroscopic maps of the internet are complete, a number of things can be done with the data. This information would certainly be of great use to lawmakers who are in charge of internet policy, as well as to various ASes who might be interested in developing the best possible set of business relationships and informing their own routing policies to maximize their own efficiency. The maps will also be helpful to researchers as they develop new techniques, and to aid and inform deployment of new technologies.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — dgaschk @ 4:12 pm

Determining the topology of the Internet is a difficult proposition. Internet protocol provides no intrinsic facility for collecting this data and network operators are reticent to share their proprietary operational information. Several efforts to collect Internet infrastructure data are ongoing and the Ark architecture provides a new tool for data collection.

Ark monitors have been distributed on all continents. Additional Ark devices are deployed monthly. The system provides a foundation for collection experiments that require coordinated distributed measurement locations. Deployment of new measurement tasks is facilitated by the Ark system. The system allows collected data to be processed in multiple stages over time. Existing tasks may be used as components in newer measurement endeavors.

The document describes a useful and extensible system for data collection. It fails to provide motivation for the data collection. Examples of how the data will be used is required. It is interesting to know the topology and routing characteristics of the Internet but is there a higher purpose?

The system is limited by the number of deployed measurement devices. Promulgating an appealing motivation would help to recruit additional participants. Ensuring that the type of information collected is limited and providing ways for participants to guarantee that devices placed on their networks only collects agreed upon data will allay security fears and help to further deployment.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — krishnanadh @ 4:12 pm

The paper presents an overview of Archipelago (Ark), the platform for active measurement, analysis and mapping of Internet IP topology data for use by network research communities worldwide. The authors indicate that the Internet infrastructure protection agencies and regulatory authorities require constructive comprehension of Internet topology data to provide better services to address the ever changing dynamics of Internet and deter operational threats faced by it. Topology data is also made available to the research community which has traditionally relied on heuristics to solve network related challenges. In addition to providing a network measurement infrastructure the authors also develop software for data processing and analysis. The main points discussed in the paper are the Ark architecture, its deployment and accomplishments and future goals.

The Ark architecture primarily aims to enable ease of development and rapid prototyping, dynamic and coordinated measurements and provide a set of measurement services that come handy to researchers. Ark allows users to avail network measurement services over the network through dynamic scripting languages and pre-built APIs. Measurement like path diversity within a given prefix and monitoring prefixes containing critical infrastructure require dynamism and coordination between the measuring nodes distributed in space. Ark supports this dynamism and coordination using a tuple-space model called Marinda. Tuple-space acts like a shared memory between communicating processes, the contents of which can retrieved by simple pattern matching thereby enabling distributed measurements. Various measurements services like ping ands trace-route can be built and deployed at monitor nodes on top of tuple-space which acts as the underlying transport/messaging medium. Such service architecture allows flexibilities like decentralized management and ease of aggregation for diverse communication patterns.

The authors plan to deploy Ark monitors in under-represented regions and in regions with IPv6 connectivity to enable active measurements on it. Among the measurements that the Ark infrastructure makes it to calculate the delay of IP paths between a dynamically generated set of /24 prefix hosts. The task is parallelized by dividing it among multiple parallel monitors at different geographical sites constituting three teams which randomly probe the prefixes. These monitors poll the scamper measurement tool server node which supports IPv4, IPv6, ping and similar services and implements trace-routes measurements for TCP, UDP and ICMP. Apart from trace-route measurement on these /24 prefixes, the authors also perform DNS look up on them and get data like IP to-hostname maps and raw DNS query/response traffic. The third major measurement of alias resolution is used by the authors to get a router level map of the Internet which will allow them to identify more realistic physical links between routers rather than IP interfaces. Alias resolution is implemented by two heuristics techniques, CAIDA iffinder and APAR. Further the authors extract an AS-level Internet maps which can be used to maximize the number of valid paths in the AS topology. Lastly the authors construct dual AS-router level Internet topologies The resulting dual map merges router and AS-level graphs into an integrated view where links and nodes in both graphs are consistently annotated with semantically relevant meta-data and increase researchers’ situational awareness of the critical Internet infrastructure and open grounds for understanding sand modeling Internet evolution.

Overall Ark provides a great platform that research users can use to quickly design, implement and coordinate the execution of experiments across a distributed set of dedicated monitors. The Ark implementation is a big step towards providing synergic Internet topology data to various research communities across the globe and requires careful inclusion and expansion to encompass geographical regions spanning different continents and reflecting disparate Internet traffic characteristics. The main problems that arise from this vision could be resistance from service providers and regulatory authorities for inclusive growth and sharing of critical Internet data and also economies of scale which at some level might not be acceptable to all communities. Also maintainability is major issue since trace-route type measurements need series active nodes and failure of any one of them might disrupt further experimentation.

 

Internet Mapping: from Art to Science May 25, 2009

Filed under: R17. Internet Mapping: from Art to Science — filipposeracini @ 4:12 pm

This paper describes some of the features of a measurement platform capable of performing several measurements of the Internet infrastructure. This platform, called Ark, is a Ruby based operating system that support large scale active measurements of the global Internet. Ark is composed of 31 monitors located around the world. The plan is to increase the number of monitors in order to have a wider representation of the Internet. The Ark project goes along the lines with PlanetLab, iPlane, etc.

This measurement infrastructure allows researchers to run experiments and prototypes using a high-level language API. One of the most important features of Ark is the coordination between all the monitors that allow the deployment of sophisticated distributed measurements. Such coordination takes place in the form of a service oriented architecture (SOA). Each measurement can be seen as a service and the whole experiment reduces to the orchestration of such services. SOA is known to be able to scale up very well. Hence I believe that it is a very sound approach in a such large scale context as Ark.

The Ark project is quite recent, but already achieved some good results. In particular it was able to create a first representation of the Internet IP infrastructure at the router granularity. Other experiments to increase the information associated with it are currently running and under development.

It is hard to make an evaluation of this paper because it is basically just a shopping list of what they did and what they are planning to do. It will be interesting to see how this project evolves and the results that eventually will come out of it. I think one of the trickiest part for Ark will be to establish itself as the platform where to run experiments over the Internet, hence avoid to be only yet another measurement infrastructure.

 

CUBIC: A New TCP-Friendly High-Speed TCP Variant May 25, 2009

Filed under: R16. CUBIC: A New TCP-Friendly High-Speed TCP Variant — nekumar @ 4:10 pm

Many variants of TCP address the under-utilization problem due to slow growth of TCP congestion window. In large bandwidth-delay product network tackling the fairness issues of new protocols in terms of fairness with respect to existing TCP traffic and fair bandwidth sharing with other competing high-speed flows with same or different round-trip delays (Intra/Inter protocol fairness, and RTT fairness) is addressed in the paper.

TCP fairness is defined as follows: under high loss rate regions where TCP is well behaving the protocol should behave like TCP, and under low loss rate regions where TCP has low utilization problem, it can use more bandwidth than TCP. Most of the protocols have TCP mode in which they behave like TCP. Paper proposes that the regime where TCP performs well should be defined by congestion epoch time (not by window size) which is real-time period between two consecutive loss events.

It notes that though network capacity is in terms of packets but growth should not be characterized in terms of packets because growth rate depends upon the RTT. Hence TCP region of the above mentioned protocols should be defined in terms of real-time.

Paper proposes a variant of TCP which enhances the fairness property of BIC while retaining the scalability and stability. Here window growth function is defined in real-time so that its growth is independent of RTT. Congestion epoch period is determined by the packet loss rate alone. Though TCP throughput is defined by packet loss rate as well as RTT Cubit is defined by only the packet loss rate.

Paper proposes a cubic function of growth along with the linear growth. Window grows very fast upon a window reduction but as it gets closer to Wmax it slows down its growth. Slow growth around Wmax enhances the stability of the protocol and increases the utilization of network while fast growth away from Wmax ensures the scalability of the protocol. Inter flow fairness is maintained because both flows drop by a same multiplicative factor growth function allows that flows with larger Wmax will increase more slowly.

Paper shows that CUBIC has good TCP friendliness and is better stability than other protocols even for rates over 20 Mbps.

 

CUBIC: A New TCP-Friendly High-Speed TCP Variant May 25, 2009

Filed under: R16. CUBIC: A New TCP-Friendly High-Speed TCP Variant — ameenakel @ 4:10 pm

(i) the three most important things the paper says

One important thing that the paper addresses is the fact that congestion epoch time should be taken into consideration when determining protocol behavior, instead of solely relying upon the window size.  The authors claim that the scalability problem in TCP is characterized in real time, and thus real time must be an important part of the computation.  Another point that the authors address (since their implementation is based upon BIC’s) is that BIC might be too aggressive for TCP.  They address this by creating a new window growth function that coexists with TCP much more easily (and fairly).  Another addition to CUBIC from BIC adds a new TCP mode that also helps to emulate TCP on a packet loss event.  The scheme in fact will calculate what the CUBIC scheme thinks that it should do and then also what it believes TCP would do and then chooses the most optimal of the two.

(ii) the most glaring problem with the paper

I feel that the biggest problem with this paper is not in the content of the material but how it’s presented.  I felt that the writing wasn’t very clear and was difficult to understand, which added to the difficulty of the paper.  The paper also didn’t address much background information, but instead jumped right into the meat of the implementation.  This made it very difficult to understand much of the references to other protocols.

(iii) the future research directions of the work

It would be interesting to see how much faster this protocol can be made to converge to an optimal window size without impeding on TCP or causing congestion.  From this paper, I couldn’t tell how much thought was put into optimizing the BIC algorithm in this sense (versus just its TCP friendliness).

 

CUBIC: A New TCP-Friendly High-Speed TCP Variant May 25, 2009

Filed under: R16. CUBIC: A New TCP-Friendly High-Speed TCP Variant — liyunjiu @ 4:10 pm

CUBIC is a variant of TCP which is an enhancement of BIC which provides TCP-friendliness and RTT independent fairness which works well in high bandwidth-delay product networks.

1. CUBIC introduces a new window growth function that is a cubic. The congestion window is determined by Wcubic = C(t-K)^3 + Wmax and is independent of RTT.

 

CUBIC: A New TCP-Friendly High-Speed TCP Variant May 25, 2009

(i) The three most important things the paper says:

1. CUBIC achieves fairness far beyond its competing protocols by using a new window controller that is decoupled from RTT. They model the BIC growth function as a cubic which slows more majorly as fairness is close and reaches that point much faster by modeling the growth as a cubic function rather than a linear function followed by a logrithmic function.

2. This protocol also improves on BIC in terms of TCP friendliness. Because of its low variation as it approaches maximum utilization it allows TCP to continue to grow while not causing packet loss by fluctuization. This means that TCP can grow its utilization without disruption and makes it the most friendly of 4 different protocols analyzed by the author.

3. It is also very important that CUBIC retains the performance of BIC. This was accomplished as shown by the figures that bandwidth utilization of CUBIC is very similar to BIC with less fluctuation. This means that CUBIC has retained the slow increase around the maximum window size and improved the stability of its predacessor.

(ii) The most glaring problem with the paper:

The worst problem with this paper was the writing. The first notable flaw is the paper is extremely repetitive. I dont know how many times they defined what CUBIC was with a similar if not copied definition. The author also includes very little analysis on what these results mean and will allow if the protocol is adopted. He also includes very little future direction of his work.

(iii) The future research directions of the work:

Future research topics should include weather a fair version of BIC is even worth using. The stats presented do not confirm that its utilization of bandwidth is near optimimum but only show that it is fair to itself and that it is friendly to TCP. We should only be transitioning to a new protocol if we feel the throughput gains are especially strong.

 

CUBIC: A New TCP-Friendly High-Speed TCP Variant May 25, 2009

Filed under: R16. CUBIC: A New TCP-Friendly High-Speed TCP Variant — stufflebean @ 4:10 pm

This paper describes CUBIC, a modification to the TCP window growth function. The goal of this is to increase performance after a packet drop without becoming unfair to existing TCP traffic.

The main points of this paper are as follows:

  1. The primary motivation for CUBIC (and its predecessor BIC) is that the TCP window usually grows too slowly after a packet-loss-induced multiplicative decrease. This leaves performance on the table unnecessarily, especially when the ideal window size is just below the size that caused the packet drop. CUBIC solves this by using a cubic growth function to rapidly return the window size to near the size that caused the packet drop, while slowing down its rate of growth near that point so as not to overshoot it. If the capacity has increased since the time of the packet drop, the cubic function then enters a cubic growth rate increase, quickly converging on the new rate. This provides the same basic rate detection as normal TCP while converging much more quickly.
  2. CUBIC is also offers good fairness to others using the same protocol because the growth rate is tied to the real time of the packet drop. This means that all flows are on equal footing if they have a synchronized drop.
  3. While it is friendly to TCP, CUBIC grows the window more slowly on short RTT networks. Therefore they need to modify their algorithm to emulate the TCP algorithm in order to “keep up.” This allows them to be competitive with TCP without being overly aggressive like some of the other protocols are.

The biggest weakness of this paper is that they don’t summarize their work with a clean conclusion. While this could be an artifact of the short paper length, ending on a bunch of graphs does not really state how they achieved their goals.

Future research on this work should evaluate its performance over a larger-scale network, such as the Internet. The variability in BDP and RTT over different hops in the Internet would be a good test of its effectiveness as compared to artificial network topologies.

 

CUBIC: A New TCP-Friendly High-Speed TCP Variant May 25, 2009

Filed under: R16. CUBIC: A New TCP-Friendly High-Speed TCP Variant — ctrezzo @ 4:10 pm

Three major contributions:

  1. A TCP variant that uses a window growth function based on a real-time congestion epoch. (i.e. the real-time between two loss events)
  2. A window growth function based on a cubic function. This allows the protocol to be stable, fair across different protocols, and scalable.
  3. A TCP variant that reacts to changes in congestion quicker, making it a better suited protocol for high bandwidth networks.

Major problem:

There is a negative incentive to adopt this protocol: in short RTT networks CUBIC’s window growth is slower than TCP. This means that in short RTT networks, flows that use CUBIC will grab less bandwidth than flows using traditional TCP. Secondly, CUBIC claims to have made a simpler window growth function, but both C (the scaling factor) and B (a constant multiplication decrease factor) need to be tuned in order to achieve correct performance.

Future implications:

This protocol still depends on pushing the network to 100% congestion (i.e. packets are lost), but how would you design a protocol that could sense the maximum bandwidth available without having to push the link to capacity? Also, how would you design a protocol that does not have a negative incentive for adoption, but still plays fair with traditional TCP?

 

CUBIC: A New TCP-Friendly High-Speed TCP Variant May 25, 2009

Filed under: R16. CUBIC: A New TCP-Friendly High-Speed TCP Variant — supritapagad @ 4:10 pm

1. RTT independent growth maintains fairness to TCP

CUBIC defines the instances it enters TCP mode so that fairness with TCP traffic on the node is maintained, based on “congestion epoch time”. This is the time period between two consecutive loss events. Using a constant, time-invarient parameter like window size to determine this transition isn’t accurate and does not take into consideration RTT. Hence when the loss rate is high and/or RTT is small, CUBIC operates in TCP mode.

2.Real-time estimation of window function

The function used to determine window size is determined dynamically so that it is independent of RTT. The function is such that, in case of a decrease, a flow is reduced by a factor proportional to its window size. Hence a flow with larger window size is decreased to a greater extend and that with smaller window size is decremented to a smaller extend. In case of an increase, the opposite is achived, with a flow with larger window increasing to a smaller extend. Hence fairness between flows is maintained.

3. Additive increase in regions far from W_max and logrithmic increase in regions close to W_max

The CUBIC window funtion is such that it makes an estimate of the maximum possible window size. In the vicinity of this maximum, it increasese the window in small increments (logrithmic increase). However, when the currently window size is much smaller or much larger than the estimated maximum, it increses its current window size additively.

Shortcoming

The protocol increases window size by a fixed amount S_max, during the linear increase phase. This still means a large catch up time is involved until maximum utilization is achived. Also, S_max is a constant. Greater efficiency might have been achieved if S_max was varied as a function of RTT and current window size.

Future Work

The protocol could be extended to incorporate flexibility into the variable S_max and S_min. A very basic version of fairness control (fairness between various CUBIC flows) is implemented in the protocol. It could extended to enhance the definition of fairness used in the paper.

 

CUBIC: A New TCP-Friendly High-Speed TCP Variant May 25, 2009

Filed under: R16. CUBIC: A New TCP-Friendly High-Speed TCP Variant — dgaschk @ 4:10 pm

CUBIC is an improvement of the BIC TCP variant. TCP typically increases its window size until loss is encountered. BIC TCP uses an algorithm that increases its window size faster than that of traditional TCP slow start. CUBIC improves on BIC by replacing BICs two stage growth algorithm with a single simpler algorithm. BICs window growth algorithm is also more TCP-friendly and not sensitive to RTT.

CUBIC adjusts its window size based on the elapsed time since the last adjustment. Since RTT is removed from the window scaling algorithm the algorithm enhances fairness among CUBIC flows with varying RTTs. CUBICs windows size grows more slowly that that of TCP over short RTT connections. This enhances CUBICs TCP friendliness.

The short document ends rather abruptly after presenting experimental results. A conclusion that summarizes these results in light of CUBICs novelties would be appreciated.

The CUBIC algorithm shows a great deal of promise especially in light of BICs widespread Linux deployment. Integrating a congestion avoidence mechanism, e.g., TCP-Vegas, could further improve this TCP variant.

 

CUBIC: A New TCP-Friendly High-Speed TCP Variant May 25, 2009

Three Important Things:

  • CUBIC represents a departure from the traditional TCP notion of ACK clocking for window growth. Instead the growth function is based on real time, effectively decoupling it from RTT and thus making it more deployable on high latency links. On packet loss, the windows size is reduced by a multiplicative factori, and the subsequent growth is a cubic function of time and the old window size. This scheme results in speedy rampup of the throughput and ensures fairness among competing flows. All flows will converge to the same window size because of the proportional decrease
  • The paper makes the argument that fair coexistence with TCP is crucial for the deployment of any new high speed protocol. Fairness is defined as behaving like TCP when there is high loss, to ensure that legacy flows are not dominated. CUBIC achieves this by entering TCP compatibilty mode and simulating the multiplicative adjustment of TCP. It essentially reverts to TCP behavior during periods of loss and large relative cmputed window size
  • Simulation experiments comparing 5 high-speed TCP variants showed that CUBIC achieved better TCP-friendliness, link utilization, balancing, and stability than any other protocol. This results held across variations of bottleneck bandwidth, RTT, background traffic, and flow competition.

Glaring Problem:

The new protocol is shown to perform well relative to others, and its success relies on several fundamental formulas and appropriately chosen constants. Most of these are presented “as is” in the paper, with little or no formal analysis and justification. There was also no discussion of the methods used to arrive at the material presented, beyond a few vague references to control theory.

Future Work:

The ultimate goal of any TCP protocol is to converge as quickly as possible to the optimal window size while being responsive to changes congestions changes. Graphically speaking this behavior will maximize the area under the throughput curve. Future work should analyze how much better it is possible to do with TCP, meaning that no help is provided from the network. If protocols like CUBIC are getting to close to the point of “as good as can be hoped for” then this should result in a shift of attention to other protocol research possible internet philisophy changes.

 

CUBIC: A new TCP-Friendly High-Speed TCP Variant May 25, 2009

The paper “CUBIC: A new TCP-Friendly High-Speed TCP Variant” by Injong Rhee and Lisong Xu presents yet another TCP variant building on their previous work on BIC. The goal of the paper is to improve congestion control in TCP by fixing the slow growth of the congestion window in large bandwith-delay product links.

Their first contribution is to base the part of the protocol variant that behaves like TCP (the so-called TCP region) not on window size but on wall clock time. This helps in accustoming the protocol variant to networks with RTT values differing over more than one order of magnitude. The proposed window growth function is a cubic function, which is essentially flat around the origin (which is defined to be the last observed maximum window size). Therefore, the authors have a large area of stability around the maximum window size to reduce oscillation and still have a high rate of change when the network conditions have changed significantly.

The second contribution is an emulation mode for TCP, which emulates the Additive Increase Multiplicative Decrease protocol used in current versions of TCP. This emulation is run in parallel to the window growth function provided by CUBIC, to ensure that CUBIC doesn’t give an unfair share to TCP on small RTT networks.

The last contribution is an comparative study of TCP variants, especially their behavior on networks with different RTTs in respect to the ongoing (old) TCP connections. The authors are able to show that some proposed variants have an heavy impact on TCP connections based on the current implementation, especially for slow link speeds. Finally, the authors look into the stability of the throughput of different variants.

This paper ignores deployment for the new TCP variant and fails to report the impact of the improved TCP friendliness on its own flows.

Future research in this area could involve studying stability and throughput on networks with flows using different TCP variants. This could result in interesting scenarios for the deployment, where compatibility issues might hinder the adoption of certain TCP variants.

 

CUBIC: A New TCP-Friendly High-Speed TCP Variant May 25, 2009

Filed under: R16. CUBIC: A New TCP-Friendly High-Speed TCP Variant — subhramazumdar @ 4:10 pm

Te paper proposes a new transport layer protocol called the CUBIC for high speed networks. CUBIC is an enhanced version of the BIC, but it simplifies BIC window control and improves its TCP friendliness and RTT fairness. The window growth function of CUBIC is governed by a cubic function in terms of the elapsed time since the last loss event. The inherent nature of the cubic function is found to be very well suited in this case and provides good stability and scalability. Also the realtime nature of the protocol keeps the window growth rate independent of the RTT which keeps the protocol TCP firendly under both long and short RTT paths. Although BIC achieves pretty good scalability, fairness and stability under current high networks, its growth function was still too aggressive, especially under short RTTs. Also several different phases of the window growth added complexity in analyzing protocol. CUBIC is proposed as a high speed TCP variant with a cubic window growth function whose shape is very similar to the BIC. The window grows very fast upon a reduction but when closer to maximum window size, it slows down its growth. Thus the slow growth around window max enhances the stability of the protocol while fast growth away from it ensured the scalability. Also CUBIC is found to remain TCP friendly for a much wider range of traffic and RRT than other existing variants of TCP.

Although the paper suggests an improved control function for the window size, its still banking on packet losses as an indication of congestion and does not use explicit congestion signaling. This may lead to slight improvement of the protocol in terms of efficiency, but in the long run will fall behind when the networks becomes really high bandwidth and long latency. Using explicit congestion control mechanism is inevitable under such circumstances.

Future directions of the work may include exploring more control functions and comparing them to find an optimal. The degree of polynimal functions may also be chosen depending on the state of congestion so that the growth is not too aggressive to casue heavy drop of packest while at the same time be efficient enough to utilize the full available bandwidth.

 

CUBIC: A New TCP-Friendly High-Speed TCP Variant May 25, 2009

Important Things:
1) CUBIC is a high-speed TCP variant which is shown to be more friendly to TCP than other high-speed variants. It uses a window growth function that depends on real time, independent of RTT. They also emulate TCP’s window adjustment algorithm after a packet loss. They prove (but not in the paper), that their algorithm is fair to TCP for certain values of congestion epoch duration, RTT, and packet loss rate, and that this is better than other high-speed variants.
2) CUBIC retains the stability and scalability of BIC while improving its TCP-fairness and simplifying its window control.
3) CUBIC is evaluated along with four other high-speed TCP variants in simulation (using NS-2), testing for TCP friendliness in short-RTT networks, TCP friendliness in long-RTT networks, and stability. The right metric for measuring stability is not entirely clear, but they conclude that CUBIC shows good stability. CUBIC is also shown to be more TCP-friendly than the other high-speed TCP variants in all cases shown.

Problems:
The TCP-friendliness of all the high-speed TCP variants is very low in long-RTT networks. It may be impossible to maintain TCP-friendliness and have good performance when RTT is high, but whether or not this is the case is not clear to me yet.

Future Work:
All these TCP variants show that protocol design and network design is not an exact science, and it is not clear what is the “right” way to do things. We’ve gotten to this point by just trying different things. Future work should continue the search for the “right” way to network, and this should be done both with backward compatibility in mind (as in this paper), and without.