The paper presents an overview of Archipelago (Ark), the platform for active measurement, analysis and mapping of Internet IP topology data for use by network research communities worldwide. The authors indicate that the Internet infrastructure protection agencies and regulatory authorities require constructive comprehension of Internet topology data to provide better services to address the ever changing dynamics of Internet and deter operational threats faced by it. Topology data is also made available to the research community which has traditionally relied on heuristics to solve network related challenges. In addition to providing a network measurement infrastructure the authors also develop software for data processing and analysis. The main points discussed in the paper are the Ark architecture, its deployment and accomplishments and future goals.
The Ark architecture primarily aims to enable ease of development and rapid prototyping, dynamic and coordinated measurements and provide a set of measurement services that come handy to researchers. Ark allows users to avail network measurement services over the network through dynamic scripting languages and pre-built APIs. Measurement like path diversity within a given prefix and monitoring prefixes containing critical infrastructure require dynamism and coordination between the measuring nodes distributed in space. Ark supports this dynamism and coordination using a tuple-space model called Marinda. Tuple-space acts like a shared memory between communicating processes, the contents of which can retrieved by simple pattern matching thereby enabling distributed measurements. Various measurements services like ping ands trace-route can be built and deployed at monitor nodes on top of tuple-space which acts as the underlying transport/messaging medium. Such service architecture allows flexibilities like decentralized management and ease of aggregation for diverse communication patterns.
The authors plan to deploy Ark monitors in under-represented regions and in regions with IPv6 connectivity to enable active measurements on it. Among the measurements that the Ark infrastructure makes it to calculate the delay of IP paths between a dynamically generated set of /24 prefix hosts. The task is parallelized by dividing it among multiple parallel monitors at different geographical sites constituting three teams which randomly probe the prefixes. These monitors poll the scamper measurement tool server node which supports IPv4, IPv6, ping and similar services and implements trace-routes measurements for TCP, UDP and ICMP. Apart from trace-route measurement on these /24 prefixes, the authors also perform DNS look up on them and get data like IP to-hostname maps and raw DNS query/response traffic. The third major measurement of alias resolution is used by the authors to get a router level map of the Internet which will allow them to identify more realistic physical links between routers rather than IP interfaces. Alias resolution is implemented by two heuristics techniques, CAIDA iffinder and APAR. Further the authors extract an AS-level Internet maps which can be used to maximize the number of valid paths in the AS topology. Lastly the authors construct dual AS-router level Internet topologies The resulting dual map merges router and AS-level graphs into an integrated view where links and nodes in both graphs are consistently annotated with semantically relevant meta-data and increase researchers’ situational awareness of the critical Internet infrastructure and open grounds for understanding sand modeling Internet evolution.
Overall Ark provides a great platform that research users can use to quickly design, implement and coordinate the execution of experiments across a distributed set of dedicated monitors. The Ark implementation is a big step towards providing synergic Internet topology data to various research communities across the globe and requires careful inclusion and expansion to encompass geographical regions spanning different continents and reflecting disparate Internet traffic characteristics. The main problems that arise from this vision could be resistance from service providers and regulatory authorities for inclusive growth and sharing of critical Internet data and also economies of scale which at some level might not be acceptable to all communities. Also maintainability is major issue since trace-route type measurements need series active nodes and failure of any one of them might disrupt further experimentation.