Inferring hidden features in the Internet (PhD thesis)
Gursun, Gonca
The Internet is a large-scale decentralized system that is composed of thousands of independent networks. In this system, there are two main components, interdomain routing and traffic, that are vital inputs for many tasks such as traffic engineering, security, and business intelligence. However, due to the decentralized structure of the Internet, global knowledge of both interdomain routing and traffic is hard to come by. In this dissertation, we address a set of statistical inference problems with the goal of extending the knowledge of the interdomain-level Internet.
In the first part of this dissertation we investigate the relationship between the interdomain topology and an individual network’s inference ability. We first frame the questions through abstract analysis of idealized topologies, and then use actual routing measurements and topologies to study the ability of real networks to infer traffic flows.
In the second part, we study the ability of networks to identify which paths flow through their network. We first discuss that answering this question is surprisingly hard due to the design of interdomain routing systems where each network can learn only a limited set of routes. Therefore, network operators have to rely on observed traffic. However, observed traffic can only identify that a particular route passes through its network but not that a route does not pass through its network. In order to solve the routing inference problem, we propose a nonparametric inference technique that works quite accurately. The key idea behind our technique is measuring the distances between destinations. In order to accomplish that, we define a metric called Routing State Distance (RSD) to measure distances in terms of routing similarity.
Finally, in the third part, we study our new metric, RSD in detail. Using RSD we address an important and difficult problem of characterizing the set of paths between networks. The collection of the paths across networks is a great source to understand important phenomena in the Internet as path selections are driven by the economic and performance considerations of the networks. We show that RSD has a number of appealing properties that can discover these hidden phenomena.
↧