Unconstrained endpoint profiling (googling the internet)

  • Authors:
  • Ionut Trestian;Supranamaya Ranjan;Aleksandar Kuzmanovi;Antonio Nucci

  • Affiliations:
  • Northwestern University, Evanston, IL, USA;Narus Inc., Mountain View, CA, USA;Northwestern University, Evanston, IL, USA;Narus Inc., Mountain View, CA, USA

  • Venue:
  • Proceedings of the ACM SIGCOMM 2008 conference on Data communication
  • Year:
  • 2008

Quantified Score

Hi-index 0.02

Visualization

Abstract

Understanding Internet access trends at a global scale, i.e., what do people do on the Internet, is a challenging problem that is typically addressed by analyzing network traces. However, obtaining such traces presents its own set of challenges owing to either privacy concerns or to other operational difficulties. The key hypothesis of our work here is that most of the information needed to profile the Internet endpoints is already available around us - on the web. In this paper, we introduce a novel approach for profiling and classifying endpoints. We implement and deploy a Google-based profiling tool, which accurately characterizes endpoint behavior by collecting and strategically combining information freely available on the web. Our 'unconstrained endpoint profiling' approach shows remarkable advances in the following scenarios: (i) Even when no packet traces are available, it can accurately predict application and protocol usage trends at arbitrary networks; (ii) When network traces are available, it dramatically outperforms state-of-the-art classification tools; (iii) When sampled flow-level traces are available, it retains high classification capabilities when other schemes literally fall apart. Using this approach, we perform unconstrained endpoint profiling at a global scale: for clients in four different world regions (Asia, South and North America and Europe). We provide the first-of-its-kind endpoint analysis which reveals fascinating similarities and differences among these regions.