Digging into HTTPS: flow-based classification of webmail traffic

  • Authors:
  • Dominik Schatzmann;Wolfgang Mühlbauer;Thrasyvoulos Spyropoulos;Xenofontas Dimitropoulos

  • Affiliations:
  • ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland

  • Venue:
  • IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, webmail interfaces, e.g., Horde, Outlook Web Access, and webmail platforms such as GMail, Yahoo!, and Hotmail have seen a tremendous boost in popularity. Given the importance of e-mail for personal and business use alike, and its exposure to imminent threats, there exists the need for a comprehensive view of the Internet mail system, including webmail traffic. We, in this paper, propose a novel, passive approach to identify webmail traffic solely based on network-level data in order to obtain a comprehensive view of the mail system. Key to our approach is that we leverage correlations across protocols and time to introduce three novel features for HTTPS webmail classification. Our first feature is based on the finding that webmail servers tend to reside close to legacy mail servers, e.g. IMAP and POP, which can be easily identified. Our second feature leverages that the usage of webmail services results in distinct patterns on sessions' duration and on the diurnal/weekly traffic usage profile. In addition, our third feature exploits the observation that traffic flows to webmail platforms exhibit inherent periodicities due to the fact that AJAX-based clients periodically check for new messages. We use these three features to build a simple classifier and detect webmail traffic on real-world NetFlow traces from a medium-sized backbone network. We believe that the major contribution of this paper -- exploring a set of new features that could classify applications that run over HTTPS ports solely based on NetFlow data -- will stimulate more general advance in the field of traffic classification.