On web browsing privacy in anonymized NetFlows

Authors:
S. E. Coull;M. P. Collins;C. V. Wright;F. Monrose;M. K. Reiter
Affiliations:
Johns Hopkins University;Carnegie Mellon University;Johns Hopkins University;Johns Hopkins University;Carnegie Mellon University
Venue:
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Year:
2007

Citing 21
Cited 10

Probabilistic reasoning in expert systems: theory and algorithms

Probabilistic reasoning in expert systems: theory and algorithms
Packet classification on multiple fields

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Deanonymizing Users of the SafeWeb Anonymizing Service

Proceedings of the 11th USENIX Security Symposium
A blueprint for introducing disruptive technology into the Internet

ACM SIGCOMM Computer Communication Review
Statistical Identification of Encrypted Web Browsing Traffic

SP '02 Proceedings of the 2002 IEEE Symposium on Security and Privacy
Packet classification using multidimensional cutting

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
A high-level programming environment for packet trace anonymization and transformation

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis

INFORMS Journal on Computing
Prefix-preserving IP address anonymization: measurement-based security evaluation and a new cryptography-based scheme

Computer Networks: The International Journal of Computer and Telecommunications Networking
Combining Cisco NetFlow Exports with Relational Database Technology for Usage Statistics, Intrusion Detection, and Network Forensics

LISA '00 Proceedings of the 14th USENIX conference on System administration
Worm Origin Identification Using Random Moonwalks

SP '05 Proceedings of the 2005 IEEE Symposium on Security and Privacy
BLINC: multilevel traffic classification in the dark

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
The devil and packet trace anonymization

ACM SIGCOMM Computer Communication Review
Efficient sequence alignment of network traffic

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Inferring the source of encrypted HTTP connections

Proceedings of the 13th ACM conference on Computer and communications security
Fingerprinting websites using traffic analysis

PET'02 Proceedings of the 2nd international conference on Privacy enhancing technologies
Non-expanding transaction specific pseudonymization for IP traffic monitoring

CANS'05 Proceedings of the 4th international conference on Cryptology and Network Security
Finding peer-to-peer file-sharing using coarse network behaviors

ESORICS'06 Proceedings of the 11th European conference on Research in Computer Security
On the privacy risks of publishing anonymized IP network traces

CMS'06 Proceedings of the 10th IFIP TC-6 TC-11 international conference on Communications and Multimedia Security
Privacy vulnerabilities in encrypted HTTP streams

PET'05 Proceedings of the 5th international conference on Privacy Enhancing Technologies
Anonymization of IP traffic monitoring data: attacks on two prefix-preserving anonymization schemes and some proposed remedies

PET'05 Proceedings of the 5th international conference on Privacy Enhancing Technologies

Conducting cybersecurity research legally and ethically

LEET'08 Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats
Privacy-safe network trace sharing via secure queries

Proceedings of the 1st ACM workshop on Network data anonymization
A taxonomy and adversarial model for attacks against network log anonymization

Proceedings of the 2009 ACM symposium on Applied Computing
Browser Fingerprinting from Coarse Traffic Summaries: Techniques and Implications

DIMVA '09 Proceedings of the 6th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier

Proceedings of the 2009 ACM workshop on Cloud computing security
Efficient web browsing with perfect anonymity using page prefetching

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Analyzing characteristic host access patterns for re-identification of web user sessions

NordSec'10 Proceedings of the 15th Nordic conference on Information Security Technology for Applications
Website detection using remote traffic analysis

PETS'12 Proceedings of the 12th international conference on Privacy Enhancing Technologies
What SNMP data can tell us about edge-to-edge network performance

PAM'13 Proceedings of the 14th international conference on Passive and Active Measurement
Identifying user sessions from web server logs with integer programming

Intelligent Data Analysis - Business Analytics and Intelligent Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Anonymization of network traces is widely viewed as a necessary condition for releasing such data for research purposes. For obvious privacy reasons, an important goal of trace anonymization is to suppress the recovery of web browsing activities. While several studies have examined the possibility of reconstructing web browsing activities from anonymized packet-level traces, we argue that these approaches fail to account for a number of challenges inherent in real-world network traffic, and more so, are unlikely to be successful on coarser Net-Flow logs. By contrast, we develop new approaches that identify target web pages within anonymized NetFlow data, and address many real-world challenges, such as browser caching and session parsing. We evaluate the effectiveness of our techniques in identifying front pages from the 50 most popular web sites on the Internet (as ranked by alexa.com), in both a closed-world experiment similar to that of earlier work and in tests with real network flow logs. Our results show that certain types of web pages with unique and complex structure remain identifiable despite the use of state-of-the-art anonymization techniques. The concerns raised herein pose a threat to web browsing privacy insofar as the attacker can approximate the web browsing conditions represented in the flow logs.