CrowdLogging: distributed, private, and anonymous search logging

Authors:
Henry Allen Feild;James Allan;Joshua Glatt
Affiliations:
University of Massachusetts Amherst, Amherst, MA, USA;University of Massachusetts Amherst, Amherst, MA, USA;University of Massachusetts Amherst, Amherst, MA, USA
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 18
Cited 4

Onion routing

Communications of the ACM
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
How to share a secret

Communications of the ACM
A method for obtaining digital signatures and public-key cryptosystems

Communications of the ACM
The Design of Rijndael

The Design of Rijndael
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Low-Cost Traffic Analysis of Tor

SP '05 Proceedings of the 2005 IEEE Symposium on Security and Privacy
A temporal comparison of AltaVista Web searching: Research Articles

Journal of the American Society for Information Science and Technology
A large-scale analysis of query logs for assessing personalization opportunities

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning a spelling error model from search query logs

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Search-based query suggestion

Proceedings of the 17th ACM conference on Information and knowledge management
Releasing search queries and clicks privately

Proceedings of the 18th international conference on World wide web
k-Anonymous data collection

Information Sciences: an International Journal
Effective anonymization of query logs

Proceedings of the 18th ACM conference on Information and knowledge management
How are we searching the World Wide Web? A comparison of nine search engine transaction logs

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II

Differentially private search log sanitization with optimal output utility

Proceedings of the 15th International Conference on Extending Database Technology
Semantic search log k-anonymization with generalized k-cores of query concept graph

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Rank-energy selective query forwarding for distributed search systems

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Using CrowdLogger for in situ information retrieval system evaluation

Proceedings of the 2013 workshop on Living labs for information retrieval evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe CrowdLogging, an approach for distributed search log collection, storage, and mining, with the dual goals of preserving privacy and making the mined information broadly available. Most search log mining approaches and most privacy enhancing schemes have focused on centralized search logs and methods for disseminating them to third parties. In our approach, a user's search log is encrypted and shared in such a way that (a) the source of a search behavior artifact, such as a query, is unknown and (b) extremely rare artifacts---that is, artifacts more likely to contain private information---are not revealed. The approach works with any search behavior artifact that can be extracted from a search log, including queries, query reformulations, and query-click pairs. In this work, we: (1) present a distributed search log collection, storage, and mining framework; (2) compare several privacy policies, including differential privacy, showing the trade-offs between strong guarantees and the utility of the released data; (3) demonstrate the impact of our approach using two existing research query logs; and (4) describe a pilot study for which we implemented a version of the framework.