Embellishing text search queries to protect user privacy

Authors:
HweeHwa Pang;Xuhua Ding;Xiaokui Xiao
Affiliations:
Singapore Management University;Singapore Management University;Nanyang Technological University
Venue:
Proceedings of the VLDB Endowment
Year:
2010

Citing 20
Cited 9

Concept based query expansion

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
WordNet: a lexical database for English

Communications of the ACM
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Experimentation as a way of life: Okapi at TREC

Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Untraceable electronic mail, return addresses, and digital pseudonyms

Communications of the ACM
Modern Information Retrieval

Modern Information Retrieval
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'

IEEE Transactions on Knowledge and Data Engineering
On the use of the singular value decomposition for text retrieval

Computational information retrieval
Replication is not needed: single database, computationally-private information retrieval

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Practical Techniques for Searches on Encrypted Data

SP '00 Proceedings of the 2000 IEEE Symposium on Security and Privacy
New Constructions and Practical Applications for Private Stream Searching (Extended Abstract)

SP '06 Proceedings of the 2006 IEEE Symposium on Security and Privacy
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Discovering relations among named entities from large corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Tor: the second-generation onion router

SSYM'04 Proceedings of the 13th conference on USENIX Security Symposium - Volume 13
Self-supervised relation extraction from the Web

Knowledge and Information Systems
Authenticating the query results of text search engines

Proceedings of the VLDB Endowment
Privacy-preserving similarity-based text retrieval

ACM Transactions on Internet Technology (TOIT)
Public-key cryptosystems based on composite degree residuosity classes

EUROCRYPT'99 Proceedings of the 17th international conference on Theory and application of cryptographic techniques
Document frequency and term specificity

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Keyword search and oblivious pseudorandom functions

TCC'05 Proceedings of the Second international conference on Theory of Cryptography

Enhancing deniability against query-logs

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Adjusting the trade-off between privacy guarantees and computational cost in secure hardware PIR

SDM'11 Proceedings of the 8th VLDB international conference on Secure data management
Privacy preserving indexing for eHealth information networks

Proceedings of the 20th ACM international conference on Information and knowledge management
Secure data management in the cloud

DNIS'11 Proceedings of the 7th international conference on Databases in Networked Information Systems
Privacy preservation by disassociation

Proceedings of the VLDB Endowment
Privacy protection in personalized web search: a peer group-based approach

SBP'13 Proceedings of the 6th international conference on Social Computing, Behavioral-Cultural Modeling and Prediction
Efficient Time-Stamped Event Sequence Anonymization

ACM Transactions on the Web (TWEB)
A query scrambler for search privacy on the internet

Information Retrieval
Towards practical private processing of database queries over public data

Distributed and Parallel Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Users of text search engines are increasingly wary that their activities may disclose confidential information about their business or personal profiles. It would be desirable for a search engine to perform document retrieval for users while protecting their intent. In this paper, we identify the privacy risks arising from semantically related search terms within a query, and from recurring high-specificity query terms in a search session. To counter the risks, we propose a solution for a similarity text retrieval system to offer anonymity and plausible deniability for the query terms, and hence the user intent, without degrading the system's precision-recall performance. The solution comprises a mechanism that embellishes each user query with decoy terms that exhibit similar specificity spread as the genuine terms, but point to plausible alternative topics. We also provide an accompanying retrieval scheme that enables the search engine to compute the encrypted document relevance scores from only the genuine search terms, yet remain oblivious to their distinction from the decoys. Empirical evaluation results are presented to substantiate the effectiveness of our solution.