Similar Document Detection with Limited Information Disclosure

Authors:
Wei Jiang;Mummoorthy Murugesan;Chris Clifton;Luo Si
Affiliations:
Dept. of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, USA. wjiang@cs.purdue.edu;Dept. of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, USA. mmuruges@cs.purdue.edu;Dept. of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, USA. clifton@cs.purdue.edu;Dept. of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, USA. lsi@cs.purdue.edu
Venue:
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Year:
2008

Citing 0
Cited 9

Privacy-preserving similarity-based text retrieval

ACM Transactions on Internet Technology (TOIT)
Scaling-invariant boundary image matching using time-series matching techniques

Data & Knowledge Engineering
Turning privacy leaks into floods: surreptitious discovery of social network friendships and other sensitive binary attribute vectors

Proceedings of the 9th annual ACM workshop on Privacy in the electronic society
N-gram based secure similar document detection

DBSec'11 Proceedings of the 25th annual IFIP WG 11.3 conference on Data and applications security and privacy
An efficient and secure data sharing framework using homomorphic encryption in the cloud

Proceedings of the 1st International Workshop on Cloud Intelligence
Efficient privacy-aware record integration

Proceedings of the 16th International Conference on Extending Database Technology
Case based time series prediction using biased time warp distance for electrical evoked potential forecasting in visual prostheses

Applied Soft Computing
Reassembling multilingual temporal news datasets with incomplete information

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
EsPRESSO: Efficient privacy-preserving evaluation of sample set similarity

Journal of Computer Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similar document detection plays important roles in many applications, such as file management, copyright protection, and plagiarism prevention. Existing protocols assume that the contents of files stored on a server (or multiple servers) are directly accessible. This assumption limits more practical applications, e.g., detecting plagiarized documents between two conferences, where submissions are confidential. We propose novel protocols to detect similar documents between two entities where documents cannot be openly shared with each other. We also conduct experiments to show the practical value of the proposed protocols.