Similar Document Detection with Limited Information Disclosure

  • Authors:
  • Wei Jiang;Mummoorthy Murugesan;Chris Clifton;Luo Si

  • Affiliations:
  • Dept. of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, USA. wjiang@cs.purdue.edu;Dept. of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, USA. mmuruges@cs.purdue.edu;Dept. of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, USA. clifton@cs.purdue.edu;Dept. of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, USA. lsi@cs.purdue.edu

  • Venue:
  • ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Similar document detection plays important roles in many applications, such as file management, copyright protection, and plagiarism prevention. Existing protocols assume that the contents of files stored on a server (or multiple servers) are directly accessible. This assumption limits more practical applications, e.g., detecting plagiarized documents between two conferences, where submissions are confidential. We propose novel protocols to detect similar documents between two entities where documents cannot be openly shared with each other. We also conduct experiments to show the practical value of the proposed protocols.