A document comparison scheme for secure duplicate detection

  • Authors:
  • Federica Mandreoli;Riccardo Martoglia;Paolo Tiberio

  • Affiliations:
  • Università di Modena e Reggio Emilia, Dipartimento di Ingegneria dell’Informazione, via Vignolese 905, 41100, Modena, Italy;Università di Modena e Reggio Emilia, Dipartimento di Ingegneria dell’Informazione, via Vignolese 905, 41100, Modena, Italy;Università di Modena e Reggio Emilia, Dipartimento di Ingegneria dell’Informazione, via Vignolese 905, 41100, Modena, Italy

  • Venue:
  • International Journal on Digital Libraries
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ever-growing volumes of textual information from various sources have fostered the development of digital libraries, making digital content readily accessible but also easy for malicious users to plagiarize, thus giving rise to security problems. In this paper, we introduce a duplicate detection scheme that is able to determine, with a particularly high accuracy, the degree to which one document is similar to another. Our pairwise document comparison scheme detects the resemblance between the content of documents by considering document chunks, representing contexts of words selected from the text. The resulting duplicate detection technique presents a good level of security in the protection of intellectual property while improving the availability of the data stored in the digital library and the correctness of the search results. Finally, the paper addresses efficiency and scalability issues by introducing new data reduction techniques.