Evaluating text reuse discovery on the web

  • Authors:
  • Stanford Chiu;Ibrahim Uysal;W. Bruce Croft

  • Affiliations:
  • University of Massachusetts, Amherst, MA, USA;University of Massachusetts, Amherst, MA, USA;University of Massachusetts, Amherst, MA, USA

  • Venue:
  • Proceedings of the third symposium on Information interaction in context
  • Year:
  • 2010
  • Detecting text reuse with modified and weighted n-grams

    SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text reuse detection aims to identify duplicates, reformulations or partial rewrites of a given text. Some previous research has focused on determining text reuse instances accurately on local corpora. However, the practical usage of finding text reuse on the web has remained largely untested. In this work, we 1) introduce a novel text reuse searching interface for the web, based on a previously proposed architecture, 2) evaluate its feasibility, and 3) investigate techniques to improve both effectiveness and efficiency. Our results show that exhaustive query submission using n-grams can dramatically reduce the execution time with only small losses in accuracy.