Analysis of source identified text corpora: exploring the statistics of the reused text and authorship

  • Authors:
  • Akiko Aizawa

  • Affiliations:
  • National Institute of Informatics, Chiyoda-ku, Tokyo, Japan

  • Venue:
  • ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper aims at providing a view of text recycled, within a short time, by the authors themselves. We first present a simple and general method for extracting reused term sequences, and then analyze several author-identified text collections to compare the statistical quantities. The ratio of recycling is also measured for each collection. Finally, related research topics are introduced together with some discussion of future research directions.