Extraction of field-coherent passages

  • Authors:
  • Samuel Sangkon Lee;Masami Shishibori;Toru Sumitomo;Jun-ichi Aoe

  • Affiliations:
  • Department of Information Science and Intelligent Systems, University of Tokushima, 2-1 Minami Josanjima, 770-8506 Tokushima, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, 2-1 Minami Josanjima, 770-8506 Tokushima, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, 2-1 Minami Josanjima, 770-8506 Tokushima, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, 2-1 Minami Josanjima, 770-8506 Tokushima, Japan

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

It is important to identify text that is substantially independent of adjacent material. This paper presents a technique for dividing text into field-coherent passages. The method presented is based upon extracting field-associated words or phrases from the text by determining how topics grow, shrink and shift from sentence to sentence. We propose measures of topic continuity and transition and suggest how those may be used to find the passage boundaries. After collecting 12,500 documents, we obtained an average precision of 88% and recall of 78% in a training document set.