Linear text segmentation using classification techniques

  • Authors:
  • Raji R. Pillai;Sumam Mary Idicula

  • Affiliations:
  • Cochin University of Science and Technology, Kochi, India;Cochin University of Science and Technology, Kochi, India

  • Venue:
  • Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic segmentation of a text stream into topically coherent segments is an important component in natural language processing tasks such as information retrieval and document summarization. Machine learning techniques can play a vital role in building an efficient system for text segmentation. This paper describes a method for identifying segment boundaries of an unstructured text document with the aid of multiple linguistic features. Linguistic features include word repetition, lexical chains, presence of pronouns, conversation, named entities, paragraph and so on. The task of segmentation is modeled as a binary classification problem, where the classes correspond to the presence or the absence of a segment boundary. An experiment in text segmentation using an efficient classifier function is presented to show the effectiveness of the new approach.