Using multiple discriminant analysis approach for linear text segmentation

  • Authors:
  • Zhu Jingbo;Ye Na;Chang Xinzhi;Chen Wenliang;Benjamin K Tsou

  • Affiliations:
  • Natural Language Processing Laboratory, Institute of Computer Software and Theory, Northeastern University, Shenyang, P.R. China;Natural Language Processing Laboratory, Institute of Computer Software and Theory, Northeastern University, Shenyang, P.R. China;Natural Language Processing Laboratory, Institute of Computer Software and Theory, Northeastern University, Shenyang, P.R. China;Natural Language Processing Laboratory, Institute of Computer Software and Theory, Northeastern University, Shenyang, P.R. China;Language Information Sciences Research Centre, City University of Hong Kong, HK

  • Venue:
  • IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Research on linear text segmentation has been an on-going focus in NLP for the last decade, and it has great potential for a wide range of applications such as document summarization, information retrieval and text understanding. However, for linear text segmentation, there are two critical problems involving automatic boundary detection and automatic determination of the number of segments in a document. In this paper, we propose a new domain-independent statistical model for linear text segmentation. In our model, Multiple Discriminant Analysis (MDA) criterion function is used to achieve global optimization in finding the best segmentation by means of the largest word similarity within a segment and the smallest word similarity between segments. To alleviate the high computational complexity problem introduced by the model, genetic algorithms (GAs) are used. Comparative experimental results show that our method based on MDA criterion functions has achieved higher Pk measure (Beeferman) than that of the baseline system using TextTiling algorithm.