Using multiple discriminant analysis approach for linear text segmentation

Authors:
Zhu Jingbo;Ye Na;Chang Xinzhi;Chen Wenliang;Benjamin K Tsou
Affiliations:
Natural Language Processing Laboratory, Institute of Computer Software and Theory, Northeastern University, Shenyang, P.R. China;Natural Language Processing Laboratory, Institute of Computer Software and Theory, Northeastern University, Shenyang, P.R. China;Natural Language Processing Laboratory, Institute of Computer Software and Theory, Northeastern University, Shenyang, P.R. China;Natural Language Processing Laboratory, Institute of Computer Software and Theory, Northeastern University, Shenyang, P.R. China;Language Information Sciences Research Centre, City University of Hong Kong, HK
Venue:
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Year:
2005

Citing 16
Cited 1

Automatic text decomposition using text segments and text themes

Proceedings of the the seventh ACM conference on Hypertext
Machine Learning

Machine Learning
Text Segmentation by Topic

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Advances in domain independent linear text segmentation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Optimal multi-paragraph text segmentation by dynamic programming

ACL '98 Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
Intention-based segmentation: human reliability and correlation with linguistic cues

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Text segmentation based on similarity between words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Multi-paragraph segmentation of expository text

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
An automatic method of finding topic boundaries

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Statistical models for topic segmentation

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Linear text segmentation using a dynamic programming algorithm

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A statistical model for domain-independent text segmentation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Intonational features of local and global discourse structure

HLT '91 Proceedings of the workshop on Speech and Natural Language

A dynamic programming model for text segmentation based on min-max similarity

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research on linear text segmentation has been an on-going focus in NLP for the last decade, and it has great potential for a wide range of applications such as document summarization, information retrieval and text understanding. However, for linear text segmentation, there are two critical problems involving automatic boundary detection and automatic determination of the number of segments in a document. In this paper, we propose a new domain-independent statistical model for linear text segmentation. In our model, Multiple Discriminant Analysis (MDA) criterion function is used to achieve global optimization in finding the best segmentation by means of the largest word similarity within a segment and the smallest word similarity between segments. To alleviate the high computational complexity problem introduced by the model, genetic algorithms (GAs) are used. Comparative experimental results show that our method based on MDA criterion functions has achieved higher Pk measure (Beeferman) than that of the baseline system using TextTiling algorithm.