Improving Text Segmentation Using Latent Semantic Analysis: A Reanalysis of Choi, Wiemer-Hastings, and Moore (2001)

Authors:
Yves Bestgen
Affiliations:
Center for Text and Discourse Studies, PSOR, Place du Cardinal Mercier 10, B-1348 Louvain-la-Neuve Belgium
Venue:
Computational Linguistics
Year:
2006

Citing 19
Cited 6

Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Topic-based document segmentation with probabilistic latent semantic analysis

Proceedings of the eleventh international conference on Information and knowledge management
A critique and improvement of an evaluation metric for text segmentation

Computational Linguistics
Text Segmentation by Topic

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Using LSA for Pronominal Anaphora Resolution

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Domain-independent text segmentation using anisotropic diffusion and dynamic programming

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
SVDPACKC (Version 1.0) User''s Guide

SVDPACKC (Version 1.0) User''s Guide
Segmentation of Lecture Videos Based on Text: A Method Combining Multiple Linguistic Features

HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 1 - Volume 1
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Advances in domain independent linear text segmentation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Optimal multi-paragraph text segmentation by dynamic programming

ACL '98 Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
Text segmentation based on similarity between words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Cohesion and collocation: using context vectors in text segmentation

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Linear text segmentation using a dynamic programming algorithm

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Using collocations for topic segmentation and link detection

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A statistical model for domain-independent text segmentation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
On the use of automatic tools for large scale semantic analyses of causal connectives

DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
An empirical study on dimensionality optimization in text mining for linguistic knowledge acquisition

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining

Text segmentation via topic modeling: an analytical study

Proceedings of the 18th ACM conference on Information and knowledge management
A dynamic programming model for text segmentation based on min-max similarity

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Text segmentation: A topic modeling perspective

Information Processing and Management: an International Journal
Discourse structure and computation: past, present and future

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Position-Aligned translation model for citation recommendation

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Discourse structure and language technology

Natural Language Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Choi, Wiemer-Hastings, and Moore (2001) proposed to use Latent Semantic Analysis (LSA) to extract semantic knowledge from corpora in order to improve the accuracy of a text segmentation algorithm. By comparing the accuracy of the very same algorithm, depending on whether or not it takes into account complementary semantic knowledge, they were able to show the benefit derived from such knowledge. In their experiments, semantic knowledge was, however, acquired from a corpus containing the texts to be segmented in the test phase. If this hyper-specificity of the LSA corpus explains the largest part of the benefit, one may wonder if it is possible to use LSA to acquire generic semantic knowledge that can be used to segment new texts. The two experiments reported here show that the presence of the test materials in the LSA corpus has an important effect, but also that the generic semantic knowledge derived from large corpora clearly improves the segmentation accuracy.