Word association norms, mutual information, and lexicography
Computational Linguistics
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Lexical cohesion computed by thesaural relations as an indicator of the structure of text
Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Text segmentation using reiteration and collocation
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
SEXTANT: exploring unexplored contexts for semantic extraction from syntactic analysis
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Word sense disambiguation and text segmentation based on lexical cohesion
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Correcting real-word spelling errors by restoring lexical cohesion
Natural Language Engineering
A statistical model for domain-independent text segmentation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Minimum cut model for spoken lecture segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Semantic similarity for detecting recognition errors in automatic speech transcripts
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Topic indexing of TV broadcast news programs
PROPOR'03 Proceedings of the 6th international conference on Computational processing of the Portuguese language
Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition
Computer Speech and Language
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Hi-index | 0.00 |
Transcript-based topic segmentation of TV programs faces several difficulties arising from transcription errors, from the presence of potentially short segments and from the limited number of word repetitions to enforce lexical cohesion, i.e., lexical relations that exist within a text to provide a certain unity. To overcome these problems, we extend a probabilistic measure of lexical cohesion based on generalized probabilities with a unigram language model. On the one hand, confidence measures and semantic relations are considered as additional sources of information. On the other hand, language model interpolation techniques are investigated for better language model estimation. Experimental topic segmentation results are presented on two corpora with distinct characteristics, composed respectively of broadcast news and reports on current affairs. Significant improvements are obtained on both corpora, demonstrating the effectiveness of the extended lexical cohesion measure for spoken TV contents, as well as its genericity over different programs.