SeLeCT: a lexical cohesion based news story segmentation system

Authors:
Nicola Stokes;Joe Carthy;Alan F. Smeaton
Affiliations:
Department of Computer Science, University College Dublin, Ireland;Department of Computer Science, University College Dublin, Ireland;School for Computer Applications and Centre for Digital Video Processing, Dublin City University, Ireland
Venue:
AI Communications - STAIRS 2002
Year:
2004

Citing 12
Cited 17

Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Information Retrieval

Information Retrieval
A critique and improvement of an evaluation metric for text segmentation

Computational Linguistics
Text Segmentation by Topic

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Topic segmentation: algorithms and applications

Topic segmentation: algorithms and applications
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Advances in domain independent linear text segmentation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Intention-based segmentation: human reliability and correlation with linguistic cues

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Text segmentation based on similarity between words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Word sense disambiguation and text segmentation based on lexical cohesion

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Spoken and written news story segmentation using lexical chains

NAACLstudent '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Proceedings of the HLT-NAACL 2003 student research workshop - Volume 3

Semantic passage segmentation based on sentence topics for question answering

Information Sciences: an International Journal
Subword Lexical Chaining for Automatic Story Segmentation in Chinese Broadcast News

PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Measuring semantic relatedness using people and WordNet

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Using readers to identify lexical cohesive structures in texts

ACLstudent '05 Proceedings of the ACL Student Research Workshop
A Subword Normalized Cut Approach to Automatic Story Segmentation of Chinese Broadcast News

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Story segmentation and topic classification of broadcast news via a topic-based segmental model and a genetic algorithm

IEEE Transactions on Audio, Speech, and Language Processing
Automatic, context-of-capture-based categorization, structure detection and segmentation of news telecasts

DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
TextLec: a novel method of segmentation by topic using lower windows and lexical cohesion

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Linear text segmentation using classification techniques

Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Text segmentation by clustering cohesion

CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
On the effectiveness of subwords for lexical cohesion based story segmentation of Chinese broadcast news

Information Sciences: an International Journal
Text segmentation: A topic modeling perspective

Information Processing and Management: an International Journal
Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval

ACM Transactions on Speech and Language Processing (TSLP)
Initial experiments on automatic story segmentation in chinese spoken documents using lexical cohesion of extracted named entities

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Lexical chains using distributional measures of concept distance

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
A semi-automatic text-based semantic video annotation system for Turkish facilitating multilingual retrieval

Expert Systems with Applications: An International Journal
Detection of imperative and declarative question--answer pairs in email conversations

AI Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we compare the performance of three distinct approaches to lexical cohesion based text segmentation. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e., distinct news stories from broadcast news programmes. Our approach to news story segmentation (the SeLeCT system) is based on an analysis of lexical cohesive strength between textual units using a linguistic technique called lexical chaining. We evaluate the relative performance of SeLeCT with respect to two other cohesion based segmenters: TextTiling and C99. Using a recently introduced evaluation metric WindowDiff, we contrast the segmentation accuracy of each system on both "spoken" (CNN news transcripts) and "written" (Reuters newswire) news story test sets extracted from the TDT1 corpus.