Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Machine Learning
Topic-based document segmentation with probabilistic latent semantic analysis
Proceedings of the eleventh international conference on Information and knowledge management
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Text segmentation based on similarity between words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Multi-paragraph segmentation of expository text
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
SeLeCT: a lexical cohesion based news story segmentation system
AI Communications - STAIRS 2002
Statistical models for topic segmentation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Linear text segmentation using a dynamic programming algorithm
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A statistical model for domain-independent text segmentation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Feature-based segmentation of narrative documents
FeatureEng '05 Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing
Hi-index | 0.00 |
Automatic segmentation of a text stream into topically coherent segments is an important component in natural language processing tasks such as information retrieval and document summarization. Machine learning techniques can play a vital role in building an efficient system for text segmentation. This paper describes a method for identifying segment boundaries of an unstructured text document with the aid of multiple linguistic features. Linguistic features include word repetition, lexical chains, presence of pronouns, conversation, named entities, paragraph and so on. The task of segmentation is modeled as a binary classification problem, where the classes correspond to the presence or the absence of a segment boundary. An experiment in text segmentation using an efficient classifier function is presented to show the effectiveness of the new approach.