Linear text segmentation using classification techniques

Authors:
Raji R. Pillai;Sumam Mary Idicula
Affiliations:
Cochin University of Science and Technology, Kochi, India;Cochin University of Science and Technology, Kochi, India
Venue:
Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Year:
2010

Citing 12
Cited 0

Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Machine Learning

Machine Learning
Topic-based document segmentation with probabilistic latent semantic analysis

Proceedings of the eleventh international conference on Information and knowledge management
Text Segmentation by Topic

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Text segmentation based on similarity between words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Multi-paragraph segmentation of expository text

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
SeLeCT: a lexical cohesion based news story segmentation system

AI Communications - STAIRS 2002
Statistical models for topic segmentation

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Linear text segmentation using a dynamic programming algorithm

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A statistical model for domain-independent text segmentation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Feature-based segmentation of narrative documents

FeatureEng '05 Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic segmentation of a text stream into topically coherent segments is an important component in natural language processing tasks such as information retrieval and document summarization. Machine learning techniques can play a vital role in building an efficient system for text segmentation. This paper describes a method for identifying segment boundaries of an unstructured text document with the aid of multiple linguistic features. Linguistic features include word repetition, lexical chains, presence of pronouns, conversation, named entities, paragraph and so on. The task of segmentation is modeled as a binary classification problem, where the classes correspond to the presence or the absence of a segment boundary. An experiment in text segmentation using an efficient classifier function is presented to show the effectiveness of the new approach.