A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Statistical models for topic segmentation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Hi-index | 0.00 |
In a context where information retrieval is extended to spoken "documents" including conversations, it will be important to provide users with the ability to seek informational content, rather than socially motivated small talk that appears in many conversational sources. In this paper we present a preliminary study aimed at automatically identifying "irrelevance" in the domain of telephone conversations. We apply a standard machine learning algorithm to build a classifier that detects off-topic sections with better-than-chance accuracy and that begins to provide insight into the relative importance of features for identifying utterances as on topic or not.