Multi-strategy learning for topic detection and tracking: a joint report of CMU approaches to multilingual TDT

Authors:
Yiming Yang;Jaime Carbonell;Ralf Brown;John Lafferty;Thomas Pierce;Thomas Ault
Affiliations:
School of Computer Science, Carnegie Mellon University (CMU), Pittsburgh, PA;School of Computer Science, Carnegie Mellon University (CMU), Pittsburgh, PA;School of Computer Science, Carnegie Mellon University (CMU), Pittsburgh, PA;School of Computer Science, Carnegie Mellon University (CMU), Pittsburgh, PA;School of Computer Science, Carnegie Mellon University (CMU), Pittsburgh, PA;School of Computer Science, Carnegie Mellon University (CMU), Pittsburgh, PA
Venue:
Topic detection and tracking
Year:
2002

Citing 11
Cited 5

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic combination of multiple ranked retrieval systems

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining multiple evidence from different properties of weighting schemes

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Combining classifiers in text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A study of retrospective and on-line event detection

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Improving text categorization methods for event tracking

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Learning Approaches for Detecting and Tracking News Events

IEEE Intelligent Systems
Multistrategy Learning for Information Extraction

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Combining Multiple Learning Strategies for Effective Cross Validation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning

Capturing term dependencies using a language model based on sentence trees

Proceedings of the eleventh international conference on Information and knowledge management
Robust techniques for organizing and retrieving spoken documents

EURASIP Journal on Applied Signal Processing
Discovering event evolution graphs from news corpora

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
New event detection and topic tracking in Turkish

Journal of the American Society for Information Science and Technology
Tracing the event evolution of terror attacks from on-line news

ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This chapter reports on CMU's work in all the five TDT-1999 tasks, including segmentation (story boundary identification), topic tracking, topic detection, first story detection, and story-link detection. We have addressed these tasks as supervised or unsupervised classification problems, and applied a variety of statistical learning algorithms to each problem for comparison. For segmentation we used exponential language models and decision trees; for topic tracking we used primarily k-nearest-neighbors classification (also language models, decision trees and a variant of the Rocchio approach); for topic detection we used a combination of incremental clustering and agglomerative hierarchical clustering, and for first story detection and story link detection we used a cosine-similarity based measure. We also studied the effect of combining the output of alternative methods for producing joint classification decisions in topic tracking. We found that a combined use of multiple methods typically improved the classification of new topics when compared to using any single method. We examined our approaches with multi-lingual corpora, including stories in English, Mandarin and Spanish, and multi-media corpora consisting of newswire texts and the results of automated speech recognition for broadcast news sources. The methods worked reasonably well under all of the above conditions.