Explorations within topic tracking and detection

Authors:
James Allan;Victor Lavrenko;Russell Swan
Affiliations:
Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts, Amherst, MA;Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts, Amherst, MA;Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts, Amherst, MA
Venue:
Topic detection and tracking
Year:
2002

Citing 6
Cited 15

Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Extracting significant time varying features from text

Proceedings of the eighth international conference on Information and knowledge management
Automatic generation of overview timelines

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
First story detection in TDT is hard

Proceedings of the ninth international conference on Information and knowledge management
The Design and Implementation of a Part of Speech Tagger for English

The Design and Implementation of a Part of Speech Tagger for English
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing

Capturing term dependencies using a language model based on sentence trees

Proceedings of the eleventh international conference on Information and knowledge management
Flexible intrinsic evaluation of hierarchical clustering for TDT

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Simple Semantics in Topic Detection and Tracking

Information Retrieval
Robust techniques for organizing and retrieving spoken documents

EURASIP Journal on Applied Signal Processing
Relevance models for topic detection and tracking

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Discovering event episodes from news corpora: a temporal-based approach

Proceedings of the 11th International Conference on Electronic Commerce
Temporal feature modification for retrospective categorization

FeatureEng '05 Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing
Cross-language linking of news stories on the web using interlingual topic modelling

Proceedings of the 2nd ACM workshop on Social web search and mining
New event detection and topic tracking in Turkish

Journal of the American Society for Information Science and Technology
Topic detection and tracking with spatio-temporal evidence

ECIR'03 Proceedings of the 25th European conference on IR research
Recommendation in Internet forums and blogs

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
User comments for news recommendation in forum-based social media

Information Sciences: an International Journal
Story link detection based on event words

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
The Study of Content Security for Mobile Internet

Wireless Personal Communications: An International Journal
Representations for multi-document event clustering

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

This chapter presents the system used by the Center for Intelligent Information Retrieval (CIIR) at the University of Massachusetts for its participation in four of the five TDT tasks: tracking, detection, first story detection, and story link detection. For each task, we discuss the parameter setting approach that we used and the results of our system on the test data.For the task of link detection, we look more carefully at score normalization across different languages and media types. We find that we can improve results noticeably though not substantially by normalizing scores differently depending upon the source language. We also consider smoothing the vocabulary in stories using a "query expansion" technique from Information Retrieval to add additional words from the corpus to each story. This results in substantial improvements.In addition, we use TDT evaluation approaches to show that the tracking performance that sites are achieving is what is expected from Information Retrieval technology. We further show that any first story detection system based on a tracking approach is unlikely to be sufficiently accurate for most purposes. Finally, we present an overview of an automatic timeline generation system that we developed using TDT data.