Corpora for topic detection and tracking

Authors:
Christopher Cieri;Stephanie Strassel;David Graff;Nii Martey;Kara Rennert;Mark Liberman
Affiliations:
Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA;Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA;Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA;Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA;Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA;Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA
Venue:
Topic detection and tracking
Year:
2002

Citing 0
Cited 15

Improving realism of topic tracking evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Flexible intrinsic evaluation of hierarchical clustering for TDT

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Simple Semantics in Topic Detection and Tracking

Information Retrieval
A month to topic detection and tracking in Hindi

ACM Transactions on Asian Language Information Processing (TALIP)
Forming test collections with no system pooling

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Building an information retrieval test collection for spontaneous conversational speech

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Investigations on event evolution in TDT

NAACLstudent '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Proceedings of the HLT-NAACL 2003 student research workshop - Volume 3
Evaluation of resources for question answering evaluation

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Building a reusable test collection for question answering

Journal of the American Society for Information Science and Technology - Research Articles
Relevance models for topic detection and tracking

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Intelligent scientific authoring tools: Interactive data mining for constructive uses of citation networks

Information Processing and Management: an International Journal
New event detection and topic tracking in Turkish

Journal of the American Society for Information Science and Technology
Feeding the world: a comprehensive dataset and analysis of a real world snapshot of web feeds

Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Named entity patterns across news domains

FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Clustering in extreme learning machine feature space

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The TDT corpora, developed to support the DARPA-sponsored program in Topic Detection and Tracking, combine data collected over a nine month period from 8 English and 3 Chinese sources. The published corpora contain audio, reference text including written news text and transcripts of the broadcast audio, boundary tables segmenting the broadcasts into stories and relevance tables resulting from millions of human judgments. Sections of the corpora have undergone topic-story, first story and story link annotation. Both the TDT-2 and TDT-3 text corpora and the accompanying broadcast audio are now available from the Linguistic Data Consortium. This paper described the raw material collected for the corpora, the annotation of that material to prepare it for research use and the formats in which it is distributed. Special attention is paid to the quality control measures developed for these data sets.