Representing documents with named entities for story link detection (SLD)

Authors:
Chirag Shah;W. Bruce Croft;David Jensen
Affiliations:
University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 6
Cited 7

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to topic detection and tracking

Topic detection and tracking
Query expansion using random walk models

Proceedings of the 14th ACM international conference on Information and knowledge management

Dynamic hyperparameter optimization for bayesian topical trend analysis

Proceedings of the 18th ACM conference on Information and knowledge management
Use of topicality and information measures to improve document representation for story link detection

ECIR'07 Proceedings of the 29th European conference on IR research
Topic tracking based on keywords dependency profile

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Story link detection based on event words

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Modeling topical trends over continuous time with priors

ISNN'10 Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part II
Unsupervised and supervised learning to evaluate event relatedness based on content mining from social-media streams

Expert Systems with Applications: An International Journal
Learning to explore spatio-temporal impacts for event evaluation on social media

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several information organization, access, and filtering systems can benefit from different kind of document representations than those used in traditional Information Retrieval (IR). Topic Detection and Tracking (TDT) is an example of such an application. In this paper we demonstrate that named entities serve as better choices of units for document representation over all words. In order to test this hypothesis we study the effect of words-based and entity-based representations on Story Link Detection (SLD) - a core task in TDT research. The experiments on TDT corpora show that entity-based representations give significant improvements for SLD. We also propose a mechanism to expand the set of named entities used for document representation, which enhances the performance in some cases. We then take a step further and analyze the limitations of using only named entities for the document representation. Our studies and experiments indicate that adding additional topical terms can help in addressing such limitations.