A weakly-supervised detection of entity central documents in a stream

Authors:
Ludovic Bonnefoy;Vincent Bouvier;Patrice Bellot
Affiliations:
University of Avignon CERI-LIA / iSmart, Avignon, France;Aix-Marseille University / LSIS CNRS / Kware, Marseille, France;Aix-Marseille University / LSIS CNRS, Marseille, France
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 6
Cited 0

Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Maximum likelihood estimation for filtering thresholds

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Linking entities to a knowledge base with query expansion

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Mining spatio-temporal information on microblogging streams using a density-based online clustering method

Expert Systems with Applications: An International Journal
Entity linking with effective acronym expansion, instance selection and topic modeling

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Named entity disambiguation in streaming data

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Filtering a time-ordered corpus for documents that are highly relevant to an entity is a task receiving more and more attention over the years. One application is to reduce the delay between the moment an information about an entity is being first observed and the moment the entity entry in a knowledge base is being updated. Current state-of-the-art approaches are highly supervised and require training examples for each entity monitored. We propose an approach which does not require new training data when processing a new entity. To capture intrinsic characteristics of highly relevant documents our approach relies on three types of features: document centric features, entity profile related features and time features. Evaluated within the framework of the "Knowledge Base Acceleration" track at TREC 2012, it outperforms current state-of-the-art approaches.