Boosting and Rocchio applied to text filtering
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Maximum likelihood estimation for filtering thresholds
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Linking entities to a knowledge base with query expansion
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Expert Systems with Applications: An International Journal
Entity linking with effective acronym expansion, instance selection and topic modeling
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Named entity disambiguation in streaming data
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
Filtering a time-ordered corpus for documents that are highly relevant to an entity is a task receiving more and more attention over the years. One application is to reduce the delay between the moment an information about an entity is being first observed and the moment the entity entry in a knowledge base is being updated. Current state-of-the-art approaches are highly supervised and require training examples for each entity monitored. We propose an approach which does not require new training data when processing a new entity. To capture intrinsic characteristics of highly relevant documents our approach relies on three types of features: document centric features, entity profile related features and time features. Evaluated within the framework of the "Knowledge Base Acceleration" track at TREC 2012, it outperforms current state-of-the-art approaches.