Leveraging context in user-centric entity detection systems

Authors:
Vadim von Brzeski;Utku Irmak;Reiner Kraft
Affiliations:
Yahoo!, Inc., Santa Clara, CA;Yahoo!, Inc., Santa Clara, CA;Yahoo!, Inc., Santa Clara, CA
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 15
Cited 6

The selection recognition agent: instant access to relevant information and operations

Proceedings of the 2nd international conference on Intelligent user interfaces
Collaborative, programmable intelligent agents

Communications of the ACM
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Domain-Specific Keyphrase Extraction

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Query-free news search

WWW '03 Proceedings of the 12th international conference on World Wide Web
Term Weighting Approaches in Automatic Text Retrieval

Term Weighting Approaches in Automatic Text Retrieval
A statistical profile of the Named Entity task

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Implicit queries (IQ) for contextualized search

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Design of the MUC-6 evaluation

MUC6 '95 Proceedings of the 6th conference on Message understanding
SRI International FASTUS system: MUC-6 test results and analysis

MUC6 '95 Proceedings of the 6th conference on Message understanding
Maximum entropy models for named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Improved automatic keyword extraction given more linguistic knowledge

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Finding advertising keywords on web pages

Proceedings of the 15th international conference on World Wide Web
Unity: relevance feedback using user query logs

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Semantically Enhanced Entity Ranking

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Personalizing entity detection and recommendation with a fusion of web log mining techniques

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Computational community interest for ranking

Proceedings of the 18th ACM conference on Information and knowledge management
A scalable machine-learning approach for semi-structured named entity recognition

Proceedings of the 19th international conference on World wide web
Citation recommendation without author supervision

Proceedings of the fourth ACM international conference on Web search and data mining
The semantic web: from representation to realization

Transactions on computational collective intelligence II

Quantified Score

Hi-index	0.00

Visualization

Abstract

A user-centric entity detection system is one in which the primary consumer of the detected entities is a person who can perform actions on the detected entities (e.g. perform a search, view a map, shop, etc.). We contrast this with machine-centric detection systems where the primary consumer of the detected entities is a machine. Machine-centric detection systems typically focus on the quantity of detected entities, measured by precision and recall metrics, with the goal of correctly identifying every single entity in a document. However, the simple precision/recall scores of machine-centric entity detection systems fail to accurately reflect the quality of detected entities in user-centric systems, where users may not necessarily want to "see" every possible entity. We posit that not all of the detected entities in a given piece of text are necessarily relevant to the main topic of the text, nor are they necessarily interesting enough to the user to warrant further action. In fact, presenting all of the detected entities to a user may annoy the user to the point where he decides to turn this capability off completely, an undesirable outcome. Therefore, we propose to measure the quality and utility of user-centric entity detection systems in three core dimensions: the accuracy, the interestingness, and the relevance of the entities it presents to the user. We show that leveraging surrounding context can greatly improve the performance of such systems in all three dimensions by employing novel algorithms for generating a concept vector and for finding concept extensions using search query logs. We extensively evaluate the proposed algorithms within Contextual Shortcuts - a large-scale user-centric entity detection platform - using 1,586 entities detected over 1,519 documents. The results confirm the importance of using context within user-centric entity detection systems, and validate the usefulness of the proposed algorithms by showing how they improve the overall entity detection quality within Contextual Shortcuts.