Leveraging context in user-centric entity detection systems

  • Authors:
  • Vadim von Brzeski;Utku Irmak;Reiner Kraft

  • Affiliations:
  • Yahoo!, Inc., Santa Clara, CA;Yahoo!, Inc., Santa Clara, CA;Yahoo!, Inc., Santa Clara, CA

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A user-centric entity detection system is one in which the primary consumer of the detected entities is a person who can perform actions on the detected entities (e.g. perform a search, view a map, shop, etc.). We contrast this with machine-centric detection systems where the primary consumer of the detected entities is a machine. Machine-centric detection systems typically focus on the quantity of detected entities, measured by precision and recall metrics, with the goal of correctly identifying every single entity in a document. However, the simple precision/recall scores of machine-centric entity detection systems fail to accurately reflect the quality of detected entities in user-centric systems, where users may not necessarily want to "see" every possible entity. We posit that not all of the detected entities in a given piece of text are necessarily relevant to the main topic of the text, nor are they necessarily interesting enough to the user to warrant further action. In fact, presenting all of the detected entities to a user may annoy the user to the point where he decides to turn this capability off completely, an undesirable outcome. Therefore, we propose to measure the quality and utility of user-centric entity detection systems in three core dimensions: the accuracy, the interestingness, and the relevance of the entities it presents to the user. We show that leveraging surrounding context can greatly improve the performance of such systems in all three dimensions by employing novel algorithms for generating a concept vector and for finding concept extensions using search query logs. We extensively evaluate the proposed algorithms within Contextual Shortcuts - a large-scale user-centric entity detection platform - using 1,586 entities detected over 1,519 documents. The results confirm the importance of using context within user-centric entity detection systems, and validate the usefulness of the proposed algorithms by showing how they improve the overall entity detection quality within Contextual Shortcuts.