Efficiently linking text documents with relevant structured information

  • Authors:
  • Venkatesan T. Chakaravarthy;Himanshu Gupta;Prasan Roy;Mukesh Mohania

  • Affiliations:
  • IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India

  • Venue:
  • VLDB '06 Proceedings of the 32nd international conference on Very large data bases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of interlinking critical business information distributed across structured and unstructured data sources. We present a novel system, called EROCS, for linking a given text document with relevant structured data. EROCS views the structured data as a predefined set of "entities" and identifies the entities that best match the given document. EROCS also embeds the identified entities in the document, effectively creating links between the structured data and segments within the document. Unlike prior approaches, EROCS identifies such links even when the relevant entity is not explicitly mentioned in the document. EROCS uses an efficient algorithm that performs this task keeping the amount of information retrieved from the database at a minimum. Our evaluation shows that EROCS achieves high accuracy with reasonable overheads.