On Precision and Recall of Multi-Attribute Data Extraction from Semistructured Sources

  • Authors:
  • Guizhen Yang;Saikat Mukherjee;I. V. Ramakrishnan

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Machine learning techniques for data extraction fromsemistructured sources exhibit different precision and recallcharacteristics. However to date the formal relationship betweenlearning algorithms and their impact on these twometrics remains unexplored. This paper proposes a formalizationof precision and recall of extraction and investigatesthe complexity-theoretic aspects of learning algorithms formulti-attribute data extraction based on this formalism. Weshow that there is a tradeoff between precision/recall of extractionand computational efficiency and present experimentalresults to demonstrate the practical utility of theseconcepts in designing scalable data extraction algorithmsfor improving recall without compromising on precision.