Labeling data extracted from the web

  • Authors:
  • Altigran S. Da Silva;Denilson Barbosa;João M. B. Cavalcanti;Marco A. S. Sevalho

  • Affiliations:
  • Universidade Federal do Amazonas, Manaus, AM, Brazi;University of Calgary, Calgary, AB, Canada;Universidade Federal do Amazonas, Manaus, AM, Brazi;Universidade Federal do Amazonas, Manaus, AM, Brazi

  • Venue:
  • OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider finding descriptive labels for anonymous, structured datasets, such as those produced by state-of-the-art Web wrappers. We give a probabilistic model to estimate the affinity between attributes and labels, and describe a method that uses a Web search engine to populate the model. We discuss a method for finding good candidate labels for unlabeled datasets. Ours is the first unsupervised labeling method that does not rely on mining the HTML pages containing the data. Experimental results with data from 8 different domains show that our methods achieve high accuracy even with very few search engine accesses.