Post processing wrapper generated tables for labeling anonymous datasets

  • Authors:
  • Emdad Ahmed;Hasan M. Jamil

  • Affiliations:
  • Wayne State University, Detroit, MI, USA;Wayne State University, Detroit, MI, USA

  • Venue:
  • Proceedings of the eleventh international workshop on Web information and data management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A large number of wrappers generate tables without column names for human consumption because the meaning of the columns are apparent from the context and easy for humans to understand, but in emerging applications, labels are needed for autonomous assignment and schema mapping where machine try to understand the tables. Autonomous label assignment is critical in volume data processing where ad hoc mediation, extraction and querying is involved. We propose an algorithm Lads for Labeling Anonymous Datasets, which can holistically label tabular web document. The algorithm has been tested on anonymous datasets from a number of sites, e.g music, movie, political, demographic, athletic obtained through different search engines such as Google, Yahoo and MSN. The comparative probabilities of attributes being candidate labels are presented which seem to be very promising, achieved as high as 93% probability of assigning good label to anonymous attribute. To the best of our knowledge, this is the first of its kind for label assignment based on multiple search engines' recommendation.