Wikipedia driven autonomous label assignment in wrapper induced tables with missing column names

  • Authors:
  • Mohammad Shafkat Amin;Anupam Bhattacharjee;Hasan Jamil

  • Affiliations:
  • Wayne State University;Wayne State University;Wayne State University

  • Venue:
  • Proceedings of the 2010 ACM Symposium on Applied Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the volume of information available on the internet is growing exponentially, it is clear that most of this information will have to be processed and digested by computers to produce useful information for human consumption. Unfortunately, most web contents are currently designed for direct human consumption in which it is assumed that a human will decipher the information presented to him in some context and will be able to connect the missing dots, if any. In particular, information presented in some tabular form often does not accompany descriptive titles or column names similar to attribute names in tables. While such omissions are not really an issue for humans, it is truly hard to extract information in autonomous systems in which a machine is expected to understand the meaning of the table presented and extract the right information in the context of the query. It is even more difficult when the information needed is distributed across the globe and involve semantic heterogeneity. In this paper, our goal is to address the issue of how to interpret tables with missing column names by developing a method for the assignment of attributes names in an arbitrary table extracted from the web in a fully autonomous manner. We propose a novel approach by leveraging Wikipedia for the first time for column name discovery for the purpose of table annotation. We show that this leads to an improved likelihood of capturing the context and interpretation of the table accurately and producing a semantically meaningful query response.