Automatically Extracting Form Labels

  • Authors:
  • Hoa Nguyen;Eun Yong Kang;Juliana Freire

  • Affiliations:
  • School of Computing, University of Utah, Salt Lake City, UT, USA. thanhhoa@cs.utah.edu;School of Computing, University of Utah, Salt Lake City, UT, USA. ekang@cs.utah.edu;School of Computing, University of Utah, Salt Lake City, UT, USA. juliana@cs.utah.edu

  • Venue:
  • ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a machine-learning-based approach for extracting attribute labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to retrieve and integrate data that reside in online databases and that are hidden behind form interfaces, including schema matching and clustering, and hidden-Web crawlers. Whereas previous approaches to this problem have relied on heuristics and manually specified extraction rules, our technique makes use of learning classifiers to identify form labels. Our preliminary experiments show this approach is promising and has high accuracy.