Information extraction from web tables

  • Authors:
  • Mahmoud Shaker;Hamidah Ibrahim;Aida Mustapha;Lili Nurliyana Abdullah

  • Affiliations:
  • Universiti Putra Malaysia, Serdang, Malaysia;Universiti Putra Malaysia, Serdang, Malaysia;Universiti Putra Malaysia, Serdang, Malaysia;Universiti Putra Malaysia, Serdang, Malaysia

  • Venue:
  • Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays, many users use web search engines to find and gather information. User faces an increasing amount of various web pages information sources. The issue of correlating, integrating and presenting related information to users becomes important. When a user uses a search engine such as Yahoo and Google to seek a specific information, the results are not only information about the availability of the desired information, but also information about other pages on which the desired information is mentioned. Extracting information from the web pages also becomes very important because the massive and increasing amount of diverse web pages information sources in the Internet that are available to users, and the variety of web pages making the process of information extraction from web a challenging problem. This paper proposes an approach for extracting information from web tables based on standard classifications. The proposed approach consists of four main phases, namely: (i) pre-processing, (ii) extraction, (iii) classification, and (iv) simplification. The proposed approach is evaluated by conducting experiments on a number of web pages from the Nokia products domain, as to the best of our knowledge this is the only product that has complete and complex standard classifiers.