Information extraction from web tables

Authors:
Mahmoud Shaker;Hamidah Ibrahim;Aida Mustapha;Lili Nurliyana Abdullah
Affiliations:
Universiti Putra Malaysia, Serdang, Malaysia;Universiti Putra Malaysia, Serdang, Malaysia;Universiti Putra Malaysia, Serdang, Malaysia;Universiti Putra Malaysia, Serdang, Malaysia
Venue:
Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services
Year:
2009

Citing 8
Cited 0

Towards domain-independent information extraction from web tables

Proceedings of the 16th international conference on World Wide Web
ClusTex: Information Extraction from HTML Pages

AINAW '07 Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops - Volume 01
Information Extraction from Web Pages Using Presentation Regularities and Domain Knowledge

World Wide Web
Clustering web documents with tables for information extraction

Proceedings of the 4th international conference on Knowledge capture
Discriminating Meaningful Web Tables from Decorative Tables Using a Composite Kernel

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Ontology-based information extraction for business intelligence

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
A method for web information extraction

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Employing Clustering Techniques for Automatic Information Extraction From HTML Documents

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays, many users use web search engines to find and gather information. User faces an increasing amount of various web pages information sources. The issue of correlating, integrating and presenting related information to users becomes important. When a user uses a search engine such as Yahoo and Google to seek a specific information, the results are not only information about the availability of the desired information, but also information about other pages on which the desired information is mentioned. Extracting information from the web pages also becomes very important because the massive and increasing amount of diverse web pages information sources in the Internet that are available to users, and the variety of web pages making the process of information extraction from web a challenging problem. This paper proposes an approach for extracting information from web tables based on standard classifications. The proposed approach consists of four main phases, namely: (i) pre-processing, (ii) extraction, (iii) classification, and (iv) simplification. The proposed approach is evaluated by conducting experiments on a number of web pages from the Nokia products domain, as to the best of our knowledge this is the only product that has complete and complex standard classifiers.