Simultaneous Product Attribute Name and Value Extraction from Web Pages

  • Authors:
  • Bo Wu;Xueqi Cheng;Yu Wang;Yan Guo;Linhai Song

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Much work has been done in the area of template independent web data extraction. However, these approaches deal with the attribute value extraction and annotation either in separate phases or constrained to a predefined set of attributes which is highly ineffective. In this paper, we perform the attribute extraction and annotation simultaneously by extracting the attribute name and value pair at the same time. In our approach, we use a co-training algorithm with naive Bayesian classifier to identify the candidate attribute name and value pairs in the unlabeled pages. The candidate attribute name and value pairs are used to detect the specification block of the product in web pages. Finally, all the attribute name and value pairs in the specification block are discovered. We conduct experiments for three types of products and obtain a promising result.