Semantic annotation of web objects using constrained conditional random fields

  • Authors:
  • Yongquan Dong;Qingzhong Li;Yongqing Zheng;Xiaoyang Xu;Yongxin Zhang

  • Affiliations:
  • School of Computer Science and Technology, Shandong University, Jinan, China and School of Computer Science and Technology, Xuzhou Normal University, Xuzhou, China;School of Computer Science and Technology, Shandong University, Jinan, China;School of Computer Science and Technology, Shandong University, Jinan, China;School of Computer Science and Technology, Shandong University, Jinan, China;School of Computer Science and Technology, Shandong University, Jinan, China

  • Venue:
  • WAIM'10 Proceedings of the 11th international conference on Web-age information management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semantic annotation of Web objects is a key problem for Web information extraction. The Web contains an abundance of useful semi-structured information about real world objects, and the empirical study shows that strong sequence characteristics exist for Web information about objects of the same type across different Web sites. Conditional Random Fields (CRFs) are the state of the art approaches taking the sequence characteristics to do better labeling. However, previous CRFs have their limitations and can not deal with a variety of logical constraints between Web object elements efficiently. This paper presents a Constrained Conditional Random Fields (Constrained CRFs) model to do semantic annotation of Web objects. The model incorporates a novel inference procedure based on integer linear programming and extends CRFs to naturally and efficiently support all kinds of logical constraints. Experimental results using a large number of real-world data collected from diverse domains show that the proposed approach can significantly improve the semantic annotation accuracy of web objects.