Combining Similarity and Distribution Features to Match Attributes

  • Authors:
  • Yu Wang;Binxing Fang;Yan Guo

  • Affiliations:
  • -;-;-

  • Venue:
  • WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Web contains much useful semistructued information which can be organized into web objects, and many of them are commercially valuable. The inner structures of these web objects are highly heterogeneous that web objects from different web sites cover different subsets of useful attributes. The complete set of attributes can be mined from web pages through attribute extraction algorithms. However, to construct high quality web object schema, some mined attributes should be merged since they are synonyms for the same concepts. Our empirical study shows that features used by traditional schema matching and deep web integration methods are usually domain specific, so they are not applicable to match attributes extracted from the Web. To overcome this problem, this paper proposes new features to depict attribute distribution characteristics and uses machine learning techniques to combine attribute distribution characteristics with attribute similarity features. We empirically compare the proposed method with existing methods use other features, and the results show the effectiveness of our method.