Combining Similarity and Distribution Features to Match Attributes

Authors:
Yu Wang;Binxing Fang;Yan Guo
Affiliations:
-;-;-
Venue:
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Year:
2009

Citing 7
Cited 0

Statistical Schema Integration across the Deep Web

Statistical Schema Integration across the Deep Web
Automatic integration of Web search interfaces with WISE-Integrator

The VLDB Journal — The International Journal on Very Large Data Bases
A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
Automatic complex schema matching across Web query interfaces: A correlation mining approach

ACM Transactions on Database Systems (TODS)
Towards Deeper Understanding of the Search Interfaces of the Deep Web

World Wide Web
Using structured text for large-scale attribute extraction

Proceedings of the 17th ACM conference on Information and knowledge management
A survey of schema-based matching approaches

Journal on Data Semantics IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web contains much useful semistructued information which can be organized into web objects, and many of them are commercially valuable. The inner structures of these web objects are highly heterogeneous that web objects from different web sites cover different subsets of useful attributes. The complete set of attributes can be mined from web pages through attribute extraction algorithms. However, to construct high quality web object schema, some mined attributes should be merged since they are synonyms for the same concepts. Our empirical study shows that features used by traditional schema matching and deep web integration methods are usually domain specific, so they are not applicable to match attributes extracted from the Web. To overcome this problem, this paper proposes new features to depict attribute distribution characteristics and uses machine learning techniques to combine attribute distribution characteristics with attribute similarity features. We empirically compare the proposed method with existing methods use other features, and the results show the effectiveness of our method.