Statistical Schema Integration across the Deep Web
Statistical Schema Integration across the Deep Web
Automatic integration of Web search interfaces with WISE-Integrator
The VLDB Journal — The International Journal on Very Large Data Bases
A support vector method for multivariate performance measures
ICML '05 Proceedings of the 22nd international conference on Machine learning
Automatic complex schema matching across Web query interfaces: A correlation mining approach
ACM Transactions on Database Systems (TODS)
Using structured text for large-scale attribute extraction
Proceedings of the 17th ACM conference on Information and knowledge management
A survey of schema-based matching approaches
Journal on Data Semantics IV
Hi-index | 0.00 |
The Web contains much useful semistructued information which can be organized into web objects, and many of them are commercially valuable. The inner structures of these web objects are highly heterogeneous that web objects from different web sites cover different subsets of useful attributes. The complete set of attributes can be mined from web pages through attribute extraction algorithms. However, to construct high quality web object schema, some mined attributes should be merged since they are synonyms for the same concepts. Our empirical study shows that features used by traditional schema matching and deep web integration methods are usually domain specific, so they are not applicable to match attributes extracted from the Web. To overcome this problem, this paper proposes new features to depict attribute distribution characteristics and uses machine learning techniques to combine attribute distribution characteristics with attribute similarity features. We empirically compare the proposed method with existing methods use other features, and the results show the effectiveness of our method.