Proceedings of the 10th international conference on World Wide Web
Learning to match ontologies on the Semantic Web
The VLDB Journal — The International Journal on Very Large Data Bases
Integrating web directories by learning their structures
Proceedings of the 16th international conference on World Wide Web
Hi-index | 0.00 |
We address the problem of integrating documents from a source catalog into a master catalog. Current technologies for solving the problem deem it as a flat category integration problem without considering the useful hierarchy information in the catalog, or deal with it hierarchically but without a rigorous model. In contrast, our method is based on correctly identifying relationships among categories, such as Match, Disjoint, SubConcept, SuperConcept, and Overlap, which come from the relations of sets in Set theory. Compared with traditional Match/NotMatch relationship in literature, our approach is more expressive in defining the relationship. The relationships among categories are first learned in a probabilistic way, and then refined by considering the hierarchy context. Our preliminary experiments show that it can help to correctly identify category relationships, and thus increase the accuracy of document integration.