A simple approximation algorithm for the weighted matching problem
Information Processing Letters
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
iMAP: discovering complex semantic matches between database schemas
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Schema Matching Using Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Ontology Matching
COMA: a system for flexible combination of schema matching approaches
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Information arbitrage across multi-lingual Wikipedia
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Measuring self-focus bias in community-maintained knowledge repositories
Proceedings of the fourth international conference on Communities and technologies
Cross-lingual alignment and completion of Wikipedia templates
CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Cross-lingual semantic relatedness using encyclopedic knowledge
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Untangling the cross-lingual link structure of Wikipedia
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
MENTA: inducing multilingual taxonomies from wikipedia
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Filling the gaps among DBpedia multilingual chapters for question answering
Proceedings of the 5th Annual ACM Web Science Conference
Hi-index | 0.00 |
Wikipedia has grown to a huge, multi-lingual source of encyclopedic knowledge. Apart from textual content, a large and ever-increasing number of articles feature so-called infoboxes, which provide factual information about the articles' subjects. As the different language versions evolve independently, they provide different information on the same topics. Correspondences between infobox attributes in different language editions can be leveraged for several use cases, such as automatic detection and resolution of inconsistencies in infobox data across language versions, or the automatic augmentation of infoboxes in one language with data from other language versions. We present an instance-based schema matching technique that exploits information overlap in infoboxes across different language editions. As a prerequisite we present a graph-based approach to identify articles in different languages representing the same real-world entity using (and correcting) the interlanguage links in Wikipedia. To account for the untyped nature of infobox schemas, we present a robust similarity measure that can reliably quantify the similarity of strings with mixed types of data. The qualitative evaluation on the basis of manually labeled attribute correspondences between infoboxes in four of the largest Wikipedia editions demonstrates the effectiveness of the proposed approach.