Multilingual schema matching for Wikipedia infoboxes

Authors:
Thanh Nguyen;Viviane Moreira;Huong Nguyen;Hoa Nguyen;Juliana Freire
Affiliations:
University of Utah;UFRGS-Brazil;University of Utah;University of Utah;NYU Poly
Venue:
Proceedings of the VLDB Endowment
Year:
2011

Citing 25
Cited 4

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Schema and ontology matching with COMA++

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Automatic complex schema matching across Web query interfaces: A correlation mining approach

ACM Transactions on Database Systems (TODS)
Identifying Indirect Attribute Correspondences in Multilingual Schemas

DEXA '06 Proceedings of the 17th International Conference on Database and Expert Systems Applications
Data integration with uncertainty

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Introduction to Information Retrieval

Introduction to Information Retrieval
Enriching Multilingual Language Resources by Discovering Missing Cross-Language Links in Wikipedia

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Information arbitrage across multi-lingual Wikipedia

Proceedings of the Second ACM International Conference on Web Search and Data Mining
NAGA: Searching and Ranking Knowledge

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Cross-lingual alignment and completion of Wikipedia templates

CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Exploiting Wikipedia and EuroWordNet to solve Cross-Lingual Question Answering

Information Sciences: an International Journal
WikiBABEL: a wiki-style platform for creation of parallel data

ACLDemos '09 Proceedings of the ACL-IJCNLP 2009 Software Demonstrations
Cross-Lingual Ontology Mapping --- An Investigation of the Impact of Machine Translation

ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
A Wikipedia-based multilingual retrieval model

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
WikiTranslate: query translation for cross-lingual information retrieval using only Wikipedia

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Improving the multilingual user experience of Wikipedia using cross-language name search

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Querying Wikipedia documents and relationships

Procceedings of the 13th International Workshop on the Web and Databases
PruSM: a prudent schema matching approach for web forms

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
GikiCLEF topics and Wikipedia articles: did they blend?

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Semantic QA for encyclopaedic questions: EQUAL in GikiCLEF

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Holistic schema matching for web query interfaces

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Overview of the CLEF 2006 multilingual question answering track

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Knowledge harvesting in the big-data era

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Automatic Mapping of Wikipedia Templates for Fast Deployment of Localised DBpedia Datasets

Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies
Towards an enhanced and adaptable ontology by distilling and assembling online encyclopedias

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
WHAD: Wikipedia historical attributes data

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research has taken advantage of Wikipedia's multi-lingualism as a resource for cross-language information retrieval and machine translation, as well as proposed techniques for enriching its cross-language structure. The availability of documents in multiple languages also opens up new opportunities for querying structured Wikipedia content, and in particular, to enable answers that straddle different languages. As a step towards supporting such queries, in this paper, we propose a method for identifying mappings between attributes from infoboxes that come from pages in different languages. Our approach finds mappings in a completely automated fashion. Because it does not require training data, it is scalable: not only can it be used to find mappings between many language pairs, but it is also effective for languages that are under-represented and lack sufficient training samples. Another important benefit of our approach is that it does not depend on syntactic similarity between attribute names, and thus, it can be applied to language pairs that have distinct morphologies. We have performed an extensive experimental evaluation using a corpus consisting of pages in Portuguese, Vietnamese, and English. The results show that not only does our approach obtain high precision and recall, but it also outperforms state-of-the-art techniques. We also present a case study which demonstrates that the multilingual mappings we derive lead to substantial improvements in answer quality and coverage for structured queries over Wikipedia content.