Information arbitrage across multi-lingual Wikipedia

Authors:
Eytan Adar;Michael Skinner;Daniel S. Weld
Affiliations:
University of Washington, Seattle, WA;Google, Seattle, WA;University of Washington, Seattle, WA
Venue:
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Year:
2009

Citing 8
Cited 32

Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Managing Duplicated Code with Linked Editing

VLHCC '04 Proceedings of the 2004 IEEE Symposium on Visual Languages - Human Centric Computing
Discovering missing links in Wikipedia

Proceedings of the 3rd international workshop on Link discovery
Autonomously semantifying wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
Information extraction from Wikipedia: moving down the long tail

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Intelligence in wikipedia

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3

Cross-lingual alignment and completion of Wikipedia templates

CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Directions for exploiting asymmetries in multilingual Wikipedia

CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Improving the extraction of bilingual terminology from Wikipedia

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Towards a universal wordnet by learning from combined evidence

Proceedings of the 18th ACM conference on Information and knowledge management
Compiling a massive, multilingual dictionary via probabilistic inference

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Panlingual lexical translation via probabilistic inference

Artificial Intelligence
Untangling the cross-lingual link structure of Wikipedia

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
MENTA: inducing multilingual taxonomies from wikipedia

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Assisting cross-lingual editing in collaborative writing

ACM SIGWEB Newsletter
Providing cross-lingual editing assistance to Wikipedia editors

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Multilingual document mining and navigation using self-organizing maps

Information Processing and Management: an International Journal
Identifying parallel documents from a large bilingual collection of texts: application to parallel article extraction in Wikipedia

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Multilingual schema matching for Wikipedia infoboxes

Proceedings of the VLDB Endowment
Unsupervised language-independent name translation mining from Wikipedia infoboxes

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Supporting collaboration in Wikipedia between language communities

Proceedings of the 4th international conference on Intercultural Collaboration
Chapter 3: search for knowledge

Search Computing
Omnipedia: bridging the wikipedia language gap

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Extracting difference information from multilingual wikipedia

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

Artificial Intelligence
Crosslingual distant supervision for extracting relations of different complexity

Proceedings of the 21st ACM international conference on Information and knowledge management
Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence
Learning multilingual named entity recognition from Wikipedia

Artificial Intelligence
Good quality complementary information for multilingual wikipedia

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Extracting lack of information on Wikipedia by comparing multilingual articles

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Managing information disparity in multilingual document collections

ACM Transactions on Speech and Language Processing (TSLP)
In search of the ur-Wikipedia: universality, similarity, and translation in the Wikipedia inter-language link network

Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration
Filling the gaps among DBpedia multilingual chapters for question answering

Proceedings of the 5th Annual ACM Web Science Conference
Cross-lingual entity matching and infobox alignment in Wikipedia

Information Systems
An approach for using Wikipedia to measure the flow of trends across countries

Proceedings of the 22nd international conference on World Wide Web companion
Identifying multilingual Wikipedia articles based on cross language similarity and activity

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Extracting complementary information from Wikipedia articles of different languages

International Journal of Business Intelligence and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapid globalization of Wikipedia is generating a parallel, multi-lingual corpus of unprecedented scale. Pages for the same topic in many different languages emerge both as a result of manual translation and independent development. Unfortunately, these pages may appear at different times, vary in size, scope, and quality. Furthermore, differential growth rates cause the conceptual mapping between articles in different languages to be both complex and dynamic. These disparities provide the opportunity for a powerful form of information arbitrage--leveraging articles in one or more languages to improve the content in another. Analyzing four large language domains (English, Spanish, French, and German), we present Ziggurat, an automated system for aligning Wikipedia infoboxes, creating new infoboxes as necessary, filling in missing information, and detecting discrepancies between parallel pages. Our method uses self-supervised learning and our experiments demonstrate the method's feasibility, even in the absence of dictionaries.