Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Managing Duplicated Code with Linked Editing
VLHCC '04 Proceedings of the 2004 IEEE Symposium on Visual Languages - Human Centric Computing
Discovering missing links in Wikipedia
Proceedings of the 3rd international workshop on Link discovery
Autonomously semantifying wikipedia
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Automatically refining the wikipedia infobox ontology
Proceedings of the 17th international conference on World Wide Web
Information extraction from Wikipedia: moving down the long tail
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Cross-lingual alignment and completion of Wikipedia templates
CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Directions for exploiting asymmetries in multilingual Wikipedia
CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Improving the extraction of bilingual terminology from Wikipedia
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Towards a universal wordnet by learning from combined evidence
Proceedings of the 18th ACM conference on Information and knowledge management
Compiling a massive, multilingual dictionary via probabilistic inference
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Panlingual lexical translation via probabilistic inference
Artificial Intelligence
Untangling the cross-lingual link structure of Wikipedia
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
MENTA: inducing multilingual taxonomies from wikipedia
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Assisting cross-lingual editing in collaborative writing
ACM SIGWEB Newsletter
Providing cross-lingual editing assistance to Wikipedia editors
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Multilingual document mining and navigation using self-organizing maps
Information Processing and Management: an International Journal
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Multilingual schema matching for Wikipedia infoboxes
Proceedings of the VLDB Endowment
Unsupervised language-independent name translation mining from Wikipedia infoboxes
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Supporting collaboration in Wikipedia between language communities
Proceedings of the 4th international conference on Intercultural Collaboration
Chapter 3: search for knowledge
Search Computing
Omnipedia: bridging the wikipedia language gap
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Extracting difference information from multilingual wikipedia
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Crosslingual distant supervision for extracting relations of different complexity
Proceedings of the 21st ACM international conference on Information and knowledge management
Collaboratively built semi-structured content and Artificial Intelligence: The story so far
Artificial Intelligence
Learning multilingual named entity recognition from Wikipedia
Artificial Intelligence
Good quality complementary information for multilingual wikipedia
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Extracting lack of information on Wikipedia by comparing multilingual articles
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Managing information disparity in multilingual document collections
ACM Transactions on Speech and Language Processing (TSLP)
Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration
Filling the gaps among DBpedia multilingual chapters for question answering
Proceedings of the 5th Annual ACM Web Science Conference
Cross-lingual entity matching and infobox alignment in Wikipedia
Information Systems
An approach for using Wikipedia to measure the flow of trends across countries
Proceedings of the 22nd international conference on World Wide Web companion
Identifying multilingual Wikipedia articles based on cross language similarity and activity
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Extracting complementary information from Wikipedia articles of different languages
International Journal of Business Intelligence and Data Mining
Hi-index | 0.00 |
The rapid globalization of Wikipedia is generating a parallel, multi-lingual corpus of unprecedented scale. Pages for the same topic in many different languages emerge both as a result of manual translation and independent development. Unfortunately, these pages may appear at different times, vary in size, scope, and quality. Furthermore, differential growth rates cause the conceptual mapping between articles in different languages to be both complex and dynamic. These disparities provide the opportunity for a powerful form of information arbitrage--leveraging articles in one or more languages to improve the content in another. Analyzing four large language domains (English, Spanish, French, and German), we present Ziggurat, an automated system for aligning Wikipedia infoboxes, creating new infoboxes as necessary, filling in missing information, and detecting discrepancies between parallel pages. Our method uses self-supervised learning and our experiments demonstrate the method's feasibility, even in the absence of dictionaries.