Managing information disparity in multilingual document collections

Authors:
Kevin Duh;Ching-Man Au Yeung;Tomoharu Iwata;Masaaki Nagata
Affiliations:
NTT Communication Science Laboratories, Japan;NTT Communication Science Laboratories, Huawei, Hong Kong;NTT Communication Science Laboratories, Japan;NTT Communication Science Laboratories, Japan
Venue:
ACM Transactions on Speech and Language Processing (TSLP)
Year:
2013

Citing 26
Cited 0

Novelty detection: a review—part 1: statistical approaches

Signal Processing
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Probabilistic text structuring: experiments with sentence ordering

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Sentence alignment for monolingual comparable corpora

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Introduction to Information Retrieval

Introduction to Information Retrieval
What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Information arbitrage across multi-lingual Wikipedia

Proceedings of the Second ACM International Conference on Web Search and Data Mining
A statistical approach to crosslingual natural language tasks

Journal of Algorithms
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatic cost estimation for tree edit distance using particle swarm optimization

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Automatically generating Wikipedia articles: a structure-aware approach

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Polylingual topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
The Cross-Lingual Wiki Engine: enabling collaboration across language barriers

WikiSym '08 Proceedings of the 4th International Symposium on Wikis
Towards cross-lingual textual entailment

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
An open-source package for recognizing textual entailment

ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
Using bilingual parallel corpora for cross-lingual textual entailment

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
CoSyne: a framework for multilingual content synchronization of wikis

Proceedings of the 7th International Symposium on Wikis and Open Collaboration
Divide and conquer: crowdsourcing the creation of cross-lingual textual entailment corpora

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information disparity is a major challenge with multilingual document collections. When documents are dynamically updated in a distributed fashion, information content among different language editions may gradually diverge. We propose a framework for assisting human editors to manage this information disparity, using tools from machine translation and machine learning. Given source and target documents in two different languages, our system automatically identifies information nuggets that are new with respect to the target and suggests positions to place their translations. We perform both real-world experiments and large-scale simulations on Wikipedia documents and conclude our system is effective in a variety of scenarios.