Foundations of statistical natural language processing
Foundations of statistical natural language processing
Labeling images with a computer game
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Computational Linguistics
Proceedings of the 2006 international symposium on Wikis
The Pyramid Method: Incorporating human content selection variation in summarization evaluation
ACM Transactions on Speech and Language Processing (TSLP)
WikiBABEL: a wiki-style platform for creation of parallel data
ACLDemos '09 Proceedings of the ACL-IJCNLP 2009 Software Demonstrations
Enabling monolingual translators: post-editing vs. options
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Human–Computer Interaction and Global Development
Foundations and Trends in Human-Computer Interaction
VidWiki: enabling the crowd to improve the legibility of online educational videos
Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Hi-index | 0.00 |
In this paper, we present a collaborative framework -- wikiBABEL -- for the efficient and effective creation of multilingual content by a community of users. The wikiBABEL framework leverages the availability of fairly stable content in a source language (typically, English) and a reasonable and not necessarily perfect machine translation system between the source language and a given target language, to create the rough initial content in the target language that is published in a collaborative platform. The platform provides an intuitive user interface and a set of linguistic tools for collaborative correction of the rough content by a community of users, aiding creation of clean content in the target language. We describe the architectural components implementing the wikiBABEL framework, namely, the systems for source and target language content management, mechanisms for coordination and collaboration and intuitive user interface for multilingual editing and review. Importantly, we discuss the integrated linguistic resources and tools, such as, bilingual dictionaries, machine translation and transliteration systems, etc., to help the users during the content correction and creation process. In addition, we analyze and present the prime factors -- user-interface features or linguistic tools and resources -- that significantly influence the user experiences in multilingual content creation. In addition to the creation of multilingual content, another significant motivation for the wikiBABEL framework is the creation of parallel corpora as a by-product. Parallel linguistic corpora are very valuable resources for both Statistical Machine Translation (SMT) and Crosslingual Information Retrieval (CLIR) research, and may be mined effectively from multilingual data with significant content overlap, as may be created in the wikiBABEL framework. Creation of parallel corpora by professional translators is very expensive, and hence the SMT and CLIR research have been largely confined to a handful of languages. Our attempt to engage the large and diverse Internet user population may aid creation of such linguistic resources economically, and may make computational linguistics research possible and practical in many languages of the world.