GATE: an architecture for development of robust HLT applications
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
The IMDI metadata framework, its current application and future direction
International Journal of Metadata, Semantics and Ontologies
Semisupervised Learning for Computational Linguistics
Semisupervised Learning for Computational Linguistics
Frontiers in linguistic annotation for lower-density languages
LAC '06 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
Natural Language Processing with Python
Natural Language Processing with Python
Statistical Machine Translation
Statistical Machine Translation
A scalable method for preserving oral literature from small languages
ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
Subjective natural language problems: motivations, applications, characterizations, and implications
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Towards a data model for the Universal Corpus
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Unsupervised multilingual learning
Unsupervised multilingual learning
A smartphone-based ASR data collection tool for under-resourced languages
Speech Communication
Hi-index | 0.00 |
We present a grand challenge to build a corpus that will include all of the world's languages, in a consistent structure that permits large-scale cross-linguistic processing, enabling the study of universal linguistics. The focal data types, bilingual texts and lexicons, relate each language to one of a set of reference languages. We propose that the ability to train systems to translate into and out of a given language be the yardstick for determining when we have successfully captured a language. We call on the computational linguistics community to begin work on this Universal Corpus, pursuing the many strands of activity described here, as their contribution to the global effort to document the world's linguistic heritage before more languages fall silent.