CoNLL-X shared task on multilingual dependency parsing
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
The human language project: building a Universal Corpus of the world's languages
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Hi-index | 0.00 |
We describe the design of a comparable corpus that spans all of the world's languages and facilitates large-scale cross-linguistic processing. This Universal Corpus consists of text collections aligned at the document and sentence level, multilingual wordlists, and a small set of morphological, lexical, and syntactic annotations. The design encompasses submission, storage, and access. Submission preserves the integrity of the work, allows asynchronous updates, and facilitates scholarly citation. Storage employs a cloud-hosted filestore containing normalized source data together with a database of texts and annotations. Access is permitted to the filestore, the database, and an application programming interface. All aspects of the Universal Corpus are open, and we invite community participation in its design and implementation, and in supplying and using its data.