Mining the web to create minority language corpora
Proceedings of the tenth international conference on Information and knowledge management
Computational Linguistics - Special issue on web as corpus
The surprise language exercises
ACM Transactions on Asian Language Information Processing (TALIP)
Linguistic resource creation for research and technology development: A recent experiment
ACM Transactions on Asian Language Information Processing (TALIP)
Experiments with a Hindi-to-English transfer-based MT system under a miserly data scenario
ACM Transactions on Asian Language Information Processing (TALIP)
Universal grammar and lexis for quick ramp-up of MT systems
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Computational Linguistics
The Proposition Bank: An Annotated Corpus of Semantic Roles
Computational Linguistics
The human language project: building a Universal Corpus of the world's languages
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A scalable method for preserving oral literature from small languages
ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
A smartphone-based ASR data collection tool for under-resourced languages
Speech Communication
Hi-index | 0.00 |
The languages that are most commonly subject to linguistic annotation on a large scale tend to be those with the largest populations or with recent histories of linguistic scholarship. In this paper we discuss the problems associated with lower-density languages in the context of the development of linguistically annotated resources. We frame our work with three key questions regarding the definition of lower-density languages; increasing available resources and reducing data requirements. A number of steps forward are identified for increasing the number lower-density language corpora with linguistic annotations.