Frontiers in linguistic annotation for lower-density languages

Authors:
Mike Maxwell;Baden Hughes
Affiliations:
University of Maryland;The University of Melbourne
Venue:
LAC '06 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
Year:
2006

Citing 9
Cited 4

Mining the web to create minority language corpora

Proceedings of the tenth international conference on Information and knowledge management
The Web as a parallel corpus

Computational Linguistics - Special issue on web as corpus
The surprise language exercises

ACM Transactions on Asian Language Information Processing (TALIP)
Linguistic resource creation for research and technology development: A recent experiment

ACM Transactions on Asian Language Information Processing (TALIP)
Experiments with a Hindi-to-English transfer-based MT system under a miserly data scenario

ACM Transactions on Asian Language Information Processing (TALIP)
MT for Minority Languages UsingElicitation-Based Learning of SyntacticTransfer Rules

Machine Translation
Universal grammar and lexis for quick ramp-up of MT systems

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics

Implementing NLP projects for noncentral languages: instructions for funding bodies, strategies for developers

Machine Translation
The human language project: building a Universal Corpus of the world's languages

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A scalable method for preserving oral literature from small languages

ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
A smartphone-based ASR data collection tool for under-resourced languages

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

The languages that are most commonly subject to linguistic annotation on a large scale tend to be those with the largest populations or with recent histories of linguistic scholarship. In this paper we discuss the problems associated with lower-density languages in the context of the development of linguistically annotated resources. We frame our work with three key questions regarding the definition of lower-density languages; increasing available resources and reducing data requirements. A number of steps forward are identified for increasing the number lower-density language corpora with linguistic annotations.