Linguistic resource creation for research and technology development: A recent experiment

Authors:
Stephanie Strassel;Mike Maxwell;Christopher Cieri
Affiliations:
University of Pennsylvania Linguistic Data Consortium, Philadelphia PA;University of Pennsylvania Linguistic Data Consortium, Philadelphia PA;University of Pennsylvania Linguistic Data Consortium, Philadelphia PA
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2003

Citing 2
Cited 3

Mining the web to create minority language corpora

Proceedings of the tenth international conference on Information and knowledge management
Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6

Cross-language information retrieval: the way ahead

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
WebKhoj: Indian language IR from multiple character encodings

Proceedings of the 15th international conference on World Wide Web
Frontiers in linguistic annotation for lower-density languages

LAC '06 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006

Quantified Score

Hi-index	0.00

Visualization

Abstract

Advances in statistical machine learning encourage language-independent approaches to linguistic technology development. Experiments in "porting" technologies to handle new natural languages have revealed a great potential for multilingual computing, but also a frustrating lack of linguistic resources for most languages. Recent efforts to address the lack of available resources have focused either on intensive resource development for a small number of languages or development of technologies for rapid porting. The Linguistic Data Consortium recently participated in an experiment falling primarily under the first approach, the surprise language exercise. This article describes linguistic resource creation within this context, including the overall methodology for surveying and collecting language resources, as well as details of the resources developed during the exercise. The article concludes with discussion of a new approach to solving the problem of limited linguistic resources, one that has recently proven effective in identifying core linguistic resources for less common studied languages.