Modern Information Retrieval
Automatic annotation of bibliographical references for descriptive language materials
CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
Hi-index | 0.00 |
In a large-scale project to list bibliographical references to all of the ca 7 000 languages of the world, the need arises to automatically annotated the bibliographical entries with ISO-639-3 language identifiers. The task can be seen as a special case of a more general Information Extraction problem: to classify short text snippets in various languages into a large number of classes. We will explore supervised and unsupervised approaches motivated by distributional characterists of the specific domain and availability of data sets. In all cases, we make use of a database with language names and identifiers. The suggested methods are rigorously evaluated on a fresh representative data set.