A systematic comparison of various statistical alignment models
Computational Linguistics
Architectural elements of language engineering robustness
Natural Language Engineering
Inducing multilingual text analysis tools via robust projection across aligned corpora
HLT '01 Proceedings of the first international conference on Human language technology research
Improving Communication in E-democracy Using Natural Language Processing
IEEE Intelligent Systems
Exploring the sawa corpus: collection and deployment of a parallel corpus English--Swahili
Language Resources and Evaluation
Data-Driven part-of-speech tagging of kiswahili
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
A survey of methods to ease the development of highly multilingual text mining applications
Language Resources and Evaluation
A survey of methods to ease the development of highly multilingual text mining applications
Language Resources and Evaluation
Cross-lingual geo-parsing for non-structured data
Proceedings of the 7th Workshop on Geographic Information Retrieval
Hi-index | 0.00 |
The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and that extract named entities and quotations (reported speech) from twenty languages. In this paper, we describe the recent effort of adding the African Bantu language Swahili to EMM. EMM is designed in an entirely modular way, allowing plugging in a new language by providing the language-specific resources for that language. We thus describe the type of language-specific resources needed, the effort involved, and ways of boot-strapping the generation of these resources in order to keep the effort of adding a new language to a minimum. The text analysis applications pursued in our efforts include clustering, classification, recognition and disambiguation of named entities (persons, organisations and locations), recognition and normalisation of date expressions, as well as the identification of reported speech quotations by and about people.