Generic text summarization using relevance measure and latent semantic analysis
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Architectural elements of language engineering robustness
Natural Language Engineering
Language-specific models in multilingual topic tracking
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Inducing multilingual text analysis tools via robust projection across aligned corpora
HLT '01 Proceedings of the first international conference on Human language technology research
NE recognition without training data on a language you don't speak
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Evaluating cross-language annotation transfer in the MultiSemCor corpus
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Named entity discovery using comparable news articles
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Improving Communication in E-democracy Using Natural Language Processing
IEEE Intelligent Systems
Opinion Mining and Sentiment Analysis
Foundations and Trends in Information Retrieval
Learning Machine Translation
Fips, a "deep" linguistic multilingual parser
DeepLP '07 Proceedings of the Workshop on Deep Linguistic Processing
Extraction of transliteration pairs from parallel corpora using a statistical transliteration model
Information Sciences: an International Journal
Using parallel corpora for multilingual (multi-document) summarisation evaluation
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
NewsGist: a multilingual statistical news summarizer
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Expanding a multilingual media monitoring and information extraction tool to a new language: Swahili
Language Resources and Evaluation
Creating sentiment dictionaries via triangulation
WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Expanding a multilingual media monitoring and information extraction tool to a new language: Swahili
Language Resources and Evaluation
Cross-lingual geo-parsing for non-structured data
Proceedings of the 7th Workshop on Geographic Information Retrieval
Hi-index | 0.00 |
Multilingual text processing is useful because the information content found in different languages is complementary, both regarding facts and opinions. While Information Extraction and other text mining software can, in principle, be developed for many languages, most text analysis tools have only been applied to small sets of languages because the development effort per language is large. Self-training tools obviously alleviate the problem, but even the effort of providing training data and of manually tuning the results is usually considerable. In this paper, we gather insights by various multilingual system developers on how to minimise the effort of developing natural language processing applications for many languages. We also explain the main guidelines underlying our own effort to develop complex text mining software for tens of languages. While these guidelines--most of all: extreme simplicity--can be very restrictive and limiting, we believe to have shown the feasibility of the approach through the development of the Europe Media Monitor (EMM) family of applications ( http://emm.newsbrief.eu/overview.html ). EMM is a set of complex media monitoring tools that process and analyse up to 100,000 online news articles per day in between twenty and fifty languages. We will also touch upon the kind of language resources that would make it easier for all to develop highly multilingual text mining applications. We will argue that--to achieve this--the most needed resources would be freely available, simple, parallel and uniform multilingual dictionaries, corpora and software tools.