Developing multilingual text mining workflows in UIMA and u-compare

Authors:
Georgios Kontonasios;Ioannis Korkontzelos;Sophia Ananiadou
Affiliations:
National Centre for Text Mining, School of Computer Science, The University of Manchester, UK;National Centre for Text Mining, School of Computer Science, The University of Manchester, UK;National Centre for Text Mining, School of Computer Science, The University of Manchester, UK
Venue:
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Year:
2012

Citing 12
Cited 0

A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Machine Translation
Cross-Language Information Retrieval in a Multilingual Legal Domain

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Terminological variation, a means of identifying research topics from texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Identifying word translations in non-parallel texts

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Building an example application with the unstructured information management architecture

IBM Systems Journal
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Text Mining for Biology And Biomedicine

Text Mining for Biology And Biomedicine
Multilingual term extraction from domain-specific corpora using morphological structure

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Fine-grained tree-to-string translation rule extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Simple and efficient algorithm for approximate dictionary matching

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
U-compare: A modular NLP workflow construction and evaluation system

IBM Journal of Research and Development
Developing a robust part-of-speech tagger for biomedical text

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a generic, language-independent method for the construction of multilingual text mining workflows. The proposed mechanism is implemented as an extension of U-Compare, a platform built on top of the Unstructured Information Management Architecture (UIMA) that allows the construction, comparison and evaluation of interoperable text mining workflows. UIMA was previously supporting strictly monolingual workflows. Building multilingual workflows exhibits challenging problems, such as representing multilingual document collections and executing language-dependent components in parallel. As an application of our method, we develop a multilingual workflow that extracts terms from a parallel collection using a new heuristic. For our experiments, we construct a parallel corpus consisting of approximately 188.000 PubMed article titles for French and English. Our application is evaluated against a popular monolingual term extraction method, C Value.