Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Wrapping the Naive Bayes classifier to relax the effect of dependences
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Hi-index | 0.00 |
The use of domain ontologies is becoming increasingly popular in Medical Natural Language Processing Systems. A wide variety of knowledge bases in multiple languages has been integrated into the Unified Medical Language System (UMLS) to create a huge knowledge source that can be accessed with diverse lexical tools. MetaMap (and its java version MMTx) is a tool that allows extracting medical concepts from free text, but currently there not exists a Spanish version. Our ongoing research is centered on the application of biomedical concepts to cross-lingual text classification, what makes it necessary to have a Spanish MMTx available. We have combined automatic translation techniques with biomedical ontologies and the existing English MMTx to produce a Spanish version of MMTx. We have evaluated different approaches and applied several types of evaluation according to different concept representations for text classification. Our results prove that the use of existing translation tools such as Google Translate produce translations with a high similarity to original texts in terms of extracted concepts.