Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Combining classifiers in text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A stemming procedure and stopword list for general French corpora
Journal of the American Society for Information Science
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories
IAAI '90 Proceedings of the The Second Conference on Innovative Applications of Artificial Intelligence
Should we translate the documents or the queries in cross-language information retrieval?
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Using contextual spelling correction to improve retrieval effectiveness in degraded text collections
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Term proximity scoring for keyword-based retrieval systems
ECIR'03 Proceedings of the 25th European conference on IR research
Argumentative feedback: a linguistically-motivated term expansion for information retrieval
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
The use of MedGIFT and EasyIR for ImageCLEF 2005
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
ImageCLEF 2004: combining image and multi-lingual search for medical image retrieval
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Translation by text categorisation: medical image retrieval in ImageCLEFmed 2006
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Hi-index | 0.00 |
We report on the development of a cross language information retrieval system, which translates user queries by categorizing these queries into terms listed in a controlled vocabulary. Unlike usual automatic text categorization systems, which rely on dataintensive models induced from large training data, our automatic text categorization tool applies data-independent classifiers: a vector-space engine and a pattern matcher are combined to improve ranking of Medical Subject Headings (MeSH). The categorizer also benefits from the availability of large thesauri, where variants of MeSH terms can be found. For evaluation, we use an English collection of MedLine records: OHSUMED. French OHSUMED queries - translated from the original English queries by domain experts- are mapped into French MeSH terms; then we use the MeSH controlled vocabulary as interlingua to translate French MeSH terms into English MeSH terms, which are finally used to query the OHSUMED document collection. The first part of the study focuses on the text to MeSH categorization task. We use a set of MedLine abstracts as input documents in order to tune the categorization system. The second part compares the performance of a machine translation-based cross language information retrieval (CLIR) system with the categorization-based system: the former results in a CLIR ratio close to 60%, while the latter achieves a ratio above 80%. A final experiment, which combines both approaches, achieves a result above 90%.