A stemming procedure and stopword list for general French corpora
Journal of the American Society for Information Science
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Mining the Web for bilingual text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Bootstrapping dictionaries for cross-language information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Pragmatic text mining: minimizing human effort to quantify many issues in call logs
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Customizing parallel corpora at the document level
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Integrating Cross-Language Hierarchies and Its Application to Retrieving Relevant Documents
ACM Transactions on Asian Language Information Processing (TALIP)
Corpus microsurgery: criteria optimization for medical cross-language ir
Proceedings of the 17th ACM conference on Information and knowledge management
Indexing and weighting of multilingual and mixed documents
Proceedings of the South African Institute of Computer Scientists and Information Technologists Conference on Knowledge, Innovation and Leadership in a Diverse, Multidisciplinary Environment
A cross-lingual framework for web news taxonomy integration
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Hi-index | 0.00 |
An under-explored question in cross-language information retrieval (CLIR) is to what degree the performance of CLIR methods depends on the availability of high-quality translation resources for particular domains. To address this issue, we evaluate several competitive CLIR methods - with different training corpora - on test documents in the medical domain. Our results show severe performance degradation when using a general-purpose training corpus or a commercial machine translation system (SYSTRAN), versus a domain-specific training corpus. A related unexplored question is whether we can improve CLIR performance by systematically analyzing training resources and optimally matching them to target collections. We start exploring this problem by suggesting a simple criterion for automatically matching training resources to target corpora. By using cosine similarity between training and target corpora as resource weights we obtained an average of 5.6% improvement over using all resources with no weights. The same metric yields 99.4% of the performance obtained when an oracle chooses the optimal resource every time.