Biomedical text retrieval in languages with a complex morphology

Authors:
Stefan Schulz;Martin Honeck;Udo Hahn
Affiliations:
Freiburg University Hospital;Freiburg University Hospital;Freiburg University
Venue:
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Year:
2002

Citing 8
Cited 5

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Viewing morphology as an inference process

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Towards new measures of information retrieval evaluation

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Stemming algorithms: a case study for detailed evaluation

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Viewing stemming as recall enhancement

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Effective use of natural language processing techniques for automatic conflation of multi-word terms: the role of derivational morphology, part of speech tagging, and shallow parsing

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Aspects of Swedish morphology and semantics from the perspective of mono- and cross-language information retrieval

Information Processing and Management: an International Journal
Information Retrieval

Information Retrieval

Bootstrapping dictionaries for cross-language information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Cognate mapping: a heuristic strategy for the semi-supervised acquisition of a Spanish lexicon from a Portuguese seed lexicon

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Question processing and clustering in INDOC: a biomedical question answering system

EURASIP Journal on Bioinformatics and Systems Biology
Multilingual term extraction from domain-specific corpora using morphological structure

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Methodological Review: Unsupervised grammar induction and similarity retrieval in medical language processing using the Deterministic Dynamic Associative Memory (DDAM) model

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document retrieval in languages with a rich and complex morphology - particularly in terms of derivation and (single-word) composition - suffers from serious performance degradation with the stemming-only query-term-to-text-word matching paradigm. We propose an alternative approach in which morphologically complex word forms are segmented into relevant subwords (such as stems, named entities, acronyms), and subwords constitute the basic unit for indexing and retrieval. We evaluate our approach on a large biomedical document collection.