Probabilistic term variant generator for biomedical terms

Authors:
Yoshimasa Tsuruoka;Jun'ichi Tsujii
Affiliations:
CREST, JST (Japan Science and Technology Corporation, Saitama, Japan and University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan and CREST, JST (Japan Science and Technology Corporation, Saitama, Japan
Venue:
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Year:
2003

Citing 10
Cited 5

Viewing morphology as an inference process

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Stemming algorithms: a case study for detailed evaluation

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Effective use of natural language processing techniques for automatic conflation of multi-word terms: the role of derivational morphology, part of speech tagging, and shallow parsing

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Guessing morphology from terms and corpora

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus-based stemming using cooccurrence of word variants

ACM Transactions on Information Systems (TOIS)
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Tuning support vector machines for biomedical named entity recognition

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Unsupervised discovery of morphologically related words based on orthographic and semantic similarity

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Use of support vector machines in extended named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
The GENIA corpus: an annotated research abstract corpus in molecular biology domain

HLT '02 Proceedings of the second international conference on Human Language Technology Research

Term identification in the biomedical literature

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Improving approximate matching capabilities for Meta Map Transfer applications

Proceedings of the 3rd international symposium on Principles and practice of programming in Java
Enhancing automatic term recognition through recognition of variation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Detecting invalid dictionary entries for biomedical text mining

KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Semantic annotation of biomedical literature using google

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an algorithm to generate possible variants for biomedical terms. The algorithm gives each variant its generation probability representing its plausibility, which is potentially useful for query and dictionary expansions. The probabilistic rules for generating variants are automatically learned from raw texts using an existing abbreviation extraction technique. Our method, therefore, requires no linguistic knowledge or labor-intensive natural language resource. We conducted an experiment using 83,142 MEDLINE abstracts for rule induction and 18,930 abstracts for testing. The results indicate that our method will significantly increase the number of retrieved documents for long biomedical terms.