Term identification in the biomedical literature
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Exploring criteria for successful query expansion in the genomic domain
Information Retrieval
Conceptual language models for domain-specific retrieval
Information Processing and Management: an International Journal
A cross-lingual framework for monolingual biomedical information retrieval
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Biomedical information retrieval: the BioTracer approach
ITBAM'10 Proceedings of the First international conference on Information technology in bio- and medical informatics
Supporting biomedical information retrieval: the bioTracer approach
Transactions on large-scale data- and knowledge-centered systems IV
Hi-index | 0.00 |
Tokenization is a fundamental preprocessing step in Information Retrieval systems in which text is turned into index terms. This paper quantifies and compares the influence of various simple tokenization techniques on document retrieval effectiveness in two domains: biomedicine and news. As expected, biomedical retrieval is more sensitive to small changes in the tokenization method. The tokenization strategy can make the difference between a mediocre and well performing IR system, especially in the biomedical domain.