Probabilistic term variant generator for biomedical terms
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Text analysis and knowledge mining system
IBM Systems Journal
A text-mining system for knowledge discovery from biomedical documents
IBM Systems Journal
Term identification in the biomedical literature
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Hi-index | 0.00 |
In text mining, to calculate precise keyword frequency distributions in a particular document collection, we need to map different keywords that denote the same entity to a canonical form. In the life science domain, we can construct a large dictionary that contains the canonical forms and their variants based on the information from external resources and use this dictionary for the term aggregation. However, in this automatically generated dictionary, there are many invalid entries that have negative effects on the calculations of keyword frequencies. In this paper, we propose and test methods to detect invalid entries in the dictionary.