Detecting invalid dictionary entries for biomedical text mining

  • Authors:
  • Hironori Takeuchi;Issei Yoshida;Yohei Ikawa;Kazuo Iida;Yoko Fukui

  • Affiliations:
  • IBM Research, Tokyo Research Laboratory, IBM Japan, Ltd., Shimotsuruma, Yamato-shi Kanagawa, Japan;IBM Research, Tokyo Research Laboratory, IBM Japan, Ltd., Shimotsuruma, Yamato-shi Kanagawa, Japan;IBM Research, Tokyo Research Laboratory, IBM Japan, Ltd., Shimotsuruma, Yamato-shi Kanagawa, Japan;Research Institute of Bio-system Informatics, Tohoku Chemical Co., Ltd., Japan;Research Institute of Bio-system Informatics, Tohoku Chemical Co., Ltd., Japan

  • Venue:
  • KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In text mining, to calculate precise keyword frequency distributions in a particular document collection, we need to map different keywords that denote the same entity to a canonical form. In the life science domain, we can construct a large dictionary that contains the canonical forms and their variants based on the information from external resources and use this dictionary for the term aggregation. However, in this automatically generated dictionary, there are many invalid entries that have negative effects on the calculations of keyword frequencies. In this paper, we propose and test methods to detect invalid entries in the dictionary.