MaxMatcher: biological concept extraction using approximate dictionary lookup

  • Authors:
  • Xiaohua Zhou;Xiaodan Zhang;Xiaohua Hu

  • Affiliations:
  • College of Information Science & Technology, Drexel University, Philadelphia, PA;College of Information Science & Technology, Drexel University, Philadelphia, PA;College of Information Science & Technology, Drexel University, Philadelphia, PA

  • Venue:
  • PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dictionary-based biological concept extraction is still the state-ofthe-art approach to large-scale biomedical literature annotation and indexing. The exact dictionary lookup is a very simple approach, but always achieves low extraction recall because a biological term often has many variants while a dictionary is impossible to collect all of them. We propose a generic extraction approach, referred to as approximate dictionary lookup, to cope with term variations and implement it as an extraction system called MaxMatcher. The basic idea of this approach is to capture the significant words instead of all words to a particular concept. The new approach dramatically improves the extraction recall while maintaining the precision. In a comparative study on GENIA corpus, the recall of the new approach reaches a 57% recall while the exact dictionary lookup only achieves a 26% recall.