Mixing statistical and symbolic approaches for chemical names recognition

  • Authors:
  • Florian Boudin;Juan Manuel Torres-Moreno;Marc El-Bèze

  • Affiliations:
  • Laboratoire Informatique d'Avignon, Avignon Cedex 9, France;Laboratoire Informatique d'Avignon, Avignon Cedex 9, France and École Polytechnique de Montréal, Département de génie informatique, Montréal, Québec, Canada;Laboratoire Informatique d'Avignon, Avignon Cedex 9, France

  • Venue:
  • CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates the problem of automatic chemical Term Recognition (TR) and proposes to tackle the problem by fusing Symbolic and statistical techniques. Unlike other solutions described in the literature, which only use complex and costly human made ruledbased matching algorithms, we show that the combination of a seven rules matching algorithm and a naïve Bayes classifier achieves high performances. Through experiments performed on different kind of available Organic Chemistry texts, we show that our hybrid approach is also consistent across different data sets.