Inductive improvement of part-of-speech tagging and its effect on a terminology of molecular biology

  • Authors:
  • Ahmed Amrani;Mathieu Roche;Yves Kodratoff;Oriane Matte-Tailliez

  • Affiliations:
  • ESIEA Recherche, Paris, France;LRI, UMR CNRS 8623, Bât 490, Université de Paris-Sud 11, Orsay, France;LRI, UMR CNRS 8623, Bât 490, Université de Paris-Sud 11, Orsay, France;LRI, UMR CNRS 8623, Bât 490, Université de Paris-Sud 11, Orsay, France

  • Venue:
  • AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the context of Part-of-Speech (PoS)-tagging of specialized corpora, we proposed an inductive approach focusing on the most ‘important' PoStags because mistaking them can lead to a total misunderstanding of the text After a standard tagging of a biological corpus by Brill's tagger, we noted persistent errors that are very hard to deal with As an application, we studied two cases of different nature: first, confusion between past participle, adjective and preterit for verbs that end with ‘ed'; second, confusion between plural nouns and verbs, 3rd person singular present With a friendly user interface, the expert corrected the examples Then, from these well-annotated examples, we induced rules using a propositional rule induction algorithm Experimental validation showed improvement in tagging precision The relevance of the terminology of the considered field, here molecular biology, is greatly improved when the number of these tagging errors decreases.