Inductive improvement of part-of-speech tagging and its effect on a terminology of molecular biology

Authors:
Ahmed Amrani;Mathieu Roche;Yves Kodratoff;Oriane Matte-Tailliez
Affiliations:
ESIEA Recherche, Paris, France;LRI, UMR CNRS 8623, Bât 490, Université de Paris-Sud 11, Orsay, France;LRI, UMR CNRS 8623, Bât 490, Université de Paris-Sud 11, Orsay, France;LRI, UMR CNRS 8623, Bât 490, Université de Paris-Sud 11, Orsay, France
Venue:
AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Year:
2005

Citing 14
Cited 0

Word association norms, mutual information, and lexicography

Computational Linguistics
C4.5: programs for machine learning

C4.5: programs for machine learning
Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
ILP in part-of-speech tagging — an overview

Learning language in logic
Part-of-Speech Tagging Using Decision Trees

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Part-of-Speech Tagging Using Progol

ILP '97 Proceedings of the 7th International Workshop on Inductive Logic Programming
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Improving accuracy in word class tagging through the combination of machine learning systems

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Learning Constraint Grammar-style disambiguation rules using inductive logic programming

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Classifier combination for improved lexical disambiguation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Creating a multilingual collocation dictionary from large text corpora

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the context of Part-of-Speech (PoS)-tagging of specialized corpora, we proposed an inductive approach focusing on the most ‘important' PoStags because mistaking them can lead to a total misunderstanding of the text After a standard tagging of a biological corpus by Brill's tagger, we noted persistent errors that are very hard to deal with As an application, we studied two cases of different nature: first, confusion between past participle, adjective and preterit for verbs that end with ‘ed'; second, confusion between plural nouns and verbs, 3rd person singular present With a friendly user interface, the expert corrected the examples Then, from these well-annotated examples, we induced rules using a propositional rule induction algorithm Experimental validation showed improvement in tagging precision The relevance of the terminology of the considered field, here molecular biology, is greatly improved when the number of these tagging errors decreases.