Tagging with hidden Markov models using ambiguous tags

Authors:
Alexis Nasr;Frédéric Béchet;Alexandra Volanschi
Affiliations:
LaTTice - Université Paris;Laboratoire d'Informatique, d'Avignon;LaTTice - Université Paris
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 4
Cited 1

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Supertagging: an approach to almost parsing

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Tagset reduction without information loss

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics

An interactive framework for image annotation through gaming

Proceedings of the international conference on Multimedia information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Part of speech taggers based on Hidden Markov Models rely on a series of hypotheses which make certain errors inevitable. The idea developed in this paper consists in allowing a limited, controlled ambiguity in the output of the tagger in order to avoid a number of errors. The ambiguity takes the form of ambiguous tags which denote subsets of the tagset. These tags are used when the tagger hesitates between the different components of the ambiguous tags. They are introduced in an existing lexicon and 3-gram database. Their lexical and syntactic counts are computed on the basis of the lexical and syntactic counts of their constituents, using impurity functions. The tagging process itself, based on the Viterbi algorithm, is unchanged. Experiments conducted on the Brown corpus show a recall of 0.982, for an ambiguity rate of 1.233 which is to be compared with a baseline recall of 0.978 for an ambiguity rate of 1.414 using the same ambiguous tags and with a recall of 0.955 corresponding to the one best solution of standard tagging (without ambiguous tags).