Deterministic part-of-speech tagging with finite-state transducers

Authors:
Emmanuel Roche;Yves Schabes
Affiliations:
MERL;MERL
Venue:
Computational Linguistics
Year:
1995

Citing 13
Cited 40

Grammatical category disambiguation by statistical optimization

Computational Linguistics
Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Regular models of phonological rule systems

Computational Linguistics - Special issue on computational phonology
Storing a sparse table

Communications of the ACM
Automata, Languages, and Machines

Automata, Languages, and Machines
Minimization of Sequential Transducers

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Coping with ambiguity and unknown words through probabilistic models

Computational Linguistics - Special issue on using large corpora: II
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Ambiguity resolution in a reductionistic parser

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Two-level morphology with composition

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Weighted rational transductions and their application to human language processing

HLT '94 Proceedings of the workshop on Human Language Technology

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction

Computational Linguistics
Defense of the ansatz for dynamical hierarchies

Artificial Life
Can We Make Information Extraction More Adaptive?

Information Extraction: Towards Scalable, Adaptable Systems
Compressed Storage of Sparse Finite-State Transducers

WIA '99 Revised Papers from the 4th International Workshop on Automata Implementation
Treatment of Unknown Words

WIA '99 Revised Papers from the 4th International Workshop on Automata Implementation
Bootstrapping an ontology-based information extraction system

Intelligent exploration of the web
A natural language system for retrieval of captioned images

Natural Language Engineering
A divide-and-conquer strategy for shallow parsing of German free texts

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Regular expressions for language engineering

Natural Language Engineering
Transducers from rewrite rules with backreferences

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Compiling regular formalisms with rule features into finite-state automata

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Finite state transducers approximating Hidden Markov Models

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Directed replacement

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Efficient transformation-based parsing

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Finite-state transducer cascades to extract named entities in texts

Theoretical Computer Science - Implementation and application automata
A rule induction approach to modeling regional pronunciation variation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Identifying temporal expression and its syntactic role using FST and lexical data from corpus

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
KCAT: a Korean Corpus Annotating Tool minimizing human intervention

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Parallel replacement in finite state calculus

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
A Generic Finite State Compiler for Tagging Rules

Machine Translation
A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger

Journal of Functional Programming
Transformation-based learning in the fast lane

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Independence and commitment: assumptions for rapid training and execution of rule-based POS taggers

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Coaxing confidences from an old friend: probabilistic classifications from transformation rule lists

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
A bimachine compiler for ranked tagging rules

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Efficient dictionary-based text rewriting using subsequential transducers†

Natural Language Engineering
Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Natural Language Processing as a Foundation of the Semantic Web

Foundations and Trends in Web Science
Look-back and look-ahead in the conversion of Hidden Markov Models into finite state transducers

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Implementing voting constraints with finite state transducers

FSMNLP '09 Proceedings of the International Workshop on Finite State Methods in Natural Language Processing
Improving automatic speech recognition for lectures through transformation-based rules learned from minimal data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Deciding word neighborhood with universal neighborhood automata

Theoretical Computer Science
MWU-aware part-of-speech tagging with a CRF model and lexical resources

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Constrained atomic term: widening the reach of rule templates in transformation based learning

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Word translation disambiguation using multinomial classifiers

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
ETL ensembles for chunking, NER and SRL

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Constraint grammar parsing with left and right sequential finite transducers

FSMNLP '11 Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing
A note on sequential rule-based POS tagging

FSMNLP '11 Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing
Standardization problem of author affiliations in citation indexes

Scientometrics

Quantified Score

Hi-index	0.02

Visualization

Abstract

Stochastic approaches to natural language processing have often been preferred to rule-based approaches because of their robustness and their automatic training capabilities. This was the case for part-of-speech tagging until Brill showed how state-of-the-art part-of-speech tagging can be achieved with a rule-based tagger by inferring rules from a training corpus. However, current implementations of the rule-based tagger run more slowly than previous approaches. In this paper, we present a finite-state tagger, inspired by the rule-based tagger, that operates in optimal time in the sense that the time to assign tags to a sentence corresponds to the time required to follow a single path in a deterministic finite-state machine. This result is achieved by encoding the application of the rules found in the tagger as a nondeterministic finite-state transducer and then turning it into a deterministic transducer. The resulting deterministic transducer yields a part-of-speech tagger whose speed is dominated by the access time of mass storage devices. We then generalize the techniques to the class of transformation-based systems.