A syntax-based part-of-speech analyser

Authors:
Atro Voutilainen
Affiliations:
Research Unit for Multilingual Language Technology, University of Helsinki, Finland
Venue:
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Year:
1995

Citing 22
Cited 23

Choice of grammatical word-class without global syntactic analysis: tagging words in the LOB Corpus.

Computers and the Humanities
Grammatical category disambiguation by statistical optimization

Computational Linguistics
Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text

Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
Natural Language Processing: The Plnlp Approach

Natural Language Processing: The Plnlp Approach
Slot Grammar: A System for Simpler Construction of Practical Natural Language Grammars

Proceedings of the International Symposium on Natural Language and Logic
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Coping with ambiguity and unknown words through probabilistic models

Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Tagging accurately: don't guess if you know

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Does Baum-Welch re-estimation help taggers?

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Ambiguity resolution in a reductionistic parser

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Acquiring disambiguation rules from text

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Parsing the LOB corpus

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Part-of-speech tagging with neural networks

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
CLAWS4: the tagging of the British National Corpus

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Syntactic analysis of natural language using linguistic rules and corpus-based patterns

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Compiling and using finite-state syntactic rules

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
TTP: a fast and robust parser for natural language

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Finite-state parsing and disambiguation

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2

Joint knowledge capture for grammars and ontologies

Proceedings of the 1st international conference on Knowledge capture
An integrated, dual learner for grammars and ontologies

Data & Knowledge Engineering
Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean

Computational Linguistics
An Integrated Statistical Model for Tagging and Chunking Unrestricted Text

TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Automatic rule induction for unknown-word guessing

Computational Linguistics
Fuzzy network model for part-of-speech tagging under small training data

Natural Language Engineering
Unsupervised learning of part-of-speech guessing rules

Natural Language Engineering
POS disambiguation and unknown word guessing with decision trees

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Morphological disambiguation by voting constraints

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Comparing a linguistic and a stochastic tagger

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Tagging English by path voting constraints

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Japanese morphological analyzer using word co-occurrence: JTAG

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Unsupervised learning of word-category guessing rules

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Tagging and chunking with bigrams

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
A tagger/lemmatiser for Dutch medical language

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Shallow language processing architecture for Bulgarian

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
How far are we from (semi-) automatic annotation of anaphoric links in corpora?

ANARESOLUTION '97 Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts
The present use of statistics in the evaluation of NLP parsers

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Does tagging help parsing?: a case study on finite state parsing

FSMNLP '09 Proceedings of the International Workshop on Finite State Methods in Natural Language Processing
Tagging Icelandic text using a linguistic and a statistical tagger

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Etiquetage grammatical de l'arabe voyellé ou non

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Understanding and executing a declarative sentence involving a forms-of-be verb

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Tagging a morphologically complex language using heuristics

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are two main methodologies for constructing the knowledge base of a natural language analyser: the linguistic and the data-driven. Recent state-of-the-art part-of-speech taggers are based on the data-driven approach. Because of the known feasibility of the linguistic rule-based approach at related levels of description, the success of the data-driven approach in part-of-speech analysis may appear surprising. In this paper, a case is made for the syntactic nature of part-of-speech tagging. A new tagger of English that uses only linguistic distributional rules is outlined and empirically evaluated. Tested against a benchmark corpus of 38,000 words of previously unseen text, this syntax-based system reaches an accuracy of above 99%. Compared to the 95--97% accuracy of its best competitors, this result suggests the feasibility of the linguistic approach also in part-of-speech analysis.