An Integrated Statistical Model for Tagging and Chunking Unrestricted Text

Authors:
Ferran Pla;Antonio Molina;Natividad Prieto
Affiliations:
-;-;-
Venue:
TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Year:
2000

Citing 12
Cited 2

Learning language models through the ECGI method

Speech Communication - Eurospeech '91
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Learning grammatical stucture using statistical decision-trees

ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Finding clauses in unrestricted text by finitary and stochastic methods

ANLC '88 Proceedings of the second conference on Applied natural language processing
Incremental finite-state parsing

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Developing a hybrid NP parser

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A syntax-based part-of-speech analyser

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
A memory-based approach to learning shallow natural language patterns

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Surface grammatical analysis for the extraction of terminological noun phrases

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3

Shallow Parsing Using Probabilistic Grammatical Inference

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Language Understanding Using Two-Level Stochastic Models with POS and Semantic Units

TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a corpus-based approach for tagging and chunking. The formalism used is based on stochastic finite-state automata. Therefore, it can include n-grams models or any stochastic finite-state automata learnt using grammatical inference techniques. As the models involved in our system are learnt automatically, it allows for a very flexible and portable system for different languages and chunk definitions. In order to show the viability of our approach, we present results for tagging and chunking using different combinations of bigrams and other more complex automata learnt by means of the Error Correcting Grammatical Inference (ECGI) algorithm. The experimentation was carried out on the Wall Street Journal corpus for English and on the Lexesp corpus for Spanish.