Learning language models through the ECGI method
Speech Communication - Eurospeech '91
Learning grammatical stucture using statistical decision-trees
ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model
Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Finding clauses in unrestricted text by finitary and stochastic methods
ANLC '88 Proceedings of the second conference on Applied natural language processing
Incremental finite-state parsing
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A syntax-based part-of-speech analyser
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
A memory-based approach to learning shallow natural language patterns
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Surface grammatical analysis for the extraction of terminological noun phrases
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
Shallow Parsing Using Probabilistic Grammatical Inference
ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Language Understanding Using Two-Level Stochastic Models with POS and Semantic Units
TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
Hi-index | 0.00 |
In this paper, we present a corpus-based approach for tagging and chunking. The formalism used is based on stochastic finite-state automata. Therefore, it can include n-grams models or any stochastic finite-state automata learnt using grammatical inference techniques. As the models involved in our system are learnt automatically, it allows for a very flexible and portable system for different languages and chunk definitions. In order to show the viability of our approach, we present results for tagging and chunking using different combinations of bigrams and other more complex automata learnt by means of the Error Correcting Grammatical Inference (ECGI) algorithm. The experimentation was carried out on the Wall Street Journal corpus for English and on the Lexesp corpus for Spanish.