Tree k-Grammar Models for Natural Language Modelling and Parsing

Authors:
Jose L. Verdú-Mas;Mikel L. Forcada;Rafael C. Carrasco;Jorge Calera-Rubio
Affiliations:
-;-;-;-
Venue:
Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Year:
2002

Citing 10
Cited 1

Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
Class-based n-gram models of natural language

Computational Linguistics
Minimal Ascending and Descending Tree Automata

SIAM Journal on Computing
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Stochastic Inference of Regular Tree Languages

Machine Learning
Probabilistic k-Testable Tree Languages

ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
PCFG models of linguistic tree representations

Computational Linguistics
Compacting the Penn Treebank grammar

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Tree-bank grammars

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Smoothing and compression with stochastic k-testable tree languages

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we compare three different approaches to build a probabilistic context-free grammar for natural language parsing from a tree bank corpus: (1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule; (2) a model that also stores information about the parent node's category, and (3) a model that estimates the probabilities according to a generalized k-gram scheme for trees with k = 3. The last model allows for faster parsing and decreases considerably the perplexity of test samples.