Class-based n-gram models of natural language
Computational Linguistics
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Probabilistic CFG with latent annotations
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
On statistical parsing of French with supervised and semi-supervised strategies
CLAGI '09 Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference
Word representations: a simple and general method for semi-supervised learning
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Handling unknown words in statistical latent-variable parsing models for Arabic, English and French
SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Benchmarking of statistical dependency parsers for French
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Language models as representations for weakly-supervised NLP tasks
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Decreasing lexical data sparsity in statistical syntactic parsing: experiments with named entities
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A word clustering approach to domain adaptation: effective parsing of biomedical texts
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
French parsing enhanced with a word clustering method based on a syntactic lexicon
SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Discriminative strategies to integrate multiword expression recognition and parsing
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Semi-supervised dependency parsing using lexical affinities
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Biased representation learning for domain adaptation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Word segmentation, unknown-word resolution, and morphological agreement in a hebrew parsing system
Computational Linguistics
Parsing models for identifying multiword expressions
Computational Linguistics
Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields
ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2
Hi-index | 0.00 |
We present a semi-supervised method to improve statistical parsing performance. We focus on the well-known problem of lexical data sparseness and present experiments of word clustering prior to parsing. We use a combination of lexicon-aided morphological clustering that preserves tagging ambiguity, and unsupervised word clustering, trained on a large unannotated corpus. We apply these clusterings to the French Treebank, and we train a parser with the PCFG-LA unlexicalized algorithm of (Petrov et al., 2006). We find a gain in French parsing performance: from a baseline of F1=86.76% to F1=87.37% using morphological clustering, and up to F1=88.29% using further unsupervised clustering. This is the best known score for French probabilistic parsing. These preliminary results are encouraging for statistically parsing morphologically rich languages, and languages with small amount of annotated data.