On statistical parsing of French with supervised and semi-supervised strategies

Authors:
Marie Candito;Benoît Crabbé;Djamé Seddah
Affiliations:
Université Paris, Ufrl et Inria (Alpage), Paris, France;Université Paris, Ufrl et Inria (Alpage), Paris, France;Université Paris, LaLIC et Inria (Alpage), Paris, France
Venue:
CLAGI '09 Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference
Year:
2009

Citing 13
Cited 6

Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
PCFG models of linguistic tree representations

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Probabilistic parsing for German using sister-head dependencies

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Intricacies of Collins' Parsing Model

Computational Linguistics
Probabilistic CFG with latent annotations

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Lexicalization in crosslinguistic probabilistic parsing: the case of French

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Is it really that difficult to parse German?

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A dependency-based method for evaluating broad-coverage parsers

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Improving generative statistical parsing with semi-supervised word clustering

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Handling unknown words in statistical latent-variable parsing models for Arabic, English and French

SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French

SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Joint Hebrew segmentation and parsing using a PCFG-LA lattice parser

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging

Language Resources and Evaluation
Word segmentation, unknown-word resolution, and morphological agreement in a hebrew parsing system

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports results on grammatical induction for French. We investigate how to best train a parser on the French Treebank (Abeillé et al., 2003), viewing the task as a trade-off between generaliz-ability and interpretability. We compare, for French, a supervised lexicalized parsing algorithm with a semi-supervised un-lexicalized algorithm (Petrov et al., 2006) along the lines of (Crabbé and Candito, 2008). We report the best results known to us on French statistical parsing, that we obtained with the semi-supervised learning algorithm. The reported experiments can give insights for the task of grammatical learning for a morphologically-rich language, with a relatively limited amount of training data, annotated with a rather flat structure.