Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Experiments in parallel-text based grammar induction
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Unsupervised grammar induction by distribution and attachment
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised multilingual grammar induction
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Natural language grammar induction with a generative constituent-context model
Pattern Recognition
Reducing the size of the representation for the uDOP-estimate
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Semi-supervised constituent grammar induction based on text chunking information
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Semantic separator learning and its applications in unsupervised Chinese text parsing
Frontiers of Computer Science: Selected Publications from Chinese Universities
Hi-index | 0.01 |
This paper describes a new method for unsupervised grammar induction based on the automatic extraction of certain patterns in the texts. Our starting hypothesis is that there exist some classes of words that function as separators, marking the beginning or the end of new constituents. Among these separators we distinguish those which trigger new levels in the parse tree. If we are able to detect these separators we can follow a very simple procedure to identify the constituents of a sentence by taking the classes of words between separators. This paper is devoted to describe the process that we have followed to automatically identify the set of separators from a corpus only annotated with Part-of-Speech (POS) tags. The proposed approach has allowed us to improve the results of previous proposals when parsing sentences from the Wall Street Journal corpus.