An all-subtrees approach to unsupervised parsing
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Prototype-driven grammar induction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A Graph Based Method for Building Multilingual Weakly Supervised Dependency Parsers
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
History-Based Inside-Outside Algorithm
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Unsupervised Grammar Induction Using a Parent Based Constituent Context Model
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Unsupervised parsing with U-DOP
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Unsupervised grammar induction by distribution and attachment
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Automatic selection of high quality parses created by a fully unsupervised parser
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
The NVI clustering evaluation measure
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Unsupervised induction of labeled parse trees by clustering with syntactic features
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Two approaches for building an unsupervised dependency parser and their other applications
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
A second language acquisition model using example generalization and concept categories
PMHLA '05 Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition
Unsupervised argument identification for Semantic Role Labeling
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Unsupervised multilingual grammar induction
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Grammar Induction by Unification of Type-logical Lexicons
Journal of Logic, Language and Information
From baby steps to Leapfrog: how "Less is More" in unsupervised dependency parsing
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Variable-length Markov models and ambiguous words in Portuguese
YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Variable-length Markov models and ambiguous words in Portuguese
YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Improvements in unsupervised co-occurrence based parsing
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Improved fully unsupervised parsing with zoomed learning
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Neutralizing linguistically problematic annotations in unsupervised dependency parsing evaluation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Unsupervised multilingual learning
Unsupervised multilingual learning
A comparative study on chinese word clustering
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Searching for smallest grammars on large sequences and application to DNA
Journal of Discrete Algorithms
Lateen EM: unsupervised training with multiple objectives, applied to dependency grammar induction
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised dependency parsing without gold part-of-speech tags
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Choosing word occurrences for the smallest grammar problem
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
A feature-rich constituent context model for grammar induction
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Towards a self-learning assistive vocal interface: vocabulary and grammar learning
SMIAE '12 Proceedings of the 1st Workshop on Speech and Multimodal Interaction in Assistive Environments
Towards unsupervised learning of temporal relations between events
Journal of Artificial Intelligence Research
Semantic separator learning and its applications in unsupervised Chinese text parsing
Frontiers of Computer Science: Selected Publications from Chinese Universities
Smoothing for bracketing induction
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Bayesian Constituent Context Model for Grammar Induction
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
There is precisely one complete language processing system to date: the human brain. Though there is debate on how much built-in bias human learners might have, we definitely acquire language in a primarily unsupervised fashion. On the other hand, computational approaches to language processing are almost exclusively supervised, relying on hand-labeled corpora for training. This reliance is largely due to unsupervised approaches having repeatedly exhibited discouraging performance. In particular, the problem of learning syntax (grammar) from completely unannotated text has received a great deal of attention for well over a decade, with little in the way of positive results. We argue that previous methods for this task have generally underperformed because of the representations they used. Overly complex models are easily distracted by non-syntactic correlations (such as topical associations), while overly simple models aren't rich enough to capture important first-order properties of language (such as directionality, adjacency, and valence). In this work, we describe several syntactic representations and associated probabilistic models which are designed to capture the basic character of natural language syntax as directly as possible. First, we examine a nested, distributional method which induces bracketed tree structures. Second, we examine a dependency model which induces word-to-word dependency structures. Finally, we demonstrate that these two models perform better in combination than they do alone. With these representations, high-quality analyses can be learned from surprisingly little text, with no labeled examples, in several languages (we show experiments with English, German, and Chinese). Our results show above-baseline performance in unsupervised parsing in each of these languages. Grammar induction methods are useful since parsed corpora exist for only a small number of languages. More generally, most high-level NLP tasks, such as machine translation and question-answering, lack richly annotated corpora, making unsupervised methods extremely appealing even for common languages like English. Finally, while the models in this work are not intended to be cognitively plausible, their effectiveness can inform the investigation of what biases are or are not needed in the human acquisition of language.