Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
An annotation scheme for free word order languages
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Prototype-driven grammar induction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Highly accurate error-driven method for noun phrase detection
Pattern Recognition Letters
Unsupervised parsing with U-DOP
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
K-best combination of syntactic parsers
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Natural language grammar induction with a generative constituent-context model
Pattern Recognition
Products of random latent variable grammars
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Identifying patterns for unsupervised grammar induction
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Voting between multiple data representations for text chunking
AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
There is a growing interest in unsupervised grammar induction, which does not require syntactic annotations, but provides less accurate results than the supervised approach. Aiming at improving the accuracy of the unsupervised approach, we have resorted to additional information, which can be obtained more easily. Shallow parsing or chunking identifies the sentence constituents (noun phrases, verb phrases, etc.), but without specifying their internal structure. There exist highly accurate systems to perform this task, and thus this information is available even for languages for which large syntactically annotated corpora are lacking. In this work we have investigated how the results of a pattern-based unsupervised grammar induction system improve as data on new kind of phrases are added, leading to a significant improvement in performance. We have analyzed the results for three different languages. We have also shown that the system is able to significantly improve the results of the unsupervised system using the chunks provided by automatic chunkers.