EXPLOITING SUBTREES IN AUTO-PARSED DATA TO IMPROVE DEPENDENCY PARSING

Authors:
Wenliang Chen;Jun’ichi Kazama;Kiyotaka Uchimoto;Kentaro Torisawa
Affiliations:
Language Infrastructure Group, MASTAR Project, National Institute of Information and Communications Technology, Tokyo, Japan and Human Language Technology, Institute for Infocomm Researc ...;Language Infrastructure Group, MASTAR Project, National Institute of Information and Communications Technology, Tokyo, Japan;Language Infrastructure Group, MASTAR Project, National Institute of Information and Communications Technology, Tokyo, Japan;Language Infrastructure Group, MASTAR Project, National Institute of Information and Communications Technology, Tokyo, Japan

Venue:

Computational Intelligence

Year:

2012

Citing 27

Cited 0

Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II

Three new probabilistic models for dependency parsing: an exploration

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1

Bootstrapping statistical parsers from small datasets

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1

A maximum entropy model for prepositional phrase attachment

HLT '94 Proceedings of the workshop on Human Language Technology

Question answering passage retrieval using dependency relations

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Dependency tree kernels for relation extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Online large-margin training of dependency parsers

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

Machine translation using probabilistic synchronous dependency insertion grammars

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

Reranking and self-training for parser adaptation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

Multilingual dependency parsing using Bayes Point Machines

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics

Discriminative learning and spanning tree algorithms for dependency parsing

Discriminative learning and spanning tree algorithms for dependency parsing

Discriminative classifiers for deterministic dependency parsing

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions

Probabilistic Models for Action-Based Chinese Dependency Parsing

ECML '07 Proceedings of the 18th European conference on Machine Learning

Using Short Dependency Relations from Auto-Parsed Data for Chinese Dependency Parsing

ACM Transactions on Asian Language Information Processing (TALIP)

CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning

TAG, dynamic programming, and the perceptron for efficient, feature-rich parsing

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning

Dependency-based syntactic-semantic analysis with PropBank and NomBank

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning

Parsing syntactic and semantic dependencies with two single-stage maximum entropy models

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning

Chinese dependency parsing with large scale automatically constructed case structures

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1

A tale of two parsers: investigating and combining graph-based and transition-based dependency parsing using beam-search

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Using self-trained bilexical preferences to improve disambiguation accuracy

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies

Simple training of dependency parsers via structured boosting

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Cross language dependency parsing using a bilingual lexicon

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1

An error-driven word-character hybrid model for joint Chinese word segmentation and POS tagging

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1

An empirical study of semi-supervised structured conditional models for dependency parsing

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Improving dependency parsing with subtrees from auto-parsed data

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dependency parsing has attracted considerable interest from researchers and developers in natural language processing. However, to obtain a high-accuracy dependency parser, supervised techniques require a large volume of hand-annotated data, which are extremely expensive. This paper presents a simple and effective approach for improving dependency parsing with subtrees derived from unannotated data, which are easy to obtain. First, we use a baseline parser to parse large-scale unannotated data. Then, we extract subtrees from dependency parse trees in the auto-parsed data. Next, the extracted subtrees are classified into several sets according to their frequency. Finally, we design new features based on the subtree sets for parsing algorithms. To demonstrate the effectiveness of our proposed approach, we conduct experiments on the English Penn Treebank and Chinese Penn Treebank. The results show that our approach significantly outperforms baseline systems. It also achieves the best accuracy for the Chinese data and an accuracy competitive with the best known systems for the English data. © 2012 Wiley Periodicals, Inc.