Learning Chinese bracketing knowledge based on a bilingual language model

Authors:
Yajuan Lü;Sheng Li;Tiejun Zhao;Muyun Yang
Affiliations:
Harbin Institute of Technology, Harbin, China;Harbin Institute of Technology, Harbin, China;Harbin Institute of Technology, Harbin, China;Harbin Institute of Technology, Harbin, China
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Year:
2002

Citing 5
Cited 5

Models of translational equivalence among words

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An algorithm for simultaneously bracketing parallel texts by aligning words

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics

Statistical machine translation by parsing

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Parsing word-aligned parallel corpora in a grammar induction context

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
A syntax-driven bracketing model for phrase-based translation

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Automatic adaptation of annotation standards for dependency parsing: using projected treebank as source corpus

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Dependency parsing and projection based on word-pair classification

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a new method for automatic acquisition of Chinese bracketing knowledge from English-Chinese sentence-aligned bilingual corpora. Bilingual sentence pairs are first aligned in syntactic structure by combining English parse trees with a statistical bilingual language model. Chinese bracketing knowledge is then extracted automatically. The preliminary experiments show automatically learned knowledge accords well with manually annotated brackets. The proposed method is particularly useful to acquire bracketing knowledge for a less studied language that lacks tools and resources found in a second language more studied. Although this paper discusses experiments with Chinese and English, the method is also applicable to other language pairs.