PCFG parsing for restricted classical Chinese texts

Authors:
Liang Huang;Yinan Peng;Huan Wang;Zhenyu Wu
Affiliations:
Shanghai Jiaotong University, Shanghai, P. R. China;Shanghai Jiaotong University, Shanghai, P. R. China;East China Normal University, Shanghai, P. R. China;Shanghai Jiaotong University, Shanghai, P. R. China
Venue:
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Year:
2002

Citing 4
Cited 2

Natural language understanding (2nd ed.)

Natural language understanding (2nd ed.)
Taggers for parsers

Artificial Intelligence - Special volume on empirical methods
Statistical Part-of-Speech Tagging for Classical Chinese

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Joint and conditional estimation of tagging and parsing models

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics

Pseudo context-sensitive models for parsing isolating languages: classical Chinese-a case study

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
A dependency treebank of classical Chinese poems

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Probabilistic Context-Free Grammar (PCFG) model is widely used for parsing natural languages, including Modern Chinese. But for Classical Chinese, the computer processing is just commencing. Our previous study on the part-of-speech (POS) tagging of Classical Chinese is a pioneering work in this area. Now in this paper, we move on to the PCFG parsing of Classical Chinese texts. We continue to use the same tagset and corpus as our previous study, and apply the bigram-based forward-backward algorithm to obtain the context-dependent probabilities. Then for the PCFG model, we restrict the rewriting rules to be binary/unary rules, which will simplify our programming. A small-sized rule-set was developed that could account for the grammatical phenomena occurred in the corpus. The restriction of texts lies in the limitation on the amount of proper nouns and difficult characters. In our preliminary experiments, the parser gives a promising accuracy of 82.3%.