A Chinese efficient analyser integrating word segmentation, part-of-speech tagging, partial parsing and full parsing

Authors:
GuoDong Zhou;Jian Su
Affiliations:
Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore
Venue:
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Year:
2003

Citing 5
Cited 2

Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Hybrid text chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Error-driven HMM-based chunk tagger with context-dependent lexicon

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13

Parsing the internal structure of words: a new paradigm for Chinese word segmentation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A lexicon-constrained character model for chinese morphological analysis

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces an efficient analyser for the Chinese language, which efficiently and effectively integrates word segmentation, part-of-speech tagging, partial parsing and full parsing. The Chinese efficient analyser is based on a Hidden Markov Model (HMM) and an HMM-based tagger. That is, all the components are based on the same HMM-based tagging engine. One advantage of using the same single engine is that it largely decreases the code size and makes the maintenance easy. Another advantage is that it is easy to optimise the code and thus improve the speed while speed plays a critical important role in many applications. Finally, the performances of all the components can benefit from the optimisation of existing algorithms and/or adoption of better algorithms to a single engine. Experiments show that all the components can achieve state-of-art performances with high efficiency for the Chinese language.