Chinese word segmentation based on maximum matching and word binding force

Authors:
Pak-kwong Wong;Chorkin Chan
Affiliations:
The University of Hong Kong, Hong Kong;The University of Hong Kong, Hong Kong
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Year:
1996

Citing 0
Cited 12

Toward a unified approach to statistical language modeling for Chinese

ACM Transactions on Asian Language Information Processing (TALIP)
A segment-based hidden markov model for real-setting pinyin-to-chinese conversion

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A Hybrid Approach to Word Segmentation of Vietnamese Texts

Language and Automata Theory and Applications
Discriminative lexicon adaptation for improved character accuracy: a new direction in Chinese language modeling

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Urdu word segmentation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Performance analysis for lattice-based speech indexing approaches using words and subword units

IEEE Transactions on Audio, Speech, and Language Processing
Domain-specific Chinese word segmentation using suffix tree and mutual information

Information Systems Frontiers
Large-scale question classification in cQA by leveraging Wikipedia semantic knowledge

Proceedings of the 20th ACM international conference on Information and knowledge management
Role-explicit query identification and intent role annotation

Proceedings of the 21st ACM international conference on Information and knowledge management
Exploring the existing category hierarchy to automatically label the newly-arising topics in cQA

Proceedings of the 21st ACM international conference on Information and knowledge management
Improving question retrieval in community question answering using world knowledge

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

A Chinese word segmentation algorithm based on forward maximum matching and word binding force is proposed in this paper. This algorithm plays a key role in post-processing the output of a character or speech recognizer in determining the proper word sequence corresponding to an input line of character images or a speech waveform. To support this algorithm, a text corpus of over 63 millions characters is employed to enrich an 80,000-words lexicon in terms of its word entries and word binding forces. As it stands now, given an input line of text, the word segmentor can process on the average 210,000 characters per second when running on an IBM RISC System/6000 3BT workstation with a correct word identification rate of 99.74%.