Word identification for Mandarin Chinese sentences

Authors:
Keh-Jiann Chen;Shing-Huan Liu
Affiliations:
Institute of Information Science, Academia Sinica;Institute of Information Science, Academia Sinica
Venue:
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Year:
1992

Citing 1
Cited 50

Information-based Case Grammar

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2

Fast and quasi-natural language search for gigabytes of Chinese texts

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A stochastic finite-state word-segmentation algorithm for Chinese

Computational Linguistics
On Chinese text retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A statistically emergent approach for language processing: application to modeling context effects in ambiguous Chinese word boundary perception

Computational Linguistics
Comparing representations in Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese text retrieval without using a dictionary

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
PAT-tree-based keyword extraction for Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Word segmentation and recognition for web document framework

Proceedings of the eighth international conference on Information and knowledge management
On the use of words and n-grams for Chinese information retrieval

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Revision of Morphological Analysis Errors through the Person Name Construction Model

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
A Hybrid Approach of Text Segmentation Based on Sensitive Word Concept for NLP

CICLing '01 Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing
Critical tokenization and its properties

Computational Linguistics
Splitting-merging model of Chinese word tokenization and segmentation

Natural Language Engineering
Applying repair processing in Chinese homophone disambiguation

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
CSeg& Tag1.0: a practical word segmenter and POS tagger for Chinese texts

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Chinese word segmentation without using lexicon and hand-crafted training data

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A stochastic finite-state word-segmentation algorithm for Chinese

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Automatic semantic classification for Chinese unknown compound nouns

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Chinese segmentation disambiguation

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Syllable-based model for the Korean morphology

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Character-based collocation for Mandarin Chinese

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
A Chinese corpus for linguistic research

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
Identification and classification of proper nouns in Chinese texts

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Segmentation standard for Chinese natural language processing

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
The head-modifier principle and multilingual term extraction

Natural Language Engineering
Unknown word extraction for Chinese documents

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Knowledge extraction for identification of Chinese organization names

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Sinica Treebank: design criteria, annotation guidelines, and on-line interface

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Statistically-enhanced new word identification in a rule-based Chinese system

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Learning case-based knowledge for disambiguating Chinese word segmentation: a preliminary study

SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
A bottom-up merging algorithm for Chinese unknown word extraction

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources

ACM Transactions on Asian Language Information Processing (TALIP)
Implementation and performance evaluation of parameter improvement mechanisms for intelligent e-learning systems

Computers & Education
Using GHSOM to construct legal maps for Taiwan's securities and futures markets

Expert Systems with Applications: An International Journal
Comparing different units for query translation in Chinese cross-language information retrieval

Proceedings of the 2nd international conference on Scalable information systems
An ontology-supported database refurbishing technique and its application in mining actionable troubleshooting rules from real-life databases

Engineering Applications of Artificial Intelligence
Automatic extraction of new words based on Google News corpora for supporting lexicon-based Chinese word segmentation systems

Expert Systems with Applications: An International Journal
Personalized e-news monitoring agent system for tracking user-interested Chinese news events

Applied Intelligence
Current research issues and trends in non-English Web searching

Information Retrieval
Rethinking Chinese word segmentation: tokenization, character classification, or wordbreak identification

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Data-driven compound splitting method for english compounds in domain names

Proceedings of the 18th ACM conference on Information and knowledge management
Mining bilingual data from the web with adaptively learnt patterns

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Summary of FAQs from a topical forum based on the native composition structure

Expert Systems with Applications: An International Journal
Word-based and character-based word segmentation models: comparison and combination

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A case-based reasoning approach to zero anaphora resolution in chinese texts

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Error Diagnosis of Chinese Sentences Using Inductive Learning Algorithm and Decomposition-Based Testing Mechanism

ACM Transactions on Asian Language Information Processing (TALIP)
A fully automated web-based TV-News system

PCM'04 Proceedings of the 5th Pacific Rim conference on Advances in Multimedia Information Processing - Volume Part III
Boosting-based ensemble learning with penalty profiles for automatic Thai unknown word recognition

Computers & Mathematics with Applications
Revision for recognizing Chinese handwritten sentences based on lexical, syntactical and corpus rules

ROCLING '11 ROCLING 2011 Poster Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chinese sentences are composed with string of characters without blanks to mark words. However the basic unit for sentence parsing and understanding is word. Therefore the first step of processing Chinese sentences is to identify the words. The difficulties of identifying words include (1) the identification of complex words, such as Determinative-Measure, reduplications, derived words etc., (2) the identification of proper names, (3) resolving the ambiguous segmentations. In this paper, we propose the possible solutions for the above difficulties. We adopt a matching algorithm with 6 different heuristic rules to resolve the ambiguities and achieve an 99.77% of the success rate. The statistical data supports that the maximal matching algorithm is the most effective heuristics.