CSeg& Tag1.0: a practical word segmenter and POS tagger for Chinese texts
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A trainable rule-based algorithm for word segmentation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A stochastic finite-state word-segmentation algorithm for Chinese
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Word identification for Mandarin Chinese sentences
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Korean text summarization using an aggregate similarity
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
A Statistical Corpus-Based Term Extractor
AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Mostly-unsupervised statistical segmentation of Japanese Kanji sequences
Natural Language Engineering
Mostly-unsupervised statistical segmentation of Japanese: applications to kanji
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Text summarization using a trainable summarizer and latent semantic analysis
Information Processing and Management: an International Journal - Special issue: An Asian digital libraries perspective
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
Using co-occurrence statistics as an information source for partial parsing of Chinese
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
A word segmentation method with dynamic adapting to text using inductive learning
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
A bottom-up merging algorithm for Chinese unknown word extraction
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Contextual dependencies in unsupervised word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Discursive usage of six Chinese punctuation marks
COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Punctuation as implicit annotations for chinese word segmentation
Computational Linguistics
A Unified Character-Based Tagging Framework for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
Integrating unsupervised and supervised word segmentation: The role of goodness measures
Information Sciences: an International Journal
Incremental Chinese lexicon extraction with minimal resources on a domain-specific corpus
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Word frequency approximation for chinese using raw, MM-Segmented and manually segmented corpora
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Unsupervised segmentation of chinese corpus using accessor variety
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Chinese new word finding using character-based parsing model
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Spoken versus written queries for mobile information access: an experiment on Mandarin Chinese
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Word frequency approximation for chinese without using manually-annotated corpus
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Word segmentation and POS tagging for chinese keyphrase extraction
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Hi-index | 0.00 |
Chinese word segmentation is the first step in any Chinese NLP system. This paper presents a new algorithm for segmenting Chinese texts without making use of any lexicon and hand-crafted linguistic resource. The statistical data required by the algorithm, that is, mutual information and the difference of t-score between characters, is derived automatically from raw Chinese corpora. The preliminary experiment shows that the segmentation accuracy of our algorithm is acceptable. We hope the gaining of this approach will be beneficial to improving the performance (especially in ability to cope with unknown words and ability to adapt to various domains) of the existing segmenters, though the algorithm itself can also be utilized as a stand-alone segmenter in some NLP applications.