Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Chinese text segmentation for text retrieval: achievements and problems
Journal of the American Society for Information Science
A probabilistic algorithm for segmenting non-Kanji Japanese strings
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
Evaluating parsing strategies using standardized parse files
ANLC '92 Proceedings of the third conference on Applied natural language processing
A trainable rule-based algorithm for word segmentation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Chinese word segmentation without using lexicon and hand-crafted training data
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Japanese morphological analyzer using word co-occurrence: JTAG
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A stochastic Japanese morphological analyzer using a forward-DP backward-A* N-best search algorithm
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Context-based spelling correction for Japanese OCR
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
LINGSTAT: an interactive, machine-aided translation system
HLT '93 Proceedings of the workshop on Human Language Technology
Example-based correction of word segmentation and part of speech labelling
HLT '93 Proceedings of the workshop on Human Language Technology
Japanese word segmentation by hidden Markov model
HLT '94 Proceedings of the workshop on Human Language Technology
A Statistical Corpus-Based Term Extractor
AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
An Unsupervised Algorithm for Segmenting Categorical Timeseries into Episodes
Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Self-Supervised Chinese Word Segmentation
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
A non-programming introduction to computer science via NLP, IR, and AI
ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1
A unified language model for large vocabulary continuous speech recognition of Turkish
Signal Processing - Fractional calculus applications in signals and systems
Unsupervised models for morpheme segmentation and morphology learning
ACM Transactions on Speech and Language Processing (TSLP)
Unsupervised segmentation of Chinese text by use of branching entropy
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Voting experts: An unsupervised algorithm for segmenting sequences
Intelligent Data Analysis
Integrating unsupervised and supervised word segmentation: The role of goodness measures
Information Sciences: an International Journal
A new unsupervised approach to word segmentation
Computational Linguistics
Unsupervised segmentation of chinese corpus using accessor variety
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Entropy as an indicator of context boundaries: an experiment using a web search engine
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Hi-index | 0.00 |
Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and grammar or on pre-segmented data. In contrast, we introduce a novel statistical method utilizing unsegmented training data, with performance on kanji sequences comparable to and sometimes surpassing that of morphological analyzers over a variety of error metrics.