An Unsupervised Algorithm for Segmenting Categorical Timeseries into Episodes
Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Accessor variety criteria for Chinese word extraction
Computational Linguistics
Contextual dependencies in unsupervised word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Unsupervised segmentation of Chinese text by use of branching entropy
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A fast decoder for joint word segmentation and POS-tagging using a single discriminative model
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A new unsupervised approach to word segmentation
Computational Linguistics
Entropy as an indicator of context boundaries: an experiment using a web search engine
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Hi-index | 0.00 |
In this paper, we present an unsupervized segmentation system tested on Mandarin Chinese. Following Harris's Hypothesis in Kempe (1999) and Tanaka-Ishii's (2005) reformulation, we base our work on the Variation of Branching Entropy. We improve on (Jin and Tanaka-Ishii, 2006) by adding normalization and viterbi-decoding. This enable us to remove most of the thresholds and parameters from their model and to reach near state-of-the-art results (Wang et al., 2011) with a simpler system. We provide evaluation on different corpora available from the Segmentation bake-off II (Emerson, 2005) and define a more precise topline for the task using cross-trained supervized system available off-the-shelf (Zhang and Clark, 2010; Zhao and Kit, 2008; Huang and Zhao, 2007)