Extracting nested collocations
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
WWW '05 Proceedings of the 14th international conference on World Wide Web
Unsupervised segmentation of Chinese text by use of branching entropy
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Entropy as an indicator of context boundaries: an experiment using a web search engine
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Multilingual phrase-based concordance generation in real-time
Information Retrieval
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Fully unsupervised word segmentation with BVE and MDL
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Word segmentation as general chunking
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Hi-index | 0.00 |
We scientifically test Harris's hypothesis that morpheme/ word boundaries can be detected from changes in the complexity of phoneme sequences. We re-formulate his hypothesis from a more information theoretic viewpoint and use a corpus to test whether the hypothesis holds. We found that his hypothesis holds for morphemes, with an F-score of about 80%, in both English and Chinese. However, we obtained contrary results for English and Chinese with regard to word boundaries; this reflects a difference in the nature of the two languages.