From phoneme to morpheme: another verification using a corpus

Authors:
Kumiko Tanaka-Ishii;Zhihui Jin
Affiliations:
Graduate School of Information Science and Technology, University of Tokyo;Graduate School of Information Science and Technology, University of Tokyo
Venue:
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Year:
2006

Citing 4
Cited 4

Extracting nested collocations

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
A multilingual usage consultation tool based on internet searching: more than a search engine, less than QA

WWW '05 Proceedings of the 14th international conference on World Wide Web
Unsupervised segmentation of Chinese text by use of branching entropy

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Entropy as an indicator of context boundaries: an experiment using a web search engine

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Multilingual phrase-based concordance generation in real-time

Information Retrieval
Bootstrap voting experts

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Fully unsupervised word segmentation with BVE and MDL

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Word segmentation as general chunking

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We scientifically test Harris's hypothesis that morpheme/ word boundaries can be detected from changes in the complexity of phoneme sequences. We re-formulate his hypothesis from a more information theoretic viewpoint and use a corpus to test whether the hypothesis holds. We found that his hypothesis holds for morphemes, with an F-score of about 80%, in both English and Chinese. However, we obtained contrary results for English and Chinese with regard to word boundaries; this reflects a difference in the nature of the two languages.