Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Using eigenvectors of the bigram graph to infer morpheme identity
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Hi-index | 0.00 |
Without prior knowledge, distinguishing different languages may be a hard task, especially when their borders are permeable. We develop an extension of spectral clustering ---a powerful unsupervised classification toolbox ---that is shown to resolve accurately the task of soft language distinction. At the heart of our approach, we replace the usual hard membership assignment of spectral clustering by a soft, probabilistic assignment, which also presents the advantage to bypass a well-known complexity bottleneck of the method. Experiments with a readily available system display the potential of the method, which brings a visually appealing soft distinction of languages that may define altogether a whole corpus.