A Hybrid Approach t Word Segmentation
ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
Hi-index | 0.00 |
We report on experiments which demonstrate that by abductive inference it is possible to learn enough simple phonotactics to distinguish words from non-words for a simplified set of Dutch, the monosyllables. The monosyllables are distinguished in input so that segmentation is not problematic. Frequency information is withheld as is negative data. The methods are all tested using ten-fold cross-validation as well as a fixed number of randomly generated strings. Orthographic and phonetic representations are compared. The work presented in this chapter is part of a larger project comparing different machine learning techniques on linguistic data.