Self-Organizing Maps and Learning Vector Quantization forFeature Sequences
Neural Processing Letters
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Plagiarism Detection through Multilevel Text Comparison
AXMEDIS '06 Proceedings of the Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution
Tandem repeats over the edit distance
Bioinformatics
Machine Learning: An Algorithmic Perspective
Machine Learning: An Algorithmic Perspective
Behaviour Recognition from Sensory Streams in Smart Environments
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Many unsupervised learning methods for recognising patterns in data streams are based on fixed length data sequences, which makes them unsuitable for applications where the data sequences are of variable length such as in speech recognition, behaviour recognition and text classification. In order to use these methods on variable length data sequences, a pre-processing step is required to manually segment the data and select the appropriate features, which is often not practical in real-world applications. In this paper we suggest an unsupervised learning method that handles variable length data sequences by identifying structure in the data stream using text compression and the edit distance between 'words'. We demonstrate that using this method we can automatically cluster unlabelled data in a data stream and perform segmentation. We evaluate the effectiveness of our proposed method using both fixed length and variable length benchmark datasets, comparing it to the Self-Organising Map in the first case. The results show a promising improvement over baseline recognition systems.