An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery
Machine Learning - Special issue on natural language learning
Information Retrieval
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
A compression-based algorithm for Chinese word segmentation
Computational Linguistics
Mostly-unsupervised statistical segmentation of Japanese: applications to kanji
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Identifying hierarchical structure in sequences: a linear-time algorithm
Journal of Artificial Intelligence Research
Voting experts: An unsupervised algorithm for segmenting sequences
Intelligent Data Analysis
Features for learning local patterns in time-stamped data
LPD'04 Proceedings of the 2004 international conference on Local Pattern Detection
Mining semantic structures in movies
INAP'04/WLP'04 Proceedings of the 15th international conference on Applications of Declarative Programming and Knowledge Management, and 18th international conference on Workshop on Logic Programming
New malicious code detection using variable length n-grams
ICISS'06 Proceedings of the Second international conference on Information Systems Security
Episode based masquerade detection
ICISS'05 Proceedings of the First international conference on Information Systems Security
Unsupervized word segmentation: the case for Mandarin Chinese
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Hi-index | 0.00 |
This paper describes an unsupervised algorithm for segmenting categorical time series into episodes. The Voting-Experts algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two "expert methods" decide where in the window boundaries should be drawn. The algorithm successfully segments text into words in four languages. The algorithm also segments time series of robot sensor data into subsequences that represent episodes in the life of the robot. We claim that VOTING-EXPERTS finds meaningful episodes in categorical time series because it exploits two statistical characteristics of meaningful episodes.