A survey of thresholding techniques
Computer Vision, Graphics, and Image Processing
Bayesian learning of probabilistic language models
Bayesian learning of probabilistic language models
An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery
Machine Learning - Special issue on natural language learning
Information Retrieval
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
Chunking in Soar: The Anatomy of a General Learning Mechanism
Machine Learning
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
An Unsupervised Algorithm for Segmenting Categorical Timeseries into Episodes
Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
An Algorithm for Segmenting Categorical Time Series into Meaningful Episodes
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
The Unsupervised Acquisition of a Lexicon from Continuous Speech
The Unsupervised Acquisition of a Lexicon from Continuous Speech
Grounding knowledge in sensors: unsupervised learning for language and planning
Grounding knowledge in sensors: unsupervised learning for language and planning
A compression-based algorithm for Chinese word segmentation
Computational Linguistics
A statistical model for word discovery in transcribed speech
Computational Linguistics
Mostly-unsupervised statistical segmentation of Japanese: applications to kanji
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Improving Sequence Recognition for Learning the Behavior of Agents
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Chinese text segmentation with MBDP-1: making the most of training corpora
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Unsupervised segmentation of words using prior distributions of morph length and frequency
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Finding structure via compression
NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Identifying hierarchical structure in sequences: a linear-time algorithm
Journal of Artificial Intelligence Research
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Word segmentation as general chunking
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
A new unsupervised approach to word segmentation
Computational Linguistics
Hi-index | 0.00 |
We describe a statistical signature of chunks and an algorithm for finding chunks. While there is no formal definition of chunks, they may be reliably identified as configurations with low internal entropy or unpredictability and high entropy at their boundaries. We show that the log frequency of a chunk is a measure of its internal entropy. The Voting-Experts exploits the signature of chunks to find word boundaries in text from four languages and episode boundaries in the activities of a mobile robot.