IEEE Transactions on Pattern Analysis and Machine Intelligence
Elements of information theory
Elements of information theory
Efficient learning of context-free grammars from positive structural examples
Information and Computation
Class-based n-gram models of natural language
Computational Linguistics
An efficient probabilistic context-free parsing algorithm that computes prefix probabilities
Computational Linguistics
Arithmetic coding for data compression
Communications of the ACM
Minimal Ascending and Descending Tree Automata
SIAM Journal on Computing
IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical methods for speech recognition
Statistical methods for speech recognition
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Experiments in text file compression
Communications of the ACM
Statistical Language Learning
Stochastic Inference of Regular Tree Languages
Machine Learning
On the Estimation of 'Small' Probabilities by Leaving-One-Out
IEEE Transactions on Pattern Analysis and Machine Intelligence
Stochastic k-testable Tree Languages and Applications
ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Transformation of Documents and Schemas by Patterns and Contextual Conditions
PODP '96 Proceedings of the Third International Workshop on Principles of Document Processing
Tree k-Grammar Models for Natural Language Modelling and Parsing
Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Using Regular Tree Automata as XML Schemas
ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Inside-outside reestimation from partially bracketed corpora
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Solution of an Open Problem on Probabilistic Grammars
IEEE Transactions on Computers
Journal of Computer and System Sciences
Structuring labeled trees for optimal succinctness, and beyond
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Recognizable tree series with discounting
Acta Cybernetica
Classifying melodies using tree grammars
IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Hi-index | 0.01 |
In this paper, we describe some techniques to learn probabilistic k-testable tree models, a generalization of the well-known k-gram models, that can be used to compress or classify structured data. These models are easy to infer from samples and allow for incremental updates. Moreover, as shown here, backing-off schemes can be defined to solve data sparseness, a problem that often arises when using trees to represent the data. These features make them suitable to compress structured data files at a better rate than string-based methods.