Smoothing and compression with stochastic k-testable tree languages

Authors:
Juan Ramón Rico-Juan;Jorge Calera-Rubio;Rafael C. Carrasco
Affiliations:
Departament de Llenguatges i Sistemes Informítics, Universitat d'Alacant, E-03071 Alacant, Spain;Departament de Llenguatges i Sistemes Informítics, Universitat d'Alacant, E-03071 Alacant, Spain;Departament de Llenguatges i Sistemes Informítics, Universitat d'Alacant, E-03071 Alacant, Spain
Venue:
Pattern Recognition
Year:
2005

Citing 23
Cited 3

Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Elements of information theory

Elements of information theory
Efficient learning of context-free grammars from positive structural examples

Information and Computation
Class-based n-gram models of natural language

Computational Linguistics
On Polynomial-Time Learnability in the Limit of Strictly Deterministic Automata

Machine Learning
An efficient probabilistic context-free parsing algorithm that computes prefix probabilities

Computational Linguistics
Arithmetic coding for data compression

Communications of the ACM
Minimal Ascending and Descending Tree Automata

SIAM Journal on Computing
Consistency of Stochastic Context-Free Grammars From Probabilistic Estimation Based on Growth Transformations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical methods for speech recognition

Statistical methods for speech recognition
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Experiments in text file compression

Communications of the ACM
Statistical Language Learning

Statistical Language Learning
Stochastic Inference of Regular Tree Languages

Machine Learning
On the Estimation of 'Small' Probabilities by Leaving-One-Out

IEEE Transactions on Pattern Analysis and Machine Intelligence
Stochastic k-testable Tree Languages and Applications

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Transformation of Documents and Schemas by Patterns and Contextual Conditions

PODP '96 Proceedings of the Third International Workshop on Principles of Document Processing
Tree k-Grammar Models for Natural Language Modelling and Parsing

Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Using Regular Tree Automata as XML Schemas

ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Solution of an Open Problem on Probabilistic Grammars

IEEE Transactions on Computers
Locally testable languages

Journal of Computer and System Sciences

Structuring labeled trees for optimal succinctness, and beyond

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Recognizable tree series with discounting

Acta Cybernetica
Classifying melodies using tree grammars

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we describe some techniques to learn probabilistic k-testable tree models, a generalization of the well-known k-gram models, that can be used to compress or classify structured data. These models are easy to infer from samples and allow for incremental updates. Moreover, as shown here, backing-off schemes can be defined to solve data sparseness, a problem that often arises when using trees to represent the data. These features make them suitable to compress structured data files at a better rate than string-based methods.