Dependency trees in sub-linear time and bounded memory

Authors:
Dan Pelleg;Andrew Moore
Affiliations:
IBM Haifa Labs, Israel;Robotics Institute, Carnegie-Mellon University
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2006

Citing 9
Cited 0

Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Algorithms

Introduction to Algorithms
Bayesian Network Classification with Continuous Attributes: Getting the Best of Both Discretization and Parametric Fitting

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High-Dimensional Sparse Data

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning with mixtures of trees

The Journal of Machine Learning Research
Tractable learning of large Bayes net structures from sparse data

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Scalable and practical probability density estimators for scientific anomaly detection

Scalable and practical probability density estimators for scientific anomaly detection
Learning bayesian network structure from massive datasets: the «sparse candidate« algorithm

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We focus on the problem of efficient learning of dependency trees. Once grown, they can be used as a special case of a Bayesian network, for PDF approximation, and for many other uses. Given the data, a well-known algorithm can fit an optimal tree in time that is quadratic in the number of attributes and linear in the number of records. We show how to modify it to exploit partial knowledge about edge weights. Experimental results show running time that is near-constant in the number of records, without significant loss in accuracy of the generated trees.