The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Hierarchical mixture models: a probabilistic analysis
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
The cluster-abstraction model: unsupervised learning of topic hierarchies from text data
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
In this paper we deal with a ubiquitous problem in data management: hierarchical model estimation for large distribution sets. This particular problem arises in many applications. Classification, top-k query processing, clustering and outlier detection are just a few possible applications. Our aim is to continuously and incrementally estimate the model parameters of 'typical' distributions that describe the characteristics of a database. Our approach to model estimation can handle arbitrary types of data (e.g., categorical and numerical data) in databases, incrementally, quickly, and with little resource consumption. Moreover, this paper proposes not only incremental algorithms for model fitting, but also a modeling framework in which the learning approach recognizes hierarchical groups, each of whose distributions has similar characteristics, and separately updates the model parameters of each group without scanning all the distributions in the database. Thus, it can provide a response, i.e., the parameters of typical distribution models, with an arbitrary level of granularity, at any time. Just as importantly, we demonstrate the utility of our approach by showing how it can be applied to two specific problems that arise in the context of data management.