The derivation problem of summary data
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Framework for query optimization in distributed statistical databases
Information and Software Technology
A universal-scheme approach to statistical databases containing homogeneous summary tables
ACM Transactions on Database Systems (TODS)
Dataset descriptions and results
Machine learning, neural and statistical classification
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimal and efficient integration of heterogeneous summary tables in a distributed database
Data & Knowledge Engineering
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: a robust clustering algorithm for categorical attributes
Information Systems
Generality-Based Conceptual Clustering with Probabilistic Concepts
IEEE Transactions on Pattern Analysis and Machine Intelligence
A robust and scalable clustering algorithm for mixed type attributes in large database environment
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
An Evidential Reasoning Approach to Attribute Value Conflict Resolution in Database Integration
IEEE Transactions on Knowledge and Data Engineering
Designing a Kernel for Data Mining
IEEE Expert: Intelligent Systems and Their Applications
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Cached sufficient statistics for efficient machine learning with large datasets
Journal of Artificial Intelligence Research
Data privacy protection in multi-party clustering
Data & Knowledge Engineering
Integrating semantically heterogeneous aggregate views of distributed databases
Distributed and Parallel Databases
Privacy-preserving data publishing for cluster analysis
Data & Knowledge Engineering
Knowledge discovery from semantically heterogeneous aggregate databases using model-based clustering
BNCOD'07 Proceedings of the 24th British national conference on Databases
Model-based segmentation of multimodal images
CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Modeling the evolution of associated data
Data & Knowledge Engineering
A log-linear approach to mining significant graph-relational patterns
Data & Knowledge Engineering
Reliable representations for association rules
Data & Knowledge Engineering
Probability-based text clustering algorithm by alternately repeating two operations
Journal of Information Science
Top-k best probability queries and semantics ranking properties on probabilistic databases
Data & Knowledge Engineering
Hi-index | 0.00 |
Clustering of distributed databases facilitates knowledge discovery through learning of new concepts that characterise common features and differences between datasets. Hence, general patterns can be learned rather than restricting learning to specific databases from which rules may not be generalisable. We cluster databases that hold aggregate count data on categorical attributes that have been classified according to homogeneous or heterogeneous classification schemes. Clustering of datasets is carried out via the probability distributions that describe their respective aggregates. The homogeneous case is straightforward. For heterogeneous data we investigate a number of clustering strategies, of which the most efficient avoid the need to compute a dynamic shared ontology to homogenise the classification schemes prior to clustering.