Concept formation in structured domains
Concept formation knowledge and experience in unsupervised learning
Some MAX SNP-hard results concerning unordered labeled trees
Information Processing Letters
Fuzzy sets as a basis for a theory of possibility
Fuzzy Sets and Systems
A State-of-the-Art Survey on Software Merging
IEEE Transactions on Software Engineering
RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets
Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
A Fast Parallel Clustering Algorithm for Large Spatial Databases
Data Mining and Knowledge Discovery
On Clustering Validation Techniques
Journal of Intelligent Information Systems
SAINTETIQ: a fuzzy set-based approach to database summarization
Fuzzy Sets and Systems - Data bases and approximate reasoning
A Supra-Classifier Architecture for Scalable Knowledge Reuse
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
A three-way merge for XML documents
Proceedings of the 2004 ACM symposium on Document engineering
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
General purpose database summarization
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Distributed clustering based on sampling local density estimates
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A survey of schema-based matching approaches
Journal on Data Semantics IV
Summary management in P2P systems
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Hi-index | 0.00 |
The database summarization system coined SaintEtiQ provides multi-resolution summaries of structured data stored into acentralized database. Summaries are computed online with a conceptual hierarchical clustering algorithm. However, most companies work in distributed legacy environments and consequently the current centralized version of SaintEtiQ is either not feasible (privacy preserving) or not desirable (resource limitations). To address this problem, we propose new algorithms to generate a single summary hierarchy given two distinct hierarchies, without scanning the raw data. The Greedy Merging Algorithm (GMA) takes all leaves of both hierarchies and generates the optimal partitioning for the considered data set with regards to a cost function (compactness and separation). Then, a hierarchical organization of summaries is built by agglomerating or dividing clusters such that the cost function may emphasize local or global patterns in the data. Thus, we obtain two different hierarchies according to the performed optimisation. However, this approach breaks down due to its exponential time complexity. Two alternative approaches with constant time complexity w.r.t. the number of data items, are proposed to tackle this problem. The first one, called Merge by Incorporation Algorithm (MIA), relies on the SaintEtiQ engine whereas the second approach, named Merge by Alignment Algorithm (MAA), consists in rearranging summaries by levels in a top-down manner. Then, we compare those approaches using an original quality measure in order to quantify how good our merged hierarchies are. Finally, an experimental study, using real data sets, shows that merging processes (MIA and MAA) are efficient in terms of computational time.