Algorithms for clustering data
Algorithms for clustering data
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data bubbles: quality preserving performance boosting for hierarchical clustering
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining top-n local outliers in large databases
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Large Datasets in Arbitrary Metric Spaces
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Online Hierarchical Clustering in a Data Warehouse Environment
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Multi-step density-based clustering
Knowledge and Information Systems
Fast Single-Link Clustering Method Based on Tolerance Rough Set Model
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Distributed, hierarchical clustering and summarization in sensor networks
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
An efficient clustering algorithm for k-anonymisation
Journal of Computer Science and Technology
Tolerance rough set theory based data summarization for clustering large datasets
Transactions on rough sets XIV
Hi-index | 0.00 |
To speed-up clustering algorithms, data summarization methods have been proposed, which first summarize the data set by computing suitable representative objects. Then, a clustering algorithm is applied to these representatives only, and a clustering structure for the whole data set is derived, based on the result for the representatives. Most previous methods are, however, limited in their application domain. They are in general based on sufficient statistics such as the linear sum of a set of points, which assumes that the data is from a vector space. On the other hand, in many important applications, the data is from a metric non-vector space, and only distances between objects can be exploited to construct effective data summarizations. In this paper, we develop a new data summarization method based only on distance information that can be applied directly to non-vector data. An extensive performance evaluation shows that our method is very effective in finding the hierarchical clustering structure of non-vector data using only a very small number of data summarizations, thus resulting in a large reduction of runtime while trading only very little clustering quality.