BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Clustering Categorical Data: An Approach Based on Dynamical Systems
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Multi-objective phylogenetic algorithm: solving multi-objective decomposable deceptive problems
EMO'11 Proceedings of the 6th international conference on Evolutionary multi-criterion optimization
PrefixUnion: mining traversal patterns efficiently in virtual environments
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part III
An improvement algorithm for accessing patterns through clustering in interactive VRML environments
PCM'04 Proceedings of the 5th Pacific Rim conference on Advances in Multimedia Information Processing - Volume Part III
S2MP: similarity measure for sequential patterns
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Hi-index | 0.00 |
Clustering is a data mining method, which consists in discovering interesting data distributions in very large databases. The applications of clustering cover customer segmentation, catalog design, store layout, stock market segmentation, etc. In this paper, we consider the problem of discovering similarity-based clusters in a large database of event sequences. We introduce a hierarchical algorithm that uses sequential patterns found in the database to efficiently generate both the clustering model and data clusters. The algorithm iteratively merges smaller, similar clusters into bigger ones until the requested number of clusters is reached. In the absence of a well-defined metric space, we propose the similarity measure, which is used in cluster merging. The advantage of the proposed measure is that no additional access to the source database is needed to evaluate the inter-cluster similarities.