Communications of the ACM - Special issue on parallelism
Algorithms for clustering data
Algorithms for clustering data
Concept formation in structured domains
Concept formation knowledge and experience in unsupervised learning
Conceptual clustering in a first order logic representation
ECAI '92 Proceedings of the 10th European conference on Artificial intelligence
C4.5: programs for machine learning
C4.5: programs for machine learning
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A Polynomial Approach to the Constructive Induction of Structural Knowledge
Machine Learning - Special issue on evaluating and changing representation
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
UML distilled: applying the standard object modeling language
UML distilled: applying the standard object modeling language
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques
Data mining: concepts and techniques
An interference matching technique for inducing abstractions
Communications of the ACM
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
Clustering Categorical Data: An Approach Based on Dynamical Systems
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Clustering categorical data: an approach based on dynamical systems
The VLDB Journal — The International Journal on Very Large Data Bases
Similarity Queries in Image Databases
CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
Unifying representation and generalization: understanding hierarchically structured objects
Unifying representation and generalization: understanding hierarchically structured objects
Improved heterogeneous distance functions
Journal of Artificial Intelligence Research
Network intrusion detection: Evaluating cluster, discriminant, and logit analysis
Information Sciences: an International Journal
Intelligent physician segmentation and management based on KDD approach
Expert Systems with Applications: An International Journal
Discovering frequent itemsets by support approximation and itemset clustering
Data & Knowledge Engineering
Clustering high dimensional data: A graph-based relaxed optimization approach
Information Sciences: an International Journal
Classification by clustering decision tree-like classifier based on adjusted clusters
Expert Systems with Applications: An International Journal
Classification by clustering decision tree-like classifier based on adjusted clusters
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Clustering is a popular data analysis and data mining technique. However, applying traditional clustering algorithms directly to a database is not straightforward due to the fact that a database usually consists of structured and related data; moreover, there might be several object views of the database to be clustered, depending on a data analyst's particular interest. Finally, in many cases, there is a data model discrepancy between the format used to store the database to be analyzed and the representation format that clustering algorithms expect as their input. These discrepancies have been mostly ignored by current research.This paper focuses on identifying those discrepancies and on analyzing their impact on the application of clustering techniques to databases. We are particularly interested in the question on how clustering algorithms can be generalized to become more directly applicable to real-world databases. The paper introduces methodologies, techniques, and tools that serve this purpose. We propose a data set representation framework for database clustering that characterizes objects to be clustered through sets of tuples, and introduce preprocessing techniques and tools to generate object views based on this framework. Moreover, we introduce bag-oriented similarity measures and clustering algorithms that are suitable for our proposed data set representation framework. We also demonstrate that our approach is capable of dealing with relationship information commonly found in databases through the bag-oriented clustering. We also argue that our bag-oriented data representation framework is more suitable for database clustering than the commonly used flat file format and produce better quality of clusters.