Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Generating non-redundant association rules
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering by pattern similarity in large data sets
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Mining sequential patterns with constraints in large databases
Proceedings of the eleventh international conference on Information and knowledge management
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
OP-Cluster: Clustering by Tendency in High Dimensional Space
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining coherent gene clusters from gene-sample-time microarray data
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining condensed frequent-pattern bases
Knowledge and Information Systems
Continuously identifying representatives out of massive streams
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Hi-index | 0.00 |
Mining high dimensional data is an urgent problem of great practical importance. Although some data mining models such as frequent patterns and clusters have been proven to be very successful for analyzing very large data sets, they have some limitations. Frequent patterns are inadequate to describe the quantitative correlations among nominal members. Traditional cluster models ignore distances of some pairs of members, so a pair of members in one big cluster may be far away. As a combination and complementary of both techniques, we propose the Maximal-Correlated-Member-Cluster (MCMC) model in this paper. The MCMC model is based on a statistical measure reflecting the relationship of nominal variables, and every pair of members in one cluster satisfy unified constraints. Moreover, in order to improve algorithm's efficiency, we introduce pruning techniques to reduce the search space. In the first phase, a Tri-correlation inequation is used to eliminate unrelated member pairs, and in the second phase, an Inverse-Order-Enumeration-Tree (IOET) method is designed to share common computations. Experiments over both synthetic datasets and real life datasets are performed to examine our algorithm's performance. The results show that our algorithm has much higher efficiency than the naïve algorithm, and this model can discover meaningful correlated patterns in high dimensional database.