Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining quantitative association rules in large relational tables
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Mining the stock market (extended abstract): which measure is best?
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Multivariate discretization for set mining
Knowledge and Information Systems
Mining optimized support rules for numeric attributes
Information Systems
Exploring multivariate data using directions of high density
Statistics and Computing
Clustering Gene Expression Data by Mutual Information with Gene Function
ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Linear correlation discovery in databases: a data mining approach
Data & Knowledge Engineering
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Bioinformatics
Active learning with multiple views
Journal of Artificial Intelligence Research
Permutation tests for classification
COLT'05 Proceedings of the 18th annual conference on Learning Theory
The curse of dimensionality in data mining and time series prediction
IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems
Hi-index | 0.00 |
An algorithm is presented for finding patterns between sets of continuous attributes and item sets. In contrast to most pattern mining approaches, the algorithm considers multiple continuous attributes as a single vector attribute. This approach results in a separate abstraction level and allows multiple vector attributes to be considered. We show that the pattern mining process can uncover relationships between the vector data and item sets. Filtering according to these patterns can be seen as feature selection at the level of the vector attributes as opposed to individual continuous attributes. In the evaluation, we show that the pattern mining algorithm can more effectively and efficiently achieve this filtering than a direct application of classification algorithms. Patterns are identified by relating item data to the distribution of objects within the vector space that is spanned by the sets of continuous attributes. The Kullback-Leibler divergence provides a quantitative measure that establishes whether the subset defined by an item set differs from the overall distribution of data points. The set-subset relationship of data points, which violates i.i.d assumptions, requires an adaptation of standard algorithms for computing the Kullback-Leibler divergence. The algorithm is evaluated on gene expression data and on a classification example problem that is constructed from time series data.