Point-distribution algorithm for mining vector-item patterns

Authors:
Anne M. Denton;Jianfei Wu;Dietmar H. Dorr
Affiliations:
North Dakota State University, Fargo, ND;North Dakota State University, Fargo, ND;Research and Development Thomson Reuters, St. Paul, MN
Venue:
Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Year:
2010

Citing 21
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Glossary of Terms

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Mining the stock market (extended abstract): which measure is best?

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Multivariate discretization for set mining

Knowledge and Information Systems
Mining optimized support rules for numeric attributes

Information Systems
Exploring multivariate data using directions of high density

Statistics and Computing
Clustering Gene Expression Data by Mutual Information with Gene Function

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Relation Between Permutation-Test P Values and Classifier Error Estimates

Machine Learning
A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-View Clustering

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Linear correlation discovery in databases: a data mining approach

Data & Knowledge Engineering
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Kernel-Density-Based Clustering of Time Series Subsequences Using a Continuous Random-Walk Noise Model

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
GSEA-P

Bioinformatics
Active learning with multiple views

Journal of Artificial Intelligence Research
Permutation tests for classification

COLT'05 Proceedings of the 18th annual conference on Learning Theory
The curse of dimensionality in data mining and time series prediction

IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

An algorithm is presented for finding patterns between sets of continuous attributes and item sets. In contrast to most pattern mining approaches, the algorithm considers multiple continuous attributes as a single vector attribute. This approach results in a separate abstraction level and allows multiple vector attributes to be considered. We show that the pattern mining process can uncover relationships between the vector data and item sets. Filtering according to these patterns can be seen as feature selection at the level of the vector attributes as opposed to individual continuous attributes. In the evaluation, we show that the pattern mining algorithm can more effectively and efficiently achieve this filtering than a direct application of classification algorithms. Patterns are identified by relating item data to the distribution of objects within the vector space that is spanned by the sets of continuous attributes. The Kullback-Leibler divergence provides a quantitative measure that establishes whether the subset defined by an item set differs from the overall distribution of data points. The set-subset relationship of data points, which violates i.i.d assumptions, requires an adaptation of standard algorithms for computing the Kullback-Leibler divergence. The algorithm is evaluated on gene expression data and on a classification example problem that is constructed from time series data.