Data mining of vector–item patterns using neighborhood histograms

Authors:
Anne M. Denton;Jianfei Wu
Affiliations:
North Dakota State University, Department of Computer Science and Operations Research, 58108-6050, Fargo, ND, USA;North Dakota State University, Department of Computer Science and Operations Research, 58108-6050, Fargo, ND, USA
Venue:
Knowledge and Information Systems
Year:
2009

Citing 27
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
Re-designing distance functions and distance-based applications for high dimensional data

ACM SIGMOD Record
Mining optimized support rules for numeric attributes

Information Systems
Discovering local structure in gene expression data: the order-preserving submatrix problem

Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Relation Between Permutation-Test P Values and Classifier Error Estimates

Machine Learning
Parallel coordinates: a tool for visualizing multi-dimensional geometry

VIS '90 Proceedings of the 1st conference on Visualization '90
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Linear correlation discovery in databases: a data mining approach

Data & Knowledge Engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Automatic Subspace Clustering of High Dimensional Data

Data Mining and Knowledge Discovery
Clustering short time series gene expression data

Bioinformatics
A knowledge-driven approach to cluster validity assessment

Bioinformatics
Kernel-Density-Based Clustering of Time Series Subsequences Using a Continuous Random-Walk Noise Model

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Discovering significant OPSM subspace clusters in massive gene expression data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Effective similarity measures for expression profiles

Bioinformatics
In search of meaning for time series subsequence clustering: matching algorithms based on a new distance measure

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Making clustering in delay-vector space meaningful

Knowledge and Information Systems
Mining gene–sample–time microarray data: a coherent gene cluster discovery approach

Knowledge and Information Systems
Pattern-based time-series subsequence clustering using radial distribution functions

Knowledge and Information Systems
Why does subsequence time-series clustering produce sine waves?

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Permutation tests for classification

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

The representation of multiple continuous attributes as dimensions in a vector space has been among the most influential concepts in machine learning and data mining. We consider sets of related continuous attributes as vector data and search for patterns that relate a vector attribute to one or more items. The presence of an item set defines a subset of vectors that may or may not show unexpected density fluctuations. We test for fluctuations by studying density histograms. A vector–item pattern is considered significant if its density histogram significantly differs from what is expected for a random subset of transactions. Using two different density measures, we evaluate the algorithm on two real data sets and one that was artificially constructed from time series data.