Quantifiable data mining using ratio rules

Authors:
Flip Korn;Alexandros Labrinidis;Yannis Kotidis;Christos Faloutsos
Affiliations:
AT&T Labs - Research, Florham Park, NJ 07932, USA/ E-mail: flip@research.att.com;University of Maryland, College Park, MD 20742, USA/ E-mail: {labrinid,kotidis}@cs.umd.edu;University of Maryland, College Park, MD 20742, USA/ E-mail: {labrinid,kotidis}@cs.umd.edu;Carnegie Mellon University, Pittsburgh, PA 15213, USA/ E-mail: christos@cs.cmu.edu
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2000

Citing 0
Cited 13

Discovering critical edge sequences in E-commerce catalogs

Proceedings of the 3rd ACM conference on Electronic Commerce
Eureka!: A Tool for Interactive Knowledge Discovery

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Formal logics of discovery and hypothesis formation by machine

Theoretical Computer Science
Mining Adaptive Ratio Rules from Distributed Data Sources

Data Mining and Knowledge Discovery
Beyond streams and graphs: dynamic tensor analysis

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Dissemination of compressed historical information in sensor networks

The VLDB Journal — The International Journal on Very Large Data Bases
Incremental tensor analysis: Theory and applications

ACM Transactions on Knowledge Discovery from Data (TKDD)
Measures of Ruleset Quality Capable to Represent Uncertain Validity

ECSQARU '07 Proceedings of the 9th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Measures of ruleset quality for general rules extraction methods

International Journal of Approximate Reasoning
Automated trend analysis of proteomics data using an intelligent data mining architecture

Expert Systems with Applications: An International Journal
Re-mining positive and negative association mining results

ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Are tensor decomposition solutions unique? on the Global convergence HOSVD and parafac algorithms

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Re-mining item associations: Methodology and a case study in apparel retailing

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Association Rule Mining algorithms operate on a data matrix (e.g., customers $\times$ products) to derive association rules [AIS93b, SA96]. We propose a new paradigm, namely, Ratio Rules, which are quantifiable in that we can measure the “goodness” of a set of discovered rules. We also propose the “guessing error” as a measure of the “goodness”, that is, the root-mean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess missing/hidden values from the Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can “guess” the amount spent on butter. Thus, unlike association rules, Ratio Rules can perform a variety of important tasks such as forecasting, answering “what-if” scenarios, detecting outliers, and visualizing the data. Moreover, we show that we can compute Ratio Rules in a single pass over the data set with small memory requirements (a few small matrices), in contrast to association rule mining methods which require multiple passes and/or large memory. Experiments on several real data sets (e.g., basketball and baseball statistics, biological data) demonstrate that the proposed method: (a) leads to rules that make sense; (b) can find large itemsets in binary matrices, even in the presence of noise; and (c) consistently achieves a “guessing error” of up to 5 times less than using straightforward column averages.