Mining Adaptive Ratio Rules from Distributed Data Sources

Authors:
Jun Yan;Ning Liu;Qiang Yang;Benyu Zhang;Qiansheng Cheng;Zheng Chen
Affiliations:
LMAM, Department of Information Science, School of Mathematical Science, Peking University, Beijing, P.R. China 100871;Department of Mathematical Science, Tsinghua University, Tsinghua, P.R. China 100084;Department of Computer Science, Hong Kong University of Science and Technology, HongKong, P.R. China;Microsoft Research Asia, Beijing, P.R. China 100080;LMAM, Department of Information Science, School of Mathematical Science, Peking University, Beijing, P.R. China 100871;Microsoft Research Asia, Beijing, P.R. China 100080
Venue:
Data Mining and Knowledge Discovery
Year:
2006

Citing 16
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Original Contribution: Least mean square error reconstruction principle for self-organizing neural-nets

Neural Networks
Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A statistical theory for quantitative association rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A Framework for Analysis of Data Quality Research

IEEE Transactions on Knowledge and Data Engineering
Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Quantifiable data mining using ratio rules

The VLDB Journal — The International Journal on Very Large Data Bases
Robust and efficient fuzzy match for online data cleaning

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Candid Covariance-Free Incremental Principal Component Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
IMMC: incremental maximum margin criterion

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Database classification for multi-database mining

Information Systems
Robust error measure for supervised neural network learning with outliers

IEEE Transactions on Neural Networks
Robust principal component analysis by self-organizing rules based on statistical physics approach

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Different from traditional association-rule mining, a new paradigm called Ratio Rule (RR) was proposed recently. Ratio rules are aimed at capturing the quantitative association knowledge, We extend this framework to mining ratio rules from distributed and dynamic data sources. This is a novel and challenging problem. The traditional techniques used for ratio rule mining is an eigen-system analysis which can often fall victim to noise. This has limited the application of ratio rule mining greatly. The distributed data sources impose additional constraints for the mining procedure to be robust in the presence of noise, because it is difficult to clean all the data sources in real time in real-world tasks. In addition, the traditional batch methods for ratio rule mining cannot cope with dynamic data. In this paper, we propose an integrated method to mining ratio rules from distributed and changing data sources, by first mining the ratio rules from each data source separately through a novel robust and adaptive one-pass algorithm (which is called Robust and Adaptive Ratio Rule (RARR)), and then integrating the rules of each data source in a simple probabilistic model. In this way, we can acquire the global rules from all the local information sources adaptively. We show that the RARR technique can converge to a fixed point and is robust as well. Moreover, the integration of rules is efficient and effective. Both theoretical analysis and experiments illustrate that the performance of RARR and the proposed information integration procedure is satisfactory for the purpose of discovering latent associations in distributed dynamic data source.