A new distributed data mining model based on similarity

Authors:
Tao Li;Shenghuo Zhu;Mitsunori Ogihara
Affiliations:
University of Rochester, Rochester, NY;University of Rochester, Rochester, NY;University of Rochester, Rochester, NY
Venue:
Proceedings of the 2003 ACM symposium on Applied computing
Year:
2003

Citing 12
Cited 10

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Distributed cooperative Bayesian learning strategies

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
A framework for measuring changes in data characteristics

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Advances in Distributed and Parallel Knowledge Discovery

Advances in Distributed and Parallel Knowledge Discovery
Efficient Mining of Association Rules in Distributed Databases

IEEE Transactions on Knowledge and Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Finding Similar Time Series

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Towards Real Time Discovery from Distributed Information Sources

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
Distributed data mining of probabilistic knowledge

ICDCS '97 Proceedings of the 17th International Conference on Distributed Computing Systems (ICDCS '97)
An Algorithm for Non-Distance Based Clustering in High Dimensional Spaces

An Algorithm for Non-Distance Based Clustering in High Dimensional Spaces

Association-based similarity testing and its applications

Intelligent Data Analysis
Higher order mining

ACM SIGKDD Explorations Newsletter
Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system

Expert Systems with Applications: An International Journal
Research of distributed data mining association rules model based on similarity

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: applications and services
Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters

GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
A novel parallel algorithm for frequent pattern mining with privacy preserved in cloud computing environments

International Journal of Ad Hoc and Ubiquitous Computing
Things to know about a (dis)similarity measure

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I
ROC analysis as a useful tool for performance evaluation of artificial neural networks

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II
Product recommendation with temporal dynamics

Expert Systems with Applications: An International Journal
Load balancing approach parallel algorithm for frequent pattern mining

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed Data Mining (DDM) has been very active and enjoying a growing amount attention since its inception. Current DDM techniques regard the distributed data sets as a single virtual table and assume there exists a global model which could be generated if the data were combined/centralized. This paper proposes a similarity-based distributed data mining(SBDDM) framework which explicitly take the differences among distributed sources into consideration. A new similarity measure is introduced and its effectiveness is then evaluated and validated. This paper also illustrates the limitations of current DDM techniques through three concrete case studies. Finally distributed clustering within the SBDDM framework is also discussed.