Global Classifier for Confidential Data in Distributed Datasets

Authors:
Omar Jasso-Luna;Victor Sosa-Sosa;Ivan Lopez-Arevalo
Affiliations:
Laboratory of Information Technology, Center for Research and Advanced Studies, Cd. Victoria, Tam., Mexico;Laboratory of Information Technology, Center for Research and Advanced Studies, Cd. Victoria, Tam., Mexico;Laboratory of Information Technology, Center for Research and Advanced Studies, Cd. Victoria, Tam., Mexico
Venue:
MICAI '08 Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Year:
2008

Citing 7
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Induction of Decision Trees

Machine Learning
Web Services Composition for Distributed Data Mining

ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Weka4WS: a WSRF-enabled weka toolkit for distributed data mining on grids

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Adapting the weka data mining toolkit to a grid based environment

AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Everyday, a huge amount of data are produced by many institutions. In most of the cases these data are stored on centralized servers where usually are analyzed to extract knowledge from them. This knowledge is represented by patterns or tendencies that become valuable assets for decision makers. Data analysis requires high performance computing. This situation has motivated the development of Distributed Data Mining (DDM) architectures. DDM uses different distributed data sources to build a global classifier. Building a global classifier implies that all of the data sources be integrated in a unique global dataset. This means that private data have to be shared by every participant. This situation sometimes represents a data privacy intrusion that is not desired by data owners. This paper describes a DDM application where participants work in an interactive way to built a global classifier for data mining process without need sharing the original data. Results show that the global classifier created of this way offers better performance than doing it individually and avoids data privacy intrusion.