Distributed mining of classification rules

Authors:
Vincent Cho;Beat Wüthrich
Affiliations:
Department of Management, Hong Kong Polytechnic University, Hong Kong;Human Resources, UBS Abu, Financial Service Group, Switzerland
Venue:
Knowledge and Information Systems
Year:
2002

Citing 30
Cited 1

Simplifying decision trees

International Journal of Man-Machine Studies - Special Issue: Knowledge Acquisition for Knowledge-based Systems. Part 5
The induction of probabilistic rule sets—the Itrule algorithm

Proceedings of the sixth international workshop on Machine learning
Original Contribution: Stacked generalization

Neural Networks
C4.5: programs for machine learning

C4.5: programs for machine learning
Duality aspects of the Gini index for general information production processes

Information Processing and Management: an International Journal - Special issue on Informetrics
On the expressive power of query languages

ACM Transactions on Information Systems (TOIS)
Finding interesting rules from large sets of discovered association rules

CIKM '94 Proceedings of the third international conference on Information and knowledge management
Multivariate data analysis (4th ed.): with readings

Multivariate data analysis (4th ed.): with readings
The KDD process for extracting useful knowledge from volumes of data

Communications of the ACM
On the Accuracy of Meta-learning for Scalable Data Mining

Journal of Intelligent Information Systems
SiteHelper: a localized agent that helps incremental exploration of the World Wide Web

Selected papers from the sixth international conference on World Wide Web
Data Compression and Local Metrics for Nearest Neighbor Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
An introduction to deductive database languages and systems

The VLDB Journal — The International Journal on Very Large Data Bases - Prototypes of deductive database systems
Probabilistic Knowledge Bases

IEEE Transactions on Knowledge and Data Engineering
Efficient Mining of Association Rules in Distributed Databases

IEEE Transactions on Knowledge and Data Engineering
Visualization Support for Data Mining

IEEE Expert: Intelligent Systems and Their Applications
Learning Logical Definitions from Relations

Machine Learning
The CN2 Induction Algorithm

Machine Learning
Data-Driven Discovery of Quantitative Rules in Relational Databases

IEEE Transactions on Knowledge and Data Engineering
Inductive Learning in Deductive Databases

IEEE Transactions on Knowledge and Data Engineering
Knowledge Discovery from Telecommunication Network Alarm Databases

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Mining Knowledge Rules from Databases: A Rough Set Approach

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Modeling Uncertainty in Deductive Databases

DEXA '94 Proceedings of the 5th International Conference on Database and Expert Systems Applications
Finding the Most Similar Documents across Multiple Text Databases

ADL '99 Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries
Novel parallel join algorithms for grid files

HIPC '96 Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)
Speech recognition in parallel

HLT '89 Proceedings of the workshop on Speech and Natural Language
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research
Bagging, boosting, and C4.S

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Multirelational classification: a multiple view approach

Knowledge and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many successful data-mining techniques and systems have been developed. These techniques usually apply to centralized databases with less restricted requirements on learning and response time. Not so much effort has yet been put into mining distributed databases and real-time issues. In this paper, we investigate issues of fast-distributed data mining. We assume that merging the distributed databases into a single one would either be too costly (distributed case) or the individual fragments would be non-uniform so that mining only one fragment would bias the result (fragmented case). The goal is to classify the objects O of the database into one of several mutually exclusive classes Ci. Our approach to make mining fast and feasible is as follows. From each data site or fragment dbk, only a single rule rik is generated for each class Ci. A small subset {ri1,.....,rih} of these individual rules is selected to form a rule set Ri for each class Ci. These rule subsets represent adequately the hidden knowledge of the entire database. Various selection criteria to form Ri are discussed, both theoretically and experimentally.