IEEE Transactions on Software Engineering - Special issue on computer security and privacy
International Journal of Man-Machine Studies - Special Issue: Knowledge Acquisition for Knowledge-based Systems. Part 5
C4.5: programs for machine learning
C4.5: programs for machine learning
A survey of intrusion detection techniques
Computers and Security
Machine Learning
Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Enhancements to the data mining process
Enhancements to the data mining process
Mining in a data-flow environment: experience in network intrusion detection
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
A framework for constructing features and models for intrusion detection systems
ACM Transactions on Information and System Security (TISSEC)
Instance Selection and Construction for Data Mining
Instance Selection and Construction for Data Mining
Mathematical Programming in Data Mining
Data Mining and Knowledge Discovery
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A Dynamic Programming Based Pruning Method for Decision Trees
INFORMS Journal on Computing
On learning to predict web traffic
Decision Support Systems - Special issue: Web data mining
Multivariate decision trees using linear discriminants and tabu search
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Metadata and its impact on libraries: Book Reviews
Journal of the American Society for Information Science and Technology
ACM Transactions on Computer-Human Interaction (TOCHI)
Context-based market basket analysis in a multiple-store environment
Decision Support Systems
Computers and Industrial Engineering
Mining students' behavior in web-based learning programs
Expert Systems with Applications: An International Journal
A decision support system for detecting products missing from the shelf based on heuristic rules
Decision Support Systems
Improving the performance of minor class in decision tree using duplicating instances
AIKED'11 Proceedings of the 10th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases
Hi-index | 0.00 |
One of the most challenging problems in data mining is to develop scalable algorithms capable of mining massive data sets whose sizes exceed the capacity of a computer's memory. In this paper, we propose a new decision tree algorithm, named SURPASS (for Scaling Up Recursive Partitioning with Sufficient Statistics), that is highly effective in handling such large data. SURPASS incorporates linear discriminants into decision trees' recursive partitioning process. In SURPASS, the information required to build a decision tree is summarized into a set of sufficient statistics, which can be gathered incrementally from the data, by reading a subset of the data from storage space to main memory one at a time. As a result, the data size that can be handled by this algorithm is independent of memory size. We apply SURPASS to three large data sets pertaining to pattern recognition and intrusion detection problems. The results indicate that SURPASS scales up well against large data sets and produces decision tree models with very high quality.