Introduction to algorithms
C4.5: programs for machine learning
C4.5: programs for machine learning
BOAT—optimistic decision tree construction
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
General and Efficient Multisplitting of Numerical Attributes
Machine Learning
Parallel Formulations of Decision-Tree Classification Algorithms
Data Mining and Knowledge Discovery
RainForest—A Framework for Fast Decision Tree Construction of Large Datasets
Data Mining and Knowledge Discovery
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning
Data Mining and Knowledge Discovery
Use of Contextual Information for Feature Ranking and Discretization
IEEE Transactions on Knowledge and Data Engineering
Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Improved use of continuous attributes in C4.5
Journal of Artificial Intelligence Research
High-performance data mining with skeleton-based structured parallel programming
Parallel Computing - Parallel data-intensive algorithms and applications
On computing the semi-sum of two integers
Information Processing Letters
A meteorological conceptual modeling approach based on spatial data mining and knowledge discovery
IEA/AIE'2005 Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence
Distributed knowledge discovery with the parallel KDDML system
PDCN'06 Proceedings of the 24th IASTED international conference on Parallel and distributed computing and networks
A data mining approach for heavy rainfall forecasting based on satellite image sequence analysis
Computers & Geosciences
A Query-Driven Approach to the Design and Management of Flexible Database Systems
Journal of Management Information Systems
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Some Progress of Supervised Learning
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
Entropy-based associative classification algorithm for mining manufacturing data
International Journal of Computer Integrated Manufacturing
Moving towards efficient decision tree construction
Information Sciences: an International Journal
Fuzzifying Gini Index based decision trees
Expert Systems with Applications: An International Journal
An ontological Proxy Agent with prediction, CBR, and RBR techniques for fast query processing
Expert Systems with Applications: An International Journal
A filter model for feature subset selection based on genetic algorithm
Knowledge-Based Systems
Processing of transcranial doppler for assessment of blood volume loss
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
WISA'06 Proceedings of the 7th international conference on Information security applications: PartI
Learning management system based on SCORM, agents and mining
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Porting decision tree algorithms to multicore using fastflow
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Classification of software artifacts based on structural information
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
Computer Methods and Programs in Biomedicine
Decisions: algebra and implementation
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Research on multi-valued and multi-labeled decision trees
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Intelligent intrusion detection system using fuzzy rough set based C4.5 algorithm
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Research on application of data mining methods to diagnosing gastric cancer
ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
Decision trees: a recent overview
Artificial Intelligence Review
An in-depth analysis on traffic flooding attacks detection and system using data mining techniques
Journal of Systems Architecture: the EUROMICRO Journal
Ontology driven decision support for the diagnosis of mild cognitive impairment
Computer Methods and Programs in Biomedicine
A hybrid decision tree classifier
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
Hi-index | 0.04 |
We present an analytic evaluation of the runtime behavior of the C4.5 algorithm which highlights some efficiency improvements. Based on the analytic evaluation, we have implemented a more efficient version of the algorithm, called EC4.5. It improves on C4.5 by adopting the best among three strategies for computing the information gain of continuous attributes. All the strategies adopt a binary search of the threshold in the whole training set starting from the local threshold computed at a node. The first strategy computes the local threshold using the algorithm of C4.5, which, in particular, sorts cases by means of the quicksort method. The second strategy also uses the algorithm of C4.5, but adopts a counting sort method. The third strategy calculates the local threshold using a main-memory version of the RainForest algorithm, which does not need sorting. Our implementation computes the same decision trees as C4.5 with a performance gain of up to five times.