Machine learning, neural and statistical classification
Machine learning, neural and statistical classification
Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Mining optimized association rules for numeric attributes
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Decision Tree Induction Based on Efficient Tree Restructuring
Machine Learning
Incremental Induction of Decision Trees
Machine Learning
Machine Learning
Database Mining: A Performance Perspective
IEEE Transactions on Knowledge and Data Engineering
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
An Interval Classifier for Database Mining Applications
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On growing better decision trees from data
On growing better decision trees from data
Classification and regression: money *can* grow on trees
KDD '99 Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable algorithms for mining large databases
KDD '99 Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining and the Web: past, present and future
Proceedings of the 2nd international workshop on Web information and data management
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient algorithms for constructing decision trees with constraints
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Targeting the right students using data mining
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering through decision tree construction
Proceedings of the ninth international conference on Information and knowledge management
On computing correlated aggregates over continual data streams
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Scalable data mining with model constraints
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
SQL database primitives for decision tree classifiers
Proceedings of the tenth international conference on Information and knowledge management
Mining data streams under block evolution
ACM SIGKDD Explorations Newsletter
MobiMine: monitoring the stock market from a PDA
ACM SIGKDD Explorations Newsletter
DEMON: Mining and Monitoring Evolving Data
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering
On the quest for easy-to-understand splitting rules
Data & Knowledge Engineering
Efficiently Determining the Starting Sample Size for Progressive Sampling
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Knowledge Management in Expert System Creator
AIMSA '02 Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications
Shared Memory Parallelization of Decision Tree Construction Using a General Data Mining Middleware
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Building an Information and Knowledge Fusion System
Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
Decision Trees for Multiple Abstraction Levels of Data
CIA '01 Proceedings of the 5th International Workshop on Cooperative Information Agents V
Efficient Data Mining by Active Learning
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
On effective classification of strings with wavelets
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Instability of decision tree classification algorithms
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
B-EM: a classifier incorporating bootstrap with EM approach for data mining
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining and monitoring evolving data
Handbook of massive data sets
Cancer classification using gene expression data
Information Systems - Special issue: Data management in bioinformatics
Scoring and ranking the data using association rules
Data mining, rough sets and granular computing
Is random model better? On its accuracy and efficiency
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Prototype-based mining of numeric data streams
Proceedings of the 2003 ACM symposium on Applied computing
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
XRules: an effective structural classifier for XML data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient decision tree construction on streaming data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
ART: A Hybrid Classification Model
Machine Learning
IEEE Transactions on Knowledge and Data Engineering
Discovering decision rules from numerical data streams
Proceedings of the 2004 ACM symposium on Applied computing
Incremental, Online, and Merge Mining of Partial Periodic Patterns in Time-Series Databases
IEEE Transactions on Knowledge and Data Engineering
Automatic categorization of query results
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Genetic programming in classifying large-scale data: an ensemble method
Information Sciences: an International Journal - Special issue: Soft computing data mining
Building multi-way decision trees with numerical attributes
Information Sciences: an International Journal
IEEE Transactions on Knowledge and Data Engineering
On the Use of Wavelet Decomposition for String Classification
Data Mining and Knowledge Discovery
Hierarchical Decision Tree Induction in Distributed Genomic Databases
IEEE Transactions on Knowledge and Data Engineering
On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A Framework for On-Demand Classification of Evolving Data Streams
IEEE Transactions on Knowledge and Data Engineering
Suppressing model overfitting in mining concept-drifting data streams
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Incorporating large unlabeled data to enhance EM classification
Journal of Intelligent Information Systems
Vector and matrix operations programmed with UDFs in a relational DBMS
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Addressing diverse user preferences in SQL-query-result navigation
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Building statistical models and scoring with UDFs
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Reverse nearest neighbor aggregates over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Workload-aware anonymization techniques for large-scale datasets
ACM Transactions on Database Systems (TODS)
Blind paraunitary equalization
Signal Processing
A New Incremental Algorithm for Induction of Multivariate Decision Trees for Large Datasets
IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Expert Systems with Applications: An International Journal
Cancer classification using microarray and layered architecture genetic programming
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Concept Drifting Detection on Noisy Streaming Data in Random Ensemble Decision Trees
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Inductive learning in less than one sequential data scan
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
General criteria on building decision trees for data classification
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
An ensemble approach applied to classify spam e-mails
Expert Systems with Applications: An International Journal
HE-Tree: a framework for detecting changes in clustering structure for categorical data streams
The VLDB Journal — The International Journal on Very Large Data Bases
PLANET: massively parallel learning of tree ensembles with MapReduce
Proceedings of the VLDB Endowment
Association rule mining in multiple, multidimensional time series medical data
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Discovering conjecturable rules through tree-based clustering analysis
Expert Systems with Applications: An International Journal
Fast UDFs to compute sufficient statistics on large data sets exploiting caching and sampling
Data & Knowledge Engineering
A Streaming Parallel Decision Tree Algorithm
The Journal of Machine Learning Research
Mining distributed evolving data streams using fractal GP ensembles
EuroGP'07 Proceedings of the 10th European conference on Genetic programming
WISA'06 Proceedings of the 7th international conference on Information security applications: PartI
Database implementation of a model-free classifier
ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
BOAI: fast alternating decision tree induction based on bottom-up evaluation
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A granular agent evolutionary algorithm for classification
Applied Soft Computing
The inverse classification problem
Journal of Computer Science and Technology
A novel sequential design strategy for global surrogate modeling
Winter Simulation Conference
Effective sentiment stream analysis with self-augmenting training and demand-driven projection
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A Novel Hybrid Sequential Design Strategy for Global Surrogate Modeling of Computer Experiments
SIAM Journal on Scientific Computing
Mining Recurring Concept Drifts with Limited Labeled Streaming Data
ACM Transactions on Intelligent Systems and Technology (TIST)
Evaluation of summarization schemes for learning in streams
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Multivariate decision trees using different splitting attribute subsets for large datasets
AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Induction of decision trees using an internal control of induction
IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems
Scalable random forests for massive data
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Decision tree selection in an industrial machine fault diagnostics
MEDI'12 Proceedings of the 2nd international conference on Model and Data Engineering
An Efficient Method for Discretizing Continuous Attributes
International Journal of Data Warehousing and Mining
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Building fast decision trees from large training sets
Intelligent Data Analysis
Hi-index | 0.00 |
Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision trees. All current algorithms to construct decision trees, including all main-memory algorithms, make one scan over the training database per level of the tree.We introduce a new algorithm (BOAT) for decision tree construction that improves upon earlier algorithms in both performance and functionality. BOAT constructs several levels of the tree in only two scans over the training database, resulting in an average performance gain of 300% over previous work. The key to this performance improvement is a novel optimistic approach to tree construction in which we construct an initial tree using a small subset of the data and refine it to arrive at the final tree. We guarantee that any difference with respect to the “real” tree (i.e., the tree that would be constructed by examining all the data in a traditional way) is detected and corrected. The correction step occasionally requires us to make additional scans over subsets of the data; typically, this situation rarely arises, and can be addressed with little added cost.Beyond offering faster tree construction, BOAT is the first scalable algorithm with the ability to incrementally update the tree with respect to both insertions and deletions over the dataset. This property is valuable in dynamic environments such as data warehouses, in which the training dataset changes over time. The BOAT update operation is much cheaper than completely rebuilding the tree, and the resulting tree is guaranteed to be identical to the tree that would be produced by a complete re-build.