Learning very fast decision tree from uncertain data streams with positive and unlabeled samples

Authors:
Chunquan Liang;Yang Zhang;Peng Shi;Zhengguo Hu
Affiliations:
College of Mechanical and Electronic Engineering, Northwest A&F Univ., Shaanxi, China;College of Information Engineering, Northwest A&F Univ., Shaanxi, China and State Key Laboratory for Novel Software Technology, Nanjin Univ., Nanjin, China;Dept. of Computing and Mathematical Sciences, Univ. of Glamorgan, Pontypridd, UK and Sch. of Engineering and Science, Victoria Univ., Melbourne, Vic., Australia;College of Mechanical and Electronic Engineering, Northwest A&F Univ., Shaanxi, China
Venue:
Information Sciences: an International Journal
Year:
2012

Citing 28
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Updating mean and variance estimates: an improved method

Communications of the ACM
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Estimating the Support of a High-Dimensional Distribution

Neural Computation
Learning from positive and unlabeled examples

Theoretical Computer Science - Algorithmic learning theory (ALT 2000)
Learning classifiers from only positive and unlabeled data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised kernel density estimation for video annotation

Computer Vision and Image Understanding
A Survey of Uncertain Data Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
DTU: A Decision Tree for Uncertain Data

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A Rule-Based Classification Algorithm for Uncertain Data

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Decision Trees for Uncertain Data

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
OcVFDT: one-class very fast decision tree for one-class classification of data streams

Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Ambiguous decision trees for mining concept-drifting data streams

Pattern Recognition Letters
Naive Bayes Classification of Uncertain Data

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Handling numeric attributes in hoeffding trees

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Direct mining of discriminative patterns for classifying uncertain data

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Personalized mode transductive spanning SVM classification tree

Information Sciences: an International Journal
Semi-supervised ranking on very large graphs with rich metadata

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Continuous monitoring of skylines over uncertain data streams

Information Sciences: an International Journal
Classifier ensemble for uncertain data stream classification

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Multi-view learning via probabilistic latent semantic analysis

Information Sciences: an International Journal
Evolving fuzzy pattern trees for binary classification on data streams

Information Sciences: an International Journal

FARP: Mining fuzzy association rules from a probabilistic quantitative database

Information Sciences: an International Journal
Learning from data streams with only positive and unlabeled data

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.07

Visualization

Abstract

Most data stream classification algorithms need to supply input with a large amount of precisely labeled data. However, in many data stream applications, streaming data contains inherent uncertainty, and labeled samples are difficult to be collected, while abundant data are unlabeled. In this paper, we focus on classifying uncertain data streams with only positive and unlabeled samples available. Based on concept-adapting very fast decision tree (CVFDT) algorithm, we propose an algorithm namely puuCVFDT (CVFDT for positive and unlabeled uncertain data). Experimental results on both synthetic and real-life datasets demonstrate the strong ability and efficiency of puuCVFDT to handle concept drift with uncertainty under positive and unlabeled learning scenario. Even when 90% of the samples in the stream are unlabeled, the classification performance of the proposed algorithm is still compared to that of CVFDT, which is learned from fully labeled data without uncertainty.