Learning from data streams with only positive and unlabeled data

Authors:
Xiangju Qin;Yang Zhang;Chen Li;Xue Li
Affiliations:
College of Information Engineering, Northwest A&F University, Yangling, China;College of Information Engineering, Northwest A&F University, Yangling, China and State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China;College of Information Engineering, Northwest A&F University, Yangling, China;School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia
Venue:
Journal of Intelligent Information Systems
Year:
2013

Citing 39
Cited 1

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
C4.5: programs for machine learning

C4.5: programs for machine learning
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
Space-efficient online computation of quantile summaries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
PAC Learning from Positive Statistical Queries

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
General MC: Estimating Boundary of Positive Class from Small Positive Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate decision trees for mining high-speed data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient decision tree construction on streaming data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Text classification from positive and unlabeled documents

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Functional Trees

Machine Learning
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Dynamic Classifier Selection for Effective Mining from Noisy Data Streams

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Single-Class Classification with Mapping Convergence

Machine Learning
Estimating the Support of a High-Dimensional Distribution

Neural Computation
Learning from positive and unlabeled examples

Theoretical Computer Science - Algorithmic learning theory (ALT 2000)
An automatic construction and organization strategy for ensemble learning on data streams

ACM SIGMOD Record
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Decision trees for mining data streams

Intelligent Data Analysis
Adaptive-Size Reservoir Sampling over Data Streams

SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Learning Bayesian classifiers from positive and unlabeled examples

Pattern Recognition Letters
Learning classifiers from only positive and unlabeled data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Categorizing and mining concept drifting data streams

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Active Learning from Data Streams

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
One-Class Classification of Text Streams with Concept Drift

ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
New ensemble methods for evolving data streams

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
OcVFDT: one-class very fast decision tree for one-class classification of data streams

Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Adaptive Learning from Evolving Data Streams

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Mining Data Streams with Labeled and Unlabeled Training Examples

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Handling numeric attributes in hoeffding trees

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
One-class learning and concept summarization for data streams

Knowledge and Information Systems - Special Issue on Data Warehousing and Knowledge Discovery from Sensors and Streams
Learning very fast decision tree from uncertain data streams with positive and unlabeled samples

Information Sciences: an International Journal

A similarity-based approach for data stream classification

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many studies on streaming data classification have been based on a paradigm in which a fully labeled stream is available for learning purposes. However, it is often too labor-intensive and time-consuming to manually label a data stream for training. This difficulty may cause conventional supervised learning approaches to be infeasible in many real world applications, such as credit fraud detection, intrusion detection, and rare event prediction. In previous work, Li et al. suggested that these applications be treated as Positive and Unlabeled learning problem, and proposed a learning algorithm, OcVFD, as a solution (Li et al. 2009). Their method requires only a set of positive examples and a set of unlabeled examples which is easily obtainable in a streaming environment, making it widely applicable to real-life applications. Here, we enhance Li et al.'s solution by adding three features: an efficient method to estimate the percentage of positive examples in the training stream, the ability to handle numeric attributes, and the use of more appropriate classification methods at tree leaves. Experimental results on synthetic and real-life datasets show that our enhanced solution (called PUVFDT) has very good classification performance and a strong ability to learn from data streams with only positive and unlabeled examples. Furthermore, our enhanced solution reduces the learning time of OcVFDT by about an order of magnitude. Even with 80 % of the examples in the training data stream unlabeled, PUVFDT can still achieve a competitive classification performance compared with that of VFDTcNB (Gama et al. 2003), a supervised learning algorithm.