Classification spanning correlated data streams

Authors:
Yabo Xu;Ke Wang;Ada Wai-Chee Fu;Rong She;Jian Pei
Affiliations:
Simon Fraser University;Simon Fraser University;The Chinese University of Hong Kong;Simon Fraser University;Simon Fraser University
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 21
Cited 3

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Tracking join and self-join sizes in limited storage

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Building decision tree classifier on private data

CRPIT '14 Proceedings of the IEEE international conference on Privacy, security and data mining - Volume 14
Approximate join processing over data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate decision trees for mining high-speed data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
MAIDS: mining alarming incidents from data streams

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Join-distinct aggregate estimation over update streams

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sequential Pattern Mining in Multiple Streams

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Online clustering of parallel data streams

Data & Knowledge Engineering
Monitoring streams: a new class of data management applications

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Processing sliding window multi-joins in continuous queries over data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Memory-limited execution of windowed stream joins

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Classification spanning private databases

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1

On classifying drifting concepts in P2P networks

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Regression on evolving multi-relational data streams

Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Boosting tuple propagation in multi-relational classification

Proceedings of the 15th Symposium on International Database Engineering & Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many applications, classifiers need to be built based on multiple related data streams. For example, stock streams and news streams are related, where the classification patterns may involve features from both streams. Thus instead of mining on a single isolated stream, we need to examine multiple related data streams in order to find such patterns and build an accurate classifier. Other examples of related streams include traffic reports and car accidents, sensor readings of different types or at different locations, etc. In this paper, we consider the classification problem defined over sliding-window join of several input data streams. As the data streams arrive in fast pace and the many-to-many join relationship blows up the data arrival rate even more, it is impractical to compute the join and then build the classifier each time the window slides forward. We present an efficient algorithm to build a Naïve Bayesian classifier in such context. Our method does not need to perform the join operations but is still able to build exactly the same classifier as if built on the joined result. It only examines each input tuple twice, independent of the number of tuples it joins in other streams, therefore, is able to keep pace with the fast arriving data streams in the presence of many-to-many join relationships. The experiments confirmed that our classification algorithm is more efficient than conventional methods while maintaining good classification accuracy.