On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Building decision tree classifier on private data
CRPIT '14 Proceedings of the IEEE international conference on Privacy, security and data mining - Volume 14
Approximate join processing over data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate decision trees for mining high-speed data streams
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
MAIDS: mining alarming incidents from data streams
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Join-distinct aggregate estimation over update streams
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sequential Pattern Mining in Multiple Streams
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Online clustering of parallel data streams
Data & Knowledge Engineering
Monitoring streams: a new class of data management applications
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
StatStream: statistical monitoring of thousands of data streams in real time
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Processing sliding window multi-joins in continuous queries over data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Memory-limited execution of windowed stream joins
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Classification spanning private databases
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
On classifying drifting concepts in P2P networks
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Regression on evolving multi-relational data streams
Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Boosting tuple propagation in multi-relational classification
Proceedings of the 15th Symposium on International Database Engineering & Applications
Hi-index | 0.00 |
In many applications, classifiers need to be built based on multiple related data streams. For example, stock streams and news streams are related, where the classification patterns may involve features from both streams. Thus instead of mining on a single isolated stream, we need to examine multiple related data streams in order to find such patterns and build an accurate classifier. Other examples of related streams include traffic reports and car accidents, sensor readings of different types or at different locations, etc. In this paper, we consider the classification problem defined over sliding-window join of several input data streams. As the data streams arrive in fast pace and the many-to-many join relationship blows up the data arrival rate even more, it is impractical to compute the join and then build the classifier each time the window slides forward. We present an efficient algorithm to build a Naïve Bayesian classifier in such context. Our method does not need to perform the join operations but is still able to build exactly the same classifier as if built on the joined result. It only examines each input tuple twice, independent of the number of tuples it joins in other streams, therefore, is able to keep pace with the fast arriving data streams in the presence of many-to-many join relationships. The experiments confirmed that our classification algorithm is more efficient than conventional methods while maintaining good classification accuracy.