Incrementally maintaining classification using an RDBMS

Authors:
M. Levent Koc;Christopher Ré
Affiliations:
University of Wisconsin-Madison;University of Wisconsin-Madison
Venue:
Proceedings of the VLDB Endowment
Year:
2011

Citing 32
Cited 0

Efficiently updating materialized views

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
The weighted majority algorithm

Information and Computation
Support-Vector Networks

Machine Learning
Making large-scale support vector machine learning practical

Advances in kernel methods
Competitive randomized algorithms for non-uniform problems

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Machine Learning
Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Efficiently Mining Approximate Models of Associations in Evolving Databases

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Interactive deduplication using active learning

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Evaluation of Queries with Mining Predicates

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Extracting predicates from mining models for efficient query evaluation

ACM Transactions on Database Systems (TODS)
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
SVM in oracle database 10g: removing the barriers to widespread adoption of support vector machines

VLDB '05 Proceedings of the 31st international conference on Very large data bases
MauveDB: supporting model-based user views in database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams

ICMLA '06 Proceedings of the 5th International Conference on Machine Learning and Applications
All of Nonparametric Statistics (Springer Texts in Statistics)

All of Nonparametric Statistics (Springer Texts in Statistics)
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Incremental Support Vector Learning: Analysis, Implementation and Applications

The Journal of Machine Learning Research
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Databases with uncertainty and lineage

The VLDB Journal — The International Journal on Very Large Data Bases
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Event queries on correlated probabilistic streams

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Online Filtering, Smoothing and Probabilistic Modeling of Streaming data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
PrDB: managing and exploiting rich correlations in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Data mining using relational database management systems

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The proliferation of imprecise data has motivated both researchers and the database industry to push statistical techniques into relational database management systems (RDBMSes). We study strategies to maintain model-based views for a popular statistical technique, classification, inside an RDBMS in the presence of updates (to the set of training examples). We make three technical contributions: (1) A strategy that incrementally maintains classification inside an RDBMS. (2) An analysis of the above algorithm that shows that our algorithm is optimal among all deterministic algorithms (and asymptotically within a factor of 2 of a non-deterministic optimal strategy). (3) A novel hybrid-architecture based on the technical ideas that underlie the above algorithm which allows us to store only a fraction of the entities in memory. We apply our techniques to text processing, and we demonstrate that our algorithms provide an order of magnitude improvement over non-incremental approaches to classification on a variety of data sets, such as the Citeseer and DBLife.