Efficiently updating materialized views
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
The weighted majority algorithm
Information and Computation
Machine Learning
Making large-scale support vector machine learning practical
Advances in kernel methods
Competitive randomized algorithms for non-uniform problems
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Efficiently Mining Approximate Models of Associations in Evolving Databases
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Evaluation of Queries with Mining Predicates
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
Extracting predicates from mining models for efficient query evaluation
ACM Transactions on Database Systems (TODS)
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
SVM in oracle database 10g: removing the barriers to widespread adoption of support vector machines
VLDB '05 Proceedings of the 31st international conference on Very large data bases
MauveDB: supporting model-based user views in database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams
ICMLA '06 Proceedings of the 5th International Conference on Machine Learning and Applications
All of Nonparametric Statistics (Springer Texts in Statistics)
All of Nonparametric Statistics (Springer Texts in Statistics)
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
Incremental Support Vector Learning: Analysis, Implementation and Applications
The Journal of Machine Learning Research
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM
Proceedings of the 24th international conference on Machine learning
Efficient query evaluation on probabilistic databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Databases with uncertainty and lineage
The VLDB Journal — The International Journal on Very Large Data Bases
Automatically refining the wikipedia infobox ontology
Proceedings of the 17th international conference on World Wide Web
MCDB: a monte carlo approach to managing uncertain data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Event queries on correlated probabilistic streams
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Fast and Simple Relational Processing of Uncertain Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Online Filtering, Smoothing and Probabilistic Modeling of Streaming data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
PrDB: managing and exploiting rich correlations in probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
Data mining using relational database management systems
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Hi-index | 0.00 |
The proliferation of imprecise data has motivated both researchers and the database industry to push statistical techniques into relational database management systems (RDBMSes). We study strategies to maintain model-based views for a popular statistical technique, classification, inside an RDBMS in the presence of updates (to the set of training examples). We make three technical contributions: (1) A strategy that incrementally maintains classification inside an RDBMS. (2) An analysis of the above algorithm that shows that our algorithm is optimal among all deterministic algorithms (and asymptotically within a factor of 2 of a non-deterministic optimal strategy). (3) A novel hybrid-architecture based on the technical ideas that underlie the above algorithm which allows us to store only a fraction of the entities in memory. We apply our techniques to text processing, and we demonstrate that our algorithms provide an order of magnitude improvement over non-incremental approaches to classification on a variety of data sets, such as the Citeseer and DBLife.