A course in density estimation
A course in density estimation
Elements of information theory
Elements of information theory
Efficient agnostic PAC-learning with simple hypothesis
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Machine Learning
On Changing Continuous Attributes into Ordered Discrete Attributes
EWSL '91 Proceedings of the European Working Session on Machine Learning
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
ChiMerge: discretization of numeric attributes
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Towards NIC-based intrusion detection
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Toward Unsupervised Correlation Preserving Discretization
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.00 |
Most current work in data mining assumes that the database is static, and a database update requires rediscovering all the patterns by scanning the entire old and new database. Such approaches can waste a lot of computational and I/O resources, and result in relatively slow response times, to essentially an interactive process. In this paper we address this issue in the context of 2-dimensional discretization within a multiattribute database. Discretization, an important problem in data mining, is typically used to partition the range of continuous attribute(s) into intervals which highlight the behavior of a related discrete attribute. It can be used to build decision trees and to determine appropriate aggregations for On-Line Analytical Processing. We first propose a time-optimal solution to the problem. We then parallelize and incrementalize the algorithm so that it can dynamically maintain the required information even in the presence of data updates without re-executing the algorithm on the entire dataset. Experimental results confirm that our approach results in execution time improvements of up to several orders of magnitude on large datasets.