Robust regression and outlier detection
Robust regression and outlier detection
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Histogram-based estimation techniques in database systems
Histogram-based estimation techniques in database systems
Mining the most interesting rules
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Supporting Data Mining of Large Databases by Visual Feedback Queries
Proceedings of the Tenth International Conference on Data Engineering
Analyzing Quantitative Databases: Image is Everything
Proceedings of the 27th International Conference on Very Large Data Bases
Hi-index | 0.00 |
Finding local interrelations (hypotheses) among attributeswithin very large databases of high dimensionalityis an acute problem for many databases and data miningapplications. These include, dependency modeling, clusteringlarge databases, correlation and link analysis.Traditional statistical methods are concerned with the corroborationof (a set of) hypotheses on a given body ofdata. Testing all of the hypotheses that can be generatedfrom a database with millions of records and dozens offields is clearly infeasible. Generating, on the other hand,a set of the most "promising" hypotheses (to be corroborated)requires much intuition and ingenuity.In this paper we present an efficient method for rankingthe multidimensional hypotheses using image processingof data visualization. In the heart of the method lies theuse of visualization techniques and image processing ideasto rank subsets of attributes according to the relation betweenthem in the databases. Some of the scalability issuesare solved by concise generalized histograms and by usingan efficient on-line computation of clustering around amedian with only five additional memory words. In additionto presenting our algorithmic methodology, we demonstrateits efficiency and performance by applying it to realcensus data sets, as well as synthetic data sets.