A statistical perspective on knowledge discovery in databases
Advances in knowledge discovery and data mining
Data mining: concepts and techniques
Data mining: concepts and techniques
Mastering Data Mining: The Art and Science of Customer Relationship Management
Mastering Data Mining: The Art and Science of Customer Relationship Management
Statistical Themes and Lessons for Data Mining
Data Mining and Knowledge Discovery
A Sequential Monte Carlo Method for Bayesian Analysis of Massive Datasets
Data Mining and Knowledge Discovery
Optimal Time-Space Trade-Offs for Sorting
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
The Journal of Machine Learning Research
Regression analysis for massive datasets
Data & Knowledge Engineering
Comparison of approaches for estimating reliability of individual regression predictions
Data & Knowledge Engineering
Semantic information integration and question answering based on pervasive agent ontology
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
According to Lindley's paradox, most point null hypotheses will be rejected when the sample size is too large. In this paper, a two-stage block testing procedure is proposed for massive data regression analysis. New variables selection criteria incorporating with classical stepwise procedure are also developed to select significant explanatory variables. Our approach is not only simple in computation for massive data but also confirmed by the simulation study that our approach is more accurate in the sense of achieving the nominal significance level for huge data sets. A real example with moderate sample size verifies that the proposed procedure is accurate compared with the classical method, and a huge real data set is also demonstrated to select appropriate regressors.