Lessons and Challenges from Mining Retail E-Commerce Data
Machine Learning
An optimization approach for feature selection in an electric billing database
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part IV
Hi-index | 0.00 |
Databases in the terabyte range are now common. In many domains, mining all the data available in reasonable time is already beyond the reach of current systems. Yet the size of databases continues to grow rapidly. Is subsampling unavoidable? Or should it be avoided at all costs? If we subsample, what is the best way to do it? What issues must be taken into account? The KDD-2001 Panel on When and How to Subsample addressed these and related questions, with the twin goals of developing practical guidelines and identifying key research issues. It was chaired by Pedro Domingos (University of Washington), and the participants were Surajit Chaudhuri (Microsoft Research), David Jensen (University of Massachusetts at Amherst), Ronny Kohavi (Blue Martini), and Foster Provost (New York University). Below is each panelist's summary of his position.