C4.5: programs for machine learning
C4.5: programs for machine learning
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A fast and simple method for extracting relevant content from news webpages
Proceedings of the 18th ACM conference on Information and knowledge management
Hi-index | 0.00 |
Cross validation is fundamental to machine learning as it provides a reliable way in which to evaluate algorithms and the overall quality of the corpora in use. In typical cross validation, the corpus is initially divided into learning and training segments, then crossed-over in successive rounds, so that each data segment is validated against the remaining ones. This process is prohibitively time and effort consuming, and often brushed off for computationally cheaper ones, such as heuristics. In this paper we introduce a cloud-based architecture for running cross validation jobs. Our solution makes heavy use of computational resources in the cloud by proposing a strategy in which there are two distinct, subsequent, map-reduce cycles: the first to perform the algorithmic target computation, and the second to provide cross validation data to retrofit the machine learning process. We demonstrate the feasibility of the proposed approach, with the implementation of a web segmentation algorithm.