Shared State for Distributed Interactive Data Mining Applications

  • Authors:
  • Srinivasan Parthasarathy;Sandhya Dwarkadas

  • Affiliations:
  • Computer and Information Science, Ohio State University, Columbus, OH 43235, USA. srini@cis.ohio-state.edu;Computer Science, University of Rochester, Rochester, NY 14627, USA. sandhya@cs.rochester.edu

  • Venue:
  • Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed data mining applications involving user interaction are now feasible due to advances in processor speed and network bandwidth. These applications are traditionally implemented using ad-hoc communication protocols, which are often either cumbersome or inefficient. This paper presents and evaluates a system for sharing state among such interactive distributed data mining applications, developed with the goal of providing both ease of programming and efficiency. Our system, called InterAct, supports data sharing efficiently by allowing caching, by communicating only the modified data, and by allowing relaxed coherence requirement specification for reduced communication overhead, as well as placement of data for improved locality, on a per client and per data structure basis. Additionally, our system supports the ability to supply clients with consistent copies of shared data even while the data is being modified.We evaluate the performance of the system on a set of data mining applications that perform queries on data structures that summarize information from the databases of interest. We demonstrate that providing a runtime system such as InterAct results in a 10–30 fold improvement in execution time due to shared data caching, the applications' ability to tolerate stale data (client-controlled coherence), and the ability to off-load some of the computation from the server to the client. Performance is improved without requiring complex communication protocols to be built into the application, since the runtime system uses knowledge about application behavior (encoded by specifying coherence requirements) in order to automatically optimize the resources utilized for communication. We also demonstrate that for our benchmark tests, the quality of the results generated is not significantly deteriorated due to the use of more relaxed coherence protocols.