Memory Characterization of a Parallel Data Mining Workload

  • Authors:
  • Jin-Soo Kim;Xiaohan Qin;Yarsun Hsu

  • Affiliations:
  • -;-;-

  • Venue:
  • WWC '98 Proceedings of the Workload Characterization: Methodology and Case Studies
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper studies a representative of an important class of emerging applications, a parallel data mining workload. The application, extracted from the IBM Intelligent Miner, identifies groups of records that are mathematically similar based on a neural network model called self-organizing map. We examine and compare in details two implementations of the application: (1) temporal locality or working set sizes; (2) spatial locality and memory block utilization; (3) communication characteristics and scalability; and (4) TLB performance.First, we find that the working set hierarchy of the application is governed by two parameters, namely the size of an input record and the size of prototype array; it is independent of the number of input records. Second, the application shows good spatial locality, with the implementation optimized for sparse data sets having slightly worse spatial locality. Third, due to the batch update scheme, the application bears very low communication. Finally, a 2-way set associative TLB may result in severely skewed TLB performance in a multiprocessor environment caused by the large discrepancy in the amount of conflict misses. Increasing the set associativity is more effective in mitigating the problem than increasing the TLB size.