Parallel implementation of information retrieval clustering models

Authors:
Daniel Jiménez;Vicente Vidal
Affiliations:
Department of Computer Systems and Computation, Polytechnic University of Valencia, Spain;Department of Computer Systems and Computation, Polytechnic University of Valencia, Spain
Venue:
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Year:
2004

Citing 3
Cited 0

Clustering Algorithms

Clustering Algorithms
On Clustering Validation Techniques

Journal of Intelligent Information Systems
A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information Retrieval (IR) is fundamental nowadays, and more since the appearance of the Internet and huge amount of information in electronic format. All this information is not useful unless its search is efficient and effective. With large collections parallelization is important because the data volume is enormous. Hence, usually, only one computer is not sufficient to manage all data, and more in a reasonable time. The parallelization also is important because in many situations the document collection is already distributed and its centralization is not a good idea. This is the reason why we present parallel algorithms in information retrieval systems. We propose two parallel clustering algorithms: α-Bisecting K-Means and α-Bisecting Spherical K-Means. Moreover, we have prepared a set of experiments to compare the computation performance of the algorithms. These studies have been accomplished in a cluster of PCs with 20 bi-processor nodes and two different collections.