Minersoft: Software retrieval in grid and cloud computing infrastructures

  • Authors:
  • Marios D. Dikaiakos;Asterios Katsifodimos;George Pallis

  • Affiliations:
  • University of Cyprus, Cyprus;LRI Universite Paris-Sud XI and INRIA, Saclay;University of Cyprus, Cyprus

  • Venue:
  • ACM Transactions on Internet Technology (TOIT)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the main goals of Cloud and Grid infrastructures is to make their services easily accessible and attractive to end-users. In this article we investigate the problem of supporting keyword-based searching for the discovery of software files that are installed on the nodes of large-scale, federated Grid and Cloud computing infrastructures. We address a number of challenges that arise from the unstructured nature of software and the unavailability of software-related metadata on large-scale networked environments. We present Minersoft, a harvester that visits Grid/Cloud infrastructures, crawls their file systems, identifies and classifies software files, and discovers implicit associations between them. The results of Minersoft harvesting are encoded in a weighted, typed graph, called the Software Graph. A number of information retrieval (IR) algorithms are used to enrich this graph with structural and content associations, to annotate software files with keywords and build inverted indexes to support keyword-based searching for software. Using a real testbed, we present an evaluation study of our approach, using data extracted from production-quality Grid and Cloud computing infrastructures. Experimental results show that Minersoft is a powerful tool for software search and discovery.