A First Study on Clustering Collections of Workflow Graphs

Authors:
Emanuele Santos;Lauro Lins;James P. Ahrens;Juliana Freire;Cláudio T. Silva
Affiliations:
Scientific Computing and Imaging Institute, University of Utah,;Scientific Computing and Imaging Institute, University of Utah,;Los Alamos National Lab,;School of Computing, University of Utah,;Scientific Computing and Imaging Institute, University of Utah, and School of Computing, University of Utah,
Venue:
Provenance and Annotation of Data and Processes
Year:
2008

Citing 11
Cited 4

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
A graph distance metric based on the maximal common subgraph

Pattern Recognition Letters
An Algorithm for Subgraph Isomorphism

Journal of the ACM (JACM)
Data clustering: a review

ACM Computing Surveys (CSUR)
A vector space model for automatic indexing

Communications of the ACM
Modern Information Retrieval

Modern Information Retrieval
Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support

Data Mining and Knowledge Discovery
An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Discovering Expressive Process Models by Clustering Log Traces

IEEE Transactions on Knowledge and Data Engineering
Managing rapidly-evolving scientific workflows

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Segmentation of color images using multiscale clustering and graph theoretic region synthesis

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Provenance management for data exploration

DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
Athena: text mining based discovery of scientific workflows in disperse repositories

RED'10 Proceedings of the Third international conference on Resource Discovery
Delta: a tool for representing and comparing workflows

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Learning to explore scientific workflow repositories

Proceedings of the 25th International Conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

As workflow systems get more widely used, the number of workflows and the volume of provenance they generate has grown considerably. New tools and infrastructure are needed to allow users to interact with, reason about, and re-use this information. In this paper, we explore the use of clustering techniques to organize large collections of workflow and provenance graphs. We propose two different representations for these graphs and present an experimental evaluation, using a collection of 1,700 workflow graphs, where we study the trade-offs of these representations and the effectiveness of alternative clustering techniques.