Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
A graph distance metric based on the maximal common subgraph
Pattern Recognition Letters
An Algorithm for Subgraph Isomorphism
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
A vector space model for automatic indexing
Communications of the ACM
Modern Information Retrieval
Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support
Data Mining and Knowledge Discovery
IEEE Transactions on Pattern Analysis and Machine Intelligence
Discovering Expressive Process Models by Clustering Log Traces
IEEE Transactions on Knowledge and Data Engineering
Managing rapidly-evolving scientific workflows
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Segmentation of color images using multiscale clustering and graph theoretic region synthesis
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Provenance management for data exploration
DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
Athena: text mining based discovery of scientific workflows in disperse repositories
RED'10 Proceedings of the Third international conference on Resource Discovery
Delta: a tool for representing and comparing workflows
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Learning to explore scientific workflow repositories
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
As workflow systems get more widely used, the number of workflows and the volume of provenance they generate has grown considerably. New tools and infrastructure are needed to allow users to interact with, reason about, and re-use this information. In this paper, we explore the use of clustering techniques to organize large collections of workflow and provenance graphs. We propose two different representations for these graphs and present an experimental evaluation, using a collection of 1,700 workflow graphs, where we study the trade-offs of these representations and the effectiveness of alternative clustering techniques.