Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The Journal of Machine Learning Research
A faceted query engine applied to archaeology
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Managing the Evolution of Dataflows with VisTrails
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Querying and Creating Visualizations by Analogy
IEEE Transactions on Visualization and Computer Graphics
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Dynamic faceted search for discovery-driven analysis
Proceedings of the 17th ACM conference on Information and knowledge management
Future Generation Computer Systems
Automatic Extraction of Useful Facet Hierarchies from Text Databases
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
SWDB'04 Proceedings of the Second international conference on Semantic Web and Databases
Actor-oriented design of scientific workflows
ER'05 Proceedings of the 24th international conference on Conceptual Modeling
Search, adapt, and reuse: the future of scientific workflows
ACM SIGMOD Record
(Re)Use in public scientific workflow repositories
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Search and result presentation in scientific workflow repositories
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Learning to explore scientific workflow repositories
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
On specifying and sharing scientific workflow optimization results using research objects
WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Hi-index | 0.00 |
Scientific workflows are gaining popularity, and repositories of workflows are starting to emerge. In this paper we present some initial experiences of information discovery in repositories of scientific workflows. In the first part of the paper we consider a collection of VisTrails workflows, and explore how this collection may be summarized when workflow modules are used as features. We present a hierarchical browsable view of the repository in which categories are derived using frequent itemset mining or latent Dirichlet allocation. We demonstrate that both approaches may be used for effective data exploration. In the second part of the paper we focus on a collection of Taverna workflows from myExperiment.org, and consider how these workflows may be browsed using modules and tags as features. Finally, we outline some interesting challenges and describe conditions under which these techniques work well for repositories of scientific workflows, and conditions under which additional work is needed for effective data exploration.