Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Software Engineering: A Practitioner's Approach (McGraw-Hill Series in Computer Science)
Software Engineering: A Practitioner's Approach (McGraw-Hill Series in Computer Science)
Kepler: An Extensible System for Design and Execution of Scientific Workflows
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Stemming and lemmatization in the clustering of finnish text documents
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
A differential LSI method for document classification
AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
VisTrails: visualization meets data management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Workflow discovery: the problem, a case study from e-Science and a graph-based solution
ICWS '06 Proceedings of the IEEE International Conference on Web Services
Categorization and analysis of text in computer mediated communication archives using visualization
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Enhancing text clustering by leveraging Wikipedia semantics
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A First Study on Clustering Collections of Workflow Graphs
Provenance and Annotation of Data and Processes
Grid metadata management: Requirements and architecture
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
A class-feature-centroid classifier for text categorization
Proceedings of the 18th international conference on World wide web
Experiment Line: Software Reuse in Scientific Workflows
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
WordNet-based text document clustering
ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
Exploiting internal and external semantics for the clustering of short texts using world knowledge
Proceedings of the 18th ACM conference on Information and knowledge management
Benchmarking workflow discovery: a case study from bioinformatics
Concurrency and Computation: Practice & Experience - Special Issue: 3rd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2008)
Stop word and related problems in web interface integration
Proceedings of the VLDB Endowment
CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
A provenance-based approach to resource discovery in distributed molecular dynamics workflows
RED'09 Proceedings of the 2nd international conference on Resource discovery
Workflow clustering method based on process similarity
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part II
Hi-index | 0.00 |
Scientific workflows are abstractions used to model and execute in silico scientific experiments. They represent key resources for scientists and are enacted and managed by engines called Scientific Workflow Management Systems (SWfMS). Each SWfMS has a particular workflow language. This heterogeneity of languages and formats poses as complex scenario for scientists to search or discover workflows in distributed repositories for reuse. The existing workflows in these repositories can be used to leverage the identification and construction of families of workflows (clusters) that aim at a particular goal. However it is hard to compare the structure of these workflows since they are modeled in different formats. One alternative way is to compare workflow metadata such as natural language descriptions (usually found in workflow repositories) instead of comparing workflow structure. In this scenario, we expect that the effective use of classical text mining techniques can cluster a set of workflows in families, offering to the scientists the possibility of finding and reusing existing workflows, which may decrease the complexity of modeling a new experiment. This paper presents Athena, a cloud-based approach to support workflow clustering from disperse repositories using their natural language descriptions, thus integrating these repositories and providing a facilitated form to search and reuse workflows.