A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The Journal of Machine Learning Research
Natural Language Engineering
Petascale Computational Systems
Computer
The Need for Open Source Software in Machine Learning
The Journal of Machine Learning Research
Future Generation Computer Systems
Gestalt: integrated support for implementation and analysis in machine learning
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
ESCIENCE '11 Proceedings of the 2011 IEEE Seventh International Conference on eScience
OWL-DL domain-models as abstract workflows
ISoLA'12 Proceedings of the 5th international conference on Leveraging Applications of Formal Methods, Verification and Validation: applications and case studies - Volume Part II
Large-scale multimedia content analysis using scientific workflows
Proceedings of the 21st ACM international conference on Multimedia
Time-bound analytic tasks on large datasets through dynamic configuration of workflows
WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Structured analysis of the ISI Atomic Pair Actions dataset using workflows
Pattern Recognition Letters
Computer-Assisted Scientific Workflow Design
Journal of Grid Computing
Hi-index | 0.00 |
The demand for advanced skills in data analysis spans many areas of science, computing, and business analytics. This paper discusses how non-expert users reuse workflows created by experts and representing complex data mining processes for text analytics. They include workflows for document classification, document clustering, and topic detection, all assembled from components available in well-known text analytics software libraries. The workflows expose to non-experts expert-level knowledge on how these individual components need to be combined with data preparation and feature selection steps to make the underlying statistical learning algorithms most effective. The framework allows non-experts to easily experiment with different combinations of data analysis processes, represented as workflows of computations that they can easily reconfigure. We report on our experiences to date on having users with limited data analytic knowledge and even basic programming skills to apply workflows to their data.