Making data analysis expertise broadly accessible through workflows

  • Authors:
  • Matheus Hauder;Yolanda Gil;Ricky Sethi;Yan Liu;Hyunjoon Jo

  • Affiliations:
  • University of Southern California, Marina del Rey, CA, USA;University of Southern California, Marina del Rey, CA, USA;University of Southern California, Marina del Rey, CA, USA;University of Southern California, Los Angeles, CA, USA;Univesrity of Southern California, Los Angeles, CA, USA

  • Venue:
  • Proceedings of the 6th workshop on Workflows in support of large-scale science
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The demand for advanced skills in data analysis spans many areas of science, computing, and business analytics. This paper discusses how non-expert users reuse workflows created by experts and representing complex data mining processes for text analytics. They include workflows for document classification, document clustering, and topic detection, all assembled from components available in well-known text analytics software libraries. The workflows expose to non-experts expert-level knowledge on how these individual components need to be combined with data preparation and feature selection steps to make the underlying statistical learning algorithms most effective. The framework allows non-experts to easily experiment with different combinations of data analysis processes, represented as workflows of computations that they can easily reconfigure. We report on our experiences to date on having users with limited data analytic knowledge and even basic programming skills to apply workflows to their data.