A survey of open source data mining systems

  • Authors:
  • Xiaojun Chen;Yunming Ye;Graham Williams;Xiaofei Xu

  • Affiliations:
  • Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China;Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China;Australian Taxation Office, Australia;Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China

  • Venue:
  • PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Open source data mining software represents a new trend in data mining research, education and industrial applications, especially in small and medium enterprises (SMEs). With open source software an enterprise can easily initiate a data mining project using the most current technology. Often the software is available at no cost, allowing the enterprise to instead focus on ensuring their staff can freely learn the data mining techniques and methods. Open source ensures that staff can understand exactly how the algorithms work by examining the source codes, if they so desire, and can also fine tune the algorithms to suit the specific purposes of the enterprise. However, diversity, instability, scalability and poor documentation can be major concerns in using open source data mining systems. In this paper, we survey open source data mining systems currently available on the Internet. We compare 12 open source systems against several aspects such as general characteristics, data source accessibility, data mining functionality, and usability. We discuss advantages and disadvantages of these open source data mining systems.