Meta mining system for supervised learning

  • Authors:
  • Lukasz Andrzej Kurgan;Krzysztof Cios

  • Affiliations:
  • -;-

  • Venue:
  • Meta mining system for supervised learning
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Supervised inductive machine learning is one of several powerful methodologies that can be used for performing a Data Mining task. Data Mining aims to find previously unknown, implicit patterns that exist in large data sets, but are hidden among large quantities of data. These patterns describe potentially valuable knowledge. Data Mining techniques have been focused on finding knowledge, often expressed in terms of rules, directly from data. More recently, a new Data Mining concept, called Meta Mining, was introduced. It generates knowledge utilizing two-step procedure, where first meta-data is generated from the input data, and next the meta-data is used to generate meta-rules that constitute final data model. In this dissertation we examine a new approach to generation of knowledge, using supervised inductive learning methodologies combined with Meta Mining. We propose a novel data mining system, called MetaSqueezer, for extraction of useful patterns that carry new information about input supervised data set. The major contribution of this thesis is design and development of the above system, supported by extensive benchmarking evaluation results. Two key advantages of the system are its scalability, which results from its linear complexity, and high compactness of user-friendly data models that it generates. These two features make it applicable for applications that use megabytes, or even gigabytes of data. The fields contributing to this research are Inductive Machine Learning, Data Mining and Knowledge Discovery, and Meta Mining. A study of existing Machine Learning methodologies, which give similar results, is given to properly situate the research and to help in evaluation of the system. The usefulness of the system is evaluated theoretically and also empirically via thorough testing. The results show that the system generates very compact data models. They also confirm linear complexity of the system, which makes it highly applicable to real data. Results of application of the system to cystic fibrosis data are provided. This application generated very useful results, as evaluated by the domain experts.