Distributed mining of censored production rules in data streams: an evolutionary approach

  • Authors:
  • Saroj Saroj;K. K. Bharadwaj

  • Affiliations:
  • Department of Computer Science and Engineering, Guru Jambheshwar University of Science and Technology, Hisar, India;School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India

  • Venue:
  • AIKED'08 Proceedings of the 7th WSEAS International Conference on Artificial intelligence, knowledge engineering and data bases
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

As distributed data streams are gaining importance in a growing number of applications, the centralized approach for data mining is inappropriate for the distributed and ubiquitous data mining environments. The conventional mining algorithms for data streams depend on computationally expensive update procedures to incorporate the changing patterns in the streaming data. Moreover, the most common knowledge structure learnt, in knowledge discovery, is standard Production Rules (PRs) in the form: If P Then D. However, PRs ignore exceptions as noise. These are not efficient for approximate reasoning and unable to exhibit variable precision logic in the reasoning process due to rigidity in their structure. This paper proposes an evolutionary approach for distributed mining of CPRs using cumulative learning scheme. Local classifiers consisting of PRs and CPRs are generated for the data streams at distributed sites and then a meta-classifier is produced by combining the local classifiers. A Censored Production Rule (CPR) is an extension of PR and is of the form, If P Then D Unless C, where C is the censor representing exception condition. 'If P Then D' part of a CPR holds frequently and the censor C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence, are tight. Thus 'If P Then D' part of the CPR expresses important information while the 'Unless C' part acts only as a switch that changes the polarity of D to ~D, whenever a censor evaluates true A Genetic Algorithm is designed with a fixed length chromosome encoding that allows variable length rules. Appropriate genetic operators are suggested for the specific encoding and a fitness function incorporating the constraints of CPRs is formulated. Experimental results are presented to demonstrate the effectivity of CPR for cumulative learning in data streams.