Reduct generation of microarray dataset using rough set and graph theory for unsupervised learning

Authors:
Asit Kumar Das;Soumen Kumar Pati;Saikat Chakrabarty
Affiliations:
Bengal Engineering and Science University, Howrah, West Bengal, India;St. Thomas' College of Engineering and Technology, Kolkata, West Bengal, India;Bengal Engineering and Science University, Howrah, West Bengal, India
Venue:
Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Year:
2012

Citing 5
Cited 0

Image Processing: The Fundamentals

Image Processing: The Fundamentals
Neuro-Fuzzy Pattern Recognition: Methods in Soft Computing

Neuro-Fuzzy Pattern Recognition: Methods in Soft Computing
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
ChiMerge: discretization of numeric attributes

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
A New Rough Sets Model Based on Database Systems

Fundamenta Informaticae - The 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Conputing (RSFDGrC 2003)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Microarray gene dataset often contains high dimensionalities which cause difficulty in clustering and classification. Datasets containing huge number of attributes lead to increased complexity and therefore, degradation of dataset handling performance. Often, all the measured features of these high-dimensional datasets are not relevant for understanding the underlying phenomena of interest. Dimensionality reduction by reduct generation is hence performed as an important step before clustering and classification. The reduced attribute set has the same characteristics as the entire set of attributes in the information system. In this paper, a new attribute reduction technique, based on directed minimal spanning tree and rough set theory is done, for unsupervised learning. The method, firstly, computes a similarity factor between each pair of attributes using indiscernibility relation, a concept of rough set theory. Based on the similarity factors, an attribute similarity set is formed from which a directed weighted graph with vertices as attributes and edge weights as the inverse of the similarity factor is constructed. Then, all possible minimal spanning trees of the graph are generated. From each tree, iteratively, the most important vertex is included in the reduct set and all its out-going edges are removed. The process stops when the edge set is empty, thus producing multiple reducts. The proposed method and some well-known attribute reduction techniques have been applied on several microarray gene datasets for gene selection. The results obtained show the effectiveness of the method.