Rough Sets and Few-Objects-Many-Attributes Problem: The Case Study of Analysis of Gene Expression Data Sets

Authors:
Dominik Slezak
Affiliations:
-
Venue:
FBIT '07 Proceedings of the 2007 Frontiers in the Convergence of Bioscience and Information Technologies
Year:
2007

Citing 0
Cited 5

Rough Sets and Functional Dependencies in Data: Foundations of Association Reducts

Transactions on Computational Science V
Rule-Based Similarity for Classification

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Rough set based maximum relevance-maximum significance criterion and Gene selection from microarray data

International Journal of Approximate Reasoning
Dominance-based fuzzy rough set analysis of uncertain and possibilistic data tables

International Journal of Approximate Reasoning
Distance: A more comprehensible perspective for measures in rough set theory

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss two methodologies developed within the theory of rough sets, which may be used in the analysis of data with relatively low number of objects (rows, experiments) and large number of attributes (columns, features). We illustrate their applicability by referring to the gene expression data sets, which are widely known to suffer from such a few-objects-many-attributes problem. The first methodology is based on the notion of a reduct, aiming at very clear and simple representation of (possibly inexact) functional dependencies between groups of attributes in a data table. In particular, we talk about collections of decision reducts and association reducts, useful, respectively, in the new case classification and the data-based knowledge representation tasks, where genes are interpreted as attributes and where reducts represent (inexact) functional dependencies between expressions of groups of genes. The second methodology is referred to as to rough discretization (sometimes called also dynamic discretization or roughfication, in contrast to fuzzification known from fuzzy logic.) Given a data set with numeric attributes, rough discretization produces a new data set with the same number of attributes, where, however, the number of records is squared and the values become symbolic. Basically, it uses each of original records to discretize the others by mutual comparisons between values over all numeric attributes, without any additional, externally tuned discretization parameters needed. Such roughfied data sets are a perfect source for calculations of decision and association reducts, which result in rules interpreted in terms of inequalities involving original attributes and their respective value domains. As a summary, the two presented rough-set-based methods applied together provide a simple, easily interpretable, and noninvasive framework for gene expression data analysis.