An O(n log n) algorithm for the maximum agreement subtree problem for binary trees
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
On distances between phylogenetic trees
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Frequent Closures as a Concise Representation for Binary Data Mining
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Efficient Progressive Sampling for Association Rules
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Using transposition for pattern discovery from microarray data
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Constraint-based concept mining and its application to microarray data analysis
Intelligent Data Analysis
Closed patterns meet n-ary relations
ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining bi-sets in numerical data
KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
Temporal evolution and local patterns
LPD'04 Proceedings of the 2004 international conference on Local Pattern Detection
Hi-index | 0.00 |
In the domain of gene expression data analysis, several researchers have recently emphasized the promising application of local pattern (e.g., association rules, closed sets) discovery techniques from boolean matrices that encode gene properties. Detecting local patterns by means of complete constraint-based mining techniques turns to be an important complementary approach or invaluable counterpart to heuristic global model mining. To take the most from local set pattern mining approaches, a needed step concerns gene expression property encoding (e.g., over-expression). The impact of this preprocessing phase on both the quantity and the quality of the extracted patterns is crucial. In this paper, we study the impact of discretization techniques by a sound comparison between the dendrograms, i.e., trees that are generated by a hierarchical clustering algorithm on raw numerical expression data and its various derived boolean matrices. Thanks to a new similarity measure, we can select the boolean property encoding technique which preserves similarity structures holding in the raw data. The discussion relies on several experimental results for three gene expression data sets. We believe our framework is an interesting direction of work for the many application domains in which (a) local set patterns have been proved useful, and (b) Boolean properties have to be derived from raw numerical data.