A Method for Evaluating the Quality of String Dissimilarity Measures and Clustering Algorithms for EST Clustering

Authors:
Affiliations:
Venue:
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Year:
2004

Citing 0
Cited 1

An efficient implementation of the d2 distance function for EST clustering: preliminary investigations

SAICSIT '04 Proceedings of the 2004 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for evaluating the suitability ofdifferent string dissimilarity measures and clustering algorithmsfor EST clustering, one of the main techniques usedin transcriptome projects. The method comprises generatingsimulated ESTs with user-specified parameters, andthen evaluating the quality of clusterings produced whendifferent dissimilarity measures and different clustering algorithmsare used. We implemented two tools to do this:ESTSim (EST Simulator), which generates simulated ESTsequences from mRNAs/cDNAs using user-specified parameters,and ECLEST (Evaluator for CLusterings of ESTs),which computes and evaluates a clustering of a set of inputESTs, where the dissimilarity measure, the clusteringalgorithm, and the clustering validity index can be specifiedindependently. We demonstrate the method on a sampleof 699 cDNAs, generating approximately 16,000 simulatedESTs. We conducted two experiments and derived statisticallysignificant results from this study comparing subword-baseddissimilarity measures to alignment-based ones.