A Method for Evaluating the Quality of String Dissimilarity Measures and Clustering Algorithms for EST Clustering

  • Authors:
  • Affiliations:
  • Venue:
  • BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a method for evaluating the suitability ofdifferent string dissimilarity measures and clustering algorithmsfor EST clustering, one of the main techniques usedin transcriptome projects. The method comprises generatingsimulated ESTs with user-specified parameters, andthen evaluating the quality of clusterings produced whendifferent dissimilarity measures and different clustering algorithmsare used. We implemented two tools to do this:ESTSim (EST Simulator), which generates simulated ESTsequences from mRNAs/cDNAs using user-specified parameters,and ECLEST (Evaluator for CLusterings of ESTs),which computes and evaluates a clustering of a set of inputESTs, where the dissimilarity measure, the clusteringalgorithm, and the clustering validity index can be specifiedindependently. We demonstrate the method on a sampleof 699 cDNAs, generating approximately 16,000 simulatedESTs. We conducted two experiments and derived statisticallysignificant results from this study comparing subword-baseddissimilarity measures to alignment-based ones.