A method for similarity-based grouping of biological data

Authors:
Vaida Jakonienė;David Rundqvist;Patrick Lambrix
Affiliations:
Department of Computer and Information Science, Linköpings universitet, Linköping, Sweden;Department of Computer and Information Science, Linköpings universitet, Linköping, Sweden;Department of Computer and Information Science, Linköpings universitet, Linköping, Sweden
Venue:
DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
Year:
2006

Citing 4
Cited 0

Relationship-based clustering and cluster ensembles for high-dimensional data mining

Relationship-based clustering and cluster ensembles for high-dimensional data mining
BIO-AJAX: an extensible framework for biological data cleaning

ACM SIGMOD Record
Automatic data fusion with HumMer

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors

Proceedings of the 14th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity-based grouping of data entries in one or more data sources is a task underlying many different data management tasks, such as, structuring search results, removal of redundancy in databases and data integration. Similarity-based grouping of data entries is not a trivial task in the context of life science data sources as the stored data is complex, highly correlated and represented at different levels of granularity. The contribution of this paper is two-fold. 1) We propose a method for similarity-based grouping and 2) we show results from test cases. As the main steps the method contains specification of grouping rules, pairwise grouping between entries, actual grouping of similar entries, and evaluation and analysis of the results. Often, different strategies can be used in the different steps. The method enables exploration of the influence of the choices and supports evaluation of the results with respect to given classifications. The grouping method is illustrated by test cases based on different strategies and classifications. The results show the complexity of the similarity-based grouping tasks and give deeper insights in the selected grouping tasks, the analyzed data source, and the influence of different strategies on the results.