Network Similarity Decomposition (NSD): A Fast and Scalable Approach to Network Alignment

Authors:
Giorgos Kollias;Shahin Mohammadi;Ananth Grama
Affiliations:
Purdue University, West Lafayette;Purdue University, West Lafayette;Purdue University, West Lafayette
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2012

Citing 0
Cited 1

An auction-based weighted matching implementation on massively parallel architectures

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As graph-structured data sets become commonplace, there is increasing need for efficient ways of analyzing such data sets. These analyses include conservation, alignment, differentiation, and discrimination, among others. When defined on general graphs, these problems are considerably harder than their well-studied counterparts on sets and sequences. In this paper, we study the problem of global alignment of large sparse graphs. Specifically, we investigate efficient methods for computing approximations to the state-of-the-art IsoRank solution for finding pairwise topological similarity between nodes in two networks (or within the same network). Pairs of nodes with high similarity can be used to seed global alignments. We present a novel approach to this computationally expensive problem based on uncoupling and decomposing ranking calculations associated with the computation of similarity scores. Uncoupling refers to independent preprocessing of each input graph. Decomposition implies that pairwise similarity scores can be explicitly broken down into contributions from different link patterns traced back to a low-rank approximation of the initial conditions for the computation. These two concepts result in significant improvements, in terms of computational cost, interpretability of similarity scores, and nature of supported queries. We show over two orders of magnitude improvement in performance over IsoRank/Random Walk formulations, and over an order of magnitude improvement over constrained matrix-triple-product formulations, in the context of real data sets.