Answering label-constraint reachability in large graphs
Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient processing of label-constraint reachability queries in large graphs
Information Systems
Hi-index | 0.00 |
The increased availability of data describing biological interactions provides important clues on how complex chains of genes and proteins interact with each other. Most previous approaches either restrict their attention to analyzing simple substructures such as paths or trees in these graphs, or use heuristics that do not provide performance guarantees when general substructures are analyzed. We investigate a formulation to model pathway structures directly and give a probabilistic algorithm to find an optimal path structure in $O(4^{k}n^{2t}k^{t+\log(t+1)+2.92}t^{2})$ time and $O(n^{t}k\log k+m)$ space, where n and m are respectively the number of vertices and the number of edges in the given network, k is the number of vertices in the path structure, and t is the maximum number of vertices (i.e., "width") at each level of the structure. Even for the case t = 1 which corresponds to finding simple paths of length k, our time complexity $4^{k}n^{O(1)}$ is a significant improvement over previous probabilistic approaches. To allow for the analysis of multiple pathway structures, we further consider a variant of the algorithm that provides probabilistic guarantees for the top suboptimal path structures with a slight increase in time and space. We show that our algorithm can identify pathway structures with high sensitivity by applying it to protein interaction networks in the DIP database.