Large-scale continuous subgraph queries on streams

  • Authors:
  • Sutanay Choudhury;Lawrence Holder;George Chin;John Feo

  • Affiliations:
  • Pacific Northwest National Laboratory, Richland, WA, USA;Washington State University, Pullman, WA, USA;Pacific Northwest National Laboratory, Richland, WA, USA;Pacific Northwest National Laboratory, Richland, WA, USA

  • Venue:
  • Proceedings of the first annual workshop on High performance computing meets databases
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Graph pattern matching involves finding exact or approximate matches for a query subgraph in a larger graph. It has been studied extensively and has strong applications in domains such as computer vision, computational biology, social networks, security and finance. The problem of exact graph pattern matching is often described in terms of subgraph isomorphism which is NP-complete. The exponential growth in streaming data from online social networks, news and video streams and the continual need for situational awareness motivates a solution for finding patterns in streaming updates. This is also the prime driver for the real-time analytics market. Development of incremental algorithms for graph pattern matching on streaming inputs to a continually evolving graph is a nascent area of research. Some of the challenges associated with this problem are the same as found in continuous query (CQ) evaluation on streaming databases. This paper reviews some of the representative work from the exhaustively researched field of CQ systems and identifies important semantics, constraints and architectural features that are also appropriate for HPC systems performing real-time graph analytics. For each of these features we present a brief discussion of the challenge encountered in the database realm, the approach to the solution and state their relevance in a high-performance, streaming graph processing framework.graph of Gd and vertices of Gq such that all vertex adjacencies are preserved. Dynamic graphs refer to graphs that evolve over time through addition or deletion of vertices and edges. Therefore, the problem of graph pattern matching for dynamic graphs can be described as the continuous process of searching for patterns in the graph as it is updated. News [23], finance [7], cyber security and intelligence [10] are among the primary domains that drive the real-time analytics market [1, 19] and motivate development of HPC systems. These domains present data sources that lend themselves naturally to a graph based representation and additionally, provide semantic information in terms of types, labels and timestamps, which can be more generally described as attributes of the vertices and edges of the graph. The availability of the attributes influence the isomorphism computation because assigning a correspondence between a pair of vertices in the search and query graph requires them to satisfy equality constraints on type and possibly, other attributes as well. All these domains are also characterized by massive streaming data that are continuously providing updates from social networks, financial markets and malicious activities on the internet with a high emphasis on time-to-insight, the capability of learning about an event as soon as it happens. This motivates