Using substructure mining to identify misbehavior in network provenance graphs

Authors:
David DeBoer;Wenchao Zhou;Lisa Singh
Affiliations:
Georgetown University;Georgetown University;Georgetown University
Venue:
First International Workshop on Graph Data Management Experiences and Systems
Year:
2013

Citing 19
Cited 0

Measuring ISP topologies with rocketfuel

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
The complexity of theorem-proving procedures

STOC '71 Proceedings of the third annual ACM symposium on Theory of computing
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Subdue: compression-based frequent pattern discovery in graph data

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
SAGA: a subgraph matching tool for biological graphs

Bioinformatics
Efficient and Scalable Algorithms for Inferring Likely Invariants in Distributed Systems

IEEE Transactions on Knowledge and Data Engineering
Graphs-at-a-time: query language and access methods for graph databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
GADDI: distance index based subgraph matching in biological networks

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Fast Graph Pattern Matching

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
HDB-Subdue: A Scalable Approach to Graph Mining

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Distance-join: pattern match query in a large graph database

Proceedings of the VLDB Endowment
Efficient querying and maintenance of network provenance at internet-scale

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Mining invariants from console logs for system problem detection

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
On graph query optimization in large networks

Proceedings of the VLDB Endowment
Secure network provenance

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Enhanced DB-Subdue: supporting subtle aspects of graph mining using a relational approach

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Efficient subgraph matching on billion node graphs

Proceedings of the VLDB Endowment
Efficient subgraph similarity search on large probabilistic graph databases

Proceedings of the VLDB Endowment
Distributed time-aware provenance

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

As distributed systems become more ubiquitous and more complex, the need for efficient, scalable tools to analyze these systems increases. Network provenance graphs offer a rich framework for this task, mapping dependencies between system states and allowing one to explain these states. In this paper, we investigate methods for more efficient substructure mining in the context of network provenance graphs. Specifically, we are interested in identifying frequent substructures that can be used as a feature set for modeling common execution patterns. Knowing these will help network administrators detect nodes in the distributed system that are misbehaving. Therefore, this paper focuses on applying and scaling up substructure mining for network provenance graphs by incorporating a graph database (neo4j) into the substructure mining process and implementing optimizations that improve the efficiency of the substructure mining task. Our results show that the use of the neo4j graph database combined with our algorithmic optimizations greatly improves the run time of our algorithm while not significantly affecting the quality of the substructures returned.