Measuring ISP topologies with rocketfuel
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
The complexity of theorem-proving procedures
STOC '71 Proceedings of the third annual ACM symposium on Theory of computing
CloseGraph: mining closed frequent graph patterns
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Subdue: compression-based frequent pattern discovery in graph data
Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
SAGA: a subgraph matching tool for biological graphs
Bioinformatics
Efficient and Scalable Algorithms for Inferring Likely Invariants in Distributed Systems
IEEE Transactions on Knowledge and Data Engineering
Graphs-at-a-time: query language and access methods for graph databases
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
GADDI: distance index based subgraph matching in biological networks
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
HDB-Subdue: A Scalable Approach to Graph Mining
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Distance-join: pattern match query in a large graph database
Proceedings of the VLDB Endowment
Efficient querying and maintenance of network provenance at internet-scale
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Mining invariants from console logs for system problem detection
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
On graph query optimization in large networks
Proceedings of the VLDB Endowment
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Enhanced DB-Subdue: supporting subtle aspects of graph mining using a relational approach
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Efficient subgraph matching on billion node graphs
Proceedings of the VLDB Endowment
Efficient subgraph similarity search on large probabilistic graph databases
Proceedings of the VLDB Endowment
Distributed time-aware provenance
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
As distributed systems become more ubiquitous and more complex, the need for efficient, scalable tools to analyze these systems increases. Network provenance graphs offer a rich framework for this task, mapping dependencies between system states and allowing one to explain these states. In this paper, we investigate methods for more efficient substructure mining in the context of network provenance graphs. Specifically, we are interested in identifying frequent substructures that can be used as a feature set for modeling common execution patterns. Knowing these will help network administrators detect nodes in the distributed system that are misbehaving. Therefore, this paper focuses on applying and scaling up substructure mining for network provenance graphs by incorporating a graph database (neo4j) into the substructure mining process and implementing optimizations that improve the efficiency of the substructure mining task. Our results show that the use of the neo4j graph database combined with our algorithmic optimizations greatly improves the run time of our algorithm while not significantly affecting the quality of the substructures returned.