IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Customizable parallel execution of scientific stream queries
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Finding strongly connected components in distributed graphs
Journal of Parallel and Distributed Computing
PRIMED: community-of-interest-based DDoS mitigation
Proceedings of the 2006 SIGCOMM workshop on Large-scale attack defense
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
SPC: a distributed, scalable platform for data mining
Proceedings of the 4th international workshop on Data mining standards, services and platforms
Staying FIT: efficient load shedding techniques for distributed stream processing
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Speculative out-of-order event processing with software transaction memory
Proceedings of the second international conference on Distributed event-based systems
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Clique Analysis of Query Log Graphs
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems
ICDCS '09 Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems
Stateful bulk processing for incremental analytics
Proceedings of the 1st ACM symposium on Cloud computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Graph structures and algorithms for query-log analysis
CiE'10 Proceedings of the Programs, proofs, process and 6th international conference on Computability in Europe
Large-scale incremental processing using distributed transactions and notifications
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Entropy and Distance of Random Graphs with Application to Structural Pattern Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Facilitating real-time graph mining
Proceedings of the fourth international workshop on Cloud data management
Hi-index | 0.00 |
Analyzing huge amounts of log data is often a difficult task, especially if it has to be done in real time (e.g., fraud detection) or when large amounts of stored data are required for the analysis. Graphs are a data structure often used in log analysis. Examples are clique analysis and communities of interest (COI). However, little attention has been paid to large distributed graphs that allow a high throughput of updates with very low latency. In this paper, we present a distributed graph mining system that is able to process around 39 million log entries per second on a 50 node cluster while providing processing latencies below 10 ms. We validate our approach by presenting two example applications, namely telephony fraud detection and internet attack detection. A thorough evaluation proves the scalability and near real-time properties of our system.