The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
A signal analysis of network traffic anomalies
Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
Information-Theoretic Measures for Anomaly Detection
SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Studying cooperation and conflict between authors with history flow visualizations
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Mining anomalies using traffic feature distributions
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
GraphScope: parameter-free mining of large time-evolving graphs
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
An empirical evaluation of entropy-based traffic anomaly detection
Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
Sketching and Streaming Entropy via Approximation Theory
FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
ACM Computing Surveys (CSUR)
Dynamics of large networks
Distribution-based anomaly detection in 3G mobile networks: from theory to practice
International Journal of Network Management
OddBall: spotting anomalies in weighted graphs
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Hi-index | 0.00 |
Wikipedia has become a standard source of reference online, and many people (some unknowingly) now trust this corpus of knowledge as an authority to fulfil their information requirements. In doing so they task the human contributors of Wikipedia with maintaining the accuracy of articles, a job that these contributors have been performing admirably. We study the problem of monitoring the Wikipedia corpus with the goal of \emph{automated, online} anomaly detection. We present Wiki-watchdog, an efficient \emph{distribution-based} methodology that monitors distributions of revision activity for changes. We show that using our methods it is possible to detect the activity of bots, flash events, and outages, as they occur. Our methods are proposed to support the monitoring of the contributors. They are useful to speed-up anomaly detection, and identify events that are hard to detect manually. We show the efficacy and the low false-positive rate of our methods by experiments on the revision history of Wikipedia. Our results show that distribution-based anomaly detection has a higher detection rate than traditional methods based on either volume or entropy alone. Unlike previous work on anomaly detection in information networks that worked with a static network graph, our methods consider the network \emph{as it evolves} and monitors properties of the network for changes. Although our methodology is developed and evaluated on Wikipedia, we believe it is an effective generic anomaly detection framework in its own right.