Fast Outlier Detection in High Dimensional Spaces
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
A Survey of Outlier Detection Methodologies
Artificial Intelligence Review
Failure Diagnosis Using Decision Trees
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Improving Goodput by Coscheduling CPU and Network Capacity
International Journal of High Performance Computing Applications
In-Network Outlier Detection in Wireless Sensor Networks
ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
The Grid as a Single Entity: Towards a Behavior Model of the Whole Grid
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
Toward autonomic grids: analyzing the job flow with affinity streaming
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-scale Real-Time Grid Monitoring with Job Stream Mining
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Troubleshooting thousands of jobs on production grids using data mining techniques
GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Ubiquitous knowledge discovery
Ubiquitous knowledge discovery
Failure prediction and localization in large scientific workflows
Proceedings of the 6th workshop on Workflows in support of large-scale science
Precomputing possible configuration error diagnoses
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Failure analysis of distributed scientific workflows executing in the cloud
Proceedings of the 8th International Conference on Network and Service Management
Hi-index | 0.00 |
Grid systems are proving increasingly useful for managing the batch computing jobs of organizations. One well-known example is Intel, whose internally developed NetBatch system manages tens of thousands of machines. The size, heterogeneity, and complexity of grid systems make them very difficult, however, to configure. This often results in misconfigured machines, which may adversely affect the entire system.We investigate a distributed data mining approach for detection of misconfigured machines. Our Grid Monitoring System (GMS) non-intrusively collects data from all sources (log files, system services, etc.) available throughout the grid system. It converts raw data to semantically meaningful data and stores this data on the machine it was obtained from, limiting incurred overhead and allowing scalability. Afterwards, when analysis is requested, a distributed outliers detection algorithm is employed to identify misconfigured machines. The algorithm itself is implemented as a recursive workflow of grid jobs. It is especially suited to grid systems, in which the machines might be unavailable most of the time and often fail altogether.