VirtualPower: coordinated power management in virtualized enterprise systems
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Protectit: trusted distributed services operating on sensitive data
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Evolution of storage management: transforming raw data into information
IBM Journal of Research and Development
Tracking transaction footprints for non-intrusive end-to-end monitoring
Cluster Computing
Isolation points: Creating performance-robust enterprise systems
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
EbAT: online methods for detecting utility cloud anomalies
Proceedings of the 6th Middleware Doctoral Symposium
Monalytics: online monitoring and analytics for managing large scale data centers
Proceedings of the 7th international conference on Autonomic computing
A flexible architecture integrating monitoring and analytics for managing large-scale data centers
Proceedings of the 8th ACM international conference on Autonomic computing
PAL: Propagation-aware Anomaly Localization for cloud hosted distributed applications
SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
Using link gradients to predict the impact of network latency on multitier applications
IEEE/ACM Transactions on Networking (TON)
A case for coordinated resource management in heterogeneous multicore platforms
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Net-cohort: detecting and managing VM ensembles in virtualized data centers
Proceedings of the 9th international conference on Autonomic computing
VScope: middleware for troubleshooting time-sensitive data center applications
Proceedings of the 13th International Middleware Conference
Performance troubleshooting in data centers: an annotated bibliography?
ACM SIGOPS Operating Systems Review
Hi-index | 0.00 |
Distributed systems are becoming increasingly complex, caused by the prevalent use of web services, multi-tier architectures, and grid computing, where dynamic sets of components interact with each other across distributed and heterogeneous computing infrastructures. For these applications to be able to predictably and efficiently deliver services to end users, it is therefore, critical to understand and control their runtime behavior. In a datacenter environment, for instance, understanding the end-to-end dynamic behavior of certain IT subsystems, from the time requests are made to when responses are generated and finally, received, is a key prerequisite for improving application response, to provide required levels of performance, or to meet service level agreements (SLAs). The E2EProf toolkit enables the efficient and nonintrusive capture and analysis of end-to-end program behavior for complex enterprise applications. E2EProf permits an enterprise to recognize and analyze performance problems when they occur -- online, to take corrective actions as soon as possible and wherever necessary along the paths currently taken by user requests -- end-to-end, and to do so without the need to instrument applications -- nonintrusively. Online analysis exploits a novel signal analysis algorithm, termed pathmap, which dynamically detects the causal paths taken by client requests through application and backend servers and annotates these paths with endto- end latencies and with the contributions to these latencies from different path components. Thus, with pathmap, it is possible to dynamically identify the bottlenecks present in selected servers or services and to detect the abnormal or unusual performance behaviors indicative of potential problems or overloads. Pathmap and the E2EProf toolkit successfully detect causal request paths and associated performance bottlenecks in the RUBiS ebay-like multi-tier web application and in one of the datacenter of our industry partner, Delta Air Lines.