カンファレンス (国際) Approximate QoS Rule Derivation Based on Root Cause Analysis for Cloud Computing
Satoshi Konno and Xavier De ́fago
24th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2019)
Ensuring proper quality of service (QoS) is essential for cloud service providers and customers alike. To this end, cloud systems must rely as much as possible on automated and efficient methods of monitoring, introspection, and recovery. In particular, automated recovery is essential to ensure long-term reliability and availability because human intervention is too slow and not every situation can be anticipated. In turn, automated recovery requires both efficient monitoring and accurate identification of root causes to ensure that the same causes will not lead to failures in the future. Current cloud systems use an in-memory time-series database for dynamic analysis or aggregation purposes. When done at all, root cause analysis serves the convenience of reporting and does not need to be very accurate. As a result, recent studies lack details on how to accurately find root causes from time-series monitoring data. This study proposes a novel event-driven monitoring rule inference method based on dynamic case-based reasoning and shape-based root cause analysis. It is designed for autonomous recovery so as to guarantee long-term QoS of cloud systems. The accuracy and performance of the approach are evaluated using realistic monitoring data combining more than a decade of experience as a major cloud service provider (Yahoo). The results show that our approach makes effective use of monitoring data in improving overall QoS and hence opens interesting directions.