Kali: Optimizing Resource Utilization in Distributed Clusters
Decreasing operational costs is a key criterion for organizations that manage compute clusters, such as Amazon, Microsoft, Google, Alibaba, etc. One way to decrease costs it to improve resource utilization in the cluster [13, 14]. Yet, high resource utilization can negatively affect workload performance and thus user satisfaction. Performance degradation happens when workloads running on the same machine compete for shared resources, e.g., a workload that consumes a large portion of memory delays execution of other, memory-intensive workloads. Such competition for resources is referred to as resource interference in the literature.
Existing work on predicting and avoiding interference mainly relies on (a) stress-testing the workloads before scheduling, to estimate their constraints and (b) extracting interference-related constraints while observing real executions. TO BE CONT'D