Understanding and Optimizing Parallel Performance in Multi-tenant Cloud

Zhao, Yong

dc.contributor.advisor	Rao, Jia
dc.creator	Zhao, Yong
dc.date.accessioned	2019-08-27T20:23:46Z
dc.date.available	2019-08-27T20:23:46Z
dc.date.created	2019-08
dc.date.issued	2019-08-12
dc.date.submitted	August 2019
dc.identifier.uri	http://hdl.handle.net/10106/28618
dc.description.abstract	As a critical component of resource management in multicore systems, fair schedulers in hypervisors and operating systems (OSes) must follow a simple invariant: guarantee that the computing resources such as CPU cycles are fairly allocated to each vCPU or thread. As simple as it may seem, we found this invariant is broken when parallel programs with blocking synchronization are colocated with CPU intensive programs in hypervisors such as Xen, KVM and OSes such as Linux CFS. On the other hand, schedulers in virtualized environment usually reside in two different layers: one is in the hypervisor which aims to schedule vCPU onto each pCPU and another is in the virtual machine to schedule the processes. Such design principle will impose an implicit scheduling gap between these two layers such that threads holding the lock or waiting for the lock in the virtual machine can be inadvertently descheduled by hypervisors. This behavior will cause the well known LHP and LWP problems which can seriously degrade the performance of parallel applications. While the cloud is believed to be an ideal platform for hosting parallel applications, its nature of multi-user sharing and resource over-commitment makes parallel performance often quite disappointing and unpredictable. Although many research works have identified the excessive synchronization delays such as LWP and LHP due to multi-tenant interferences as the culprit, there lacks a full understanding of the quantitative relationship between changes in synchronization and the overall performance loss. As performance modeling plays a fundamental role in designing traditional parallel systems, a systematic and quantitative study of parallel performance under cloud interferences would help improve the resource and power management in datacenters. This dissertation explores two fundamental questions towards the solutions for the scheduling unfairness and inefficiency in multicore systems: why does the schedulers ``unexpectedly'' show to be unfairness under the common belief that scheduling algorithms have been stable for many years and are already perfect? Why does the schedulers exhibit effectively regarding to scheduling the parallel applications in physical environment but perform badly in the cloud? The goal of this dissertation is to enable multicore systems to proactively anticipate and defend against the scheduling unfairness and inefficiency, rather than reacting to their manifestations and consequences. This dissertation presents three key principles of systems design and implementation for rethinking and redesigning the scheduling algorithms in multicore systems against the unfairness and inefficiency---preemptive multiple queue fair queuing, interference-resilient scheduling, and differential scheduling. This dissertation demonstrates that applying these principles can effectively defend scheduling unfairness and inefficiency in multicore systems. Furthermore, this dissertation also presents the corresponding techniques and tools support that can automatically and systematically apply these principles into existing multicore systems. Scheduling algorithms which originated from scheduling the packets in single-linked network were widely used in computer systems, however scheduling unfairness are unexpectedly manifested through scaling these algorithms from single core to multicore systems. Scheduling inefficiency are usually caused by the implicit semantic gap existing in the virtualized environment. Thus, this dissertation has modified the design of single-linked scheduling algorithm to make them to be fairness in face of the multiple-linked network and furthermore applied them into the multicore system scheduling to eliminate the unfairness. Instead of leaving the scheduling activities in two virtualized layers transparently for each other, this dissertation first characterized the performance of parallel applications under interference and then proposed methods to bridging the semantic gap in order to remove the scheduling inefficiency.
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.subject	Operating system
dc.subject	Computer architecture
dc.subject	Virtualization
dc.subject	Parallel computing
dc.title	Understanding and Optimizing Parallel Performance in Multi-tenant Cloud
dc.type	Thesis
dc.degree.department	Computer Science and Engineering
dc.degree.name	Doctor of Philosophy in Computer Science
dc.date.updated	2019-08-27T20:24:52Z
thesis.degree.department	Computer Science and Engineering
thesis.degree.grantor	The University of Texas at Arlington
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy in Computer Science
dc.type.material	text

Files in this item

Name:: ZHAO-DISSERTATION-2019.pdf
Size:: 2.376Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record