Show simple item record

dc.contributor.authorWu, Xiaofeng
dc.contributor.authorRao, Jia
dc.contributor.authorWei, Chen
dc.contributor.authorHuang, Heng
dc.contributor.authorDing, Chris
dc.contributor.authorHuang, Hang
dc.date.accessioned2023-07-25T17:27:02Z
dc.date.available2023-07-25T17:27:02Z
dc.date.issued2021-12-10
dc.identifier.urihttp://hdl.handle.net/10106/31596
dc.description.abstractAccelerators, such as GPU, are a scarce resource in deep learning (DL). Effectively and efficiently sharing GPU leads to improved hardware utilization as well as user experiences, who may need to wait for hours to access GPU before a long training job is done. Spatial and temporal multitasking on GPU have been studied in the literature, but popular deep learning frameworks, such as TensorFlow and PyTorch, lack the support of GPU sharing among multiple DL models, which are typically represented as computation graphs, heavily optimized by underlying DL libraries, and run on a complex pipeline spanning CPU and GPU. Our study shows that GPU kernels, spawned from computation graphs, can barely execute simultaneously on a single GPU and time slicing may lead to low GPU utilization. This paper presents SwitchFlow, a scheduling framework for DL multitasking. It centers on two designs. First, instead of scheduling a computation graph as a whole, SwitchFlow schedules its subgraphs and prevents subgraphs from different models to run simultaneously on a GPU. This results in less interference and the elimination of out-of-memory errors. Moreover, subgraphs running on different devices can overlap with each other, leading to a more efficient execution pipeline. Second, SwitchFlow maintains multiple versions of each subgraph. This allows subgraphs to be migrated across devices at a low cost, thereby enabling low-latency preemption. Results on representative DL models show that SwitchFlow achieves up to an order of magnitude lower tail latency for inference requests collocated with a training job.en_US
dc.language.isoen_USen_US
dc.publisherACMen_US
dc.subjectDeep learning framework, preemption scheduling, systems for machine learningen_US
dc.titleSwitchFlow: Preemptive Multitasking for Deep Learningen_US
dc.typeArticleen_US


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record