Show simple item record

dc.contributor.advisorJiang, Hong
dc.creatorWu, Xiaofeng
dc.date.accessioned2023-06-14T17:04:54Z
dc.date.available2023-06-14T17:04:54Z
dc.date.created2023-05
dc.date.issued2023-05-01
dc.date.submittedMay 2023
dc.identifier.urihttp://hdl.handle.net/10106/31210
dc.description.abstractThis thesis addresses the challenges of utilization, efficiency, and scalability faced by deep learning systems, which are essential for high-performance training and serving of deep learning models. Deep learning systems play a critical role in developing accurate and complex models for various applications, including image recognition, natural language understanding, and speech recognition. This research focuses on understanding and developing deep learning systems that encompass data preprocessing, resource management, multi-tenancy, and distributed model training. The thesis proposes several solutions to improve the performance, scalability, and efficiency of deep learning applications. Firstly, we introduce SwitchFlow, a scheduling framework that addresses the limitations of popular deep learning frameworks in supporting GPU sharing and multi-tasking. Secondly, we propose Atom, a distributed training framework for large language models that utilizes decentralized training to reduce communication costs and increase scalability. We discuss the challenges of decentralized training and present the design and implementation of Atom. Lastly, we introduce PerFect, a method that pre-trains the model using repetitive data to improve data processing efficiency and fine-tunes it to achieve the desired accuracy. Our approach provides a significant improvement in the performance, scalability, and efficiency of deep learning applications. Specifically, SwitchFlow reduces interference and eliminates out-of-memory errors by scheduling subgraphs instead of computation graphs as a whole. Additionally, it allows subgraphs running on different devices to overlap with each other, leading to a more efficient execution pipeline. Atom achieves high training throughput and fault-tolerance in a decentralized environment, enabling the training of massive-scale models using affordable hardware such as consumer-class GPUs and Ethernet. Finally, PerFect improves the throughput performance of the data preprocessing stage and achieves the desired accuracy when reusing cached data, without the need for additional hardware or third-party libraries. The proposed frameworks and solutions are evaluated using representative DL models, and the results demonstrate their effectiveness and scalability. Overall, this thesis contributes to the development of deep learning systems and provides practical solutions to the challenges of utilization, efficiency, and scalability, making deep learning applications more accessible and efficient for a wider range of users.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.subjectOptimization
dc.subjectResource utilization
dc.subjectEfficiency
dc.subjectScalability
dc.subjectDeep learning systems
dc.titleOptimizing Resource Utilization, Efficiency and Scalability in Deep Learning Systems
dc.typeThesis
dc.date.updated2023-06-14T17:04:54Z
thesis.degree.departmentComputer Science and Engineering
thesis.degree.grantorThe University of Texas at Arlington
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy in Computer Science
dc.type.materialtext


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record