Dissertations & Theses
http://hdl.handle.net/10106/11752
2024-03-28T13:34:02ZINVESTIGATING THE EFFECT OF PEEPHOLE OPTIMIZATIONS ON BINARY CODE DIFFERENCES
http://hdl.handle.net/10106/31804
INVESTIGATING THE EFFECT OF PEEPHOLE OPTIMIZATIONS ON BINARY CODE DIFFERENCES
**Please note that the full text is embargoed until 8/1/2025** ABSTRACT: Binary diffing is a technique used to compare and identify differences or similarities in executable files without access to source code. The potential applications of binary diffing in various software security tasks, such as vulnerability search, code clone detection, and malware analysis, have generated a vast body of literature in recent years. One of the recurring themes in binary diffing research is the evaluation of its resilience against the impact of compiler optimization, which is the most common source of syntactic differences in binary code. Despite that most binary diffing tools claim that they are immune to compiler optimization, recent studies have highlighted the need for the research community to revisit this claim, particularly regarding non-default optimization settings and function inlining.
In this study, we investigate the effect of peephole optimization on binary diffing analysis. Peephole optimization is a feature of mainstream compilers that allows local rewriting of the input program. It replaces instruction sequences within a window (i.e., peephole) with shorter, faster, or functionally equivalent instruction sequences. Our research reveals that peephole optimization primarily affects binary code differences at the intra-procedural level, which contradicts the assumptions made by basic-block-centric comparison approaches. We conducted systematic experiments using LLVM’s unit test suite. We also customized Alive2, an LLVM translation validation tool, to isolate the impact of peephole optimization from the overall optimization process.
Our investigation determines the pervasiveness of peephole optimization in the resulting compiled code and explores its effects on current binary diffing techniques. The noticeable decline in performance highlights the importance of considering peephole optimization in the analysis and improvement of binary diffing methodologies. Therefore, our findings suggest that researchers and practitioners should consider the impact of peephole optimization when developing and evaluating binary diffing tools. Further research is necessary to address this challenge and improve the effectiveness of binary diffing in various software security tasks.
2023-09-01T00:00:00ZRESOURCE PROVISIONING FOR DATA-INTENSIVE USER-FACING APPLICATIONS
http://hdl.handle.net/10106/31763
RESOURCE PROVISIONING FOR DATA-INTENSIVE USER-FACING APPLICATIONS
**Please note that the full text is embargoed until 08/01/2024** Data-intensive, User-facing Services (DUSes) such as web searching, digital marketing, online social networking, and online retailing are critical workloads in clouds and datacenters. Meeting stringent query tail-latency Service Level Objectives (SLO) for DUS queries is essential for optimal user experience and business success. However, achieving these objectives is challenging due to the scale-out nature of DUese workloads and the varying resource demands of queries with different fanouts. Additionally, the design and configuration options for clusters significantly impact query performance.
In this dissertation, we present solutions of DUSes performance online and offline optimization. We highlight the importance of reducing query tail latency and the impact on user experience and revenue. We discuss the complexities of meeting tail-latency SLOs considering query fanout and the need to allocate resources accordingly. Furthermore, we explore the wide range of cluster design and configuration options and propose model-based approaches to compare and identify promising configurations.
Through queuing models, we establish the maximum sustainable cluster loads and analyze worker and cluster-level performance. We validate our models through extensive simulation and testing, providing valuable insights for DUSes design and efficient resource planning. Our work contributes to improving user experience, resource optimization and resource provisioning plan in cloud-based DUSes environments.
Overall, our online solution optimized/guaranteed the tail latency while improve resource utilization, and our offline models analysis and findings provide guidance for DUSes service providers, enabling enhanced user experience and effective resource provisioning.
2023-07-27T00:00:00ZFuzz Testing of Zigbee Protocol Implementations
http://hdl.handle.net/10106/31760
Fuzz Testing of Zigbee Protocol Implementations
In recent years, we have witnessed the increasing of the Internet of Things (IoT) devices deployed by many areas, such as home automation, healthcare, manufacture, and smart vehicle. Among the numerous IoT wireless standards available, Zigbee stands out as one of the most globally popular choices, with major companies like Amazon, Samsung, IKEA, Huawei, and Xiaomi incorporating it into their products. Notably, Zigbee has even been utilized in NASA's Mars mission, where it serves as the communication radio between the flying drone and the Perseverance rover.
However, with the rapid growth of Zigbee's global market presence, the incentive for cyber criminal attacks has also escalated. Recent incidents have highlighted severe vulnerabilities in Zigbee protocol implementations, compromising IoT devices from multiple manufacturers. Consequently, conducting security testing on Zigbee protocol implementations has become an imperative task. Nevertheless, applying existing vulnerability detection techniques like fuzzing and data flow analysis to Zigbee protocols is nontrivial, especially when dealing with vendor-specific requirements and low-level hardware events. Additionally, many existing protocol fuzzing tools lack an appropriate execution environment for Zigbee, as it relies on radio communication rather than internet connectivity.
This dissertation aims to address the aforementioned gaps by proposing comprehensive fuzzing solutions tailored to the security testing of Zigbee protocol implementations. The goal is to assist IoT application manufacturers and protocol vendors in mitigating security risks during their development process. The dissertation makes the following contributions: (i) Z-Fuzzer: A device-agnostic fuzzing platform that utilizes code coverage feedback to detect security issues of the Zigbee protocol implementations. (ii) TaintBFuzz: An intelligent Zigbee protocol fuzzing solution via constraint-field dependency inference. (iii) CT-BFuzz: A fuzzing platform with combinatorial approach of Zigbee protocol implementation.
This dissertation is presented in a monograph based format and includes three research articles. The first article introduces our work of Z-Fuzzer that is the first device-agnostic fuzzing tool making fuzzing applicable to detect security problems of Zigbee protocol implementation. The second article reports the work of TaintBFuzz that uses constraint-field dependency inference to augment test input mutation in fuzzing Zigbee protocol implementation. The third article presents CT-BFuzz that optimizes the Zigbee protocol fuzzing via combinatorial test generation to generate test cases for efficiently covering combination values of important message fields. The first two papers have been accepted at peer-reviewed venues, while the third one is currently in press.
2023-07-12T00:00:00ZTowards Nuclei Segmentation with Limited Annotations
http://hdl.handle.net/10106/31753
Towards Nuclei Segmentation with Limited Annotations
Nuclei segmentation is a fundamental but challenging task in histopathology image analysis. For semantic segmentation of nuclei, Convolutional Neural Network (CNN), and Vision Transformer (VT) models give very promising results. However, to successfully train fully-supervised CNN and VT models we need significant amount of annotated data which is highly rare in biomedical domain. Also, collecting an unannotated histopathology dataset first, and then manually doing pixel-level labeling is expensive, time-consuming and tedious process. Therefore, we require to discover a way for training nuclei segmentation models with unlabeled datasets. In this thesis, I present my work towards solving this critical problem by utilizing Adversarial Learning, Self-Supervised Learning (SSL), and Diffusion Models. Thus, my approaches can be summarized into three directions: 1) employing adversarial learning based unsupervised and semi-supervised domain adaptation techniques to solve nuclei segmentation problem for unannotated datasets; 2) proposing SSL based approaches for pre-training VT models with unannotated image dataset; 3) introducing Denoising Diffusion Probabilistic Model (DDPM) based approach for pre-training nuclei segmentation model with large-scale histology image dataset. In the first approach, I apply Unsupervised Domain Adaptation (UDA) and Semi-Supervised Domain Adaptation (SSDA) with the help of another labeled dataset that may come from another organs or sources. Later, I extend the model by utilizing an adversarial learning incorporated reconstruction network to translate the source-domain images to the target domain for further training. Then, in my second approach, I introduce a novel region-level SSL based framework for pre-training semantic nuclei segmentation model with a large-scale unannotated histopathology image dataset extracted from Whole Slide Images (WSI). Additionally, I propose hierarchical, scale, and transformation equivariance loss to reduce the disagreements among predictions. Finally, in the third approach, I utilize DDPM for extracting discriminative and powerful features. Then, I combine a generation module, a discriminator, and scale loss with DDPM for effective label-efficient SSL based pre-training. Extensive and comprehensive experiments demonstrate the superiority of the proposed methods over the baseline models.
2023-09-01T00:00:00Z