TESTING ARTIFICIAL INTELLIGENCE-BASED SOFTWARE SYSTEMS
Abstract
Artificial Intelligence (AI)-based software systems are increasingly used in high-stake and
safety-critical domains, including recidivism prediction, medical diagnosis, and autonomous driving. There is an urgent need to ensure the reliability and correctness of AI-based systems. At the core of AI-based software systems is a machine learning (ML) model that is used to perform tasks such as classification and prediction.
Unlike software programs, where a developer explicitly writes the decision logic, ML models learn the decision logic from a large training dataset. Furthermore, many ML models encode the decision logic in the form of mathematic functions that can be quite abstract and complex. Thus, existing software testing techniques cannot be directly applied to test AI-based applications.
The goal of this dissertation is to develop methodologies for testing AI-based software systems. This dissertation makes contributions in the following areas: Test input generation: 1) A combinatorial approach for generating test configurations to test five classical machine learning algorithms. 2) A combinatorial approach for generating test data (synthetic images) to test Deep Neural Network (DNN) models used in autonomous driving cars. Test Cost Reduction: 3) An empirical study that analyzes the effect of using sampled datasets to test supervised learning algorithms. Explainable AI (XAI): 4) A software fault localization-based explainable AI (XAI)
vii
approach that produces counterfactual explanations for decisions made by image classifier models (DNN models).
This dissertation is presented in an article-based format and includes five research papers. The first paper reports our work on applying combinatorial testing to test five classical machine learning algorithms. The second paper reports an extensive empirical evaluation of testing ML algorithms with sampled datasets. The third paper introduces a combinatorial testing-based approach to generating test images to test pre-trained DNN models used in autonomous driving cars. The fourth paper is an extension of the third paper. This paper presents an initial study that evaluates the performance of combinatorial testing in testing DNNs used in autonomous driving cars. The fifth paper presents an explainable AI (XAI) approach that adopts BEN, an existing software fault localization technique, and produces explanations for decisions made by ML models. All five papers have been accepted at peer-reviewed venues. Paper 1, Paper 2, Paper 3, and Paper 5 have been published, while Paper 4 is currently in press.