MACHINE LEARNING METHODS TO IMPROVE FAIRNESS AND PREDICTION ACCURACY ON LARGESOCIALLY RELEVANT DATASETS

Jain, Bhanu Chaturvedi

dc.contributor.advisor	Elmasri, Ramez
dc.contributor.advisor	Huber, Manfred
dc.creator	Jain, Bhanu Chaturvedi
dc.date.accessioned	2022-08-30T15:09:53Z
dc.date.available	2022-08-30T15:09:53Z
dc.date.created	2021-08
dc.date.issued	2021-08-16
dc.date.submitted	August 2021
dc.identifier.uri	http://hdl.handle.net/10106/30921
dc.description.abstract	Machine learning-based decision support systems bring relief to the decision-makers in many domains such as loan application acceptance, dating, hiring, granting parole, insurance coverage, and medical diagnoses. These support systems facilitate processing tremendous amounts of data to decipher the embedded patterns. However,these decisions can also absorb and amplify bias embedded in the data. An increasing number of applications of machine learning-based decision sup-port systems in a growing number of domains has directed the attention of stake-holders to the accuracy, transparency, interpretability, cost effectiveness, and fairness encompassed in the ensuing decisions. In this dissertation, we have focused on fairness and accuracy embodied in such predictions. When making machine learning based forecasts, there are a series of sub-problems within the overarching problem of addressing bias and accuracy in decisions that we address in this work: 1) detecting bias in the predictions, 2) increasing accuracy in predictions, 3) increasing prediction accuracy without tampering the class labels and while excluding sensitive attributes that trigger bias, 4) quantifying bias in a model, and finally 5) reducing a model’s bias during the training phase. In this dissertation we develop machine learning methods to address the aforementioned problems to improve fairness and prediction accuracy while using three large socially relevant datasets in two different domains. One of these two Department of Justice recidivism datasets as well as the Census-based adult income-based datasets hold significant demographic information. The second recidivism dataset is more feature rich and holds information pertaining to criminal history, substance-abuse, and treatments taken during incarceration and thus provides a rich contrast to the largely demographic datasets when comparing fairness in predicted results. Our approach is focused on data preparation, feature enrichment in activity and personal history-based datasets, model design, and inclusion of loss function regularization alongside the traditional binary cross entropy loss to increase both fairness and accuracy. We achieve this without tampering with the class labels and without balancing the datasets. To stay squarely focused on fairness, we do not include the sensitive attributes in our input features while training the models. In the experiments we show that we can increase accuracy and fairness in the predictions based on the three dataset beyond what has been achieved in the published literature. The results demonstrate that our fairness improvement approach via loss functions is applicable in different domains with different sensitive attributes and can be applied without manipulating class labels or balancing skewed datasets.
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.subject	Accuracy
dc.subject	Bias
dc.subject	Fairness
dc.subject	Machine learning
dc.subject	Deep learning
dc.subject	Artificial intelligence recidivism
dc.subject	Income prediction
dc.subject	FPR
dc.subject	FNR
dc.subject	TPR
dc.subject	TNR
dc.subject	Loss functions
dc.subject	Prediction
dc.subject	Race-based bias
dc.subject	Gender-based bias
dc.subject	BPS
dc.subject	Bias parity score
dc.title	MACHINE LEARNING METHODS TO IMPROVE FAIRNESS AND PREDICTION ACCURACY ON LARGESOCIALLY RELEVANT DATASETS
dc.type	Thesis
dc.degree.department	Computer Science and Engineering
dc.degree.name	Doctor of Philosophy in Computer Science
dc.date.updated	2022-08-30T15:09:53Z
thesis.degree.department	Computer Science and Engineering
thesis.degree.grantor	The University of Texas at Arlington
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy in Computer Science
dc.type.material	text

Files in this item

Name:: JAIN-DISSERTATION-2021.pdf
Size:: 3.144Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record