Supervised Sparse Learning with Applications in Bioinformatics
Abstract
In machine learning and mathematical optimization, sparse learning is the use of mathematical norms such as L1-norm, group norm and L21-norm in order to seek a trade-off between the goodness-of-fit measure and sparsity of the result. Sparsity of result leads to a parsimonious learning model - in other words, only few features from the data matrix are required to build the learning model and for further interpretation. The motivations of employing sparse learning in bioinformatics are two-fold: firstly, a parsimonious learning model enhances the explanatory power; and secondly, a parsimonious model generally allows better prediction and generalizes better to new data.
This dissertation is a collection of recent advances of sparse learning in bioinformatics, and consists of 1) L21-regularized multi-target support vector regression (L21-MSVR), 2) the application of L21-MSVR in predicting optimal tibial soft-tissue insertion of the human knees, 3) hierarchical sparse group lasso (HSGL), which improves the hierarchical lasso by incorporating an extra group-norm regularization, and 4) the use of HSGL on an electroencephalography (EEG)-based emotion recognition problem. The commonality between these articles is the use of mathematical norms, and improvement from existing optimization formulations in order to learn better and to allow a better interpretation of feature selection.