Robust Self-training Strategy for Various Molecular Biology Prediction Tasks∗
View/ Open
Date
2022-08-10Author
Ma, Hehuan
Jiang, Feng
Rong, Yu
Guo, Yuzhi
Huang, Junzhou
Metadata
Show full item recordAbstract
Molecular biology prediction tasks suffer the limited labeled data
problem since it normally demands a series of professional experiments to label the target molecule. Self-training is one of the
semi-supervised learning paradigms that utilizes both labeled and
unlabeled data. It trains a teacher model on labeled data, and uses
it to generate pseudo labels for unlabeled data. The labeled and
pseudo-labeled data are then combined to train a student model.
However, the pseudo labels generated from the teacher model are
not sufficiently accurate. Thus, we propose a robust self-training
strategy by exploring robust loss function to handle such noisy
labels, which is model and task agnostic, and can be easily embedded with any prediction tasks. We have conducted molecular
biology prediction tasks to gradually evaluate the performance of
proposed robust self-training strategy. The results demonstrate
that the proposed method consistently boosts the prediction performance, especially for molecular regression tasks, which have
gained a 41.5% average improvement.