COMPREHENSIVE STUDY OF GENERATIVE METHODS ON DRUG DISCOVERY

Xiu, Siyu

View/Open

XIU-THESIS-2019.pdf (8.389Mb)

Date

2019-12-09

Author

Xiu, Siyu

0000-0002-1145-9648

Metadata

Show full item record

Abstract

Observing the recent success of the deep learning (DL) technology in multiple life-changing application areas, e.g., autonomous driving, image/video search and discovery, natural language processing, etc., many new opportunities have presented themselves. One of the biggest ones lies in applying DL in accelerating the drug discovery, where millions of human lives could potentially be saved. However, applying DL into the drug discovery task turns out to be non-trivial. The most successful DL methods take fix-sized tensors/matrices, e.g., images, or sequences of tokens, e.g., sentences with variant numbers of words, as their inputs. However, none of these registers with the inputs of drug discovery, i.e., chemical compounds. Due to the structural nature of the chemical compounds, the graph data structure is often used to represent the atomic data for the compound. Seen as a great opportunity for improvement, deep learning on graph techniques are being actively studied lately. In this paper, we survey the newest academic progress in generative deep learning methods on graphs for drug discovery applications. We will focus our study by narrowing down our scope to one of the most important deep learning generative model, namely Variational AutoEncoder (VAE). We start our survey introduction by dating back to the stage when each molecule atom is treated completely separately and their structural information is completely ignored in VAE. This method is quite limited given their structure information is scraped. We hence introduce the baseline method Grammar Variational AutoEncoder (GVAE) where the chemical representation grammar information is encoded in the modeling. One improvement upon the GVAE is by ensuring the syntax validation in the decoder. This method is named Syntax-Directed Variational AutoEncoder (SDVAE). Since then, a couple of variants of these methods have bloomed. One of them is by encoding and decoding the molecules in two steps, one being junction tree macrostructure with chemical sub-components as the minimum unit and the other one being the microstructure with atom as the minimum unit. This method is named Junction Tree Variational Au-toEncoder (JTVAE). Finally, we introduce another method named GraphVAE where the predefined maximum atom number is enforced in the decoder. Those methods turn out to be effective in avoiding generating invalid molecules. We show the effectiveness of all the methods in extensive experiments. In conclusion, the light of hope has been lit in the drug discovery area with deep learning techniques when a ton of opportunities for growth are still open.

URI

http://hdl.handle.net/10106/28884