Machine Learning Framework for Nonlinear and Interaction Relationships Involving Categorical and Numerical Features
Abstract
Traditionally, physical scientific experiments have been conducted extensively to
study and understand the behavior of a process or a system. With the advancement
of computing technology in recent years, computer codes and algorithms are used as
simulators to replicate behavior of a complex system. Such use of computers to study
a system is termed as ‘computer experiments.’ The process involves selecting specific
points or runs in the design space in order to maximize information about the system
in minimal runs. These computer models are high dimensional and can take a long
time to simulate. Metamodels (or surrogate models) built using the data collected from
computer model experiments are hence used to approximate the functional relationship
between inputs and outputs.
The contribution of this dissertation falls in design points selection and modeling
stages of the above process. First, existing computer experiments with mixed factors
(categorical and numerical) are reviewed and then we perform a comprehensive study
of these designs to understand their performance under various settings. In the latter
part of the thesis, we propose a data-mining framework to learn and model interactions
and non-linearity with categorical and numerical features.