ApproxML: Efficient Approximate Ad-Hoc ML Models Through Materialization and Reuse

Ghaderi, Faezeh

ATTENTION: The works hosted here are being migrated to a new repository that will consolidate resources, improve discoverability, and better show UTA's research impact on the global community. We will update authors as the migration progresses. Please see MavMatrix for more information.

View/Open

GHADERI-THESIS-2019.pdf (930.9Kb)

Date

2019-12-10

Author

Ghaderi, Faezeh

0000-0002-4275-5247

Metadata

Show full item record

Abstract

Machine Learning (ML) has become an essential tool in answering complex predictive analytic queries. Model building for large scale datasets is one of the most time-consuming parts of the data science pipeline. Often data scientists are willing to sacrifice some accuracy in order to speed up this process during the exploratory phase. In this report, we aim to demonstrate ApproxML, a system that efficiently constructs approximate ML models for new queries from previously constructed ML models using the concepts of model materialization and reuse. ApproxML supports a wide variety of ML models such as generalized linear models for supervised learning and K-Means and Gaussian Mixture model for unsupervised learning. The Implementation is compatible with different datasets and ML algorithms, as it is a cost-based optimization framework that identifies best reuse strategy at query time.

URI

http://hdl.handle.net/10106/28915