Data Analytics Over Hidden Databases

Dasgupta, Arjun

dc.contributor.author	Dasgupta, Arjun	en_US
dc.date.accessioned	2010-11-01T21:29:06Z
dc.date.available	2010-11-01T21:29:06Z
dc.date.issued	2010-11-01
dc.date.submitted	January 2010	en_US
dc.identifier.other	DISS-10800	en_US
dc.identifier.uri	http://hdl.handle.net/10106/5170
dc.description.abstract	Web based access to databases have become a popular method of datadelivery. A multitude of websites provides access to their proprietary datathrough web forms. In order to view this data, customers use the web forminterface and pose queries on the underlying database. These queries areexecuted and a resulting set of tuples (usually the top-k ones) is served to thecustomer. Top-k along with strict limits on querying are constraints used by thedatabase providers to conserve the power of the underlying data distribution.Delivering limited access only to tuples that satisfy a query enables providers toexpose only a small snippet of the entire inventory at a time. This method of datadelivery prevents analysts from deriving information on the holistic nature of data.Analytical queries on the data statistics are hence blocked through these accessrestrictions. The objective of this work is to provide detailed approaches that obtain resultstowards inferring statistical information on such hidden databases, using theirpublicly available front-end forms. To this end, we first explore the problem ofrandom sampling of tuples from hidden databases. Samples representing theunderlying data open up a proprietary database to a plethora of opportunities bygiving external parties a glimpse into the holistic aspects of the data. Analystscan use samples to pose aggregate queries and gain information on the natureand quality of data. In addition to sampling, we also present efficient techniquesthat directly produce unbiased estimate of various interesting aggregates. Thesetechniques can be also applied to address the more general problem of sizeestimation of such databases. In light of techniques towards inferring aggregates, we introduce and motivatethe problem of privacy preservation in hidden databases from the data provider'sperspective, where the objective is to preserve the underlying aggregates while serving legitimate customers with answers to their form-based queries.	en_US
dc.description.sponsorship	Das, Gautam	en_US
dc.language.iso	en	en_US
dc.publisher	Computer Science & Engineering	en_US
dc.title	Data Analytics Over Hidden Databases	en_US
dc.type	Ph.D.	en_US
dc.contributor.committeeChair	Das, Gautam	en_US
dc.degree.department	Computer Science & Engineering	en_US
dc.degree.discipline	Computer Science & Engineering	en_US
dc.degree.grantor	University of Texas at Arlington	en_US
dc.degree.level	doctoral	en_US
dc.degree.name	Ph.D.	en_US

Files in this item

Name:: Dasgupta_uta_2502D_10800.pdf
Size:: 5.551Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record