Personalization and Data Relation Exploration using Predictive Analytics for the Production and Distributed Analysis System (PanDA)

Titov, Mikhail

dc.contributor.advisor	Záruba, Gergely
dc.contributor.advisor	De, Kaushik
dc.creator	Titov, Mikhail
dc.date.accessioned	2016-10-25T19:50:34Z
dc.date.available	2016-10-25T19:50:34Z
dc.date.created	2016-08
dc.date.issued	2016-08-26
dc.date.submitted	August 2016
dc.identifier.uri	http://hdl.handle.net/10106/26138
dc.description.abstract	Efficient data distribution among computing centers is one of the biggest challenges in large-scale scientific distributed computing systems. Such data distribution issues include: i) the rational utilization of storage and computing resources, ii) the minimization of the completion time for data processing (which requires a reduction in redundant data transfers, and intelligent allocation of processing tasks), and iii) user experience enhancement, i.e., availability and fast access to the desired data, and discovery of new relevant data. In the literature and in practice, there have been significant new approaches to the improvement of workflow management to address the above described issues, especially the first two. However, scientific computing systems usually miss out on enhancing user experience, although significant improvements could be done by exploring the relationships between the involved entities, e.g., inter-user, user-data relationships. Such revealed relationships would not only be to the benefit of the users, but could also improve data distribution strategies. The focus of this dissertation is on the discovery of hidden correlations between users and corresponding data, and on the interpretation of the reasons of those correlations in terms of a quantitative assessment. The scientific computing system on which this research is focused is the pilot-job based workload management system called PanDA (Production and Distributed Analysis) that operates at the ATLAS experiment. The dissertation describes a research effort that was conducted to detect data usage patterns in PanDA to validate a thesis that a recommender system would enhance user experience as well as provide important data with which scheduling of computing tasks could be improved. Data mining techniques are investigated and applied to estimate the correlation between users’ data needs, and to collect and manage groupings of data (based on data origin and usage patterns) and users (based on interests and data usage history). This work also presents the design of Data Watcher, a system that can create and maintain user models and thus reveal relationships between users and data. The goal is to be able to analyze, model, and predict user preferences based on estimated ratings and user provided feedback. The core analytics of Data Watcher is based on various recommender system techniques to provide methods in assisting users in finding interesting data (i.e., data similar to what the user has used previously, or relevant data that similar users have used). More precisely, Data Watcher i) can predict the degree of users’ potential interest in particular data, ii) dynamically forms groups of similar objects (groups of similar users, and data collections), and iii) maintains data popularity metrics based on implicit and explicit ratings.
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.subject	Recommender system
dc.subject	Machine learning
dc.title	Personalization and Data Relation Exploration using Predictive Analytics for the Production and Distributed Analysis System (PanDA)
dc.type	Thesis
dc.degree.department	Computer Science and Engineering
dc.degree.name	Doctor of Philosophy in Computer Science
dc.date.updated	2016-10-25T19:51:05Z
thesis.degree.department	Computer Science and Engineering
thesis.degree.grantor	The University of Texas at Arlington
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy in Computer Science
dc.type.material	text

Files in this item

Name:: TITOV-DISSERTATION-2016.pdf
Size:: 2.829Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record