Show simple item record

dc.contributor.advisorLi, Chengkai
dc.contributor.advisorElmasri, Ramez
dc.creatorJayaram, Nandish
dc.date.accessioned2016-07-08T19:58:41Z
dc.date.available2016-07-08T19:58:41Z
dc.date.created2016-05
dc.date.issued2016-05-05
dc.date.submittedMay 2016
dc.identifier.urihttp://hdl.handle.net/10106/25764
dc.description.abstractThere is a pressing need to tackle the usability challenges in querying massive, ultraheterogeneous entity graphs which use thousands of node and edge types in recording millions to billions of entities (persons, products, organizations) and their relationships. Widely known instances of such graphs include Freebase, DBpedia and YAGO. Applications in a variety of domains are tapping into such graphs for richer semantics and better intelligence. Both data workers and application developers are often overwhelmed by the daunting task of understanding and querying these data, due to their sheer size and complexity. To retrieve data from graph databases, the norm is to use structured query languages such as SQL, SPARQL, and those alike. However, writing structured queries requires extensive experience in query language, data model and the datasets themselves. In this dissertation, as an initial step toward improving the usability of query systems for large graphs, we present two novel and first-of-its-kind systems: Orion and GQBE. The database community has long recognized the importance of graphical query interface to the usability of data management systems. Yet, relatively little has been done. vi Existing visual query builders allow users to build queries by drawing query graphs, but do not offer suggestions to users regarding what nodes and edges to include. At every step of query formulation, a user would be inundated with possibly hundreds of or even more options. We present Orion, a visual query interface that iteratively assists users in query graph construction by making suggestions using machine learning methods. In its active mode, Orion suggests top-k edges to be added to a query graph, without being triggered by any user action. In its passive mode, the user adds a new edge manually, and Orion suggests a ranked list of labels for the edge. Orion’s edge ranking algorithm, Random Decision Paths (RDP), makes use of a query log to rank candidate edges by how likely they are predicted to match users’ query intent. Extensive user studies using Freebase demonstrated that Orion users have a 70% success rate in constructing complex query graphs, a signifi- cant improvement over the 58% success rate by users of a baseline system that resembles existing visual query builders. Furthermore, using active mode only, the RDP algorithm was compared with several methods adapting other machine learning algorithms such as random forests and naive Bayes classifier, as well as recommendation systems based on singular value decomposition and class association rules. On average, RDP required only 40 suggestions to correctly reach a target query graph while other methods required 1.5-4 times as many suggestions. We also propose to query large graphs by example entity tuples, without requiring users to form complex graph queries. Our system, GQBE (Graph Query By Example), provides a complementary approach to the existing keyword-based methods, facilitating user-friendly graph querying. GQBE automatically discovers a weighted hidden maximum query graph based on input query tuples, to capture a user’s query intent. It then efficiently finds and ranks the top approximate matching answer graphs and answer tuples. GQBE also lets users provide multiple example tuples as input, and efficiently uses them to better capture the user’s query intent. User studies with Freebase demonstrated that GQBE’s vii ranked answer tuple list has a strong positive correlation with the users’ ranking preferences. Other extensive experiments showed that GQBE has a significantly better accuracy than other state-of-the-art systems. GQBE was also faster than NESS (one of the compared systems) for 17 of the 20 queries used in the experiments, and was 3 times faster for 10 of them
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.subjectQuery formulation
dc.subjectQuery specification
dc.subjectQuerying large graphs
dc.subjectUltra-heterogeneous graphs
dc.titleTOWARDS BETTER USABILITY OF QUERY SYSTEMS FOR MASSIVE ULTRA-HETEROGENEOUS GRAPHS: NOVEL APPROACHES OF QUERY FORMULATION AND QUERY SPECIFICATION
dc.typeThesis
dc.degree.departmentComputer Science and Engineering
dc.degree.nameDoctor of Philosophy in Computer Science
dc.date.updated2016-07-08T19:59:12Z
thesis.degree.departmentComputer Science and Engineering
thesis.degree.grantorThe University of Texas at Arlington
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy in Computer Science
dc.type.materialtext


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record