Show simple item record

dc.contributor.advisorElmasri, Ramez
dc.creatorAlshemaimri, Bader
dc.date.accessioned2022-06-28T15:10:35Z
dc.date.available2022-06-28T15:10:35Z
dc.date.created2022-05
dc.date.issued2022-04-06
dc.date.submittedMay 2022
dc.identifier.urihttp://hdl.handle.net/10106/30366
dc.description.abstractDatabase code fragments exist in software systems by using SQL as the stan- dard language for relational databases. Traditionally, developers bind databases as backends to software systems for supporting user applications. However, these bind- ings are low-level code and implemented to persist user data, so Object Relational Mapping (ORM) frameworks take place to abstract database access details. These approaches are prone to problematic database code fragments that negatively im- pact the quality of software systems. In the first part of the dissertation, we survey problematic database code fragments in the literature and examine antipatterns that occur in low-level database access code using SQL and high-level counterparts in ORM frameworks. We also study problematic database code fragments in different popular software architectures such as Service Oriented Architecture (SOA), Microservice Ar- chitecture (MA), and Model View Controller (MVC). We create a novel categorization of both SQL schema and query antipatterns in terms of performance, maintainability, portability, and data integrity. In the second part of this dissertation, we create NLP patterns that support data architects when modeling and naming data element definitions. We design and develop rule-based natural language processing (NLP) techniques to automatically extract standardized data element names from data element definitions written in American English. The goal is to study how using NLP techniques can improve the accuracy of extracting standardized data element names in a domain-independent context. It is a challenge to come up with NLP patterns in natural language definitions as opposed to unambiguous code. To achieve automated data element naming, we first identify heuristic patterns that mine noun phrases and relationships from data element definitions. Then, we use these noun phrases and relationships as input to determine components of data element names. The output of the patterns is reviewed by a domain expert. We apply our method to extract the five standard components of a data element name in the Railway and Transportation domains. We first achieved 80% accuracy, then by improving the rules and adding a similarity function using knowledge graphs, we improved the accuracy to 95% in our final experiments. We also introduce our tool entitled as Data Element Naming Automation (DENA) tool. The tool consists of four components: DENA NLP, DENA assem- bly, preprocessing, and duplicate checker. In the last part of the dissertation, we propose how we preprocess data element definitions and evaluate the deduplication detection.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.subjectData management
dc.subjectArtificial intelligence
dc.titleUsing antipatterns to improve database code fragments, and utilizing knowledge graphs and NLP patterns to extract standardized data element names
dc.typeThesis
dc.degree.departmentComputer Science and Engineering
dc.degree.nameDoctor of Philosophy in Computer Science
dc.date.updated2022-06-28T15:10:35Z
thesis.degree.departmentComputer Science and Engineering
thesis.degree.grantorThe University of Texas at Arlington
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy in Computer Science
dc.type.materialtext


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record