Show simple item record

dc.contributor.advisorLi, Chengkai
dc.creatorAkrami, Farahnaz
dc.date.accessioned2021-09-14T13:54:46Z
dc.date.available2021-09-14T13:54:46Z
dc.date.created2021-08
dc.date.issued2021-08-25
dc.date.submittedAugust 2021
dc.identifier.urihttp://hdl.handle.net/10106/29977
dc.description.abstractIn the active research area of employing embedding models for knowledge graph completion, particularly for the task of link prediction, most prior studies used some specific benchmark datasets to evaluate such models. Most triples in those datasets belong to reverse and duplicate relations, which exhibit high data redundancy due to semantic duplication, correlation, or data incompleteness. This is a case of excessive data leakage—a model is trained using features that otherwise would not be available when the model needs to be applied for real prediction. There are also Cartesian product relations for which every triple formed by the Cartesian product of applicable subjects and objects is a true fact. Link prediction on the aforementioned relations is easy and can be achieved with even better accuracy using straightforward rules instead of sophisticated embedding models. A more fundamental defect of these models is that the link prediction scenario, given such data, is non-existent in the real world. This dissertation thoroughly examines the impact of the aforementioned problems on the performance of different embedding models and provides a systematic evaluation of the true effectiveness of the models when the unrealistic triples are removed. Our experiment results show that the reverse triples led to a substantial overestimation of the accuracy of the embedding models. We argue that the popular benchmark datasets are entirely misleading and should not be used anymore. In this dissertation, we also demonstrate the inadequacy of existing evaluation metrics. The frequently used metrics are based on the closed-world assumption and thus have flaws when a model correctly predicts a triple that does not exist in the benchmark dataset. Models are penalized for generating such correct predictions, which contradicts with the exact goal of link prediction—finding correct triples that do not already exist in the knowledge graph. Another limitation of the current metrics is that they aggregate the predictions’ accuracy of all triples into a single value. This makes it impossible to discern the specific strengths and weaknesses of the models for different predictions tasks. We present the results per relation for various models and the results on relations with varying difficulty levels (1-1 vs. 1-N vs. N-M), in order to provide more insights into embedding models and show performance differences that global metrics would not reveal. Link prediction task, the most generic evaluation protocol, verifies that models prioritize correct answers over wrong ones for a question that is already known to have an answer. This evaluation setup can be misleading as we cannot verify if a model ranks false or nonsensical triples lower than correct triples. Hence, we report the results of an alternative protocol called entity-pair ranking to rank all possible triples for a specific relation. Also, the current evaluation protocol is based on the assumption that the presence of a particular property on an entity is already known. The evaluation focuses on whether a model can derive the correct property values. In reality, though, it remains a challenge to determine whether a property is valid for a given entity in the first place. Therefore, we propose the property prediction task as another evaluation protocol. The performances of the models on these two tasks are unsatisfactory and differ considerably from link prediction results. Another way of evaluation is to find the models' performance on triple classification. This task is the binary classification of triples regarding whether they are true or false facts. We compared the classification results using two sets of negative triples, one that complies with type constraints of Freebase and one that violates them. The results show that when negative triples are type consistent, classification performance degrades considerably. The results of these different evaluation protocols suggest that better knowledge graph embedding models or training strategies are needed.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.subjectKnowledge graph
dc.subjectEmbedding models
dc.subjectKnowledge graph completion
dc.titleOn the Efficacy of Knowledge Graph Completion Methods, Accuracy Measures and Evaluation Protocols
dc.typeThesis
dc.degree.departmentComputer Science and Engineering
dc.degree.nameDoctor of Philosophy in Computer Science
dc.date.updated2021-09-14T13:54:47Z
thesis.degree.departmentComputer Science and Engineering
thesis.degree.grantorThe University of Texas at Arlington
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy in Computer Science
dc.type.materialtext


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record