Show simple item record

dc.contributor.advisorLi, Chengkai
dc.creatorPatel, Foram Pankajbhai
dc.date.accessioned2023-06-14T17:06:14Z
dc.date.available2023-06-14T17:06:14Z
dc.date.created2023-05
dc.date.issued2023-05-10
dc.date.submittedMay 2023
dc.identifier.urihttp://hdl.handle.net/10106/31240
dc.description.abstractThe task of health tweet classification entails identifying whether a given tweet is health-related or not. While existing research in this area has made significant progress in classifying tweets into specific sub-domains of health, such as mental health, COVID-19, or specific diseases, there is a need for a more comprehensive approach that considers a broader range of health-related topics. This thesis addresses this need by proposing a diverse and comprehensive dataset that includes various existing health-related datasets, data collected through a keyword-based approach, and manually annotated data. However, the use of health-related keywords in a figurative or non-health context poses a significant challenge to the classification task. To overcome this challenge, the thesis explores the use of Transformer-based models, such as BERT, BERTweet, RoBERTa, and DistilBERT, which have the ability to understand the contextual meaning of words. The study experiments with these models to assess their effectiveness in classifying health-related tweets. Based on the findings of the thesis study, Transformer-based models, including BERT, DistilBERT, and RoBERTa, had lower F1-scores of 0.882, 0.870, and 0.872, respectively when evaluated on test data. The highest F1-score of 0.900 was achieved by adding the BiLSTM layer to the BERTweet model, which was then fine-tuned on our proposed dataset and RHMD (Reddit Dataset). Additionally, an ablation analysis was conducted to highlight the significance of the BiLSTM layer and the RHMD dataset in enhancing the BERTweet model's performance for health tweet classification.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.subjectHealthcare, Deep Learning, Twitter Data Analysis, Transformers, Classification
dc.titleEnhancing Health Tweet Classification: An evaluation of Transformer-based models for Comprehensive Analysis
dc.typeThesis
dc.date.updated2023-06-14T17:06:14Z
thesis.degree.departmentComputer Science and Engineering
thesis.degree.grantorThe University of Texas at Arlington
thesis.degree.levelMasters
thesis.degree.nameMaster of Science in Computer Science
dc.type.materialtext
dc.creator.orcid0000-0001-5977-3188


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record