Sentiment analysis of dialectical Arabic social media content using a hybrid linguistic-machine learning approach

Alwakid, GN, 2020. Sentiment analysis of dialectical Arabic social media content using a hybrid linguistic-machine learning approach. PhD, Nottingham Trent University.

[thumbnail of Ghadah Alwakid 2021.pdf]
Preview
Text
Ghadah Alwakid 2021.pdf - Published version

Download (2MB) | Preview

Abstract

Despite the enormous increase in the number of Arabic posts on social networks, the sentiment analysis research into extracting opinions from these posts lags behind that for the English language. This is largely attributed to the challenges in processing the morphologically complex Arabic natural language and the scarcity of Arabic NLP tools and resources. This complex task is further exacerbated when analysing dialectal Arabic that do not abide by the formal grammatical structure. Based on the semantic modelling of the target domain’s knowledge and multi-factor lexicon-based sentiment analysis, the intent of this research is to use a hybrid approach, integrating linguistic and machine learning methods for sentiment analysis classification of dialectal Arabic. First, a dataset of dialectal Arabic tweets was collected focusing on the unemployment domain, which is annotated manually. The tweets cover different dialectal Arabic in Saudi Arabia for which a comprehensive Arabic sentiment lexicon was constructed. This approach to sentiment analysis also integrated a novel light stemming mechanism towards improved Saudi dialectal Arabic stemming. Subsequently, a novel multi-factor lexicon-based sentiment analysis algorithm was developed for domain-specific social media posts written in dialectal Arabic. The algorithm considers several factors (emoji, intensifiers, negations, supplications) to improve the accuracy of the classifications. Applying this model to a central problem of sentiment analysis in dialectical Arabic, these operational techniques were deployed in order to assess analytical performance across social media channels which are vulnerable to semantic and colloquial variations. Finally, this study presented a new hybrid approach to sentiment analysis where domain knowledge is utilised in two methods to combine computational linguistics and machine learning; the first method integrates the problem domain semantic knowledgebase in the machine learning training features set, while the second uses the outcome of the lexicon-based sentiment classification in the training of the machine learning methods. By integrating these techniques into a single, hybridised solution, a greater degree of accuracy and consistency was achieved than applying each approach independently, confirming a pragmatic solution to sentiment classification in dialectical Arabic text.

Item Type: Thesis
Creators: Alwakid, G.N.
Date: January 2020
Rights: This work is the intellectual property of the author. You may copy up to 5% of this work for private study, or personal, non-commercial research. Any re-use of the information contained within this document should be fully referenced, quoting the author, title, university, degree level and pagination. Queries or requests for any other use, or if a more substantial copy is required, should be directed in the owner(s) of the Intellectual Property Rights.
Divisions: Schools > School of Science and Technology
Record created by: Linda Sullivan
Date Added: 10 Mar 2021 15:12
Last Modified: 31 May 2021 15:05
URI: https://irep.ntu.ac.uk/id/eprint/42474

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year