Author: Aly, Alaa Mansy Mohamed./ Title: User-Generated Content Analysis<br>of Social Networks using<br>Machine Learning Approaches/

Search In this Thesis

العنوان

User-Generated Content Analysis
of Social Networks using
Machine Learning Approaches/

المؤلف

Aly, Alaa Mansy Mohamed.

هيئة الاعداد

باحث / علاء منسي محمد على

مشرف / طارق فؤاد غريب

مشرف / شيرين راضي عبد الغنى

تاريخ النشر

2023.

عدد الصفحات

114p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Information Systems

تاريخ الإجازة

1/1/2023

مكان الإجازة

جامعة عين شمس - كلية الحاسبات والمعلومات - نظم المعلومات

الفهرس

Only 14 pages are availabe for public view

from

114

from

114

Abstract

Daily and intensive use of social networks has generated a huge content that must be taken into consideration. People communicate with each other and share their lifetimes with family, friends, or even all people across the social domain of the digital world. Companies target their clients through their publicly available pages on social networks, market their products, and expose their services. Also, they offer support to their clients by interacting through humanized chats or chatbots that respond to user inquiries instantly at any time. User-generated content in social networks like posts or chat messages can be analyzed to get valuable insights. Twitter is considered a famous active social network that has many signed-in users sharing their activities, thoughts, and feelings at any time. People post tweets, broadcast live videos, and chat with each other. Companies create and manage a lot of marketing campaigns to promote their products, and even much more services are provided.
These days no one can give up using online social networks because it makes them feel connected all the time. Also, they can express their feelings and emotions whether they are happy, sad, surprised, anticipated, or any other feelings during their online activities. A lot of expressions and words in our daily written text over the web may reflect our feelings. Not only that but also it may affect other people significantly because we believe that every simple word reflects an impact. For example, posting a tweet like, “I got COVID-19 twice even though I have been vaccinated the vaccine is useless !!”. Such simple words can kill a lot of people affected by that virus. Elderly people who have chronic diseases will realize that death is their next step because it makes them feel frustrated. By analyzing that content, everything that may affect a lot of people can be controlled. For Example, social networks can utilize a model for emotion detection (ED) in their platforms as an option to prevent such disappointing statuses from being appeared in their customers’ timelines. In this way, they can control and restrict anxiety, frustration, and much more.
Scientists have summarized ED activities in a set of approaches that determine how exactly emotions are represented. The most famous emotional model is the Discrete Emotion Model (DEM) like Ekman’s model [1] which contains six basic emotions which are anger, fear, disgust, happiness, sadness, and surprise. The other models are the Dimensional Emotion Model (DiEM) like Plutchik’s Emotion Model and Russell’s Circumplex Model [2][3].
Suppose a text presented in a user-generated tweet like,
“غضب دفين يفقد الاشياء الوانها لتصبح رمادية وتنعدم لذة الحياة”, humans can simply understand the context of this sentence by understanding each word based on the understanding of the previous and next words. Also, they can understand the implied emotion of the user who posted the tweet (tweeter) which is sad or angry. These are considered challenges for a machine to interpret.
Words in a sentence are linked to each other in a certain sequence to form meaning, understanding that meaning is called “Contextual Understanding”. Traditional machine learning techniques cannot understand the context very well [4]. Deep learning (DL) sequence models can be utilized to make machines simulate human understanding. Sequence models such as Recurrent Neural Networks (RNNs) can understand the context by memorizing words and getting the relationships between them. But they have shown some shortages known by the problems of vanishing and exploding gradients. Accordingly, new generations of RNNs have been developed to overcome that shortage. For example, LSTM and GRU models can deal with long-term dependencies and tackle the problems mentioned above.
Although there is a lack of Arabic resources and research studies on Arabic contextual understanding, different Arabic pre-trained language models (PLMs) have been developed to support this point. The most famous PLM is called AraBERT [5] which is an Arabic pre-trained language model based on Bidirectional Encoder Representations from Transformers (BERT) [6]. AraBERT has been trained using more than 20GB of Arabic text from different sources like Arabic Wikipedia, Arabic news websites, and others. AraBERT is based on Modern Standard Arabic (MSA), and it gets better results when fine-tuned using MSA datasets. PLMs are considered a part of Transfer Learning (TL) that support the research in this area and tackle the problem of limited resources. Other models have been trained using both MSA and Arabic dialects like “MARBERT” which is utilized in this study to form the ensemble approach after being combined with BiGRU and BiLSTM.
The Arabic language is spoken by around 447 million people worldwide, with 237 million of them being internet users [7]. The Arabic text in social networks is not only written using the normal Arabic typing format, which uses the Arabic alphabet system. There is another way to represent Arabic in social networks by using the Franco-Arabic Typing Style (FATS), which is also called Arabic Chat Alphabet (ACA) or Arabizi. Franco-Arabic (FA) is a combination of Latin characters and numerals that gives similar meanings and pronunciations to Arabic. Youth in the Arab world have been known to use this informal typing style, especially when communicating online or sending text messages on their cell phones [8]. Many young people find it easier and much quicker to write in FA than in the Arabic language, even in different dialects. Understanding the context implicitly represented in FA user-generated text is considered one of the newest challenges in Arabic NLP.
To the best of our knowledge, there is no available pre-trained word embedding models capable of understanding the context written in FA dialects. As a result, two different word embedding models (AraFranco and AraFrancoVec) have been proposed to understand the semantics in user-generated FA text. These models are based on two well-known word embedding models which are FastText and Word2Vec. For evaluation purposes, simple functions have been utilized for getting words’ similarities and their nearest neighbors. For further evaluation, user-generated textual content from Twitter has been analyzed in terms of mining some emotional insights in an emotion-based classification task. The datasets shared in the SemEval-2018: Affect in Tweets (emotion classification task) have been transliterated (converted into FA) and the proposed Franco-Arabic models have been utilized to get the word embeddings. Finally, different deep learning models have been trained after feeding it the word embeddings to classify the user-generated FA tweets into one or more of the implied emotions like anger, disgust, joy, love, optimism, pessimism, fear, and sadness.