Search In this Thesis
   Search In this Thesis  
العنوان
Deep Neural Network for Virus Mutation
Prediction /
المؤلف
Takwa Mohamed Sayed Sayed ,
هيئة الاعداد
باحث / Takwa Mohamed Sayed Sayed
مشرف / AKram Ibrahim Salah
مشرف / Essam Halim Houssein
مشرف / Sabah Sayed
الموضوع
Computer Science
تاريخ النشر
2022.
عدد الصفحات
108 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Networks and Communications
تاريخ الإجازة
1/1/2021
مكان الإجازة
جامعة القاهرة - كلية الحاسبات و المعلومات - Computer Science
الفهرس
Only 14 pages are availabe for public view

from 108

from 108

Abstract

Data analysis and machine learning have become an integrative part of the modern sci-
entific methodology, offering automated procedures for the prediction of a phenomenon
based on past observations, unraveling underlying patterns in data and providing insights
about the problem. Yet, caution should avoid using machine learning as a black-box tool,
but rather consider it as a methodology, with a rational thought process that is entirely
dependent on the problem under study. In particular, the use of algorithms should ideally
require a reasonable understanding of their mechanisms, properties and limitations, in or-
der to better apprehend and interpret their results.
In this thesis,Viral progress remains a major deterrent in the viability of antiviral drugs.
The ability to anticipate this development will provide assistance in the early detection of
drug-resistant strains and may encourage antiviral drugs to be the most effective plan. In
recent years, a deep learning model called the seq2seq neural network has emerged and
has been widely used in natural language processing. In this thesis, we borrow this ap-
proach in predicting next generation sequences using the seq2seq LSTM neural network
while considering these sequences as text data. We used hot single vectors to represent the
sequences as input to the model; subsequently, it maintains the basic information position
of each nucleotide in the sequences. Four RNA viruses sequence datasets used to evaluate
the proposed model which achieved encouraging results. The achieved results illustrate
the potential for utilizing LSTM neural network for DNA and RNA sequences in solving
other sequencing issues in bioinformatics.
The first part of this work studies the induction of sequence to sequence models and mo-
tivating their design and purpose whenever possible. Our contributions follow with an
original complexity analysis of, showing their good computational performance and scal-
ability, along with an in-depth discussion of their implementation details, as contributed
within Scikit-Learn.
In the second part of this work, we analyze and discuss the biological data in the eyes of
deep learning models. The core of our contributions rests in the representation and pre-
possessing of biological data to deal with machine learning algorithms. In consequence of
this work, our analysis demonstrates that preparing sequencing data in a new way lead up
to optimal accuracy.
The proposed approach consists of four main phases. In the first phase, sequences of
datasets are preprocessed. In the second phase, once we have preprocessed sequences of
data, they are transformed into a format that is suitable for training an LSTM network. In
this case, a one-hot encoding of the integer values is used where each value is represented
by a binary vector that is all “0” values except the pointer to the word, which is set to 1.