Author: Al-Basiouny, Eman Mohammad Reda Ahmad./ Title: A Deep-Learning Based Visual Tracking Approach /

Search In this Thesis

العنوان

A Deep-Learning Based Visual Tracking Approach /

المؤلف

Al-Basiouny, Eman Mohammad Reda Ahmad.

هيئة الاعداد

باحث / إيمان محمد رضا أحمد البسيوني

مشرف / حازم محمود عباس

مشرف / حسام الدين حسن عبد المنعم

مشرف / عبد الفتاح عطيه هليل

تاريخ النشر

2023.

عدد الصفحات

130 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

الهندسة (متفرقات)

تاريخ الإجازة

1/1/2023

مكان الإجازة

جامعة عين شمس - كلية الهندسة - هندسة الحاسبات والنظم

الفهرس

Only 14 pages are availabe for public view

from

130

from

130

Abstract

Deep learning algorithms provide an unprecedented level of visual tracking robustness, but achieving an acceptable performance is still difficult due to the natural, continu- ous changes in the characteristics of foreground and background objects across videos. Among the most influential factors on the robustness of tracking algorithms is the se- lection of network hyperparameters, especially the network architecture and the depth. We constructed two models on an ordinary convolutional neural network (CNN), which consists of feature extraction and binary classifier networks. We integrated a generative adversarial network (GAN) into the CNN to enhance the tracking results through an adversarial learning process performed during the training phase. We used the discrim- inator as a classifier and the generator as a store that produces feature-level data with different appearances by applying masks to the extracted features.
In our first model, we present a competition between a convolutional and multilayer per- ceptron (MLP) generative models with the same depth to determine which architecture is more able to extract the most robust features from the input video frames. The two networks are trained offline via adversarial learning process, then the tracking and online fine-tuning process starts after removing the generator. The experiments showed that MLP generator is more able to extract main features which remain for a long temporal span, but the convolutional generator is more able to extract the spatial features which occur in the individual frames. The two networks showed a strong competition to the state-of-the-art visual trackers.
A robust visual tracking network using a very deep generator (RTDG) is proposed in the second model. In this study, we investigated the role of increasing the number of fully connected (FC) layers in adversarial generative networks and their impact on robustness. We used a very deep FC network with 22 layers as a high-performance generator for the first time. This generator is used via adversarial learning to augment the positive samples to reduce the gap between the hungry deep learning algorithm and the available training data to achieve robust visual tracking. The experiments showed that the proposed framework performed well against state-of-the-art trackers.