Author: Abd ElRazik, Mona Nagy Ahmed,/ Title: Face and Speech Recognition Using Artificial Intelligence.

Search In this Thesis

العنوان

Face and Speech Recognition Using Artificial Intelligence.

الناشر

Mona Nagy Ahmed Abd ElRazik,

المؤلف

Abd ElRazik, Mona Nagy Ahmed,

هيئة الاعداد

باحث / Mona Nagy Ahmed Abd ElRazik

مشرف / Gamal Mohamed Behery

مشرف / Reda ElSaid ElBarougy

الموضوع

Artificial Intelligence. Face and Speech Recognition.

تاريخ النشر

2020.

عدد الصفحات

200 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

الرياضيات (المتنوعة)

الناشر

Mona Nagy Ahmed Abd ElRazik,

تاريخ الإجازة

1/1/2020

مكان الإجازة

جامعة دمياط - كلية العلوم - الرياضيات

الفهرس

Only 14 pages are availabe for public view

from

233

from

233

Abstract

One of the most important stages for face recognition (FR) that affects remarkably and effectively in increasing accuracy and overcoming the inconvenience variables of the face is feature extraction (FE). In pursuing this aim, a new technique for FE from human face is proposed that is called relative gradient magnitude strength (RGMS). The main core of RGMS is to build a novel technique for FE that can overcome the inconvenience variables which occur in unstructured environments. RGMS extracts the local feature vectors from the human face based on the magnitude of gradients. One of RGMS advantages is to give high accuracy despite the existence of these variables compared to many other famous techniques. The second core of this thesis is to build a new system for recognizing the emotional speech that addresses the shortcomings of GMM that is well known and widely used in ESR and increasing the recognition accuracy. So, weighted distance optimization system (WDOS) is proposed based on two new different weighted distance functions. The proposed system comprise of three steps: initialization, weighted distance, and maximization or minimization based on the used weighted distance function. Experimental results have been presented to check the effectiveness of the WDOS. The WDOS has achieved a considerable success through a comparative study of all emotional states and an individual emotional state characteristic. Also, these results favor WDOS in comparison with GMM and k-mean classifiers.
This thesis consists of four chapters:
Chapter one: This chapter presents the motivations behind the introducing of this thesis, the two main problems of this thesis, and how the thesis contributes to solving these problems.
Chapter two: This chapter introduces the background about the pattern recognition, very important terminologies and definitions that will be used throughout this thesis, and the pattern recognition system. Then, a brief summary of the feature extraction that is considered the first stage in the PR system and some methods for it: MFCC, HOG, and LBP are presented. Also, it presents a brief summary to the feature reduction, the second stage in PR system, and some methods for it such as PCA, KPCA, LDA, LPP, NPE, SPP, NMF, and LNMF. It presents a brief summary to the feature selection, the third stage in PR system, and some methods for it. There are three schemes for feature selection algorithms: Filter, wrapper, and embedded that based on how they combine the selection algorithm and the model building. Finally, it presents a brief description for the types of learning algorithms: supervised and unsupervised learning algorithms and some algorithms for learning. Artificial neural networks and deep learning are presented and used through this thesis because they are considered the most powerful algorithms of learning at this time.
Chapter three: This chapter presents the background about the speech signal, the importance of speech signal processing, the mechanics of producing and perceiving speech in the human beings, and the anatomy and physiology of speech production process. Then, the nature of the emotions for the speech signal, the psychological views of the emotions, the models of emotions, and the features of emotion are presented. After that, the emotional speech recognition system is introduced. Finally, two new systems for Japanese emotional speech recognition based on proposing two new weighted distance functions are proposed. The main cores for the proposed systems are to address the GMM shortcomings and increasing the recognition accuracy. The proposed systems comprise of three steps: initialization, weighted distance, and maximization or minimization steps. The experimental results and discussions are presented and prove that the proposed systems outperform GMM and k-mean.
Chapter four: This chapter presents what is the face, its main features, and the information in it. Also, the face recognition history and the face recognition obstacles are presented. After that, the face recognition system is presented. The main focus of this chapter is to propose a novel system for FR that can overcome the face recognition obstacles such as illumination, posture, wearing of glasses/ scarves, or face expression. This system is called relative gradient magnitude strength (RGMS) that is used for extracting the features. The resulting features have high dimensions, thus DNNs are used depending on a softmax activation function in an output layer that assigns a likelihood to a particular label. The experiments were conducted on four widespread datasets: ORL, YALE, AR, and UMIST. The results have been shown that the proposed system is very effective compared with many classifiers for FR under distinct conditions.