Author: Aboali, Mohamed Abdelatif./ Title: Content Based Image Retrieval / Concept-Based Image Indexing.

Search In this Thesis

العنوان

Content Based Image Retrieval / Concept-Based Image Indexing.

المؤلف

Aboali, Mohamed Abdelatif.

هيئة الاعداد

باحث / محمد عبد اللطيف ابوعلي

مشرف / حسام الدين حسن عبد المنعم

مشرف / اسلام احمد محمود المداح

تاريخ النشر

2023.

عدد الصفحات

92 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

1/1/2023

مكان الإجازة

جامعة عين شمس - كلية الهندسة - هندسة الحاسبات والنظم

الفهرس

Only 14 pages are availabe for public view

from

Abstract

In this research, three Content-Based Image Retrieval, CBIR, different methodologies are proposed. The query inputs of the proposed methodologies are an image and a text. For instance, having an image, a user would like to obtain a similar one with some modification described in text format that we refer to as a text-modifier. Two of the proposed methodologies, the first and the second, augment the ”Text Image Residual Gating function” (TIRG), a composition function between an image and text, by adding a new trained module to enhance the relationship between the composed image-text features and the target image features. The two trained modules used in this study are Multiple Linear Regression, MLR, and Non-Linear Multi-layered Perceptron, NMLP. The performance of the two was closely related. The NMLP models, in general, outperform TIRG over the validation dataset. The two approaches indicate due to the early composition in TIRG function both methods failed to show this enhancement on the testing dataset performance. from this conclusion, the third methodology was proposed. The proposed architecture, a new feature composition function, uses a set of neural networks that operate in feature space and perform feature composition in a uniform-known domain which is the textual feature domain. In this methodology, ResNet is used to extract image features and LSTM to extract text features to form query inputs. The proposed methodology uses a set of three single-hidden layer non-linear feedforward networks in a cascading structure labeled NetA, NetC, and NetB. NetA maps image features into corresponding textual features. NetC composes the textual features produced by NetA with text-modifier features to form target image textual features. NetB maps target textual features to target image features that are used to recall the target image from the image dataset based on cosine similarity. The proposed architecture was tested using ResNet 18, 50 and
152 to view the effect of image features on the new feature composition function. The testing results are promising and can compete with the most recent approaches to our knowledge.