Author: Mansour, Ahmed El Sayed Ali Hassan./ Title: Human Activity Recognition Using Machine Learning /

Search In this Thesis

العنوان

Human Activity Recognition Using Machine Learning /

المؤلف

Mansour, Ahmed El Sayed Ali Hassan.

هيئة الاعداد

باحث / احمد السيد على حسن منصور

مشرف / سلوى حسين الرملي

مشرف / حسين عبد العاطي السيد

مشرف / عمـار محمد عمــار

تاريخ النشر

2023.

عدد الصفحات

162 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

1/1/2023

مكان الإجازة

جامعة عين شمس - كلية الهندسة - هندسة الإلكترونيات والاتصالات الكهربية

الفهرس

Only 14 pages are availabe for public view

from

161

from

161

Abstract

Human activity recognition (HAR) is detecting and recognizing human actions using computers and intelligent systems. The HAR systems are categorized according to the sensor type and desired application. from the sensor type view, HAR is categorized into contact-based and remote sensors. Contact-based sensors are body-worn, such as accelerometers, smartwatches, and smartphones, that capture the signals generated from the activity and classify them according to the recognition model. On the other hand, remote sensors such as cameras, WiFi, and radars are used to detect human activities without direct contact with humans. However, from the application type point of view, HAR is categorized into single-human activity and multiple-humans activity. The single human is concerned about detecting a single person’s activity in a specific environment, such as gesture recognition and human-machine interface. However, multiple-person activity is concerned with detecting groups of people simultaneously, such as anomaly detection, monitoring applications, and crowd detection.
Human-object interaction (HOI) detection detects a human’s relationship with an object in still images and videos. Most HOI detection methods rely on appearance features as the primary feature for detecting the relationship between humans and objects. Fur- thermore, the model’s performance is affected by the abundance of false-positive pairs generated by the image’s non-interactive human-object pairs and human-object mis- grouping. Non-interactive pairs that cause false-positive detection and long inference time are considered major problems in HOI detection. Most of research approaches focus on increasing the true-positive pair prediction performance regardless of non-interactive pairs illumination. Moreover, adopting the appearance features affect the inference time due to the large models’ size that are computationally expensive. In this thesis, we propose ”Spatial-Net”, a new HOI detection approach in still images. In the proposed approach, the HOI problem is divided into two main tasks: pair prediction and global rejection. In the pair-prediction task, the spatial relationship is adopted to predict the human-object interaction for each human-object pair using spatial features that con- tains a spatial map which is a single channel image that represents human-object pairs, including body parts and object masks. Relative geometry features such as relative size, relative distance, intersection-over-union between body parts and objects, and weighted distance are used as a body part attention deterministic model. The three sub-models of the pair prediction task are fused using two different techniques, multi-modality fu- sion, and ensemble voting fusion. In multi-modality fusion, the outcome of the three sub-models is fused using the late fusion technique. On the other hand, ensemble voting
x
fusion is performed using weighted voting fusion. In the global-rejection task, an aug- mented model is employed to reject false positive pairs. We use the Hungarian matching technique to assign human-object pairs for each action and the human-centric model to reject the non-interaction human-object pairs according to semantic co-occurrence be- tween human and object. The experimental results on the V-COCO and HICO-Det datasets demonstrate that the proposed Spatial-Net outperforms many state-of-the-art HOI models with less inference time.