Author: Soliman, Mohammed Essam Abd El Samee./ Title: Hardware Accelerators For Neural Networks /

Search In this Thesis

العنوان

Hardware Accelerators For Neural Networks /

المؤلف

Soliman, Mohammed Essam Abd El Samee.

هيئة الاعداد

باحث / محمد عصام عبد السميع سليمان

مشرف / محمد محمود أحمد طاهر

مناقش / محمد شرف إسماعيل سيد

مناقش / محمد واثق علي كامل الخراشي

تاريخ النشر

2023.

عدد الصفحات

125 P. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

هندسة النظم والتحكم

تاريخ الإجازة

1/1/2023

مكان الإجازة

جامعة عين شمس - كلية الهندسة - قسم هندسة الحاسبات والنظم

الفهرس

Only 14 pages are availabe for public view

from

125

from

125

Abstract

An accelerator is a specialized hardware unit that performs defined tasks with higher performance or more energy efficient than a general-purpose CPU. Recently, advances in the Internet Of Things (IoT), video imaging analytics, autonomous vehicles, and Artificial Intelligence (AI) robotics have created a significant demand to develop and improve hardware accelerators. Video Processing Units (VPU) and GPUs held the largest market share by more than 60% in 2020. VPUs and GPUs are based on instruction-level accelerators that are designed for single primitives, like arithmetic operations. As a result, both academia and industry have proposed numerous methods to optimize arithmetic operations and improve accelerators’ efficiency.
This thesis proposes an optimized PAU to be placed in the core of hardware accelerators specially the image classification and object detection NNs.
Our main contributions can be summarized as pursues. First, modeling Register Transfer Level (RTL) implementation of PAU using two’s complement algorithm. Second, modeling RTL implementation of compact PAU using two’s complement algorithm.
Finally, we modeled the different NNs used in different image classification and object detection tasks using Qtorch+ framework with different data types to show the exceeding results of the low bit posits than 16-bit or 32-bit floating/fixed-point representations in both activation and weight data.
The thesis is divided into seven chapters as listed below:
Chapter 1: presents the hardware accelerators history, its performance when it is compared to general-purpose processors. Also, it shows its applications, research topics, and types of hardware accelerators based on its location inside the system and the level of abstraction.
Chapter 2: introduces the NNs, their definition, and different types of NNs that are used in the core of various deep learning tasks.
Chapter 3: explains the motivation behind our optimization methodology and the different image classification and object detection models that we used during our modeling. We discuss the differences between the models and the specifications of each one. Also, we introduced the posit numbering system that we used to build the arithmetic units, the core of the hardware accelerators and the posit decoding algorithms.
Chapter 4: summarizes the related work of using the posits in the arithmetic units in the core of the hardware accelerators and evaluates their performance and limitations.
Chapter 5: highlights the proposed arithmetic unit used in the heart of the hardware accelerators for various NNs and its results. Additionally, the compact PAU is presented.
Chapter 6: states our contribution for modeling the different image classification and object detection models using Qtorch+ framework and their accuracy results
Chapter 7: concludes the results and proposes some future work as well.