Author: Katamesh, Nany Mohamed Ismail./ Title: Textual documents classification using deep learning techniques /

Search In this Thesis

العنوان

Textual documents classification using deep learning techniques /

المؤلف

Katamesh, Nany Mohamed Ismail.

هيئة الاعداد

باحث / ناني محمد اسماعيل علي سالم قطامش

مشرف / سمير الموجي

مشرف / اسامه ابوالنصر

مناقش / احمد زايد امام

مناقش / ايهاب محمد محمود عيسى

الموضوع

Data mining. Effective teaching. Computational intelligence. Machine learning. Computer science. Management information systems. COMPUTERS / Software Development & Engineering / Systems Analysis & Design.

تاريخ النشر

2021.

عدد الصفحات

online resource (79 pages) :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

علوم الحاسب الآلي

تاريخ الإجازة

1/1/2021

مكان الإجازة

جامعة المنصورة - كلية الحاسبات والمعلومات - قسم علوم الحاسب

الفهرس

Only 14 pages are availabe for public view

from

Abstract

The document classification problem becomes an important task for regulating data behavior due to the availability of a large number of electronic text documents from a range of sources containing unstructured and semi-structured information. This thesis proposes a document classification multimodel for categorizing textual semi-structured and unstructured materials. The goal of implementing this multimodel approach is to make it easier to manage and sort textual documents. The Stacked Ensemble based meta-model technique is used to integrate the outputs of individual classifiers to get better results than any of the previously described models. Tokenization and various text normalization techniques are used at the preprocessing level such as term Frequency Inverse Term frequency (TF-IDF) and Continuous Bag-of-Words (CBOW) that generate hand-crafted feature vectors at the feature level. Based on the stacked ensemble technique, the suggested multimodal integrates three independent classifiers: Deep Neural Networks(DNN), Recurrent Convolutional Neural Networks (RCNN), and Bidirectional LSTM (Bi-LSTM) at the classification level. The proposed model is validated using a dataset constructed from a variety of spaces containing a large number of documents in each class. The experimental results show that the proposed model is capable of achieving effective results. Upon investigating the PDF Documents classification, the proposed model has achieved accuracy up to 0.9045 and 0.959 for the TFIDF and CBOW features, respectively. Moreover, concerning the JSON Documents classification, the proposed model has achieved accuracy up to 0.914 and 0.956 for the TFIDF and CBOW features, respectively. Furthermore, as for the XML Documents classification, the proposed model has achieved accuracy values up to 0.92 and 0.959 for the TFIDF and CBOW features, respectively.