Search In this Thesis
   Search In this Thesis  
العنوان
Efficient Hybrid Techniques for Feature selection of Biomolecules/
المؤلف
Alkady, Walaa Samir.
هيئة الاعداد
باحث / ولاء سمير عبد التواب القاضي
مشرف / خالد البهنسي
مشرف / ولاء خالد
مناقش / ولاء خالد
تاريخ النشر
2022.
عدد الصفحات
164p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
علوم الحاسب الآلي
تاريخ الإجازة
1/1/2022
مكان الإجازة
جامعة عين شمس - كلية الحاسبات والمعلومات - الحاسبات والمعلومات
الفهرس
Only 14 pages are availabe for public view

from 164

from 164

Abstract

Cheminformatics [1] is a relatively young discipline of information technology that focuses on chemical data gathering, storage, analysis, and manipulation. Small molecule formulae, structures, characteristics, spectra, and activities are often included in the chemical data of interest. Cheminformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization [2]. Cheminformatics began as a tool to aid in the drug discovery and development process, but it is now used in a wide range of fields in biochemistry and biology [3].
Machine learning is a field of inquiry devoted to understanding and building methods that ’learn’, that is, methods that leverage data to improve performance on some set of tasks [4]. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in bioinformatics, medicine, drug design, filtering, speech recognition, and computer vision [5].
A subset of machine learning is closely related to computational statistics, which focuses on making predictions using computers, but not all machine learning is statistical learning [6]. The study of mathematical optimization delivers methods, theory, and application domains to the field of machine learning. Data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning [4]. Some implementations of machine learning use data and neural networks in a way that mimics the working of a biological brain [7].
Machine learning algorithms work on the inferences that worked well in the past and are likely to continue working well in the future [8]. These inferences can be obvious, such as ”since the sun rose every morning for the last 10,000 days, it will probably rise tomorrow morning as well”.
Machine learning involves computers learning from data provided so that they carry out certain tasks. For simple tasks assigned to computers, it is possible to program algorithms telling the machine how to execute all steps required to solve the problem at hand; on the computer’s part, no learning is needed. For more advanced tasks, it can be challenging for a human to manually create the needed algorithms. In practice, it can turn out to be more effective to help the machine develop its algorithm, rather than having human programmers specify every needed step [8].
When building a machine learning model in real-life, it’s almost rare that all the variables in the dataset are useful to build a model. Adding redundant variables reduces the capability of the model and may also reduce the overall accuracy of a classifier. Furthermore, adding more and more variables to a model increases the overall complexity of the model [9]. The best explanation to a problem is that which involves the fewest possible assumptions. Thus, feature selection becomes an indispensable part of building machine learning models. The goal of feature selection in machine learning is to find the best set of features that allows one to build useful models of studied phenomena.