الفهرس | Only 14 pages are availabe for public view |
Abstract Given the challenges and complexities introduced while dealing with Dialect Arabic (DA) variations, Transformer based models, e.g., BERT, outperformed other models in dealing with the DA identification task. However, to fine-tune these models, a large corpus is required. Getting a large number high quality labeled examples for some Dialect Arabic classes is challenging and time-consuming. In this thesis, the Dialect Arabic Identification task is addressed. Semi- Supervised Generative Adversarial Networks (SS-GAN) are used to extend the Transformer-based models, ARBERT and MARBERT, with unlabeled data in a generative adversarial setting. The proposed model enabled producing high-quality embeddings for the Dialect Arabic examples and aided the model to better generalize for the downstream classification task given a few labeled examples. The proposed model was experimented in 2 different setups: (1) GANBERT where we extended BERT with the Semi-Supervised GAN component. (2) 2-stages setup in which we trained the GAN extended model for some epochs and then, having a second stage of BERT-based model training. Experimental results showed that the proposed model reached better performance and faster convergence when only a few labeled examples are available. |