Ana gezinime atla Aramaya atla Ana içeriğe atla

Data Augmentation for Text Classification Using Autoencoders

Araştırma sonucu: Dergiye katkıMakalebilirkişi

Özet

Deep learning models have greatly improved various natural language processing tasks. However, their effectiveness depends on large data sets, which can be difficult to acquire. To mitigate this challenge, data augmentation techniques are employed to artificially expand the training data by generating synthetic samples. By enriching the dataset, data augmentation enhances model generalization, reduces overfitting, and improves model performance. This paper investigates the effectiveness of employing autoencoders for text data augmentation to enhance the performance of text classification models. The research compares four types of autoencoders which are Traditional Autoencoder (AE), Adversarial Autoencoder (AAE), Denoising Adversarial Autoencoder (DAAE), and Variational Autoencoder (VAE). Basic text preprocessing techniques, which are lowercasing, removal of non-alphanumeric characters and removal of stop words, are applied to all documents. Additionally, label-based filtering is applied, where the outputs of autoencoders that contradict the predictions of BERT are eliminated. The experiments are conducted using the SST-2 sentiment classification dataset, which consists of 7,791 training instances and 1,821 test instances. To better analyze the impact of data augmentation methods, experiments are also performed on smaller subsets of 100, 200, 400, and 1,000 instances. Data augmentation is applied at ratios of 1:1, 1:2 and 1:4 for these subsets. The results demonstrate that AE-based data augmentation methods, particularly at a 1:1 ratio, achieve better accuracy than the baseline models. This underscores the potential of autoencoders in improving text classification outcomes in NLP tasks.

Orijinal dilİngilizce
Sayfa (başlangıç-bitiş)161594-161604
Sayfa sayısı11
DergiIEEE Access
Hacim13
DOI'lar
Yayın durumuYayınlandı - 2025

Parmak izi

Data Augmentation for Text Classification Using Autoencoders' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

Bundan alıntı yap