Mitigating Data Sparsity in Code-Mixed Text through Back-Translation Augmentation for Aspect-Based Sentiment Analysis in Tokopedia Reviews

Abdul Hakim Prima Yuniarto; Aini Shofi Achsanti; Rizky Agil Singgih Susanto; Ardena Afif Pratama; Fandy Setyo Utomo

doi:10.59934/jaiea.v5i2.2197

Authors

Abdul Hakim Prima Yuniarto Sekolah Tinggi Teknik Wiworotomo Purwokerto
Aini Shofi Achsanti Universitas Amikom Purwokerto
Rizky Agil Singgih Susanto Universitas Amikom Purwokerto
Ardena Afif Pratama Universitas Amikom Purwokerto
Fandy Setyo Utomo Universitas Amikom Purwokerto

DOI:

https://doi.org/10.59934/jaiea.v5i2.2197

Keywords:

Aspect-Based Sentiment Analysis, Back-Translation, Code-Mixed, Data Sparsity, IndoBERT, Tokopedia

Abstract

Aspect-Based Sentiment Analysis (ABSA) in e-commerce reviews in Indonesia faces significant challenges, including the use of mixed language or code-mixed language and limited labeled data, or data sparsity. This study proposes the use of Back-Translation data augmentation techniques to enrich Tokopedia's mixed Indonesian-English or South Jakarta language review dataset. Using the IndoBERT model, experimental results show a 3% increase in accuracy for both aspect and sentiment classification. These findings demonstrate that artificial data augmentation is effective in addressing data sparsity constraints in informal texts and improving the reliability of macro analysis for strategic platform recommendations.

Downloads

Download data is not yet available.

References

M. M. Pakpahan, M. Halmi Dar, and M. Nirmala Sari Hasibuan, “Performance Evaluation of Machine Learning Algorithms in Aspect-Based Sentiment Analysis on E-Commerce User Reviews,” Int. J. Sci. Technol. Manag., vol. 6, no. 4, pp. 958–965, 2025.

S. M. Salsabila, A. Alim Murtopo, and N. Fadhilah, “Analisis Sentimen Pelanggan Tokopedia Menggunakan Metode Naïve Bayes Classifier,” J. Minfo Polgan, vol. 11, no. 2, pp. 30–35, 2022.

G. Tripathy and A. Sharaff, “Traversing the landscape of aspect-based sentiment analysis: Delving deeper into techniques, trends, and future directions,” Comput. Sci. Rev., vol. 60, no. November 2025, 2026.

A. F. Hidayatullah, R. A. Apong, D. T. C. Lai, and A. Qazi, “Pre-trained language model for code-mixed text in Indonesian, Javanese, and English using transformer,” Soc. Netw. Anal. Min., vol. 15, no. 1, pp. 1–17, 2025.

L. A. Ridhawati, A. R. Firdhani, and M. N. Assyddyq, “Indonesian-English Code-Mixing in Entertainment-Based Communication on X,” Lang. Teach. Learn. Linguist. Lit., vol. 13, no. 2, pp. 5582–5592, 2025.

A. A. F. Zalukhu, R. E. Laiya, and M. Y. Laia, “ANALYSIS OF INDONESIAN-ENGLISH CODE SWITCHING AND CODE MIXING ON FACEBOOK,” Res. English Lang. Educ., vol. 3, no. 2, pp. 1–10, 2021.

R. Drikvandi and O. Lawal, “Sparse Principal Component Analysis for Natural Language Processing,” Ann. Data Sci., vol. 10, no. 1, pp. 25–41, 2023.

E. Yulianti and N. K. Nissa, “ABSA of Indonesian customer reviews using IndoBERT: single-sentence and sentence-pair classification approaches,” Bull. Electr. Eng. Informatics, vol. 13, no. 5, pp. 3579–3589, 2024.

J. P. Gygi, S. H. Kleinstein, and L. Guan, “Predictive overfitting in immunological applications: Pitfalls and solutions,” Hum. Vaccines Immunother., vol. 19, no. 2, 2023.

J. Chen, D. Tam, C. Raffel, M. Bansal, and D. Yang, “An Empirical Survey of Data Augmentation for Limited Data Learning in NLP,” Trans. Assoc. Comput. Linguist., vol. 11, pp. 191–211, 2023.

S. Ranathunga, E. S. A. Lee, M. Prifti Skenduli, R. Shekhar, M. Alam, and R. Kaur, “Neural Machine Translation for Low-resource Languages: A Survey,” ACM Comput. Surv., vol. 55, no. 11, 2023.