Mitigating Data Sparsity in Code-Mixed Text through Back-Translation Augmentation for Aspect-Based Sentiment Analysis in Tokopedia Reviews
DOI:
https://doi.org/10.59934/jaiea.v5i2.2197Keywords:
Aspect-Based Sentiment Analysis, Back-Translation, Code-Mixed, Data Sparsity, IndoBERT, TokopediaAbstract
Aspect-Based Sentiment Analysis (ABSA) in e-commerce reviews in Indonesia faces significant challenges, including the use of mixed language or code-mixed language and limited labeled data, or data sparsity. This study proposes the use of Back-Translation data augmentation techniques to enrich Tokopedia's mixed Indonesian-English or South Jakarta language review dataset. Using the IndoBERT model, experimental results show a 3% increase in accuracy for both aspect and sentiment classification. These findings demonstrate that artificial data augmentation is effective in addressing data sparsity constraints in informal texts and improving the reliability of macro analysis for strategic platform recommendations.
Downloads
References
M. M. Pakpahan, M. Halmi Dar, and M. Nirmala Sari Hasibuan, “Performance Evaluation of Machine Learning Algorithms in Aspect-Based Sentiment Analysis on E-Commerce User Reviews,” Int. J. Sci. Technol. Manag., vol. 6, no. 4, pp. 958–965, 2025.
S. M. Salsabila, A. Alim Murtopo, and N. Fadhilah, “Analisis Sentimen Pelanggan Tokopedia Menggunakan Metode Naïve Bayes Classifier,” J. Minfo Polgan, vol. 11, no. 2, pp. 30–35, 2022.
G. Tripathy and A. Sharaff, “Traversing the landscape of aspect-based sentiment analysis: Delving deeper into techniques, trends, and future directions,” Comput. Sci. Rev., vol. 60, no. November 2025, 2026.
A. F. Hidayatullah, R. A. Apong, D. T. C. Lai, and A. Qazi, “Pre-trained language model for code-mixed text in Indonesian, Javanese, and English using transformer,” Soc. Netw. Anal. Min., vol. 15, no. 1, pp. 1–17, 2025.
L. A. Ridhawati, A. R. Firdhani, and M. N. Assyddyq, “Indonesian-English Code-Mixing in Entertainment-Based Communication on X,” Lang. Teach. Learn. Linguist. Lit., vol. 13, no. 2, pp. 5582–5592, 2025.
A. A. F. Zalukhu, R. E. Laiya, and M. Y. Laia, “ANALYSIS OF INDONESIAN-ENGLISH CODE SWITCHING AND CODE MIXING ON FACEBOOK,” Res. English Lang. Educ., vol. 3, no. 2, pp. 1–10, 2021.
R. Drikvandi and O. Lawal, “Sparse Principal Component Analysis for Natural Language Processing,” Ann. Data Sci., vol. 10, no. 1, pp. 25–41, 2023.
E. Yulianti and N. K. Nissa, “ABSA of Indonesian customer reviews using IndoBERT: single-sentence and sentence-pair classification approaches,” Bull. Electr. Eng. Informatics, vol. 13, no. 5, pp. 3579–3589, 2024.
J. P. Gygi, S. H. Kleinstein, and L. Guan, “Predictive overfitting in immunological applications: Pitfalls and solutions,” Hum. Vaccines Immunother., vol. 19, no. 2, 2023.
J. Chen, D. Tam, C. Raffel, M. Bansal, and D. Yang, “An Empirical Survey of Data Augmentation for Limited Data Learning in NLP,” Trans. Assoc. Comput. Linguist., vol. 11, pp. 191–211, 2023.
S. Ranathunga, E. S. A. Lee, M. Prifti Skenduli, R. Shekhar, M. Alam, and R. Kaur, “Neural Machine Translation for Low-resource Languages: A Survey,” ACM Comput. Surv., vol. 55, no. 11, 2023.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Journal of Artificial Intelligence and Engineering Applications (JAIEA)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.







