Comparison of TF-IDF and Word2Vec Feature Representations for Emotion Classification of Tokopedia E-Commerce Review Using LinearSVC

Authors

  • Fitriyani Azzahra STMIK IKMI Cirebon
  • Bambang Irawan STMIK IKMI Cirebon
  • Ahmad Faqih STMIK IKMI Cirebon
  • Denni Pratama STMIK IKMI Cirebon
  • Dian Ade Kurnia STMIK IKMI Cirebon

DOI:

https://doi.org/10.59934/jaiea.v5i2.2215

Keywords:

TF-IDF, Word2Vec, LinearSVC, Emotion Classification, Tokopedia

Abstract

This study aims to compare the performance of TF-IDF and Word2Vec feature representations for emotion classification of Tokopedia e-commerce reviews using the LinearSVC algorithm. The dataset used is PRDECT-ID, which consists of 5,400 Indonesian-language reviews labeled with positive and negative emotions. The preprocessing stages include case folding, non-alphabet character cleaning, slang normalization, stopword removal, Sastrawi stemming, and emoji handling. Feature extraction was performed using TF-IDF and Word2Vec, after which the models were trained using LinearSVC and evaluated through 5-Fold Cross Validation and holdout testing. The experimental results show that TF-IDF achieves better performance, with an accuracy of 0.65, a macro-F1 score of 0.645, and a Cohen’s Kappa value of 0.294. Meanwhile, Word2Vec attains an accuracy of 0.58 and a macro-F1 score of 0.540. These findings indicate that TF-IDF is more effective for short and informal texts characteristic of Indonesian e-commerce reviews.

Downloads

Download data is not yet available.

References

D. Ariani, S. Putri, and T. Ramadhani, “Adaptive preprocessing for Indonesian language in consumer text analysis,” Journal of Information Systems, 2023.

A. Assiroj, D. Rahayu, and R. Firmansyah, “Comparative study of SVM, logistic regression, and naïve Bayes for Indonesian text classification,” Indonesian Journal of Computing, 2023.

C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, pp. 273–297, 1995.

S. Hadju and R. Jayadi, “Sentiment analysis of e-commerce reviews using TF-IDF and SVM,” Jurnal Teknologi Informasi Indonesia, 2021.

R. Hidayat and P. Sari, “Consumer emotion analysis on e-commerce platforms,” Jurnal Informatika Nusantara, 2022.

D. Jurafsky and J. H. Martin, Speech and Language Processing, 3rd ed. Pearson, 2023.

R. Kurniawan and M. Aji, “Characteristics of Indonesian slang language in NLP,” Jurnal Linguistik Komputasional, 2022.

R. Kurniawan and A. Nugroho, “Consumer review analytics for digital marketing strategy,” Journal of Digital Business, 2023.

S. Kusumaningrum, D. Pratiwi, and T. Siregar, “Performance comparison of SVM, naïve Bayes, and CNN in Indonesian text classification,” Journal of Data Science, 2022.

R. Pane et al., “Ensemble methods for Indonesian text emotion classification,” Journal of Intelligent Systems, 2023.

Romadhony et al., “Sentiment analysis on a large Indonesian product review dataset,” 2024.

L. Koto et al., “Indonesian informal text normalization,” in Proceedings of the Asian Language Processing Conference, 2020.

Putra et al., “Machine learning metrics for text classification,” 2021.

R. Kurniawan and A. Nugroho, “PRDECT-ID dataset documentation,” 2023.

Downloads

Published

2026-02-26

How to Cite

Azzahra, F., Irawan, B., Faqih, A., Pratama, D., & Kurnia, D. A. (2026). Comparison of TF-IDF and Word2Vec Feature Representations for Emotion Classification of Tokopedia E-Commerce Review Using LinearSVC. Journal of Artificial Intelligence and Engineering Applications (JAIEA), 5(2), 3466–3471. https://doi.org/10.59934/jaiea.v5i2.2215