Optimization of Machine Learning Models for Sentiment Analysis of TikTok Comment Data on the Progress of the Ibu Kota Nusantara as New Capital City of Indonesia

https://doi.org/10.46336/ijmsc.v3i3.232

Authors

  • Renda Sandi Saputra
  • Muhammad Bintang Eighista Dwiputra Computer Science Study Program, Faculty of Mathematics and Natural Sciences Education, Universitas Pendidikan Indonesia, Bandung, Indonesia
  • Moch Panji Agung Saputra Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran, Sumedang, Indonesia
  • Muhammad Iqbal Al-Banna Ismail School of Mathematical Sciences, Sunway University, No.5, Jalan Universiti, Bandar Sunway, 47500 Subang Jaya, Selangor, Malaysia

Keywords:

Sentiment analysis, machine learning model, tiktok comments, data augmentation, hyperparameter tuning

Abstract

Sentiment analysis plays a crucial role in understanding public opinion on social media platforms, especially in discussions related to government policies such as the relocation of Indonesia’s new capital city, known as Ibu Kota Nusantara (IKN). While machine learning algorithms like Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression (LR) are widely used for sentiment classification tasks, previous studies often focus on performance comparisons without addressing the impact of data imbalance or regularly optimizing model parameters. These issues can lead to suboptimal classification performance, especially in real-world social media data. This study aims to improve the accuracy and robustness of sentiment classification by applying two enhancement strategies: text data augmentation and hyperparameter tuning. Three models Naïve Bayes, SVM, and Logistic Regression were trained and evaluated in three experimental stages: (1) using original data, (2) after applying augmentation, and (3) after augmentation combined with hyperparameter tuning via GridSearchCV. The evaluation results show progressive improvements across the three stages. In the first stage (original data), Logistic Regression achieved the highest accuracy of 80.41%, while Naïve Bayes and SVM reached 79.73% and 76.98%, respectively. However, all models struggled to classify the minority class (positive sentiment), as reflected in their lower recall and F1-scores. After applying augmentation, performance improved significantly across all models. SVM, in particular, reached an accuracy of 92.77%, followed by Logistic Regression (86.57%) and Naïve Bayes (86.22%), with better balance between precision and recall for both sentiment classes. hyperparameter tuning further optimized model performance. Logistic Regression became the best-performing model, achieving an accuracy of 93.80%, along with high precision, recall, and F1-scores for both classes. SVM and Naïve Bayes also showed stable improvements, with accuracies of 92.88% and 87.72%, respectively.

Downloads

Download data is not yet available.

Published

2025-07-29

How to Cite

Saputra, R. S., Dwiputra, M. B. E., Saputra, M. P. A., & Ismail, M. I. A.-B. (2025). Optimization of Machine Learning Models for Sentiment Analysis of TikTok Comment Data on the Progress of the Ibu Kota Nusantara as New Capital City of Indonesia. International Journal of Mathematics, Statistics, and Computing, 3(3), 102–112. https://doi.org/10.46336/ijmsc.v3i3.232