Optimization of Machine Learning Models for Sentiment Analysis of TikTok Comment Data on the Progress of the Ibu Kota Nusantara as New Capital City of Indonesia
Keywords:
Sentiment analysis, machine learning model, tiktok comments, data augmentation, hyperparameter tuningAbstract
Sentiment analysis plays a crucial role in understanding public opinion on social media platforms, especially in discussions related to government policies such as the relocation of Indonesia’s new capital city, known as Ibu Kota Nusantara (IKN). While machine learning algorithms like Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression (LR) are widely used for sentiment classification tasks, previous studies often focus on performance comparisons without addressing the impact of data imbalance or regularly optimizing model parameters. These issues can lead to suboptimal classification performance, especially in real-world social media data. This study aims to improve the accuracy and robustness of sentiment classification by applying two enhancement strategies: text data augmentation and hyperparameter tuning. Three models Naïve Bayes, SVM, and Logistic Regression were trained and evaluated in three experimental stages: (1) using original data, (2) after applying augmentation, and (3) after augmentation combined with hyperparameter tuning via GridSearchCV. The evaluation results show progressive improvements across the three stages. In the first stage (original data), Logistic Regression achieved the highest accuracy of 80.41%, while Naïve Bayes and SVM reached 79.73% and 76.98%, respectively. However, all models struggled to classify the minority class (positive sentiment), as reflected in their lower recall and F1-scores. After applying augmentation, performance improved significantly across all models. SVM, in particular, reached an accuracy of 92.77%, followed by Logistic Regression (86.57%) and Naïve Bayes (86.22%), with better balance between precision and recall for both sentiment classes. hyperparameter tuning further optimized model performance. Logistic Regression became the best-performing model, achieving an accuracy of 93.80%, along with high precision, recall, and F1-scores for both classes. SVM and Naïve Bayes also showed stable improvements, with accuracies of 92.88% and 87.72%, respectively.
Downloads
Published
How to Cite
Issue
Section
Copyright (c) 2025 Renda Sandi Saputra, Muhammad Bintang Eighista Dwiputra, Moch Panji Agung Saputra, Muhammad Iqbal Al-Banna Ismail

This work is licensed under a Creative Commons Attribution 4.0 International License.