KOMPARASI METODE SMOTE DAN ADASYN UNTUK PENANGANAN DATA TIDAK SEIMBANG MULTICLASS

  • Fandi Yulian Pamuji Universitas Merdeka Malang
  • Sephia Dwi Arma Putri Universitas Merdeka Malang
Keywords: Data Mining, Imbalanced Data, SMOTE, ADASYN, Multiclass

Abstract

Data Mining is an activity that combines various branches of science into one, consisting of database systems, statistics, machine learning, and visualization, to analyze a large dataset in order to obtain useful data characteristics. To address the problem of imbalanced datasets, the distribution of non-uniform classes among classes is balanced by using a comparison of the SMOTE and ADASYN methods to ensure that the number is balanced between majority (negative) and minority (positive) classes. Based on the results of experiments conducted in this study, testing the SMOTE method with a classification method can handle the number of majority (negative) and minority (positive) classes in imbalanced data by producing MCC and Gmean values that achieve better predictive performance than using a classification method alone or using the ADASYN method. Furthermore, for the MultiClass dataset, the highest MCC and Gmean values were achieved using SMOTE + KNN with the highest MCC value of 0.64 and Gmean value of 0.74. This indicates that the handling process of imbalanced class distribution in the data preprocessing stage has an influence on the accuracy values of MCC and Gmean in the SMOTE + KNN method.

Downloads

Download data is not yet available.

References

(SENASIF), F. P.-S. N. S. I., & 2022, undefined. (2022). Pengujian Metode SMOTE Untuk Penanganan Data Tidak Seimbang Pada Dataset Binary. Jurnalfti.Unmer.Ac.Id, 2022(September), 3200–3208. https://jurnalfti.unmer.ac.id/index.php/senasif/article/view/403
Abdoh, S. F., Abo Rizka, M., & Maghraby, F. A. (2018). Cervical cancer diagnosis using random forest classifier with SMOTE and feature reduction techniques. IEEE Access, 6, 59475–59485. https://doi.org/10.1109/ACCESS.2018.2874063
Arifin, T., & Syalwah, S. (2020). Prediksi Keberhasilan Immunotherapy Pada Penyakit Kutil Dengan Menggunakan Algoritma Naïve Bayes. Jurnal Responsif, 2(1), 38–43.
Barus, O. P., & Sanjaya, T. (2020). Prediksi Tingkat Keberhasilan Pengobatan Kanker Menggunakan Imunoterapi Dengan Metode Naive Bayes. 5(1), 1–6.
Blagus, R., & Lusa, L. (2012). Evaluation of SMOTE for high-dimensional class-imbalanced microarray data. Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012, 2(1), 89–94. https://doi.org/10.1109/ICMLA.2012.183
Cahyanti, F. L. D., Gata, W., & Sarasati, F. (2021). Implementasi Algoritma Naïve Bayes dan K-Nearest Neighbor Dalam Menentukan Tingkat Keberhasilan Immunotherapy Untuk Pengobatan Penyakit Kanker Kulit. Jurnal Ilmiah Universitas Batanghari Jambi, 21(1), 259. https://doi.org/10.33087/jiubj.v21i1.1189
Feng, W., Huang, W., & Bao, W. (2019). Imbalanced Hyperspectral Image Classification with an Adaptive Ensemble Method Based on SMOTE and Rotation Forest with Differentiated Sampling Rates. IEEE Geoscience and Remote Sensing Letters, 16(12), 1879–1883. https://doi.org/10.1109/LGRS.2019.2913387
Fernández, A., García, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. Journal of Artificial Intelligence Research, 61, 863–905. https://doi.org/10.1613/jair.1.11192
Gameng, H. A., Gerardo, B. B., & Medina, R. P. (2019). Modified Adaptive Synthetic SMOTE to Improve Classification Performance in Imbalanced Datasets. ICETAS 2019 - 2019 6th IEEE International Conference on Engineering, Technologies and Applied Sciences, 19–23. https://doi.org/10.1109/ICETAS48360.2019.9117287
Gu, Q., Wang, X. M., Wu, Z., Ning, B., & Xin, C. S. (2016). An improved SMOTE algorithm based on genetic algorithm for imbalanced data classification. Journal of Digital Information Management, 14(2), 92–103.
Jonathan, B., Putra, P. H., Ruldeviyani, Y., Network, F. D., & Indonesia, U. (2020). Observation Imbalanced Data Text to Predict Users Selling Products on Female Daily with SMOTE ,. 81–85.
Li, J., Fong, S., & Zhuang, Y. (2016). Optimizing SMOTE by Metaheuristics with Neural Network and Decision Tree. Proceedings - 2015 3rd International Symposium on Computational and Business Intelligence, ISCBI 2015, 26–32. https://doi.org/10.1109/ISCBI.2015.12
Lin, M., Zhu, X., Hua, T., Tang, X., Tu, G., & Chen, X. (2021). Detection of Ionospheric Scintillation Based on XGBoost Model Improved by SMOTE-ENN Technique. 1–22.
Maldonado, S., López, J., & Vairetti, C. (2019). An alternative SMOTE oversampling strategy for high-dimensional datasets. Applied Soft Computing Journal, 76, 380–389. https://doi.org/10.1016/j.asoc.2018.12.024
Mohammed, A. J., Hassan, M. M., & Kadir, D. H. (2020). Improving classification performance for a novel imbalanced medical dataset using smote method. International Journal of Advanced Trends in Computer Science and Engineering, 9(3), 3161–3172. https://doi.org/10.30534/ijatcse/2020/104932020
Pamuji, F. Y., & Ramadhan, V. P. (2021). Komparasi Algoritma Random Forest dan Decision Tree untuk Memprediksi Keberhasilan Immunotheraphy. Jurnal Teknologi Dan Manajemen Informatika, 7(1), 46–50. https://doi.org/10.26905/jtmi.v7i1.5982
Pamuji, F. Y., & Soeleman, M. A. (2020). Improved number detection for low resolution image using the canny algorithm. Proceedings - 2020 International Seminar on Application for Technology of Information and Communication: IT Challenges for Sustainability, Scalability, and Security in the Age of Digital Disruption, ISemantic 2020, 638–642. https://doi.org/10.1109/iSemantic50169.2020.9234190
Polat, K. (2019). A Hybrid Approach to Parkinson Disease Classification using speech signal : The combination of SMOTE and Random Forests. 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), 1–3. https://doi.org/10.1109/EBBT.2019.8741725
Ramadhan, V. P., & Pamuji, F. Y. (2022). Jurnal Teknologi dan Manajemen Informatika Analisis Perbandingan Algoritma Forecasting dalam Prediksi Harga Saham LQ45 PT Bank Mandiri Sekuritas ( BMRI ). 8(1), 39–45.
Skryjomski, P., & Krawczyk, B. (2017). Influence of minority class instance types on SMOTE imbalanced data oversampling. Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications, 74, 7–21. http://proceedings.mlr.press/v74/skryjomski17a.html
Supriyatna, A., & Mustika, W. P. (2018). Komparasi Algoritma Naive bayes dan SVM Untuk Memprediksi Keberhasilan Imunoterapi Pada Penyakit Kutil. J-SAKTI (Jurnal Sains Komputer Dan Informatika), 2(2), 152. https://doi.org/10.30645/j-sakti.v2i2.78
Wang, X., Xu, P., Yang, Q., Wu, G., & Wei, F. (2019). Fault Prediction Method of Access Control Terminal Based on Euclidean Distance Center SMOTE Method. Proceedings of 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2018, 6027, 84–89. https://doi.org/10.1109/CCIS.2018.8691196
How to Cite
Pamuji, F. Y., & Sephia Dwi Arma Putri. (2023). KOMPARASI METODE SMOTE DAN ADASYN UNTUK PENANGANAN DATA TIDAK SEIMBANG MULTICLASS. Jurnal Informatika Polinema, 9(3), 331-338. https://doi.org/10.33795/jip.v9i3.1330