Data Mapping using Combining Clustering Methods and C.45 Classification

International Journal of Electronics and Communication Engineering
© 2023 by SSRG - IJECE Journal
Volume 10 Issue 5
Year of Publication : 2023
Authors : Robbi Rahim, Unik Hanifah Salsabila, Akhmad Anwar Dani, Eka Maya S.S. Ciptaningsih, M. Mohzana
pdf
How to Cite?

Robbi Rahim, Unik Hanifah Salsabila, Akhmad Anwar Dani, Eka Maya S.S. Ciptaningsih, M. Mohzana, "Data Mapping using Combining Clustering Methods and C.45 Classification," SSRG International Journal of Electronics and Communication Engineering, vol. 10,  no. 5, pp. 96-104, 2023. Crossref, https://doi.org/10.14445/23488549/IJECE-V10I5P109

Abstract:

School participation is measured by the Pure Participation Rate (APM). This study examines whether data mining can generate new knowledge. The Central Sumatra Statistic Central Agency (BPS-North Sumatra) provided secondary statistics on APM by city/district (2011–2019) for elementary, junior high, high school, and PT. Data mining uses clustering (k-means) and classification (Decision tree). This cluster maps the APM. Mapping clusters are utilized again for categorization. Cluster value ranges indicate classification. C1 was the high APM cluster, and C2 was the low APM cluster. RapidMiner aids processing. The study found 18 high-cluster (C1) cities and 15 low-cluster cities (C2). Based on the clustering results obtained, classification results show that SMA and PT become influential attributes in mapping the area based on the Decision tree method, resulting in 3 rules: if SMA has a percentage 68,085% and PT has a presentation 18,730%. (high cluster). Classification and clustering have yielded new data.

Keywords:

Classification, Data mining, North sumatra, Pure participation.

References:

[1] Bambang Raditya Purnomoet et al., “Education and Socialization of Mentally Challenged People and Similarities and Differences,” International Journal of Psychosocial Rehabilitation, vol. 24, no. 8, pp. 1524–1534, 2020.
[Google Scholar] [Publisher Link]
[2] Mohzana Mohzana et al., “A Management Model for Character Education in Higher Education,” Talent Development & Excellence, vol. 12, no. 3, pp. 1596–1601, 2020.
[Google Scholar] [Publisher Link]
[3] Harisa Mardiana, “Lecturers' Reasoning in Using Digital Technology: A Cognitive Approach in Learning Process,” Journal of Social, Culture and Society, vol. 1, no. 2, pp. 33–42, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[4] wim winowatan, and Jeanny Pricilia Anneke W, “Effectiveness of Teaching and Learning Process Based on Competency Curriculum and Influence on Student Skills Activation: Study on the Makassar Tourism Polytechnic Students,” Athena: Journal of Social, Culture and Society, vol. 1, no. 2, pp. 67-74, 2023.
[CrossRef] [Publisher Link]
[5] Muh. Fahrurrozi, and Mohzana, “The Development of Android-Based Economic Teaching Materials for Student Independence,” International Journal of Innovation, Creativity and Change, vol. 5, no. 6, pp. 468–482, 2019.
[Google Scholar] [Publisher Link]
[6] Zainul Arifin, and Mohamada Zaky Tatsar, “Application of Quizzz to Pascal Law Material in Increasing Student Response,” SAGA: Journal of Technology and Information Systems, vol. 1, no. 1, pp. 9–11, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[7] B. Supriyadi et al., “Classification of Natural Disaster-Prone Areas in Indonesia Using K-Means,” International Journal of Grid and Distributed Computing, vol. 11, no. 8, pp. 87–98, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Fathur Rahman et al., “Application of Data Mining Technique using K-Medoids in the Case of Export of Crude Petroleum Materials to the Destination Country,” IOP Conference Series: Materials Science and Engineering, vol. 835, p. 012058, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Khairunnissa Fanny Irnanda et al., “The Selection Of Calcium Milk Products that are Appropriate for Advanced Age Using Promethee II Algorithm,” Journal of Physics: Conference Series, vol. 1381, no. 1, p. 012070, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Anjar Wanto et al., “Model of Artificial Neural Networks in Predictions of Corn Productivity in an Effort to Overcome Imports in Indonesia,” Journal of Physics: Conference Series, vol. 1339, no. 1, p. 012057, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Sudirman, Agus Perdana Windarto, and Anjar Wanto, “Data Mining Tools | Rapidminer: K-Means Method on Clustering of Rice Crops by Province as Efforts to Stabilize Food Crops in Indonesia,” IOP Conference Series: Materials Science and Engineering, vol. 420, p. 012089, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Agus Perdana Windarto, “Implementation of Data Mining on Rice Imports by Major Country of Origin Using Algorithm Using K-Means Clustering Method,” International Journal Of Artificial Intelligence Research, vol. 1, no. 2, pp. 26–33, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Insanul Kamila, Ulya Khairunnisa, and Mustakim Mustakim, “Perbandingan Algoritma K-Means Dan K-Medoids Untuk Pengelompokan Data Transaksi Bongkar Muat di Provinsi Riau,” Jurnal Ilmiah Rekayasa dan Manajemen Sistem Informasi, vol. 5, no. 1, pp. 119–125, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Putrama Alkhairi, and Agus Perdana Windarto, “Penerapan K-Means Cluster Pada Daerah Potensi Pertanian Karet Produktif di Sumatera Utara,” Seminar Nasional Teknologi Komputer & Sains (SAINTEKS), pp. 762–767, 2019.
[Google Scholar] [Publisher Link]
[15] Yosi Pahala et al., “The Influence of Load Factor, Headway, and Travel Time on Total Fleet Requirements and Its Implications for Public Transportation Maintenance Management on Transjakarta,” Review of International Geographical Education, vol. 11, no. 5, pp. 3422–3436, 2021.
[Google Scholar] [Publisher Link]
[16] Raden Didiet Rachmat Hidayat et al., “Study of the Formation of National Logistics Cluster for Disaster Management (KLASNASLOG PB) by National Disaster Management Authority (BNPB) to Streamline Transport for Disaster Management in Indonesia,” MATEC Web Conference, vol. 229, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Meilin Widyastuti et al., “Classification Model C.45 on Determining the Quality of Custumer Service in Bank BTN Pematangsiantar Branch,” Journal of Physics: Conference Series, vol. 1255, no. 012002, pp. 1–6, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Wiwiek Katrina et al., “C.45 Classification Rules Model for Determining Students Level of Understanding of the Subject,” Journal of Physics: Conference Series, vol. 1255, no. 012005, pp. 1–7, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Zohra Bellahsene, Angela Bonifati, and Erhard Rahm, Schema Matching and Mapping, 2011.
[CrossRef] [Publisher Link]
[20] Amit P. Sheth, and James A. Larson, “Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases,” ACM Computing Surveys, vol. 22, no. 3, pp. 183–236, 1990.
[CrossRef] [Google Scholar] [Publisher Link]
[21] AguguoIhechukwu.C, Matthias Daniel, and E.O Bennett, “Big Data Mining for Interesting Pattern Using MapReduced Technique,” SSRG International Journal of Computer Science and Engineering, vol. 7, no. 7, pp. 26-33, 2020.
[CrossRef] [Publisher Link]
[22] Gunisetti Tirupathi Rao, and Rajendra Gupta, “An approach of Clustering and Analysis of Unstructured Data,” SSRG International Journal of Computer Science and Engineering, vol. 6, no. 11, pp. 64-69, 2019.
[CrossRef] [Publisher Link]
[23] Anil K. Jain, “Data Clustering: 50 Years Beyond K-Means,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Rui Xu, and D. Wunsch, “Survey of Clustering Algorithms,” IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645–678, 2005.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Steven L. Salzberg, “C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993,” Machine Learning, vol. 16, no. 3, pp. 235–240, 1994.
[CrossRef] [Google Scholar] [Publisher Link]
[26] S. B. Kotsiantis, “Decision Trees: A Recent Overview,” Artificial Intelligence Review, vol. 39, no. 4, pp. 261–283, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Mohammed M Mazid, A B M Shawkat Ali, and Kevin S Tickle, “Improved C 4. 5 Algorithm for Rule Based Classification,” Recent Advances in Artificial Intelligence, Knowledge Engineering and Data Bases, pp. 296-301, 2010.
[Google Scholar] [Publisher Link]
[28] Camilla Schaefer, and Ana Makatsaria, “Framework of Data Analytics and Integrating Knowledge Management,” International Journal of Intelligent Networks, vol. 2, pp. 156–165, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Abaka B.H. Mobio et al., “Mapping Spatial and Temporal Variability of Rainfall in Côte D’Ivoire using TRMM Data,” SSRG International Journal of Geoinformatics and Geological Science, vol. 4, no. 3, pp. 13-20, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Heny Pratiwi et al., “Sigmoid Activation Function in Selecting the Best Model of Artificial Neural Networks,” Journal of Physics: Conference Series, vol. 1471, no. 1, p. 012010, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Nurul Rofiqo, Agus Perdana Windarto, and Dedy Hartama, “Penerapan Clustering Pada Penduduk Yang Mempunyai Keluhan Kesehatan Dengan Datamining K-Means,” Konferensi Nasional Teknologi Informasi dan Komputer (KOMIK), vol. 2, no. 1, pp. 216–223, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Wisdalia Maya Sariet al., “Improving the Quality of Management with the Concept of Decision Support Systems in Determining Factors for Choosing a Cafe based on Consumers,” Journal of Physics: Conference Series, vol. 1471, no. 1, p. 012009, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Agus Perdana Windarto et al., “Analysis of the K-Means Algorithm on Clean Water Customers Based on the Province,” Journal of Physics: Conference Series, vol. 1255, no. 1, p. 012001, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Mhd Gading Sadewo, Agus Perdana Windarto, and Dedy Hartama, “Penerapan Datamining Pada Populasi Daging Ayam Ras Pedaging Di Indonesia Berdasarkan Provinsi Menggunakan K-Means Clustering,” InfoTekJar : Jurnal Nasional Informatika dan Teknologi Jaringan,vol. 2, no. 1, pp. 60–67, 2017.
[Google Scholar] [Publisher Link]
[35] Agus Perdana Windarto, “Penerapan Data Mining Pada Ekspor Buah-Buahan Menurut Negara Tujuan Menggunakan K-Means Clustering Method,” Jurnal Techno.Com, vol. 16, no. 4, pp. 348–357, 2017.
[Google Scholar] [Publisher Link]
[36] Riyani Wulan Sari, Anjar Wanto, and Agus Perdana Windarto, “Implementasi Rapidminer Dengan Metode K-Means (Study Kasus: Imunisasi Campak Pada Balita Berdasarkan Provinsi),” KOMIK (Konferensi Nasional Teknologi Informasi dan Komputer), vol. 2, no. 1, pp. 224–230, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[37] Trupti M Kodinariya, and Prashant Makwana, “Review on Determining Number of Cluster in K-Means Clustering,” International Journal of Advance Research in Computer Science and Management Studies, vol. 1, no. 6, pp. 90-95, 2013.
[Google Scholar] [Publisher Link]
[38] Dini Rizky Sitorus P et al., “Penerapan Klasifikasi C4.5 Dalam Meningkatkan Sistem Pembelajaran Mahasiswa,” KOMIK (Konferensi Nasional Teknologi Informasi dan Komputer), vol. 3, pp. 593–597, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[39] Anjar Wanto et al., Data Mining : Algoritma dan Implementasi, 2020.
[Publisher Link]
[40] Ronanki Umarao, and Behara Vineela, “An Optimized Fuzzy Means Clustering Algorithm for Grouping of Social Media Data,” SSRG International Journal of Computer Science and Engineering, vol. 4, no. 5, pp. 1-4, 2017.
[CrossRef] [Publisher Link]