Evaluation of K-Means Clustering Using Silhouette Score Method on Customer Segmentation


Baiq Nikum Yulisasih(1); Herman Herman(2*); Sunardi Sunardi(3); Herman Yuliansyah(4);

(1) Universitas Ahmad Dahlan
(2) Universitas Ahmad Dahlan
(3) Universitas Ahmad Dahlan
(4) Universitas Ahmad Dahlan
(*) Corresponding Author

  

Abstract


Customer segmentation is a critical process in businesses to understand and meet the diverse needs of customer. This study focused on the challenges of managing large and complex volumes of customer data and identifying the right segments to personalize marketing strategieshow about if I . K-Means Clustering has been widely utilized for its ability to group multidimensional data, but this method often generated broad clusters that lack detailed insights. Therefore, cluster evaluation with the Silhouette Score method became essential to ensure the optimality and validity of the generated groupings. The purpose of this study was to evaluate the quality of K-Means Clustering using the Silhouette Score method on customer segmentation. This research began with the acquisition of a dataset comprising 2,000 data points characterized with 7 attributes: sex, marital status, age, education, income, occupation, and settlement size. The data then underwent pre-processing by checking missing values and normalizing data. K-Means Clustering was then applied to group data into several clusters based on their proximity to the cluster center (centroid). The results of the clusters were assessed using the Silhouette Score method to determine the most optimal number of clusters. The results of this study consisted of manual calculations using Microsoft Excel on 27 data points to facilitate understanding of the logic, steps, methods and practical foundations before implementation on the complete dataset. Furthermore, the results of the Python calculation in 2000 data points showed that the optimal number of clusters (close to the value of 1) between k = 2 to k = 7 was the k = 4 cluster with a Silhouette Score value of 0.43, categorized as a weak structure. Although this value indicated a weak cluster structure, it was the highest value in the test, indicating that the division of data into four clusters (k = 4) was better than the number of other clusters. However, the quality of this cluster indicates the need for futher improvement. Future work should review the used attributes, data normalization methods, or consider other clustering algorithms to achieve a more robust structure and more meaningful interpretation.


Keywords


Cluster Evaluation; Customer Segmentation; K-Means Clustering; Silhouette Score.

  
  

Full Text:

PDF
  

Article Metrics

Abstract view: 10 times
PDF view: 3 times
     

Digital Object Identifier

doi  https://doi.org/10.33096/ilkom.v16i3.2325.330-342
  

Cite

References


S. Magatef, M. Al-Okaily, L. Ashour, and T. Abuhussein, "The impact of electronic customer relationship management strategies on customer loyalty: A mediated model," J. Open Innov. Technol. Mark. Complex., vol. 9, no. 4, p. 100149, 2023, doi: 10.1016/j.joitmc.2023.100149.

H. (Hojatollah) Hamidi and B. Haghi, "An approach based on data mining and genetic algorithm to optimize time series clustering for efficient segmentation of customer behavior," Comput. Hum. Behav. Reports, vol. 16, no. November, p. 100520, 2024, doi: 10.1016/j.chbr.2024.100520.

X. Ma and X. Gu, "New marketing strategy model of E-commerce enterprises in the era of digital economy," Heliyon, vol. 10, no. 8, p. e29038, 2024, doi: 10.1016/j.heliyon.2024.e29038.

J. M. John, O. Shobayo, and B. Ogunleye, "An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market," Analytics, vol. 2, no. 4, pp. 809–823, 2023, doi: 10.3390/analytics2040042.

S. Abdul-Rahman, N. F. K. Arifin, M. Hanafiah, and S. Mutalib, "Customer Segmentation and Profiling for Life Insurance using K-Modes Clustering and Decision Tree Classifier," Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 9, pp. 434–444, 2021, doi: 10.14569/IJACSA.2021.0120950.

S. J, C. Gangadhar, R. K. Arora, P. N. Renjith, J. Bamini, and Y. devidas Chincholkar, "E-commerce customer churn prevention using machine learning-based business intelligence strategy," Meas. Sensors, vol. 27, no. February, p. 100728, 2023, doi: 10.1016/j.measen.2023.100728.

L. Li, L. Yuan, and J. Tian, "Influence of online E-commerce interaction on consumer satisfaction based on big data algorithm," Heliyon, vol. 9, no. 8, 2023, doi: 10.1016/j.heliyon.2023.e18322.

M. S. E. Kasem, M. Hamada, and I. Taj-Eddin, "Customer profiling, segmentation, and sales prediction using AI in direct marketing," Neural Comput. Appl., vol. 36, no. 9, pp. 4995–5005, 2024, doi: 10.1007/s00521-023-09339-6.

M. Kanwal, N. A. Khan, and A. A. Khan, "A Machine Learning Approach to User Profiling for Data Annotation of Online Behavior," Comput. Mater. Contin., vol. 78, no. 2, pp. 2419–2440, 2024, doi: 10.32604/cmc.2024.047223.

M. Skare, B. Gavurova, and M. Rigelsky, "Innovation activity and the outcomes of B2C, B2B, and B2G E-Commerce in EU countries," J. Bus. Res., vol. 163, no. April, p. 113874, 2023, doi: 10.1016/j.jbusres.2023.113874.

M. Hänninen, L. Mitronen, and S. K. Kwan, "Multi-sided marketplaces and the transformation of retail: A service systems perspective," J. Retail. Consum. Serv., vol. 49, no. April, pp. 380–388, 2019, doi: 10.1016/j.jretconser.2019.04.015.

A. Griva, E. Zampou, V. Stavrou, D. Papakiriakopoulos, and G. Doukidis, "A two-stage business analytics approach to perform behavioural and geographic customer segmentation using e-commerce delivery data," J. Decis. Syst., Vol. 33, No. 1, pp. 1–29, 2024, doi: 10.1080/12460125.2022.2151071.

S. Guney, S. Peker, and C. Turhan, "A combined approach for customer profiling in video on demand services using clustering and association rule mining," IEEE Access, vol. 8, pp. 84326–84335, 2020, doi: 10.1109/ACCESS.2020.2992064.

J. J. Jonker, N. Piersma, and D. Van Den Poel, "Joint optimization of customer segmentation and marketing policy to maximize long-term profitability," Expert Syst. Appl., vol. 27, no. 2, pp. 159–168, 2004, doi: 10.1016/j.eswa.2004.01.010.

K. Tabianan, S. Velu, and V. Ravi, "K-Means Clustering Approach for Intelligent Customer Segmentation Using Customer Purchase Behavior Data," Sustain., vol. 14, no. 12, pp. 1–15, 2022, period: 10.3390/su14127243.

C. Shi, B. Wei, S. Wei, W. Wang, H. Liu, and J. Liu, "A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm," Eurasip J. Wirel. Commun. Netw., vol. 2021, no. 1, 2021, doi: 10.1186/s13638-021-01910-w.

R. Passarella, T. Marsyah, O. Arsalan, and M. Shahriman, "Anomaly detection in commercial aircraft landing at SSK II airport using clustering method," Aerosp. Traffic Saf., no. July, pp. 0–1, 2024, doi: 10.1016/j.aets.2024.12.004.

L. Wang, T. R. A. L. Pertheban, T. Li, and L. Zhao, "Application of business intelligence based on big data in E-commerce data evaluation," Heliyon, vol. 10, no. 21, p. e38768, 2024, doi: 10.1016/j.heliyon.2024.e38768.

J. Meng et al., "Nano-integrating green and low-carbon concepts into ideological and political education in higher education institutions through K-Means Clustering," Heliyon, vol. 10, no. 10, p. e31244, 2024, doi: 10.1016/j.heliyon.2024.e31244.

M. A. I. Gazi, A. Al Mamun, A. Al Masud, A. R. bin S. Senathirajah, and T. Rahman, "The relationship between CRM, knowledge management, organization commitment, customer profitability and customer loyalty in telecommunication industry: The mediating role of customer satisfaction and the moderating role of brand image," J. Open Innov. Technol. Mark. Complex., vol. 10, no. 1, p. 100227, 2024, doi: 10.1016/j.joitmc.2024.100227.

W. Zhang, L. Wu, and S. Zhang, "Clinical phenotype of ARDS based on K-Means cluster analysis: A study from the eICU database," Heliyon, vol. 10, no. 20, p. e39198, 2024, doi: 10.1016/j.heliyon.2024.e39198.

Y. Li, X. Chu, D. Tian, J. Feng, and W. Mu, "Customer segmentation using K-Means Clustering and the adaptive particle swarm optimization algorithm," Appl. Soft Comput., vol. 113, p. 107924, 2021, doi: 10.1016/j.asoc.2021.107924.

F. Barrera, M. Segura, and C. Maroto, "Multiple criteria decision support system for customer segmentation using a sorting outranking method," Expert Syst. Appl., vol. 238, no. October 2023, 2024, doi: 10.1016/j.eswa.2023.122310.

K. ŞAHİNBAŞ, "Performance Comparison of K-Means and DBSCAN Methods for Airline Customer Segmentation," Black Sea J. Eng. Sci., vol. 5, no. 4, pp. 158–165, 2022, doi: 10.34248/bsengineering.1170943.

M. Faisal, E. M. Zamzami, and Sutarman, "Comparative Analysis of Inter-Centroid K-Means Performance using Euclidean Distance, Canberra Distance and Manhattan Distance," J. Phys. Conf. Ser., vol. 1566, no. 1, 2020, doi: 10.1088/1742-6596/1566/1/012112.

A. Al Mamun, P. P. Em, M. J. Hossen, B. Jahan, and A. Tahabilder, "A deep learning approach for lane marking detection applying encode-decode instant segmentation network," Heliyon, vol. 9, no. 3, p. e14212, 2023, doi: 10.1016/j.heliyon.2023.e14212.

K. Sya, H. Yuliansyah, and I. Arfiani, "Clustering Student Data Based On K - Means Algorithms," vol. 8, no. 08, pp. 1014–1018, 2019.

F. Grandoni, R. Ostrovsky, Y. Rabani, L. J. Schulman, and R. Venkat, "A refined approximation for Euclidean K-Means," Inf. Process. Lett., vol. 176, pp. 1–9, 2022, doi: 10.1016/j.ipl.2022.106251.

A. Rachwał et al., "Determining the Quality of a Dataset in Clustering Terms," Appl. Sci., vol. 13, no. 5, pp. 1–20, 2023, doi: 10.3390/app13052942.

P. Anitha and M. M. Patil, "RFM model for customer purchase behavior using K-Means algorithm," J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 5, pp. 1785–1792, 2022, doi: 10.1016/j.jksuci.2019.12.011.

G. Liu, "A New Index for Clustering Evaluation Based on Density Estimation," 2022.


Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Baiq Nikum Yulisasih

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.