Skip to main navigation Skip to search Skip to main content

Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

National Household Budget Survey (HBS) data includes sociodemographic and financial indicators that are the elements of government public policy actions. Finding the optimal grouping of households in a given, sufficiently large data is a challenging task for policymakers. Soft classification techniques such as Fuzzy C-means (FCM) provide a deep understanding of hidden patterns in the variable set. This study aims to compare FCM and k-means (KM) classification performance for the grouping of households in terms of sociodemographic and out-of-pocket (OOP) health expenditure variables. Health expenditure variables have heavily skewed distributions and that the shape of the variable distribution has a measurable effect on classifiers. Incorporating Bayesian data generation procedures into the variable transformation process will increase the ability to deal with skewness and improve model performance. However, there is a scarcity of knowledge about the embedded strategy performance of the Bayesian data generation approach with unsupervised learning with the application on health expenditures. This study applied the aforementioned strategy to Turkish HBS data for the year 2015 while comparing FCM and KM classification performance. Normality test results for the distribution of logarithmic (KS = 0.006; p > 0.05) and Box-Cox transformed (KS = 0.006; p > 0.05) health expenditure variables, which were generated using lognormal distributions from a Bayesian viewpoint, are next to normal. Moreover, KM clustering (Sil = 0.48) results are better than FCM (Sil = 0.4198) for classifying households. The optimal number of household groups is 20. Further studies will compare the cluster-seeking performance of other unsupervised learning algorithms while incorporating arbitrary health expenditure variables into the study model.

Original languageEnglish
Title of host publicationIntelligent and Fuzzy Techniques
Subtitle of host publicationSmart and Innovative Solutions - Proceedings of the INFUS 2020 Conference
EditorsCengiz Kahraman, Sezi Cevik Onar, Basar Oztaysi, Irem Ucal Sari, Selcuk Cebi, A. Cagri Tolga
PublisherSpringer
Pages54-62
Number of pages9
ISBN (Print)9783030511555
DOIs
Publication statusPublished - 2021
EventInternational Conference on Intelligent and Fuzzy Systems, INFUS 2020 - Istanbul, Turkey
Duration: 21 Jul 202023 Jul 2020

Publication series

NameAdvances in Intelligent Systems and Computing
Volume1197 AISC
ISSN (Print)2194-5357
ISSN (Electronic)2194-5365

Conference

ConferenceInternational Conference on Intelligent and Fuzzy Systems, INFUS 2020
Country/TerritoryTurkey
CityIstanbul
Period21/07/2023/07/20

Keywords

  • Bayesian data generation
  • Classification
  • Fuzzy C-means
  • Health expenditure
  • Household budget survey
  • K-means
  • Unsupervised learning

Fingerprint

Dive into the research topics of 'Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data'. Together they form a unique fingerprint.

Cite this