Improving Machine Learning Algorithms with CoClust-Based Feature Selection on Big Data: A Comparative Analysis

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

The feature selection stage can be used to create machine learning algorithms, which can lead to better outcomes. The dependency structure between the variables is regarded as the most crucial factor in the feature selection stage. Copula-Based Clustering technique (CoClust), which relies on non-linear dependency and groups only related variables, makes a difference in identifying the dependency structure. In this study, we demonstrate that by combining the Random Forest, AdaBoost, and XGBoost approaches with the CoClust-based feature selection step, it is possible to achieve a notable improvement in CPU times and accuracy. On two different big data sets, we compare CoClust with K-means and hierarchical clustering techniques in order to assess its contribution to algorithms. CPU time, accuracy, and ROC (receiver operating characteristic) curve are used to compare the results.

Original languageEnglish
Title of host publicationDirectional and Multivariate Statistics
Subtitle of host publicationA Volume in Honour of Ashis SenGupta
PublisherSpringer Science+Business Media
Pages411-439
Number of pages29
ISBN (Electronic)9789819620043
ISBN (Print)9789819620036
DOIs
Publication statusPublished - 1 Jan 2025

Keywords

  • AdaBoost
  • CoClust
  • Feature selection
  • Random forest
  • XGBoost

Fingerprint

Dive into the research topics of 'Improving Machine Learning Algorithms with CoClust-Based Feature Selection on Big Data: A Comparative Analysis'. Together they form a unique fingerprint.

Cite this