Abstract
The feature selection stage can be used to create machine learning algorithms, which can lead to better outcomes. The dependency structure between the variables is regarded as the most crucial factor in the feature selection stage. Copula-Based Clustering technique (CoClust), which relies on non-linear dependency and groups only related variables, makes a difference in identifying the dependency structure. In this study, we demonstrate that by combining the Random Forest, AdaBoost, and XGBoost approaches with the CoClust-based feature selection step, it is possible to achieve a notable improvement in CPU times and accuracy. On two different big data sets, we compare CoClust with K-means and hierarchical clustering techniques in order to assess its contribution to algorithms. CPU time, accuracy, and ROC (receiver operating characteristic) curve are used to compare the results.
| Original language | English |
|---|---|
| Title of host publication | Directional and Multivariate Statistics |
| Subtitle of host publication | A Volume in Honour of Ashis SenGupta |
| Publisher | Springer Science+Business Media |
| Pages | 411-439 |
| Number of pages | 29 |
| ISBN (Electronic) | 9789819620043 |
| ISBN (Print) | 9789819620036 |
| DOIs | |
| Publication status | Published - 1 Jan 2025 |
Keywords
- AdaBoost
- CoClust
- Feature selection
- Random forest
- XGBoost
Fingerprint
Dive into the research topics of 'Improving Machine Learning Algorithms with CoClust-Based Feature Selection on Big Data: A Comparative Analysis'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver