Investigation of term weighting schemes in classification of imbalanced texts

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Citations (Scopus)

Abstract

Class imbalance problem in data, plays a critical role in use of machine learning methods for text classification since feature selection methods expect homogeneous distribution as well as machine learning methods. This study investigates two different kinds of feature selection metrics (one-sided and two-sided) as a global component of term weighting schemes (called as tffs) in scenarios where different complexities and imbalance ratios are available. Traditional term weighting approach (tfidf) is employed as a base line to evaluate the effects of tffs weighting. In fact, this study aims to present which kind of weighting schemes are suitable for which machine learning algorithms on different imbalanced cases. Four classification algorithms are used to indicate the effects of term weighting schemes on the imbalanced datasets. According to our findings, regardless of tfidf, term weighting methods based on one-sided feature selection metrics are better approaches for SVM and k-NN algorithms while two-sided based term weighting methods are the best choice for MultiNB and C4.5 on the imbalanced texts. As a result, the use of term weighting methods based on one-sided feature selection metrics is recommended for SVM and tfidf is suitable weighting method for k-NN algorithm in text classification tasks.

Original languageEnglish
Title of host publicationProceedings of the European Conference on Data Mining 2014 and International Conferences on Intelligent Systems and Agents 2014 and Theory and Practice in Modern Computing 2014 - Part of the Multi Conference on Computer Science and Information Systems, MCCSIS 2014
EditorsJorg Roth, Ajith P. Abraham, Antonio Palma dos Reis
PublisherIADIS
Pages39-46
Number of pages8
ISBN (Electronic)9789898704108
Publication statusPublished - 2014
EventEuropean Conference on Data Mining 2014 and International Conferences on Intelligent Systems and Agents 2014 and Theory and Practice in Modern Computing 2014 - Lisbon, Portugal
Duration: 15 Jul 201417 Jul 2014

Publication series

NameProceedings of the European Conference on Data Mining 2014 and International Conferences on Intelligent Systems and Agents 2014 and Theory and Practice in Modern Computing 2014 - Part of the Multi Conference on Computer Science and Information Systems, MCCSIS 2014

Conference

ConferenceEuropean Conference on Data Mining 2014 and International Conferences on Intelligent Systems and Agents 2014 and Theory and Practice in Modern Computing 2014
Country/TerritoryPortugal
CityLisbon
Period15/07/1417/07/14

Keywords

  • Class imbalance problem
  • Feature selection
  • Machine learning
  • Term weighting
  • Text classification

Fingerprint

Dive into the research topics of 'Investigation of term weighting schemes in classification of imbalanced texts'. Together they form a unique fingerprint.

Cite this