Imbalanced data ensemble classification based on cluster-based under-sampling algorithm

WU Sen; LIU Lu; LU Dan

doi:10.13374/j.issn2095-9389.2017.08.015

WU Sen, LIU Lu, LU Dan. Imbalanced data ensemble classification based on cluster-based under-sampling algorithm[J]. Chinese Journal of Engineering, 2017, 39(8): 1244-1253. DOI: 10.13374/j.issn2095-9389.2017.08.015

Citation:

Imbalanced data ensemble classification based on cluster-based under-sampling algorithm

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Most traditional classification algorithms assume the data set to be well-balanced and focus on achieving overall classification accuracy. However, actual data sets are usually imbalanced, so traditional classification approaches may lead to classification errors in minority class samples. With respect to imbalanced data, there are two main methods for improving classification performance. The first is to improve the data set by increasing the number of minority class samples by over-sampling and decreasing the number of majority class samples by under-sampling. The other method is to improve the algorithm itself. By combining the cluster-based under-sampling method with ensemble classification, in this paper, an approach was proposed for classifying imbalanced data. First, the cluster-based under-sampling method is used to establish a balanced data set in the data processing stage, and then the new data set is trained by the AdaBoost ensemble algorithm. In the integration process, when calculating the error rate of integrated learning, this algorithm uses weights to distinguish minority class data from majority class data. This makes the algorithm focus more on small data classes, thereby improving the classification accuracy of minority class data.

FullText(HTML)

References (9)

Cited By

Imbalanced data ensemble classification based on cluster-based under-sampling algorithm

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content