Resampling algorithm of imbalanced data based on neighbor relationship
- Available Online:
Abstract: The classification of imbalanced data has become a critical and significant research issue in many data-intensive applications. In order to improve the classification accuracy of imbalanced data sets, a resampling algorithm based on the neighbour relationship (RSNR) of sample space is proposed. This method firstly evaluates the security level according to the spatial neighbour relations of minority samples, and oversamples them through SMOTE algorithm guided by their security level. Then, the local density of majority samples is calculated according to their spatial neighbour relation, so as to under-sample the majority samples in sample-intensive area. By the above two means, the data set can be balanced, and the data size can be controlled to prevent overfitting, so as to realize the classification equalization of the two categories. The training set and test set were generated by the method of 5×10 fold cross validation. After resampling the training set, the Extreme Learning Machine (ELM) was used as the classifier for training, and the test set was used for verification. The experimental results on UCI imbalanced data set and measured circuit fault diagnosis data show that the proposed method is superior to other resampling algorithms.