A rough-granular approach to the imbalanced data classification problem

Katarzyna Borowska , Jarosław Stepaniuk

Abstract

More than two decades ago the imbalanced data problem turned out to be one of the most important and challenging problems. Indeed, missing information about the minority class leads to a significant degradation in classifier performance. Moreover, comprehensive research has proved that there are certain factors increasing the problem’s complexity. These additional difficulties are closely related to the data distribution over decision classes. In spite of numerous methods which have been proposed, the flexibility of existing solutions needs further improvement. Therefore, we offer a novel rough–granular computing approach (RGA, in short) to address the mentioned issues. New synthetic examples are generated only in specific regions of feature space. This selective oversampling approach is applied to reduce the number of misclassified minority class examples. A strategy relevant for a given problem is obtained by formation of information granules and an analysis of their degrees of inclusion in the minority class. Potential inconsistencies are eliminated by applying an editing phase based on a similarity relation. The most significant algorithm parameters are tuned in an iterative process. The set of evaluated parameters includes the number of nearest neighbours, complexity threshold, distance threshold and cardinality redundancy. Each data model is built by exploiting different parameters’ values. The results obtained by the experimental study on different datasets from the UCI repository are presented. They prove that the proposed method of inducing the neighbourhoods of examples is crucial in the proper creation of synthetic positive instances. The proposed algorithm outperforms related methods in most of the tested datasets. The set of valid parameters for the Rough–Granular Approach (RGA) technique is established.
Author Katarzyna Borowska (FCS / DISCN)
Katarzyna Borowska,,
- Department of Information Systems and Computer Networks
, Jarosław Stepaniuk (FCS / DISCN)
Jarosław Stepaniuk,,
- Department of Information Systems and Computer Networks
Journal seriesApplied Soft Computing, [Applied Soft Computing Journal], ISSN 1568-4946, e-ISSN 1872-9681, (N/A 200 pkt)
Issue year2019
Vol83
Pages1-13
Publication size in sheets5280.35
Keywords in EnglishData preprocessingClass imbalanceGranular computingInformation granulesRough setsSMOTE
ASJC Classification1712 Software
DOIDOI:10.1016/j.asoc.2019.105607
Internal identifierROC 19-20
Languageen angielski
Score (nominal)200
Score sourcejournalList
ScoreMinisterial score = 200.0, 04-03-2020, ArticleFromJournal
Publication indicators Scopus SNIP (Source Normalised Impact per Paper): 2018 = 2.369; WoS Impact Factor: 2018 = 4.873 (2) - 2018=4.858 (5)
Citation count*
Cite
Share Share

Get link to the record


* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Back
Confirmation
Are you sure?