BRAIN. Broad Research in Artificial Intelligence and Neuroscience

Volume: 11 | Issue: 1

Methods of Handling Unbalanced Datasets in Credit Card Fraud Detection

Elena-Adriana Mînăstireanu - PhD student, "Alexandru Ioan Cuza" University of Iași, Doctoral School of Economics and Business Administration, Iași (RO), Gabriela Meșniță - "Alexandru Ioan Cuza" University of Iasi, Faculty of Economics and Business Administration, Business Information Systems Department, Iași, Romania (RO),

Abstract

Nowadays fraudulent transactions of every type represent a major concern in the financial industry due to the total amount of money that are lost every year. Manually analyzing fraudulent transactions is unfeasible if we think at the huge amount of data and the complexity of bank fraud in the digitization era. In this context, the problem to detect the fraud can be achieved by machine-learning algorithms due to their ability of detecting small anomalies in very large datasets. The problem that arise here is that the datasets are highly unbalanced meaning that the non-fraudulent cases heavily dominates the fraudulent ones. In this paper, we are going to present three ways of handling unbalanced datasets by: resampling methods (undersampling and oversampling), cost-sensitive training and tree algorithms (decision tree, random forest and Naïve Bayes), emphasizing the idea of why the Receiver Operating Characteristics curve (ROC) should not be used on this type of datasets when measuring the performance of the algorithm. The experimental test was applied on a number of 890,977 banking transactions in order to observe the performance metrics of all the three methods mentioned above.

Full Text:

View PDF


(C) 2010-2025 EduSoft