Improving Classification of Fraudulent Sales

International Journal of Computer Science and Engineering
© 2018 by SSRG - IJCSE Journal
Volume 5 Issue 12
Year of Publication : 2018
Authors : Barry E. King

pdf
How to Cite?

Barry E. King, "Improving Classification of Fraudulent Sales," SSRG International Journal of Computer Science and Engineering , vol. 5,  no. 12, pp. 16-17, 2018. Crossref, https://doi.org/10.14445/23488387/IJCSE-V5I12P104

Abstract:

This article presents an improved solution to classifying fraudulent sales. An original k-nearest neighbor solution for a dataset of more than fifteen thousand cases yielded a misclassification rate of 0.058 where eight percent of the observations were fraudulent. An improved solution using a boosted C5.0 algorithm yielded a misclassification rate of 0.038. The solution was expanded to recognize that false positives (classifying a fraudulent sale as clean) were five times as costly as were false negatives (classifying a clean sale as fraudulent). The misclassification rate for this expanded solution was 0.058 but lowered the misclassification cost by twenty-one percent.

Keywords:

binary classification, machine learning, k-nearest neighbor, C5.0 algorithm

References:

[1] Murillo, J. P. (2016). Predicting fraudulent sales. [Online] https://rpubs.com/jpmurillo/fraudulentsales. 
[2] Lantz, B. (2015). Machine Learning with R, 2nd edition. Birmingham, UK: Packt.