Improving Classification of Fraudulent Sales

Barry E. King

Citation :

Barry E. King, "Improving Classification of Fraudulent Sales," International Journal of Computer Science and Engineering , vol. 5, no. 12, pp. 16-17, 2018. Crossref, https://doi.org/10.14445/23488387/IJCSE-V5I12P104

Abstract

This article presents an improved solution to classifying fraudulent sales. An original k-nearest neighbor solution for a dataset of more than fifteen thousand cases yielded a misclassification rate of 0.058 where eight percent of the observations were fraudulent. An improved solution using a boosted C5.0 algorithm yielded a misclassification rate of 0.038. The solution was expanded to recognize that false positives (classifying a fraudulent sale as clean) were five times as costly as were false negatives (classifying a clean sale as fraudulent). The misclassification rate for this expanded solution was 0.058 but lowered the misclassification cost by twenty-one percent.

Keywords

binary classification, machine learning, k-nearest neighbor, C5.0 algorithm

References

[1] Murillo, J. P. (2016). Predicting fraudulent sales. [Online] https://rpubs.com/jpmurillo/fraudulentsales.
[2] Lantz, B. (2015). Machine Learning with R, 2nd edition. Birmingham, UK: Packt.