Designing A Low Power LeNet Convolutional Neural Network Accelerator for FPGA in IoT Edge Computing

International Journal of Electrical and Electronics Engineering
© 2024 by SSRG - IJEEE Journal
Volume 11 Issue 2
Year of Publication : 2024
Authors : Alshima Alwali, Peter K. Kihato
pdf
How to Cite?

Alshima Alwali, Peter K. Kihato, "Designing A Low Power LeNet Convolutional Neural Network Accelerator for FPGA in IoT Edge Computing," SSRG International Journal of Electrical and Electronics Engineering, vol. 11,  no. 2, pp. 98-106, 2024. Crossref, https://doi.org/10.14445/23488379/IJEEE-V11I2P111

Abstract:

With an emphasis on the Xilinx Artix XC7A200T FPGA, in this paper, a Convolutional Neural Network (CNN) tailored explicitly for FPGA deployment is designed and implemented. The method adapts the LeNet-1 model using a hardware description language; this choice is motivated by the model’s minimal size, making it suitable for edge computing devices. With its parameterized module structure, the architecture, known as ‘LeNet,’ offers significant flexibility and adaptability. The design focuses on the modular architecture and diversity of Processing Elements (PEs), crucial for parallel processing in computationally demanding CNN tasks. Convolutional, pooling, and fully connected layers are customized to leverage the FPGA’s capabilities. Multiple filter banks are utilized for effective input processing and feature extraction. The pooling layers are specifically designed to reduce feature dimensionality complexity, thereby improving data fluctuation handling and reducing computational demands. The architecture stands out for its scalability and efficiency, utilizing five different processing units. The parameterization of modules and their successful application on the MNIST dataset, a standard benchmark in Machine Learning for handwritten digit recognition, further illustrate how the architecture may be adapted to different datasets and applications. The implementation of the Xilinx Artix XC7A200T FPGA achieved a power consumption of 1.775 W at 100 MHz, indicating that the design is energy-efficient and suitable for high-demand applications in resource-limited environments. This paper details the module design, parameterization, and integration methodologies employed in the design process of adapting the LeNet-1 model for FPGA.

Keywords:

Convolutional Neural Networks, Edge computing, FPGA, LeNet-1, Performance analysis.

References:

[1] Kaiming He et al., “Mask R-CNN,” Proceedings of the IEEE International Conference on Computer Vision, pp. 2961-2969, 2017.
[Google Scholar] [Publisher Link]
[2] Shaoqing Ren et al., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Advances in Neural Information Processing Systems, vol. 28, pp. 1-9, 2015.
[Google Scholar] [Publisher Link]
[3] Olga Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Andrew Putnam et al., “A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services,” ACM SIGARCH Computer Architecture News, vol. 42, no. 3, pp. 13-24, 2014.
[Google Scholar] [Publisher Link]
[5] Karen Simonyan, and Andrew Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” ArXiv, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Christian Szegedy et al., “Rethinking the Inception Architecture for Computer Vision,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818-2826, 2016.
[Google Scholar] [Publisher Link]
[7] Christian Szegedy et al., “Inception-v4, Inception-Resnet and the Impact of Residual Connections on Learning,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, pp. 4278-4284, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Kaiming He et al., “Deep Residual Learning for Image Recognition,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016.
[Google Scholar] [Publisher Link]
[9] Laurent Sifre, and Stephane Mallat, “Rigid-Motion Scattering for Texture Classification,” ArXiv, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Francois Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1251-1258, 2017.
[Google Scholar] [Publisher Link]
[11] Mark Sandler et al., “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510-4520, 2018.
[Google Scholar] [Publisher Link]
[12] Xiangyu Zhang et al., “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6848-6856, 2018.
[Google Scholar] [Publisher Link]
[13] Yongming Shen, Michael Ferdman, and Peter Milder, “Maximizing CNN Accelerator Efficiency through Resource Partitioning,” ACM SIGARCH Computer Architecture News, vol. 45, no. 2, pp. 535-547, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Kiseok Kwon et al., “Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications,” DAC '18: Proceedings of the 55th Annual Design Automation Conference, pp. 1-6, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems, pp. 1-9, 2012.
[Google Scholar] [Publisher Link]
[16] Christian Szegedy et al., “Going Deeper with Convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, 2015.
[Google Scholar] [Publisher Link]
[17] Norman P. Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit,” Proceedings of the ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 1-12, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Jeremy Fowers et al., “A Configurable Cloud-Scale DNN Processor for Real-Time AI,” 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), Los Angeles, USA, pp. 1-14, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Xiaying Wang et al., “FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things,” IEEE Internet of Things Journal, vol. 7, no. 5, pp. 4403-4417, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Chen Zhang et al., “Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks,” FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161-170, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Yann LeCun et al., “Learning Algorithms for Classification: A Comparison on Handwritten Digit Recognition,” Neural Networks: The Statistical Mechanics Perspective, vol. 2, pp. 261-276, 1995.
[Google Scholar]
[22] Tianling Li, Bin He, and Yangyang Zheng, “Research and Implementation of High Computational Power for Training and Inference of Convolutional Neural Networks,” Applied Sciences, vol. 13, no. 2, pp. 1-20, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Mannhee Cho, and Youngmin Kim, “FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate Unit,” Electronics, vol. 10, no. 22, pp. 1-16, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Dan Shan, Guotao Cong, and Wei Lu, “A CNN Accelerator on FPGA with A Flexible Structure,” 2020 5th International Conference on Computational Intelligence and Applications (ICCIA), Beijing, China, pp. 211-216, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Ming Xia et al., “SparkNoC: An Energy-Efficiency FPGA-Based Accelerator Using Optimized Lightweight CNN for Edge Computing,” Journal of Systems Architecture, vol. 115, 2021.
[CrossRef] [Google Scholar] [Publisher Link]