Implementation of a Data Lakehouse for Efficient Recovery and Processing of Massive Data

International Journal of Computer Science and Engineering
© 2023 by SSRG - IJCSE Journal
Volume 10 Issue 11
Year of Publication : 2023
Authors : N'GUESSAN Behou Gérard, ASSIE Brou Ida, WAMBA Samuel Fosso, ACHIEPO Odilon Yapo Melaine

pdf
How to Cite?

N'GUESSAN Behou Gérard, ASSIE Brou Ida, WAMBA Samuel Fosso, ACHIEPO Odilon Yapo Melaine, "Implementation of a Data Lakehouse for Efficient Recovery and Processing of Massive Data," SSRG International Journal of Computer Science and Engineering , vol. 10,  no. 11, pp. 7-12, 2023. Crossref, https://doi.org/10.14445/23488387/IJCSE-V10I11P102

Abstract:

In the field of Big Data, a recurring problem is that of rapid data processing with a view to their recovery and exploitation, in particular on dashboards. This problem leads to a shortening of decision-making and an inefficiency in using analytical solutions because of excessive latency times. The objective of this article is to set up a Big Data architecture capable of accelerating queries and processing on massive data in order to offer very good performance, in particular for real-time or near-real-time applications, regardless of the amount of data available and the rate at which new data is produced. This architecture is based on object storage, data virtualization and Data Lakehouse technologies. More specifically, it is based on the MinIO and Dremio technologies, which allow optimization mechanisms useful for achieving the defined objectives, particularly the reflection mechanism of Dremio. The combination of these technologies has made it possible to develop dashboards with very low latency with global COVID-19 data.

Keywords:

Big data, Object storage, Data virtualization, Data lakehouse, Technology.

References:

[1] Aymen Elhali, Imane El Yamlahi, and Amine Nabil Bouayad, “The COVID-19 Crisis is a Boost to Digital Transformation in Morocco,” French Review of Economics and Management, vol. 4, no. 2, 2023.
[2] Elizabeth Couzineau-Zegwaard, “The Impact of Digitalization on the Supply Chain Business Ecosystem,” The Journal of Management Sciences, vol. 301302, no. 1, pp. 85-97, 2020.
[3] Mohamed Talha, Big Data between Quality & Data Security, Doctoral Thesis, Cadi Ayyad University (Marrakesh, Morocco), 2022.
[4] Lotfi Benazzou, and Nabaouia Bennia, Covid-19 and Management Control, Controlling, Accounting and Auditing Journal, vol. 5, no. 3, 2021.
[5] Athira Nambiar, and Divyansh Mundra, “An Overview of Data Warehouse and Data Lake in Modern Enterprise Data Management,” Big Data and Cognitive Computing, vol. 6, no. 4, pp. 1-24, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Etienne Scholly, “From Metadata Modeling to the Conception of a Data Lake: Application to Public Housing,” Doctoral Thesis, University of Lyons, 2022.
[Google Scholar] [Publisher Link]
[7] Lopez Chavez, and Marina Adriana, “Proposal of a Research Data Management Platform and Its Adoption by Researchers in Environment and Computer Science in the Context of Forestry Research in Quebec,” Doctoral Thesis, Tele-university, 2022.
[8] Etienne Scholly et al., Metadata Systems in Data Lakes: Modeling and Functionality.
[9] Michael Armbrust et al., “Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics,” CIDR Proceedings, 2021.
[Google Scholar] [Publisher Link]
[10] Jie Zhao, Maria A. Rodriguez, and Rajkumar Buyya, “High-Performance Mining of COVID-19 Open Research Datasets for Text Classification and Insights in Cloud Computing Environments,” IEEE/ACM 13th International Conference on Utility and Cloud Computing, IEEE, pp. 302-309, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Ericka Y. Bracamonte-Arámburo, and Guillermo Foladori, “Adverse Effects of COVID-19 Protective Masks: Conflictive Cases,” Research and Science of the Autonomous University of Aguascalientes, no. 85, 2022.
[12] Stephanie Maltese, “Descriptive Study of Canadian Humanitarian Agility and Resilience in the Time of COVID-19,” Canadian Journal of Development Studies, vol. 43, no. 4, pp. 468-486, 2022.
[Publisher Link]
[13] Damien Renard, Collaboration on Social Innovation Platforms: The Case of “Solidarity Covid-19 Francophonie”, Communication, Technologies and Development, no. 10, 2021.
[14] José Espinoza Cepeda, Alejandra Colina Vargas, and Marcos Espinoza Mina, Improving Customer Service through the Application of Business Intelligence Tools