Big Data Repositories

Bennett E.O, Elliot, S. J

Received	Revised	Accepted	Published
10 Jan 2026	19 Feb 2026	09 Mar 2026	29 Mar 2026

Citation :

Bennett E.O, Elliot, S. J, "Big Data Repositories," International Journal of Computer Science and Engineering, vol. 13, no. 3, pp. 42-49, 2026. Crossref, https://doi.org/10.14445/23488387/IJCSE-V13I3P103

Abstract

The growing abundance of big data has made it extremely difficult to find meaningful and accurate information from larger unstructured, semi-structured data collections. Classical extraction methods are limited by being computation-intensive, slow, and not flexible in heterogeneous data sources. This study presents an optimal extraction framework based on generative models such as generative adversarial networks and vibrational autoencoder, which are scalable, maintain accuracy, and processing speed in the presence of a large-scale dataset. Generative learning models are employed to transform unrevealed representations of inputs into a structured and analyzable format, resulting in improved indexing and retrieval accuracy. Python was used as a primary programming language to implement the system. Machine learning and data processing libraries were also used for training the model, preprocessing of data, and evaluating its performance. When compared with existing MapReduce-based methods, results showed that this method enhanced the accuracy of the extraction of data and the effectiveness of search and retrieval. Also, the processing time was reduced in the process.

Keywords

Big Data Repositories, Data Extraction, Generative Algorithms, GANs, VAEs, Optimization.

References

Viktor Mayer-Schönberger, and Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live, Work, and Think, Houghton Mifflin Harcourt, pp. 1-242, 2013.
[Google Scholar] [Publisher Link]
James Manyika et al., “Big Data: The Next Frontier for Innovation, Competition, and Productivity,” McKinsey Global Institute, Report, pp. 1-156, 2011.
[Google Scholar] [Publisher Link]
Abdullah Konak, David W. Coit, and Alice E. Smith, “Multi-objective Optimization using Genetic Algorithms: A Tutorial,” Reliability Engineering & System Safety, vol. 91, no. 9, pp. 992-1007, 2006.
[CrossRef] [Google Scholar] [Publisher Link]
Abdulrazzaq Shaamala et al., “Algorithmic Green Infrastructure Optimisation: Review of Artificial Intelligence Driven Approaches for Tackling Climate Change,” Sustainable Cities and Society, vol. 101, pp. 1-20, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, pp. 1-800, 2016.
[Google Scholar] [Publisher Link]
Yann LeCun, Yosua Bengio, and Geoffrey Hinton, “Deep Learning,” Nature, vol. 521, pp. 436-444, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
G.E. Hinton, and R.R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks,” Science, vol. 313, no. 5786, pp. 504-507, 2006.
[CrossRef] [Google Scholar] [Publisher Link]
Diederik P. Kingma, and Max Welling, “Auto-encoding Variational Bayes,” arXiv preprint, 2013.
[Google Scholar] [Publisher Link]
Ian J. Goodfellow et al., “Generative Adversarial Nets,” Advances in Neural Information Processing Systems, vol. 27, 2014.
[Google Scholar] [Publisher Link]
Jurgen Schmidhuber, “Deep Learning in Neural Networks: An Overview,” Neural Networks, vol. 61, pp. 85-117, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
Matei Zaharia et al., “Spark: Cluster Computing with Working Sets,” Proceedings of the USENIX Conference on Hot Topics in Cloud Computing, 2010.
[Google Scholar] [Publisher Link]
Xiaochuang Yao et al., “Spatial Coding-Based Approach for Partitioning Big Spatial Data in Hadoop,” Computers & Geosciences, vol. 106, pp. 60-67, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
Eric P. Xing et al., “Strategies and Principles of Distributed Machine Learning on Big Data,” Engineering, vol. 2, no. 2, pp. 179-195, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
Wan-Yu Deng et al., “A Fast SVD-Hidden-nodes based Extreme Learning Machine for Large-Scale Data Analytics,” Neural Networks, vol. 77, pp. 14-28, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
Chen Bo-Wei, Wen Ji, and Seungmin Rho, “Divide-andconquer Signal Processing, Feature Extraction, and Machine Learning for Big Data,” Neurocomputing, vol. 174, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

[1] Viktor Mayer-Schönberger, and Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live, Work, and Think, Houghton Mifflin Harcourt, pp. 1-242, 2013.
[Google Scholar] [Publisher Link]

[2] James Manyika et al., “Big Data: The Next Frontier for Innovation, Competition, and Productivity,” McKinsey Global Institute, Report, pp. 1-156, 2011.
[Google Scholar] [Publisher Link]

[3] Abdullah Konak, David W. Coit, and Alice E. Smith, “Multi-objective Optimization using Genetic Algorithms: A Tutorial,” Reliability Engineering & System Safety, vol. 91, no. 9, pp. 992-1007, 2006.
[CrossRef] [Google Scholar] [Publisher Link]

[4] Abdulrazzaq Shaamala et al., “Algorithmic Green Infrastructure Optimisation: Review of Artificial Intelligence Driven Approaches for Tackling Climate Change,” Sustainable Cities and Society, vol. 101, pp. 1-20, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[5] Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, pp. 1-800, 2016.
[Google Scholar] [Publisher Link]

[6] Yann LeCun, Yosua Bengio, and Geoffrey Hinton, “Deep Learning,” Nature, vol. 521, pp. 436-444, 2015.
[CrossRef] [Google Scholar] [Publisher Link]

[7] G.E. Hinton, and R.R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks,” Science, vol. 313, no. 5786, pp. 504-507, 2006.
[CrossRef] [Google Scholar] [Publisher Link]

[8] Diederik P. Kingma, and Max Welling, “Auto-encoding Variational Bayes,” arXiv preprint, 2013.
[Google Scholar] [Publisher Link]

[9] Ian J. Goodfellow et al., “Generative Adversarial Nets,” Advances in Neural Information Processing Systems, vol. 27, 2014.
[Google Scholar] [Publisher Link]

[10] Jurgen Schmidhuber, “Deep Learning in Neural Networks: An Overview,” Neural Networks, vol. 61, pp. 85-117, 2015.
[CrossRef] [Google Scholar] [Publisher Link]

[11] Matei Zaharia et al., “Spark: Cluster Computing with Working Sets,” Proceedings of the USENIX Conference on Hot Topics in Cloud Computing, 2010.
[Google Scholar] [Publisher Link]

[12] Xiaochuang Yao et al., “Spatial Coding-Based Approach for Partitioning Big Spatial Data in Hadoop,” Computers & Geosciences, vol. 106, pp. 60-67, 2017.
[CrossRef] [Google Scholar] [Publisher Link]

[13] Eric P. Xing et al., “Strategies and Principles of Distributed Machine Learning on Big Data,” Engineering, vol. 2, no. 2, pp. 179-195, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

[14] Wan-Yu Deng et al., “A Fast SVD-Hidden-nodes based Extreme Learning Machine for Large-Scale Data Analytics,” Neural Networks, vol. 77, pp. 14-28, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

[15] Chen Bo-Wei, Wen Ji, and Seungmin Rho, “Divide-andconquer Signal Processing, Feature Extraction, and Machine Learning for Big Data,” Neurocomputing, vol. 174, 2016.
[CrossRef] [Google Scholar] [Publisher Link]