Towards Intelligent Data Retention Recommendations in DevOps Using Elasticsearch and ML
| International Journal of Computer Science and Engineering |
| © 2025 by SSRG - IJCSE Journal |
| Volume 12 Issue 8 |
| Year of Publication : 2025 |
| Authors : Govind Singh Rawat |
How to Cite?
Govind Singh Rawat, "Towards Intelligent Data Retention Recommendations in DevOps Using Elasticsearch and ML," SSRG International Journal of Computer Science and Engineering , vol. 12, no. 8, pp. 1-12, 2025. Crossref, https://doi.org/10.14445/23488387/IJCSE-V12I8P101
Abstract:
DevOps teams face an ever-growing challenge in managing log and metrics data: how long to retain data to balance operational value against storage costs and performance constraints. Traditional static retention policies struggle to cope with explosive data growth and evolving compliance requirements. In this work, we propose an intelligent data retention recommendation system that leverages Elasticsearch’s rich monitoring data and Machine Learning (ML) to suggest optimal retention periods for indices dynamically. Our approach collects metrics on query load, storage use, and index lifecycle policies from a live Elasticsearch cluster and trains an ML model to predict the retention duration that minimizes cost while preserving necessary data availability. We present a framework where the model learns usage patterns and system constraints, recommending when to tier or delete indices. Preliminary evaluations suggest that the ML-driven approach can reduce storage costs and cluster strain by avoiding over-retention of seldom-accessed data, without compromising on query performance or compliance. This paper details the related work in intelligent log management, the theoretical underpinnings of our approach, the design of our ML-based retention recommender, and experimental results in a DevOps context. We conclude with insights into the benefits of adaptive data retention and discuss future improvements for integrating such systems into automated DevOps pipelines.
Keywords:
Data Retention, DevOps, Elasticsearch, Predictive Analytics, Log Management.
References:
[1] Len Bass, Ingo Weber, and Liming Zhu, DevOps: A Software Architect’s Perspective, 2nd Ed., Addison-Wesley, 2015.
[Google Scholar] [Publisher Link]
[2] Regulation (EU) 2016/679, General Data Protection Regulation, 2016. [Online]. Available: https://gdpr-info.eu/
[3] Riley Peronto, Four Steps to Reduce Log Data Costs: A Practical Guide, Chronosphere, 2024. [Online]. Available: https://chronosphere.io/learn/steps-to-reduce-log-data-costs/#:~:text=folks%20reported%20a%20250
[4] Elastic, Index Lifecycle Management Policy. [Online]. Available: https://www.elastic.co/docs/manage-data/lifecycle/index-lifecycle-management
[5] Qian Cheng et al., “AI for IT Operations (AIOps) on Cloud Platforms: Reviews, Opportunities and Challenges,” arXiv preprint, pp. 1-34, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[6] SearchInform, Log Retention: Best Practices and Importance for Compliance. [Online]. Available: https://searchinform.com/articles/cybersecurity/measures/log-management/log-retention/
[7] Elastic, Disk based Shard Allocation. [Online]. Available: https://www.elastic.co/guide/en/elasticsearch/reference/7.12/modules-cluster.html#disk-based-shard-allocation
[8] Swathi Chundru, and Lakshmi Narasimha Raju Mudunuri, “Developing Sustainable Data Retention Policies: A Machine Learning Approach to Intelligent Data Lifecycle Management,” Driving Business Success through Eco-Friendly Strategies, pp. 93-114, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[9] J. Bamini et al., “Enhancing Employee Retention with AI: Predictive Analytics and Decision Support Systems,” 2025 International Conference on Automation and Computation (AUTOCOM), Dehradun, India, pp. 1581-1585, 2025.
[Google Scholar] [Publisher Link]
[10] Bharath Thandalam Rajasekaran, and Neeraj Saxena, “Machine Learning Driven Data Management in Hybrid Cloud Storage,” International Journal of Creative Research Thoughts, vol. 13, no. 2, pp. 1-14, 2025.
[Publisher Link]
[11] Valeriy Khakhutskyy, Explaining anomalies detected by Elastic Machine Learning, Elastic Blog, 2023. [Online]. Available: https://www.elastic.co/blog/explaining-anomalies-detected-by-elastic-machine-learning
[12] Renuka Gavli et al., “Log Analysis: Understanding and Enhancing System Monitoring,” International Journal of Advanced Research in Computer and Communication Engineering, vol. 14, no. 6, pp. 236-240, 2025.
[CrossRef] [Publisher Link]
[13] J. Li et al., “Managing Data Retention Policies at Scale,” IEEE Transactions on Network and Service Management, vol. 9, no. 4, pp. 393-406, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Infobelt, Accelerating Archiving and Data Retention with AI, 2025. [Online]. Available: https://infobelt.com/accelerating-archiving-and-data-retention-with-ai

10.14445/23488387/IJCSE-V12I8P101