Call For Paper - Upcoming Conferences

Research Article | Open Access | Download PDF
Volume 13 | Issue 5 | Year 2026 | Article Id. IJCE-V13I5P109 | DOI : https://doi.org/10.14445/23488352/IJCE-V13I5P109

Machine Learning-Based Surrogate Estimation of Drinking Water Quality Index under Partial Observability: Evidence from Anantapur


Shruthi R, Ganesh Prasanna S, Devanahalli Nagaraj Shilpa, Gauri Patil, B.P Deepthi, Prashant Sunagar, Nilesh Kumar Meshram

Received Revised Accepted Published
17 Feb 2026 25 Mar 2026 10 Apr 2026 29 May 2026

Citation :

Shruthi R, Ganesh Prasanna S, Devanahalli Nagaraj Shilpa, Gauri Patil, B.P Deepthi, Prashant Sunagar, Nilesh Kumar Meshram, "Machine Learning-Based Surrogate Estimation of Drinking Water Quality Index under Partial Observability: Evidence from Anantapur," International Journal of Civil Engineering, vol. 13, no. 5, pp. 111-123, 2026. Crossref, https://doi.org/10.14445/23488352/IJCE-V13I5P109

Abstract

Operationally estimating water quality is essential for sustainable management of freshwater resources, especially in anthropogenically impacted hydrochemically degraded areas. We present a data-driven solution for estimating the Water Quality Index (WQI) for Anantapur district, Andhra Pradesh, India, using machine learning models trained on hydrochemical data. Despite WQI’s definition as a weighted arithmetic equation based on such measurements, operationally calculating WQI is hindered by missing values, time delays in laboratory analysis, and irregular sampling frequencies. To overcome this, we used calculated WQI values as a proxy to train surrogate models to operationally predict WQI using frequently measured parameters at the time of sampling (e.g., pH, total hardness, alkalinity, turbidity, sodium, total dissolved solids, and electrical conductivity). We compared several models, including support vector regression, linear regression, regression trees, Artificial Neural Networks (ANN), and AdaBoost, for which we trained an 80% training subset of the data using cross-validation, and tested model performance on a separate, unseen 20% testing subset. Support vector and linear models performed well, AdaBoost yielded the best results explaining >90% of variance in the WQI values, while ensemble models were the most robust under operationally data-limited conditions.

Keywords

Water Quality Index (WQI), Machine Learning, Ensemble Learning (AdaBoost), Hydrochemical Parameters, Groundwater Quality, Sustainable Water Resource Management.

References

  1. Francesco Rufino et al., “Evaluating the Suitability of Urban Groundwater Resources for Drinking Water and Irrigation Purposes: an Integrated Approach in the Agro-Aversano Area of Southern Italy,” Environmental Monitoring and Assessment, vol. 191, pp. 1-17, 2019.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  2. Aman Kumar, and Moncef L. Nehdi, Chapter 1 - Data-driven Approaches to Groundwater Modelling: Methods, Applications, and Challenges, Hydrological Insights, pp. 1-10, 2026.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  3. Donald F. Hayes et al., “Enhancing Water Quality in Hydropower System Operations,” Water Resources Research, vol. 34, no. 3, pp. 471-483, 1998.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  4. Ersan Batur, and Derya Maktav, “Assessment of Surface Water Quality by Using Satellite Images Fusion Based on PCA Method in the Lake Gala, Turkey,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 5, pp. 2983-2989, 2019.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  5. Shailesh Jaloree, Anil Rajput, and Sanjeev Gour, “Decision Tree Approach to Build a Model for Water Quality,” Binary Journal of Data Mining & Networking, vol. 4, no. 1, pp. 25-28, 2014.
    [
    Google Scholar]
  6. Juntao Liu et al., “Accurate Prediction Scheme of Water Quality in Smart Mariculture with Deep Bi-S-SRU Learning Network,” IEEE Access, vol. 8, pp. 24784-24798, 2020.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  7. Samira Zahmatkesh, and Philipp Zech, “Spatio-Temporal Missing Data Imputation: A Systematic Literature Review with a Focus on Statistical and Machine Learning-Based Approaches,” ACM Computing Surveys, vol. 58, no. 10, pp. 1-41, 2026.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  8. Tressy Thomas, and Enayat Rajabi, “A Systematic Review of Machine Learning-based Missing Value Imputation Techniques,” Data Technologies and Applications, vol. 55, no. 4, pp. 558-585, 2021.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  9. Sina Davoudi, and Kiyoumars Roushangar, “Innovative Approaches to Surface Water Quality Management: Advancing Nitrate (NO3) Forecasting with Hybrid CNN-LSTM and CNN-GRU Techniques,” Modeling Earth Systems and Environment, vol. 11, 2025.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  10. Yehai Tang et al., “Enhancing Hydrological Extremes Forecasting Capabilities in Data‐Scarce Regions through Transfer Learning with Data Augmentation,” Earth’s Future, vol. 13, no. 10, pp. 1-21, 2025.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  11. Hao Liao, and Wen Sun, “Forecasting and Evaluating Water Quality of Chao Lake based on an Improved Decision Tree Method,” Procedia Environmental Sciences, vol. 2, pp. 970-979, 2010.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  12. Li Yan-ju, and Ming Qian, “AP-LSSVM Modeling for Water Quality Prediction,” Proceedings of the 31st Chinese Control Conference, Hefei, China, pp. 6928-6932, 2012.
    [
    Google Scholar] [Publisher Link]
  13. Archana Solanki, Himanshu Agrawal, and Kanchan Khare, “Predictive Analysis of Water Quality Parameters using Deep Learning,” International Journal of Computer Applications, vol. 125, no. 9, pp. 29-34, 2015.
    [
    Google Scholar]
  14. Xiu Li, and Jingdong Song, “A New ANN-Markov Chain Methodology for Water Quality Prediction,” 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, pp. 1-6, 2015.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  15. Leizhi Wang et al., “Improving the Robustness of Beach Water Quality Modeling using an Ensemble Machine Learning Approach,” Science of the Total Environment, vol. 765, 2021.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  16. Nitin Rane, Saurabh P. Choudhary, and Jayesh Rane, “Ensemble Deep Learning and Machine Learning: Applications, Opportunities, Challenges, and Future Directions,” Studies in Medical and Health Sciences, vol. 1, no. 2, pp. 18-41, 2024.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  17. Moein Tosan et al., “Evolution of Ensemble Machine Learning Approaches in Water Resources Management: A Review,” Earth Science Informatic, vol. 18, 2025.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  18. Ganeshbabu Oorkavalan et al., “RETRACTED: Cluster Analysis to Assess Groundwater Quality in Erode District, Tamil Nadu, India,” Circuits and Systems, vol. 7, no. 6, pp. 877-890, 2016.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  19. Umair Ahmed et al., “Efficient Water Quality Prediction Using Supervised Machine Learning,” Water, vol. 11, no. 11, pp. 1-14, 2019.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  20. M. Rajasekhar et al., “Data on Artificial Recharge Sites Identified by Geospatial Tools in Semi-arid Region of Anantapur District, Andhra Pradesh, India,” Data Brief, vol. 19, pp. 462-474, 2018.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  21. M. Rajasekhar et al., “Identification of Suitable Sites for Artificial Groundwater Recharge Structures in Semi-arid Region of Anantapur District: AHP Approach,” Hydrospatial Analysis, vol. 3, no. 1, pp. 1-11, 2019.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  22. N. Subba Rao, D. John Devadas, and K. V. Srinivasa Rao, “Interpretation of Groundwater Quality using Principal Component Analysis from Anantapur District, Andhra Pradesh, India,” Environmental Geosciences, vol. 13, no. 4, pp. 239-259, 2006.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  23. Yafra Khan, and Chai Soo See, “Predicting and Analyzing Water Quality using Machine Learning: A Comprehensive Model,” 2016 IEEE Long Island Systems, Applications and Technology Conference (LISAT), Farmingdale, NY, USA, pp. 1-6, 2026.
    [
    CrossRef] [Google Scholar] [Publisher Link]