Daily performance comparison of Regression-Based and AI models

The corrected values produced by the regression-based model are visually compared with the actual rainfall data in Appendix C, Fig. C.1. The blue line reflects the actual recorded rainfall, while the red line represents the model’s corrected readings. It is demonstrated that the regression-based method can reasonably represent the overall trend of rainfall fluctuations over time. However, significant discrepancies are noticeable, particularly during periods of intense rainfall when the model tends to either overestimate or underestimate peak values. These variations suggest that the regression-based approach struggles to accurately capture the very dynamic and nonlinear nature of rainfall patterns, which are influenced by a variety of meteorological and environmental factors. The adjusted results exhibit considerable deviations from the actual recorded data during periods of intense rainfall, underscoring the model’s performance limitations. These inconsistencies indicate that typical regression techniques may not be sufficient for effectively estimating daily rainfall fluctuations, especially in regions with high rainfall variability.

The differences between the regression-based model predictions and actual rainfall values are shown in Appendix C, Fig. C.2. The red line shows the difference between the observed and corrected rainfall as determined by the regression model, while the blue line indicates the initial difference between the observed rainfall and the baseline estimate. Variations are depicted in the figure, particularly during periods of intense rainfall when the model alternates between overpredicting and underpredicting values. Negative residuals indicate underpredictions, and large positive residuals suggest that the model overestimates rainfall. This trend implies that the accuracy of the regression-based model varies with rainfall intensity, performing better during periods of moderate rainfall but showing higher error under extreme conditions. The shifting nature of the inaccuracies highlights the challenges of relying solely on regression-based methods for reliable daily rainfall forecasts, as the method does not fully account for complex rainfall dynamics, resulting in inconsistent predictions.

A scatter plot compares recorded rainfall measurements with adjusted values from a regression model. The data distribution shows the relationship between expected and actual values, while a dotted trend line illustrates overall model performance. An R-squared value of 0.7432 indicates moderate correlation, suggesting the model captures general trends but lacks high accuracy. Data dispersion at higher rainfall levels reveals the model’s limits in predicting extreme events, with deviations and outliers showing reduced accuracy as rainfall intensity increases. These results suggest more advanced techniques, like deep learning models, for better daily rainfall predictions shown in Fig. 6.

Fig. 6
figure 6

Scatter plot between actual rainfall and corrected rainfall using Regression-Based Daily Time Increment.

The performance of regression-based correction algorithms deteriorates under extreme conditions because of their poor capacity to reflect non-linear and dynamic patterns, despite their considerable ability to capture overall rainfall trends. This limitation is visible in the large variances recorded during high-intensity rainfall events, as shown in Appendix C, Figs. C.1 and C.2. Conventional techniques like linear regression frequently depend on static relationships, which makes them less sensitive to sudden variations in rainfall patterns, especially in areas with significant variability. On the other hand, the machine learning framework used, especially deep learning models like LSTM, provides definite benefits when it comes to managing intricate temporal dependencies, non-linear interactions, and abrupt variations in rainfall intensity. The AI-driven model, in contrast to static regression approaches, learns complex correlations over time and dynamically adjusts to changing data patterns. Prediction accuracy and robustness are improved across a variety of rainfall conditions by this flexibility. The enhanced capacity to record both moderate and intense rainfall events, as illustrated in Fig. 6, highlights the advantages of the AI-based strategy over traditional statistical downscaling and bias correction techniques.

The Long Short-Term Memory (LSTM) model provided the most accurate results, as shown in the highlighted plot in Appendix C, Fig. C.3(a). The corrected rainfall predictions produced by the model closely align with the observed rainfall, showcasing its ability to consistently minimize errors across various rainfall intensities. This alignment is particularly evident during periods of high-intensity rainfall, where the LSTM model effectively captures the peaks and fluctuations in rainfall. The LSTM model architecture, specifically designed to learn patterns from sequential data, enables it to account for temporal dependencies such as daily and seasonal variations. Additionally, its ability to adapt to the non-linear nature of rainfall data allows it to accurately represent complex rainfall patterns. These characteristics make the LSTM model the most reliable and robust option for daily rainfall forecasting.

The Exponential Gaussian Process Regression (Exponential GPR) model demonstrated moderate success in forecasting daily rainfall. Its corrected rainfall predictions align well with the observed data during periods of moderate rainfall, indicating its ability to capture general rainfall trends in Appendix C, Fig. C.3(b). However, the model faces challenges in predicting extreme rainfall events, leading to noticeable deviations between the predicted and actual rainfall during these periods. This is particularly evident in high-intensity rainfall scenarios, where the model underestimates the magnitude of the rainfall. While the Exponential GPR model is well-suited for identifying broader rainfall patterns, its inability to fully model the complexities and non-linear characteristics of rainfall data reduces its overall accuracy and reliability when compared to the LSTM model.

The Efficient Linear Support Vector Machine (ELSVM) model exhibited the lowest performance among the three models. Its corrected rainfall predictions show significant discrepancies from the observed rainfall, particularly during high-intensity events. Although the model captures some general trends, its linear nature limits its ability to represent the non-linear and highly variable behavior of rainfall. This shortcoming results in the model inability to forecast extreme rainfall with precision, leading to higher error margins. Furthermore, the ELSVM model struggles to maintain consistency in capturing daily rainfall variability, making it less effective for handling the complexities of rainfall dynamics. As a result, its reliability for daily rainfall forecasting is notably lower compared to the other models shown in Appendix C, Fig. C.3(c).

The analysis of reveals that each model, LSTM, Exponential GPR, and ELSVM which has unique strengths and weaknesses when applied to daily rainfall correction. All three models perform well at capturing moderate rainfall events, effectively aligning corrected rainfall with observed values under typical daily conditions. However, when it comes to extreme rainfall, each model encounters specific challenges. The LSTM model provides consistent performance across regular rainfall levels but exhibits occasional deviations for high-intensity events, suggesting that it may require further calibration to capture such extremes accurately. The Exponential GPR model, while adept at general trends, shows greater variability in extreme cases, indicating a need for enhanced robustness to handle sharp peaks. The ELSVM model, though reliable for moderate rainfall, also faces limitations in handling the nonlinear nature of intense rainfall events.

Each model is competent in correcting moderate daily rainfall data, the LSTM model appears to offer the most balanced performance overall, making it the most suitable choice for daily rainfall forecasting in this scenario. Its ability to closely track observed data under typical conditions, coupled with relatively fewer deviations compared to the other models, suggests that the LSTM model is the most reliable for capturing daily rainfall patterns. However, for applications requiring high accuracy in extreme rainfall forecasting, additional model adjustments or the incorporation of ensemble techniques may further enhance performance across all models.

For daily rainfall forecasting, incorporating data fusion techniques within LSTM networks has proven to be more effective than GPR and ELSVM. LSTM efficiently captures temporal relationships in rainfall data, achieving a remarkably low MSE of 0.01, which is substantially lower than Exponential GPR (240.46) and ELSVM (328.74). It also records the lowest RMSE of 0.09, highlighting its precision and robustness in prediction.

While R², MAE, MSE, and RMSE are the primary metrics used to evaluate the performance of the regression-based model and the AI model. The regression-based model demonstrates substantial correlation but higher error values (MAE = 8.98, MSE = 80.61, RMSE = 8.98) and a lower R² of 0.74, indicating less predictive accuracy shown in Table 1.

On the other hands, ELSVM achieves a slightly lower MAE of 7.74 compared to Exponential GPR (9.03), its higher MSE and RMSE values indicate a greater susceptibility to extreme deviations, affecting its reliability. LSTM attains the highest R² at 0.99, outperforming Exponential GPR (0.79) and ELSVM (0.78). This perfect R² value confirms that LSTM explains nearly all variations in observed rainfall data, further demonstrating its strong predictive capabilities.

Integrating data fusion within the LSTM framework enables seamless incorporation of past rainfall patterns, enhancing its ability to recognize complex trends and improve forecast accuracy. By minimizing errors while maintaining high explanatory power, LSTM stands out as the most reliable model for daily rainfall forecasting in Table 1.

Table 1 Daily scenario performance metrics.

Compared to regression-based corrections, the LSTM-based methodology in this work has substantial advantages. While traditional approaches frequently presume fixed statistical correlations and find it difficult to account for the intricacies of rainfall behavior, particularly in the context of changing climate conditions, rainfall behavior is intrinsically dynamic and non-linear. In comparison, the LSTM model is better able to represent complex rainfall behavior since it records long-term dependencies and is naturally adapted for sequential data. This comparison demonstrate how deep learning and data fusion techniques provide better prediction accuracy and robustness, even though conventional bias correcting techniques are still helpful baselines. The LSTM framework not only corrects for biases but also learns and evolves in response to data, making it a viable and adaptable solution for daily rainfall forecasting in complex and changing meteorological conditions.

3 day scenario with model performance

The Long Short -Term Memory (LSTM) model remains a powerful tool for rainfall forecasting, showcasing strong capabilities in capturing both short-term variations and long-term trends in Appendix C, Fig. C.4(a). Its recurrent neural network structure enables it to learn temporal dependencies, making it well-suited for sequential time-series data. By incorporating data fusion, the model effectively integrates multiple data sources, allowing it to adapt to different rainfall intensities, including extreme events that are often difficult to predict. This adaptability makes LSTM valuable for practical applications like flood risk management, where precise and timely predictions are crucial. However, while LSTM offers several advantages, results indicate that Exponential GPR delivered better overall performance, particularly in terms of aligning corrected rainfall estimates with actual observations.

The Exponential GPR model demonstrated higher accuracy than LSTM, particularly in handling both stable rainfall conditions and extreme events with greater precision. Its probabilistic approach provided more consistent predictions across various rainfall levels, reducing errors in corrected forecasts shown in Appendix C, Fig. C.4(b). Although LSTM is effective due to its capability to process sequential data and identify patterns over time, Exponential GPR showed superior numerical accuracy, indicating a more robust forecasting model. However, LSTM still has strengths in adapting to complex temporal relationships, making it a significant option despite Exponential GPR’s stronger numerical performance.

The Efficient Linear SVM (ELSVM) model demonstrated the weakest performance, particularly in forecasting moderate to heavy rainfall shown in Appendix C, Fig. C.4(c). Its linear methodology restricted its capacity to capture the non-linear characteristics of rainfall data, leading to predictions that frequently diverged from observed values. Despite incorporating data fusion techniques, the model encountered difficulties in accurately representing abrupt changes in rainfall patterns, rendering it less effective for precise forecasting. While Exponential GPR achieved the highest accuracy, the LSTM model remains a highly effective approach due to its capability to identify sequential dependencies and adapt to changing rainfall patterns, underscoring its significance in advanced forecasting applications.

The Exponential GPR model demonstrates superior predictive accuracy compared to the LSTM and Efficient Linear Support Vector Machine (ELSVM) models, particularly in minimizing overall error. The Exponential GPR model achieves the lowest MSE of 137.11 and the lowest RMSE of 11.71, indicating its precision and stability in predictions. Additionally, it records the highest R² at 0.89, explaining 89% of the variance in actual rainfall data. These findings underscore its effectiveness in modelling non-linear relationships and adapting to varying rainfall patterns, establishing it as the most reliable method for reducing prediction errors in Table 2.

While Exponential GPR demonstrates superior overall performance, the LSTM model remains highly competitive due to its ability to capture complex temporal dependencies in rainfall data. The LSTM model achieves a relatively low MSE of 179.06 and a RMSE of 13.38. Although these values are slightly higher than those of Exponential GPR, they still indicate strong predictive capabilities. Furthermore, the LSTM model records the lowest MAE at 5.20, significantly outperforming the MAE of 9.46 observed for Exponential GPR. This lower MAE value suggests that the LSTM produces predictions with smaller average deviations from the actual observed values, which is advantageous in scenarios where minimizing small-scale errors is crucial. However, the slightly lower R² value of 0.73 for the LSTM model indicates that it explains a smaller proportion of variance in the observed rainfall data compared to Exponential GPR.

The ELSVM model, while performing adequately, does not achieve the same level of accuracy as the Exponential GPR or LSTM models. With a MSE of 179.58 and a RMSE of 13.40, it demonstrates larger errors, especially in predicting higher rainfall intensities. Although it achieves an R² value of 0.79, which is relatively close to the other models, its higher RMSE indicates difficulty in handling extreme fluctuations in rainfall, resulting in less precise forecasts. Despite having a lower MAE of 6.59 compared to Exponential GPR, its higher overall error values suggest that it is less reliable for accurate long-term predictions.

Table 2 3 days scenario performance Metrics.

Weekly scenario with model performance

The weekly rainfall forecasting results demonstrate the effectiveness of the Exponential Gaussian Process Regression (GPR) model, as shown in the Appendix C, Fig. C.5(b). The corrected rainfall predictions closely match the actual observed values, indicating the model’s ability to capture weekly rainfall patterns accurately. The GPR model success is due to its probabilistic framework, which allows it to handle uncertainties in rainfall data effectively. By incorporating data fusion techniques, the model adapts to varying rainfall intensities. Although minor discrepancies are observed during extreme rainfall events, the overall performance of the GPR model is more consistent and precise compared to the other two models, making it a suitable choice for weekly rainfall forecasting.

The LSTM model, depicted in Appendix C, Fig. C.5(a), demonstrates a robust capability to identify trends in sequential rainfall data. As a recurrent neural network-based model, LSTM utilizes its memory mechanisms to capture temporal dependencies within the dataset. While the model performs effectively in predicting general rainfall trends, minor inconsistencies emerge during abrupt transitions in rainfall intensity. The integration of data fusion techniques enhances the model accuracy; however, LSTM exhibits greater variability in its forecasts compared to the GPR model. Although it provides reasonable predictions for moderate rainfall conditions, its performance slightly diminishes when handling extreme precipitation, rendering it less reliable than GPR for weekly forecasting.

The Efficient Linear Support Vector Machine (ELSVM) model, shown in the Appendix C, Fig. C.5(c), demonstrates the lowest accuracy among the three models analyzed. The corrected rainfall estimates often deviate from the observed values, especially during periods of high rainfall intensity. The linear nature of the model restricts its capability to capture the non-linear characteristics inherent in rainfall, leading to less accurate predictions. Despite the application of data fusion techniques to enhance performance, ELSVM faces challenges in adapting to complex and variable weather patterns. While the model can still offer valuable insights for general forecasting purposes, its limitations render it less effective compared to GPR and LSTM models, particularly in situations requiring high adaptability and precision.

The ELSVM model achieves an MSE of 77.83 and an RMSE of 8.82, which are the lowest among the three models. These values indicate that ELSVM minimizes large deviations from actual rainfall observations. The GPR model records a higher MSE of 95.70 and an RMSE of 9.78, while the LSTM model shows an MSE of 60.19 but a slightly lower RMSE of 7.76. This suggests that while LSTM has the lowest overall RMSE, ELSVM provides more consistent performance across different rainfall intensities.

ELSVM also achieves an R² value of 0.90, indicating it explains 90% of the variance in weekly rainfall. The GPR model follows with an R² value of 0.88, while LSTM records an R² of 0.78, suggesting that all three models demonstrate strong predictive capabilities, with ELSVM offering a comprehensive fit to the actual data.

In terms of MAE, LSTM records a value of 3.71, suggesting its predictions have the smallest average deviation from actual rainfall measurements. However, ELSVM follows with an MAE of 7.38, which is lower than the GPR model (8.09), reinforcing its accuracy while handling varying rainfall conditions.

The performance of ELSVM can be attributed to its ability to capture nonlinear rainfall patterns while maintaining computational efficiency shown in Table 3. The model’s optimization of hyperplanes ensures it minimizes prediction errors while maintaining high generalization across different forecasting periods. Given its lower error metrics and high predictive accuracy, ELSVM is a reliable model for weekly rainfall forecasting, outperforming both LSTM and GPR in overall predictive strength.

Table 3 Weekly scenario performance metrics.

Daily scenario scatter plot

Among the three models, the LSTM model presents the highest accuracy with an R² value of 0.99, as shown in Fig. 7. The data points in the scatter plot align closely with the trend line, particularly for low to moderate rainfall values (0–100 mm), indicating that the LSTM model tracks day-to-day rainfall trends effectively. Although some divergence is observed at higher rainfall intensities, the model maintains consistent performance across various rainfall levels. The LSTM model’s ability to process sequential data allows it to recognize rainfall patterns, making it suitable for short-term forecasting. Despite a slight smoothing effect that may reduce sensitivity to extreme rainfall, the LSTM model aligns well with actual observations for daily rainfall forecasting.

The Exponential GPR model, with an R² value of 0.79, shows slightly lower predictive capability, as depicted in Fig. 8. The model performs adequately for moderate rainfall values, with many points clustering around the trend line. However, as rainfall intensity increases, the data points become more dispersed, indicating reduced accuracy in predicting extreme rainfall events. This dispersion suggests that the Exponential GPR model, which relies on probabilistic distributions, tends to smooth out fluctuations, making it less responsive to sharp rainfall variations.

The ELSVM model, with an R² value of 0.78, exhibits the lowest performance among the three models, as illustrated in Fig. 9. While it provides reliable predictions for lower rainfall values, its accuracy decreases for higher rainfall intensities, where the scatter plot reveals a wider spread of data points. This indicates that ELSVM, despite its efficiency in handling linear relationships, may struggle to capture the nonlinear complexities of daily rainfall variations. Its predictive capability remains strong but is less effective in modelling the intricate fluctuations associated with extreme rainfall. Overall, the LSTM model appears to be the most suitable choice for daily rainfall forecasting, offering the highest predictive accuracy. Its ability to capture both short-term rainfall trends and moderate-intensity variations ensures reliable forecasting, making it a highly effective model for daily rainfall predictions.

Fig. 7
figure 7

Daily comparison between corrected rainfall and actual rainfall LSTM model scatter plot.

Fig. 8
figure 8

Daily comparison between corrected rainfall and actual rainfall exponential GPR scatter plot.

Fig. 9
figure 9

Daily comparison between corrected rainfall and actual rainfall efficient linear SVM model scatter plot.

3 days scenario scatter plot

The Exponential GPR model demonstrates a strong performance among the three models, achieving an R² value of 0.90, as shown in Fig. 11. The scatter plot reveals a consistent alignment between predicted and actual rainfall values, particularly for moderate rainfall levels. The model captures rainfall trends while maintaining consistency across different rainfall intensities. Some deviations appear at higher rainfall values, likely due to the model reliance on statistical smoothing, which may reduce sensitivity to extreme fluctuations. However, its ability to generalize well across varying conditions makes it a reliable choice for 3-day rainfall forecasting.

The LSTM model also shows robust performance, with an R² value of 0.73, as shown in Fig. 10. The data points are densely clustered around the trend line, especially in the lower rainfall range (0–100 mm), indicating strong accuracy for moderate rainfall levels. While slight deviations occur at higher rainfall amounts, this can be attributed to the LSTM model’s sequential learning approach, which sometimes smooths out extreme values. Despite this, the model remains effective and has the potential to improve further, particularly with adjustments to better capture high-intensity rainfall events.

The ELSVM model, with an R² value of 0.80, shows a lower performance compared to the other two models, as illustrated in Fig. 12. The scatter plot indicates a correlation between predicted and actual values, but the data points exhibit greater variability, especially at higher rainfall levels. This suggests that while ELSVM is effective for capturing general trends, it struggles to account for more complex, nonlinear rainfall patterns over a 3-day period.

Overall, the Exponential GPR model appears to be the most suitable choice for 3-day rainfall forecasting due to its accuracy and alignment with observed data. However, the LSTM model remains a viable alternative, with the potential to enhance its performance through refinements in handling extreme rainfall events.

Fig. 10
figure 10

3 Days comparison between corrected rainfall and actual rainfall LSTM model scatter plot.

Fig. 11
figure 11

3 Days comparison between corrected rainfall and actual rainfall exponential GPR model scatter plot.

Fig. 12
figure 12

3 Days comparison between corrected rainfall and actual rainfall ELSVM model scatter plot.

Weekly scenario scatter plot

Among the three models analyzed, the Efficient Linear SVM (ELSVM) model, as depicted in Fig. 15, demonstrates the highest performance with an R² value of 0.90. The data points closely align along the trend line, indicating that ELSVM effectively captures variations in actual rainfall. Even at elevated rainfall levels, the model maintains strong predictive accuracy with minimal deviation, making it the most suitable option for weekly rainfall forecasting. Its robust generalization across different rainfall intensities ensures stable and reliable predictions, particularly in capturing both moderate and extreme rainfall values.

The Exponential Gaussian Process Regression (GPR) model, shown in Fig. 14, also exhibits commendable performance with an R² value of 0.88. The scatter plot reveals a strong alignment between predicted and actual values, especially within the moderate rainfall range. However, slight deviations occur at higher rainfall levels, potentially due to the model’s statistical smoothing effect, which may limit its responsiveness to extreme fluctuations. While the Exponential GPR remains a competitive alternative with high accuracy, its tendency to smooth data might reduce precision for extreme rainfall events.

The Long Short-Term Memory (LSTM) model, illustrated in Fig. 13, achieves the lowest R² value at 0.78. The scatter plot indicates a positive correlation between predicted and actual rainfall; however, the broader distribution of points around the trend line suggests some limitations in capturing complex rainfall patterns. The model performs adequately in moderate rainfall scenarios but struggles with higher values, likely due to the sequential learning approach that may smooth out extreme variations. Nevertheless, with further optimization, LSTM could enhance its ability to capture more complex rainfall patterns over extended periods.

In conclusion, the Efficient Linear SVM model emerges as the best performer for weekly rainfall forecasting, demonstrating the highest accuracy and strongest alignment with observed data. Although the Exponential GPR provides reliable results, the ELSVM offers superior predictive capability, making it the most suitable choice for applications requiring precise weekly rainfall forecasts.

Fig. 13
figure 13

Weekly comparison between corrected rainfall and actual rainfall LSTM model scatter plot.

Fig. 14
figure 14

Weekly comparison between corrected rainfall and actual rainfall exponential GPR model scatter plot.

Fig. 15
figure 15

Weekly comparison between corrected rainfall and actual rainfall ELSVM model scatter plot.

Although the proposed AI data fusion techniques utilized in the proposed methodology to forecast rainfall over various timescales showed high level of accuracy, several limitations should be highlighted. Particularly LSTM and Exponential GPR, perform adequately in moderate rainfall conditions but face challenges during heavy rainfall events due to high unpredictability. Additionally, dependence on historical rainfall data and climate change projections introduces uncertainty, as missing or inconsistent data can cause biases. Climate projections rely on assumptions that may not always align with actual environmental changes, impacting prediction accuracy. The variability of local weather patterns and topography makes it difficult to generalize the models across different regions. Computational complexity is another challenge, as LSTM requires significant computing power, complicating real-time applications. Discrepancies between observed and predicted climatic data introduce uncertainties in data fusion, necessitating further refinement to ensure reliability. Addressing these limitations with improved model calibration and adaptive learning methodologies can enhance prediction accuracy. This information can be included in the discussion or conclusion section, depending on where the limitations are best contextualized within the paper’s main findings.



Source link