Aini, Q. et al. Factors that contribute to air pollution in Malaysia. Malaysian J. Bus. Econ. 8, 43–58 (2023).
Gulati, S. et al. Estimating PM2.5 utilizing multiple linear regression and ANN techniques. Sci Rep. 13, 22578 (2023).
Rani, N. L. A., Azid, A., Khalit, S. I., Juahir, H. & Samsudin, M. S. Air pollution index trend analysis in Malaysia, 2010-15. Pol. J. Environ. Stud. 27, 801–808 (2018).
Muhammad, M., Ul-Saufie, Z. & Radi, A. A. Evaluating the Performance of Tree-Based Model in Predicting Haze Events in Malaysia. International Journal of Advanced Computer Science and Applications. 16, 1127-1135 (2025).
He, Z., Guo, Q., Wang, Z. & Li, X. A. Hybrid Wavelet-Based Deep Learning Model for Accurate Prediction of Daily Surface PM2.5 Concentrations in Guangzhou City. Toxics. 13, (2025).
Gupta, N. S. et al. Prediction of air quality index using machine learning techniques: A comparative analysis. J. Environ. Public. Health. 2023, 1–26 (2023).
Martins, L. D. et al. Extreme value analysis of air pollution data and their comparison between two large urban regions of South America. Weather Clim. Extrem. 18, 44–54 (2017).
Ribeiro, R. P. & Moniz, N. Imbalanced regression and extreme value prediction. Mach. Learn. 109, 1803–1835 (2020).
Jafarigol, E. & Trafalis, T. B. A Review of Machine Learning Techniques in Imbalanced Data and Future Trends. (2023).
Jaffe, D. A. et al. Wildfire and prescribed burning impacts on air quality in the United States. Journal of the Air and Waste Management Association 70, 583–615 https://doi.org/10.1080/10962247.2020.1749731Preprint at (2020).
Noor, N. M., Deak, G., Ul-Saufie, A. Z., Mohd, Z. & Rozainy, R. Modeling of Particulate Matter (PM10) during High Particulate Event (HPE) in Klang Valley, Malaysia. www.ijcs.ro (2022).
Branco, P., Ribeiro, R. P., Torgo, L., Krawczyk, B. & Moniz, N. SMOGN: a Pre-processing Approach for Imbalanced Regression. in Proceedings of Machine Learning Research. 74, 36–50 (2017).
Torgo, L., Ribeiro, R. P., Pfahringer, B. & Branco, P. SMOTE for regression. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8154 LNAI 378–389 (2013).
Avelino, J. G., Cavalcanti, G. D. C. & Cruz, R. M. O. Resampling strategies for imbalanced regression: a survey and empirical analysis. Artif Intell. Rev. 57, 1-42 (2024).
Wang, C., Deng, C., Wang, S. & Imbalance-XGBoost Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost. http://arxiv.org/abs/1908.01672 (2019).
Liu, X. & Tian, H. Research on Imbalanced Data Regression Based on Confrontation. Processes. 12, (2024).
Branco, P., Torgo, L. & Ribeiro, R. P. Pre-processing approaches for imbalanced distributions in regression. Neurocomputing 343, 76–99 (2019).
Branco, P. et al. REBAGG: Resampled Bagging for Imbalanced Regression. in Proceedings of Machine Learning Research. 94 67–81 (2018).
Moniz, N., Ribeiro, R., Cerqueira, V. & Chawla, N. SMOTEBoost for regression: Improving the prediction of extreme values. in Proceedings – 2018 IEEE 5th International Conference on Data Science and Advanced Analytics, DSAA 150–159 (Institute of Electrical and Electronics Engineers Inc., 2018). 150–159 (Institute of Electrical and Electronics Engineers Inc., 2018) https://doi.org/10.1109/DSAA.2018.00025 (2018).
Silva, A., Ribeiro, R. P. & Moniz, N. Model Optimization in Imbalanced Regression. in International Conference on Discovery Science https://doi.org/10.48550/arXiv.2206.09991 (2022).
Felix, E. A. & Lee, S. P. Systematic literature review of preprocessing techniques for imbalanced data. IET Software 13, 479–496 https://doi.org/10.1049/iet-sen.2018.5193 Preprint at (2019).
Torgo, L. & Ribeiro, R. Predicting rare extreme values. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3918 LNAI 816–820 (2006).
Fonseca, J. & Bacao, F. Geometric SMOTE for imbalanced datasets with nominal and continuous features. Expert Syst. Appl. 234, 121053 (2023).
Yang, Y., Zha, K., Chen, Y. C., Wang, H. & Katabi, D. Delving into Deep Imbalanced Regression. inInternational Conference on Machine Learning. https://doi.org/10.48550/arXiv.2102.09554 (2021).
Kuffner, T. A., Lee, S. M. S. & Young, G. A. Block bootstrap optimality and empirical block selection for sample quantiles with dependent data. Biometrika Trust. 103, 1–18 (2018).
Radovanov, B. & Marcikic, A. A comparison of four different block bootstrap methods. Croatian Oper. Res. Rev. 5, 189–202 (2014).
Burhanuddin, D. Shaadan. Controlled sampling approach in improving multiple imputation for missing seasonal rainfall data. Res. Sq. https://doi.org/10.21203/rs.3.rs-679692/v1 (2021).
Mader, M., Mader, W., Sommerlade, L., Timmer, J. & Schelter, B. Block-bootstrapping for noisy data. J. Neurosci. Methods. 219, 285–291 (2013).
Ebtehaj, M., Moradkhani, H. & Gupta, H. V. Improving robustness of hydrologic parameter Estimation by the use of moving block bootstrap resampling. Water Resour. Res 46, W07515 (2010).
Vogel, R. M. The Moving Blocks Bootstrap versus Parametric Time Series Models (1996).
Chen, X., Gupta, L. & Tragoudas, S. Improving the forecasting and classification of extreme events in imbalanced time series through block resampling in the joint Predictor-Forecast space. IEEE Access. 10, 121048–121079 (2022).
Torgo, L. & Ribeiro, R. LNAI 4702 – Utility-Based Regression (in (Springer, 2007). https://doi.org/10.1007/978-3-540-74976-9_63
Ayus, I., Natarajan, N. & Gupta, D. Comparison of machine learning and deep learning techniques for the prediction of air pollution: a case study from China. Asian J. Atmospheric Environment. 17, 48732–48745 (2023).
Dao, T. H. et al. PTI,. Analysis and Prediction for Air Quality Using Various Machine Learning Models. in Proceedings of the Seventh International Conference on Research in Intelligent and Computing in Engineering. 33 89–94 (2023).
Verma, A., Ranga, V. & Vishwakarma, D. K. Combating Respiratory Health Issues with Intelligent NO2 Level Prediction from Sentinel 5P Satellite. in IEEE 20th India Council International Conference, INDICON 2023 882–886 (Institute of Electrical and Electronics Engineers Inc., 2023). 882–886 (Institute of Electrical and Electronics Engineers Inc., 2023) https://doi.org/10.1109/INDICON59947.2023.10440910 (2023).
Azid, A. et al. Prediction of the level of air pollution using principal component analysis and artificial neural network techniques: A case study in Malaysia. Water Air Soil. Pollut. 225, 1-14 (2014).
Guo, Q., He, Z. & Wang, Z. The characteristics of air quality changes in Hohhot City in China and their relationship with meteorological and Socio-economic factors. Aerosol Air Qual. Res. 24, 230274 (2024).
Kumar, A. S. et al. wiley,. Recent Developments of Bioethanol Production. in Bioenergy Research: Evaluating Strategies for Commercialization and Sustainability 175–208 (2021). https://doi.org/10.1002/9781119772125.ch9
Syed, A. et al. Spatial and Temporal air quality pattern recognition using environmetric techniques: A case study in Malaysia. Environ. Sciences: Processes Impacts. 15, 1717–1728 (2013).
Latif, M. T. et al. Long term assessment of air quality from a background station on the Malaysian Peninsula. Sci. Total Environ. 482–483, 336–348 (2014).
Khadijah Arafin, S., Ul-Saufie, Z., Azura Md Ghani, A., Ibrahim, N. & Alam, S. N. Feature selection methods using RBFNN based on enhance air quality prediction: insights from Shah Alam. IJACSA) Int. J. Adv. Comput. Sci. Applications. 15, 509-514 (2024).
Noor, N. M., Bakri Abdullah, A., Yahaya, M. M. & Ramli, N. A. A. S.Trans Tech Publications Ltd,. Comparison of linear interpolation method and mean method to replace the missing values in environmental data set. in Materials Science Forum. 803, 278–281 (2015).
Libasin, Z., Ul-Saufie, Z. & Hasfazilah, A. A. identifying missing data mechanisms among incomplete air pollution datasets in Malaysia. https://doi.org/https://doi.org/10.1007/978-3-031-43922-3_18 doi:https://doi.org/10.1007/978-3-031-43922-3_18. (2024).
Malaysian Meteorological Department. Malaysia’s climate. Malaysian Meteorological Department (2025).
Srivastava, C., Singh, S. & Singh, A. P. Estimation of air pollution in Delhi using machine learning techniques. in. International Conference on Computing, Power and Communication Technologies, GUCON 2018 304–309 (Institute of Electrical and Electronics Engineers Inc., 2019) https://doi.org/10.1109/GUCON.2018.8675022 (2018).
Kumar, A. & Goyal, P. Forecasting of air quality index in Delhi using neural network based on principal component analysis. Pure Appl. Geophys. 170, 711–722 (2013).
Ditrich, J. Data representativeness problem in credit scoring. ACTA OECONOMICA PRAGENSIA. 23, 1-17 (2015).
Verma, A., Ranga, V. & Vishwakarma, D. K. Forecasting of Satellite Based Carbon-Monoxide Time-Series Data Using a Deep Learning Approach. in International Conference on Innovative Trends in Information Technology, ICITIIT 2023 (Institute of Electrical and Electronics Engineers Inc., 2023). (Institute of Electrical and Electronics Engineers Inc., 2023) https://doi.org/10.1109/ICITIIT57246.2023.10068609 2023).
Crone, S. F., Lessmann, S. & Stahlbock, R. Utility based data mining for time series analysis – Cost-sensitive learning for neural network predictors. in Proceedings of the 1st International Workshop on Utility-Based Data Mining, UBDM ’05 59–68 https://doi.org/10.1145/1089827.1089835 (2005).
Moniz, N., Branco, P., Torgo, L. & Krawczyk, B. Evaluation of Ensemble Methods in Imbalanced Regression Tasks. Proceedings of Machine Learning Research 74 http://www.kdd.org/kdd-cup (2017).
Mignani, S. & Rosa, R. The moving block bootstrap to assess the accuracy of statistical estimates in Ising model simulations. Computer Phys. Communications. 92, 203-213 (1995).
Sroka, Ł. Applying block bootstrap methods in silver prices forecasting. Econometrics 26, 15–29 (2022).
Radovanov, B. & Marcikic, A. Testing the performance of the investment portfolio using block bootstrap method. Economic Themes. 52, 166–183 (2014).
Martínez-Munoz, G., Bentejac, C. & Csorg, O. B. Gonzalo Martínez-Munoz, A. A Comparative Analysis of XGBoost. https://www.researchgate.net/publication/337048557 (2019).
Shahani, N. M., Zheng, X., Liu, C., Hassan, F. U. & Li, P. Developing an XGBoost regression model for predicting young’s modulus of intact sedimentary rocks for the stability of surface and subsurface structures. Front Earth Sci. (Lausanne) 9, 761990 (2021).
Jing, H. & Wang, Y. Research on Urban Air Quality Prediction Based on Ensemble Learning of XGBoost. in E3S Web of Conferences 165EDP Sciences, (2020).
Kumar, K. & Pande, B. P. Air pollution prediction with machine learning: a case study of Indian cities. Int. J. Environ. Sci. Technol. 20, 5333–5348 (2023).
Nguyen, A. T., Pham, D. H., Oo, B. L., Ahn, Y. & Lim, B. T. H. Predicting air quality index using attention hybrid deep learning and quantum-inspired particle swarm optimization. J Big Data. 11, 49-58(2024).
Chen, T., Guestrin, C. & XGBoost: A Scalable Tree Boosting System. https://doi.org/10.1145/2939672.2939785 doi:10.1145/2939672.2939785 (2016).
Mienye, I. D., Sun, Y. A. & Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access 10, 99129–99149 https://doi.org/10.1109/ACCESS.2022.3207287 Preprint at (2022).
Pan, B. Institute of Physics Publishing,. Application of XGBoost algorithm in hourly PM2.5 concentration prediction. in IOP Conference Series: Earth and Environmental Science. 113 (2018).
Abdullah, S., Ismail, M., Ahmed, A. N. & Abdullah, A. M. Forecasting particulate matter concentration using linear and non-linear approaches for air quality decision support. Atmosphere (Basel). 10, 1-24 (2019).
Shaziayani, W. N., Ul-Saufie, Z., Ahmat, H. & Al-Jumeily, D. Ahmad, Coupling of quantile regression into boosted regression trees (BRT) technique in forecasting emission model of PM 10 concentration. https://doi.org/10.1007/s11869-021-01045-3/Published (2021).
DOE. Department of Environment:Malaysia Quality Report 2016. Kuala Lumpur: Ministry of Energy, Science, Technology, Environment and Climate Change, Malaysia. (2016).
Mohd Shafie, S. H. et al. Influence of urban air pollution on the population in the Klang Valley, malaysia: a Spatial approach. Ecol Process. 11, 1-16 (2022).
DOE. Department of Environment:Malaysia Environmental Quality Report. Kuala Lumpur: Ministry of Energy, Science, Technology, Environment and Climate Change, Malaysia. (2021). (2021).
Rahim, N. A. A. A. et al. Institute of Physics,. Predicting Particulate Matter (PM10) during High Particulate Event (HPE) using Quantile Regression in Klang Valley, Malaysia. in IOP Conference Series: Earth and Environmental Science. 1216 (2023).
Ren, J., Zhang, M., Yu, C. & Liu, Z. Balanced MSE for Imbalanced Visual Regression. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition vols -June 2022 7916–7925 (IEEE Computer Society, 2022).
