Under the continuous impetus and leadership of China’s informatization and digitization, intelligent construction and smart tunneling are inevitably becoming the mainstream development direction in the tunnel engineering field (
Currently, AI technologies based on big data are increasingly being introduced into tunnel engineering to assist TBM construction. Key research focuses include TBM efficiency optimization, intelligent perception of surrounding rock grades, tunneling performance prediction, and adverse geological condition forecasting (
In summary, this study is based on the Luotian Reservoir-Tiegang Reservoir Water Diversion Tunnel Project (hereafter referred to as the Luotie Project), utilizing comprehensive geological data and tunneling parameters from its TBM1. First, feature parameters were selected by integrating data-driven and knowledge-driven criteria according to data characteristics and prediction targets. The raw data were then cleaned using box plots and partitioned into training (70%), testing (20%), and validation (10%) sets. Subsequently, three data imbalance mitigation strategies were applied to construct surrounding rock classification models using XGBoost, RF, CatBoost, and LightGBM algorithms. This research enables rapid, efficient, and intelligent perception of surrounding rock grades, guiding shield drivers to make parameter adjustments, ensuring safer and more efficient tunneling.
Invalid data were removed from the collected tunneling parameters by retaining only non-zero entries, i.e., any data row containing a zero value in any parameter was deleted according to the criterion defined in
Kernel Density Curves of Different Feature Parameters under Corresponding surrounding rock Grades:
Pearson correlation analysis was first conducted to assess the linear relationships between all tunneling parameters and surrounding rock classification. The objective of this analysis was to quantify the strength of correlation between each parameter and the surrounding rock grade, facilitating the identification of the most relevant features. Based on the results of this analysis, and employing a hybrid “knowledge-driven” and “data-driven” approach, eight critical feature parameters were selected: Thrust, Rotation Speed (RS), Torque, Penetration Rate (PR), 1# Middle Shield Retracting Gripper Shoe Pressure (1#SSP), 2# Middle Shield Retracting Gripper Shoe Pressure (2#SSP), Foam Pressure (1#FP), and Surrounding Rock Classification (SR). The Pearson correlation coefficients for these parameters are presented in
Pearson correlation analysis results of tunneling parameters.
Based on the selected feature parameters and processed data, modeling is performed using the following three methods:
Method (1): Using raw data with default hyperparameters and no additional processing; Method (2): Applying hyperparameter optimization to the data; Method (3): Combining hyperparameter optimization with SMOTE for data processing. The SMOTE oversampling method was specifically employed to address severe data imbalance issues across Grades III, IV, and V surrounding rocks.
Surrounding rock grade classification models were developed using XGBoost, CatBoost, RF, and LightGBM algorithms based on these three modeling methods. Model performance was evaluated using key metrics—Precision, Recall, and F1_score—to assess the effectiveness of each strategy. The formulas for these metrics are defined in
In the equations, TP denotes the number of true positive samples (correctly predicted positive instances), FP represents false positive samples (negative instances incorrectly predicted as positive), and FN indicates false negative samples (positive instances incorrectly predicted as negative).
The Extreme Gradient Boosting (XGBoost) algorithm, proposed by Chen and Guestrin et al. (
Due to its efficiency in processing large-scale data and complex models, as well as its robustness against overfitting, XGBoost has gained widespread attention and application since its inception.
Random Forest (RF), proposed by Breiman in 2001 (
The CatBoost algorithm, proposed by
Compared to XGBoost, CatBoost exhibits the following advantages. 1. Higher Model Accuracy: CatBoost often achieves high precision without requiring extensive hyperparameter tuning. 2. Faster Training Speed: Outperforms XGBoost in training efficiency. 3. Superior Prediction Speed: Delivers significantly faster inference times than XGBoost. 4. Lower Memory Consumption: Requires less memory usage on computational hardware. 5. Native Categorical Feature Support: Unlike XGBoost, which relies on OneHot encoding for categorical features, CatBoost directly handles string-type categorical features without preprocessing.
In the equations,
represents the learning rate at the
is the average learning rate from the previous iteration.
Light Gradient Boosting Machine (LightGBM), proposed by
Surrounding rock classification models were developed using XGBoost, CatBoost, RF, and LightGBM algorithms. First, eight TBM tunneling parameters—Thrust, Rotation Speed (RS), Torque, Penetration Rate (PR), 1#SSP, 2#SSP, 1#FP, and SR—were selected through a hybrid “data-driven” and “knowledge-driven” approach. The data were cleaned according to
Before model training, the training set was standardized using
In the equation,
represents the standardized data,
Surrounding rock classification models under three data processing methods were established using XGBoost, CatBoost, RF, and LightGBM ensemble learning algorithms. The prediction results of these models were evaluated using three key metrics: Precision, Recall, and F1_score, as shown in
Evaluation metrics for intelligent diagnostic models:
The hyperparameter optimization results of the intelligent diagnostic model.
Model type | XGBoost | RF | CatBoost | LightGBM |
---|---|---|---|---|
Unbalanced data handing |
|
|
|
|
Unbalanced data handing |
|
|
|
|
Performance analysis revealed varying degrees of classification confusion among all models using raw data, as demonstrated by the confusion matrix of the validation set in
Classification prediction results with default raw parameters:
After hyperparameter optimization, the confusion matrices of different models on the validation set are shown in
Classification prediction results with hyperparameter optimization:
When combining SMOTE with hyperparameter optimization, the classification prediction results of the validation set confusion matrix are shown in
Classification prediction results with SMOTE and hyperparameter optimization:
Therefore, through comprehensive consideration of prediction accuracy, the CatBoost model integrated with SMOTE and hyperparameter optimization was selected as the optimal intelligent diagnostic model in this study. Without further validating the performance curves of the CatBoost model, its convergence curve was plotted as shown in
Convergence curve of CatBoost model.