Data preparation

The data used in this paper were collected from subgrade settlement monitoring of the No.1 Subway in Xi’an from Qin Du Station to Bao Quan Road Station. This subway requires digging tunnels underneath two railway lines, including the Xu Lan High-Speed Railway and the Long Hai Railway. We researched the settlement data of two real datasets, Xu Lan High-Speed Railway and Long Hai Railway, from July 2022 to March 2025, with a monitoring period of 1005 days and a acquisition cycle of hours, totaling 24120 data points. The train set consists of 14472 data points, the test set consists of 4824 data points, and the validation set consists of 4824 data points.

Monitoring data were collected through IoT smart remote control terminals and transmitted wirelessly in real-time to the monitoring data platform. Initial deformation values were transmitted to the platform via acquisition software, and subsequent monitoring values were collected in real-time. Due to the strong continuity of the dataset in this paper, the polynomial interpolation method is used to infer reasonable missing values using existing data, thereby reducing information loss.Table 1presents the high-speed railway subgrades settlement warning levels based on settlement distance and settlement velocity.

Table 1 High-speed railway subgrades settlement warning levels.

Based on long-term industry background verification, the settlement warning level for high-speed railway subgrades is determined by two main factors: settlement distance and settlement velocity. Table 2 presents partial settlement data in the Xu Lan high-speed railway,which include the data of settlement distance, distance warning level, settlement velocity, velocity warning level and the comprehensive warning level. The comprehensive warning level is the final result of settlement warning for high-speed railway subgrades. In the Table 2, the data of settlement distance and settlement velocity are monitored through IoT smart remote control terminals. The distance warning level and velocity warning level are obtained by the data of Table 1. The comprehensive warning level is labeled by engineers in actual scenarios.

Table 2 Settlement data in the Xu Lan high-speed railway.

When using the TD Transformer model to implement warning tasks, the data in Table 2 is used as the training set for the model. The input features primarily consist of settlement distance values, settlement velocity, distance warning level, velocity warning level and comprehensive warning levels. The model outputs include predicted settlement distance values, settlement velocity, and their associated comprehensive warning levels.The common evaluation metrics are Accuracy, Precision, Recall, and F1-Score. These metrics assess the model’s performance from different perspectives:

Accuracy: Represents the proportion of correctly predicted samples out of the total number of samples. High accuracy indicates that the model performs well overall; however, it may be biased in datasets with class imbalances.

$$\begin{aligned} \text{ Accuracy } =\frac{T P+T N}{N} \end{aligned}$$

(4)

Precision: Measures the accuracy of the model in predicting positive samples, i.e., the proportion of true positive samples among all samples predicted as positive.

$$\begin{aligned} \text{ Precision } =\frac{T N}{F P+F P} \end{aligned}$$

(5)

Recall: Measures the model’s ability to identify positive samples, i.e., the proportion of true positive samples correctly predicted among all actual positive samples. High recall indicates that the model has few false negatives.

$$\begin{aligned} \text{ Recall } =\frac{T P}{T P+F N} \end{aligned}$$

(6)

F1-Score: The harmonic mean of Precision and Recall, it considers both false positives and false negatives. A high F1-Score indicates that the model performs more reliably when dealing with class imbalance.

$$\begin{aligned} F 1=\frac{2 \times \text{ Precision } \times \text{ Recall } }{ \text{ Precision } + \text{ Recall } } \end{aligned}$$

(7)

Ablation studies

Result of TSEA

To validate the effectiveness of the proposed method, we conducted relevant experiments under ablation conditions. By comparing TSEA and DGTA with other attention mechanisms, we obtained experimental results that further demonstrate the superiority of our attention mechanism. As shown in Table 3 , we compared TSEA with other CNN-based attention mechanisms. The results indicate that TSEA effectively extracts temporal and spatial features from ground subsidence data, enhancing the model’s ability to capture subsidence changes.

Table 3 Results of ablation experiments on the Xu Lan High Speed Railway dataset.
Fig. 3
figure 3

Comparison of model training accuracy and loss.

Figure 3 demonstrates that all enhanced Transformer models achieve better training performance compared to the baseline. The Transformer + TSEA model stands out with the fastest convergence and the highest training accuracy, suggesting that the integration of Temporal-Spatial Enhanced Attention significantly improves the model’s learning efficiency. Although the Transformer + Channel Attention and Transformer + TCN models also show substantial reductions in training loss, their accuracy improvements are more gradual. The baseline Transformer, in contrast, converges more slowly and exhibits greater fluctuation in loss. Throughout training, none of the models show obvious signs of overfitting, as indicated by the steady increase in accuracy and consistent decline in loss. Among them, the TSEA-based model maintains strong performance and stability, indicating promising generalization capability.

Result of DGTA

In the experimental results shown in Table 4, we compared DGTA with other self-attention-based mechanisms such as Cross-Attention and Masked Multi-head Self-Attention (MMSA), verifying the performance improvement of the model with the introduction of DGTA. DGTA uses a dynamically weighted average feature representation that effectively captures the long-term dependencies in time series data, enhancing the model’s stability and accuracy in handling complex subsidence patterns. By dynamically adjusting the attention scores through learned parameters, the model achieves greater accuracy in predicting subsidence warning levels.

Table 4 Results of ablation experiments on the Xu Lan High Speed Railway dataset.

The results in Table 4 demonstrate the effectiveness of the proposed TD Transformer-based subsidence early warning method for high-speed railway embankments. The Transformer model performs well across various metrics, indicating its high predictive accuracy and stability in processing and analyzing ground subsidence data. From the table, it is evident that the proposed method’s effectiveness has been fully validated. The Transformer model’s strong performance on all indicators shows its capability in accurately predicting and analyzing ground subsidence data. Figure 4 presents the training accuracy and loss of the baseline Transformer and its improved variants. All enhanced models demonstrate faster convergence compared to the baseline, with the Transformer + DGTA achieving the highest accuracy and the lowest, most stable loss throughout training. This suggests that the Dynamic Global Temporal Attention (DGTA) mechanism not only accelerates learning but also enhances model robustness. The Transformer + Cross-Attention and Transformer + MMSA models also contribute to performance gains, showing clear reductions in training loss, though their accuracy improvement is more gradual. Importantly, none of the models exhibit signs of overfitting, as the loss continues to decline steadily and no sharp fluctuations appear in later epochs, indicating good generalization behavior during training.

Fig. 4
figure 4

Comparison of model training accuracy and loss.

Comprehensive experiment

Table 5 shows that the TSEA mechanism effectively resolves feature blurring in high-speed railway subsidence data extraction. The Transformer + TSEA model achieves 93.10% accuracy, 92.85% precision, 93.05% recall, and 92.97% F1-score, with 87.1M parameters and 720 samples/s inference speed. Compared to the baseline Transformer, TSEA improves performance with only a 0.8% parameter increase and a minimal 7% speed reduction. The DGTA module further enhances long-term dependency modeling, boosting accuracy to 93.20%, precision to 93.00%, recall to 93.10%, and F1-score to 93.05%, while using 89.3M parameters and maintaining 650 samples/s throughput. This demonstrates that DGTA’s added complexity is justified by its performance gains. Our TD Transformer, combining both mechanisms, achieves the best results: 93.39% accuracy, 93.10% precision, 93.40% recall, and 93.24% F1-score. With 90.2M parameters and 520 samples/s speed, it balances computational cost and accuracy, improving over the baseline by 1.24 percentage points while maintaining efficient inference. These results confirm that TSEA and DGTA synergistically enhance the Transformer for subsidence early warning, with accuracy gains outweighing computational overhead. The TD Transformer’s optimal performance makes it suitable for real-world deployment in railway monitoring systems.

Table 5 Experimental results of ablation on the Xu Lan High Speed Railway dataset.
Table 6 Experimental results of ablation on the Long Hai High Speed Railway dataset.

Table 6 presents the ablation results on the Long Hai High-Speed Railway dataset, demonstrating the effectiveness of the proposed TSEA and DGTA modules. The baseline Transformer model achieves 91.67% accuracy, 91.20% precision, 91.50% recall, and a 91.35 F1-score with 86.4M parameters while processing 810 samples per second. Integrating the TSEA mechanism improves performance to 92.48% accuracy, 92.10% precision, 92.30% recall, and a 92.20 F1-score using 87.1M parameters at 690 samples per second, confirming its ability to refine local spatial and temporal representations in noisy railway subsidence data. The addition of the DGTA module further enhances results to 92.95% accuracy, 92.60% precision, 92.80% recall, and a 92.70 F1-score with 89.3M parameters running at 620 samples per second, demonstrating superior capability in capturing long-term temporal dependencies for subsidence trend analysis. The complete TD Transformer model combining both modules achieves state-of-the-art performance with 93.35% accuracy, 93.10% precision, 93.20% recall, and a 93.1 F1-score using 90.2M parameters while maintaining efficient inference at 570 samples per second, establishing an optimal balance between feature refinement, temporal modeling, and computational efficiency for railway subsidence prediction tasks.

Sequential experiment

As shown in Table 7, The experimental results indicate that applying TSEA before DGTA yields relatively better evaluation metrics. Starting with TSEA may better capture dependencies and structural information, providing more detailed feature information, which then enhances the effectiveness of DGTA in dynamically capturing the characteristics of subsidence data. This sequential synergy optimizes the overall model performance.

Table 7 Results of sequential experiments on the Xu Lan High Speed Railway dataset.

Experimental result

This study establishes a high-speed railway subsidence early warning model using two indicators: Settlement Distance and Settlement Velocity. These indicators are jointly used to determine the subsidence warning level. The warning level for each indicator is calculated based on the warning range, and the overall warning level is computed by combining the weights and warning levels of each indicator. The experimental results of the TD Transformer algorithm are shown in Table 8 .

Table 8 TD transformer experimental results.

Table 8 reveals significant uncertainty in high-speed railway subsidence and its rate of change. The data indicate that regions with larger settlement values also correspond to higher settlement velocities, suggesting more pronounced subsidence changes in these areas. This is fully reflected in the subsidence early warning mechanism. The TD Transformer model not only effectively captures both short-term and long-term subsidence changes but also weights features based on the importance of different sensor data, thereby enhancing its ability to warn about subsidence conditions. Table 9 presents the experimental results of SVM, XGBoost, RF, Transformer, and TD Transformer across four evaluation metrics: Accuracy, Precision, Recall, and F1-Score.

Table 9 Experimental results with different models on the Xu Lan High Speed Railway dataset.

As observed from Table 9 and Fig. 5, the evaluation metrics of the TD Transformer model surpass those of the traditional Transformer model. The TD Transformer achieves an accuracy of 93.39%, representing a 1.24% improvement over the Transformer. Its precision is 93.10%, a 1.3% increase; recall is 94.40%, also a 1.3% increase; and the F1 score is 93.24%, a 1.27% improvement. These results indicate that the TD Transformer excels in all aspects, particularly in precision and F1 score, demonstrating its advantages in this task. The findings highlight the significant superiority of the TD Transformer in handling the high-speed railway subsidence early warning task compared to other methods.

Fig. 5
figure 5

Experimental results of SVM, XGBoot, RF, Transformer and TD transformer.



Source link