Study objects and data sources
As a World Heritage Sites (WHS), the Grand Canal comprises a total of 85 World Heritage sites and 31 river sections along its entire route. The Jiangsu section alone includes 36 heritage sites and 14 segments of the canal. Historically, Jiangsu has been characterized by a dense network of waterways and a well-developed grain transport system, which has made significant contributions to north-south connectivity in China22. today, sections of the canal in Jiangsu remain navigable. Centuries of north-south interaction have enabled cities along the Jiangsu section of the Grand Canal to accumulate a wealth of heritage23. These heritage sites not only bear witness to the historical evolution of the Grand Canal but also play a profound role in shaping the development of cities along its route.
This study selects 205 National Key Cultural Relics Protection Units (NKU) and 36 World Heritage Sites (WHS) located along the Jiangsu section of the Beijing-Hangzhou Grand Canal as the research objects (Fig. 1), including 12 sites that are dual-listed (DUAL). Special cases such as Slender West Lake in Yangzhou are also considered. Although Slender West Lake is recognized as a WHS for its cultural landscape, it encompasses multiple distinct heritage sites such as Lotus Bridge, White Pagoda, and Xu Garden. Therefore, both the lake as a whole and its constituent heritage elements are analyzed separately.
CTA refers to the capacity of a heritage site to disseminate the culture it represents. In the past, evaluating this capability has been challenging due to the limited availability and questionable authenticity of relevant data. However, in the digital era, the rapid development of the Chinese internet has provided new opportunities for cultural dissemination24. Using social media data to assess the CTA of heritage sites offers greater accessibility and authenticity. Given that short video platforms have become popular social tools among younger generations, this study incorporates data not only from traditional text-and-image-based media such as Sina Weibo, but also from Douyin, the most widely used short video platform in China. All data were collected for the entire calendar year of 2023. Using the official names of each cultural heritage site as keywords, we conducted comprehensive searches on both Douyin and Sina Weibo platforms. A total of 6369 Weibo posts and 10,751 Douyin videos were retrieved.
For Douyin, the collected indicators included the number of videos, comments, likes, shares, and saves. For Weibo, we collected the number of posts, replies, likes, and shares. To ensure data quality, we applied a manual screening process to remove entries that: (1) lacked cultural or historical relevance; (2) were associated with commercial entities bearing similar names; or (3) were duplicates or low-quality content.
The final dataset includes data on all 229 heritage sites. The five sites shown in Fig. 2 serve solely as illustrative examples for how CTA indicators were derived.
Entropy weight-TOPSIS method
The entropy weight method combined with the Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS) is a widely used approach for multi-indicator evaluation. Peng et al.25 applied this integrated method to quantify the vulnerability of earthen sites. Since then, it has been successfully adopted in various fields such as wall weathering assessment26, disease diagnosis of earthen ruins27, and green development evaluation systems28, demonstrating strong applicability and scientificity. In this study, based on statistical data from Douyin and Weibo, the entropy weighted-TOPSIS method is employed to calculate a comprehensive cultural transmission score for each heritage site. The specific technical route is illustrated in Fig. 3.
With the promotion, application, and in-depth study of entropy theory across various disciplines, the concept of entropy was further developed throughout the mid-20th century29. In 1948, Shannon proposed a mathematical formulation of entropy, which quantitatively describes the uncertainty of data30. In recent years, many studies have successfully applied information entropy to multi-criteria comprehensive evaluation, interpreting this uncertainty as degree of change31. The underlying idea is that the smaller the calculated entropy, the greater the degree of data variation, which indicates a higher amount of information and a more significant role in the comprehensive evaluation. therefore, a higher weight is assigned. Consequently, the “degree of uncertainty” or “degree of variability” of the data can be used as the basis for weighting indicators in the entropy right method. Assuming that there are m evaluation objects and each object is assessed by n indicators, the steps of the calculation are as follows32:
Step 1: Construct the Evaluation Matrix \({X}_{{mn}}\)
$${X}_{{mn}}=\left[\begin{array}{cc}\begin{array}{cc}{x}_{11} & {x}_{12}\\ {x}_{21} & {x}_{21}\end{array} & \begin{array}{cc}\cdots & {x}_{1n}\\ \cdots & {x}_{2n}\end{array}\\ \begin{array}{cc}\cdots & \cdots \\ {x}_{m1} & {x}_{m1}\end{array} & \begin{array}{cc}\ddots & \cdots \\ \cdots & {x}_{{mn}}\end{array}\end{array}\right]$$
(1)
Step 2: Data Normalization
Since the original data for each indicator may have different dimensions and units, direct comparison and analysis can be challenging. Therefore, it is necessary to normalize the data, as shown in Eq. (2).
$${y}_{{ij}}=\left\{\begin{array}{c}\frac{{x}_{{ij}}-{x}_{\min }}{{x}_{\max }-{x}_{\min }}{;The}\,j\,{column\; is\; positive\; indiators}\\ \frac{{x}_{\max }-{x}_{{ij}}}{{x}_{\max }-{x}_{\min }}{;The}\,j\,{column\; is\; negative\; indiators}\end{array}i=1,2,\cdots ,{m;j}=1,2,\cdots ,n\right.$$
(2)
\({x}_{{ij}}\) denotes the value in the ith row and jth column of the matrix, representing the value of the jth evaluation indicator for the ith object. \({x}_{\max }\) and \({x}_{m{in}}\) refer to the maximum and minimum values of the jth indicator column, respectively.
using Eq. (3) to obtain \({p}_{{ij}}\), the feature weight of the ith evaluation object under the jth indicator:
$${p}_{{ij}}=\frac{{y}_{{ij}}}{{\sum }_{i=1}^{m}{y}_{{ij}}}{;i}=1,2,\cdots ,{m;j}=1,2,\cdots ,n$$
(3)
Step 3: Calculate the Information Entropy
After obtaining the feature weights, Eq. (4) is used to calculate the information entropy (\({E}_{j}\)) of each indicator, where \({p}_{{ij}}{ln}({p}_{{ij}})=0\) is taken when \({p}_{{ij}}\) is equal to 0.
$${E}_{j}=-\frac{1}{{in}\left(m\right)}\mathop{\sum }\limits_{i=1}^{m}{p}_{{ij}}{ln}\left({p}_{{ij}}\right){;i}=1,2,\cdots ,{m;j}=1,2,\cdots ,n$$
(4)
Finally, the weight matrix \(W=\left[\begin{array}{cccc}{w}_{1} & {w}_{2} & \cdots & {w}_{n}\end{array}\right]\) is calculated based on the entropy values of each indicator. The weights of the indicators are denoted as \(\begin{array}{cccc}{w}_{1}\,{w}_{2}\,\cdots\,{w}_{n}\end{array}\), and the calculation formula is as shown in Eq. (5):
$${w}_{j}=\frac{1-{E}_{j}}{{\sum }_{1}^{n}\left(1-{E}_{j}\right)}{;j}=1,2,\cdots ,n$$
(5)
The TOPSIS method is a classic indicator-based decision making method first introduced by Hwang and Yoon in 198133. The basic principle is to identify the best and worst solutions among a limited set of alternatives from a standardized original matrix. Then, the relative closeness of each alternative to the ideal and negative ideal solutions is calculated based on the distance between each target and these solutions. This calculation is used as a comprehensive evaluation of the superiority or inferiority of the research targets. Whether dealing with small or large sample data, the TOPSIS method can effectively utilize the original data to accurately reflect the distance between evaluation objects. The calculation process is explained in detail as follows34.
Step 1: Development of a Weighted Normalization Matrix
Multiplying the normalized matrix by the weights of each indicator results in the weighted normalized decision matrix U. The element in the ith row and jthcolumn of this matrix is denoted as \({u}_{{ij}}\):
$${u}_{{ij}}={r}_{{ij}}\times {w}_{j}$$
(6)
\({w}_{j}\) represents the weight of the jth indicator, which is the entropy weight as shown in the previous section. \({r}_{{ij}}\) refers to the element at the corresponding position in the normalized matrix.
Step 2: Constructing the Ideal Solution Vectors
The ideal solution \({A}^{+}\) and the negative ideal solution \({A}^{-}\) are defined as:
$$\begin{array}{c}{A}^{+}=\left({r}_{1}^{+},{r}_{2}^{+},\cdots ,{r}_{n}^{+}\right)=\left\{\left.\max \left({u}_{{ij}}\right)\right|i=1,2,\cdots ,m\right\}\\ {A}^{-}=\left({r}_{1}^{-},{r}_{2}^{-},\cdots ,{r}_{n}^{-}\right)=\left\{\left.\min \left({u}_{{ij}}\right)\right|i=1,2,\cdots ,m\right\}\end{array}$$
(7)
Here:
\({r}_{j}^{+}\) is the value of the jth indicator in the ideal solution;
\({r}_{j}^{-}\) is the value of the jth indicator in the negative ideal solution.
Step 3: Calculating the Distances to the Ideal and Negative Ideal Solutions
$$\left\{\begin{array}{c}{s}_{i}^{+}=\sqrt{{\sum }_{j=1}^{n}{{w}_{j}\left({r}_{j}^{+}-{r}_{{ij}}\right)}^{2}}\\ {s}_{i}^{-}=\sqrt{{\sum }_{j=1}^{n}{{w}_{j}\left({r}_{j}^{-}-{r}_{{ij}}\right)}^{2}}\end{array}\right.{;i}=1,2,\cdots ,{m;j}=1,2,\cdots ,n$$
(8)
Here: \({s}_{i}^{+}\) represents the distance between the ith alternative and the ideal solution; \({s}_{i}^{-}\) represents the distance between the ith alternative and the negative ideal solution.
Step 4: Calculating the Relative Closeness \({C}_{i}^{+}\):
$${C}_{i}^{+}=\frac{{s}_{i}^{-}}{{s}_{i}^{+}+{s}_{i}^{-}}$$
(9)
The closer the relative closeness \({C}_{i}^{+}\) is to 1, the closer the alternative is to the ideal solution and the better its performance.
K-means clustering analysis
Clustering analysis is an unsupervised learning method35. In this study, we adopt K-means clustering, a method first proposed by MacQueen in 196736, The core idea of K-means is to partition a dataset into K clusters, where data points within the same cluster are as similar as possible, while those in different clusters are as distinct as possible. K-means clustering is known for its interpretability of results37.
The specific steps of K-means clustering are illustrated in Fig. 4. First, the data are preprocessed, elbow rule is used to determine the value of the hyperparameter K, which is used to determine the number of classes obtained after clustering. Next, initial cluster centers are selected, and data points are assigned to the nearest cluster center based on Euclidean distance. For each cluster, a new centroid is computed. If the centroids remain unchanged, the algorithm terminates and the clustering result is obtained; otherwise, the steps of data point assignment and centroid update are repeated until convergence is reached. The final clustering results group heritage sites with similar CTA into the same category. Furthermore, this algorithm is entirely based on objective data, which allows it to reflect, to a certain extent, the shared characteristics of heritage sites with similar levels of CTA.
Kernel density analysis
Kernel density analysis is a statistical tool commonly used to study the spatial distribution characteristics of target points within a specific geographic area38. This method estimates the influence of each sample point on its surrounding area by defining a kernel function around it, and then aggregates the influence of all points to calculate the density value at each location. The resulting density map, which resembles a contour map, provides an intuitive visualization of the spatial distribution of heritage sites within the study area. The specific formula is shown in Eq. (10).
$$f\left(x\right)=\frac{1}{nh}\mathop{\sum }\limits_{i=1}^{n}k\left(\frac{x-{x}_{i}}{h}\right)$$
(10)
Here: \(f\left(x\right)\) represents the kernel density; \(h\) is the bandwidth of the kernel function; \(n\) denotes the number of points within the bandwidth; \(k\left(x\right)\) is the kernel function, in which \(x-{x}_{i}\) represents the distance between point \(x\) and point \({x}_{i}.\)
Average nearest neighbor analysis
Average Nearest Neighbor (ANN) analysis is a classical method used to measure the spatial distribution patterns of point data. It is commonly applied to determine whether the spatial arrangement of data points is clustered, uniformly distributed, or random. The core concept involves calculating the average distance between each point and its nearest neighbor to assess the spatial characteristics of the dataset39. This observed average distance is then compared with the expected average distance under a theoretical random distribution to identify the distribution pattern40. The specific formula is as follows.
$$\overline{r0}=\mathop{\sum }\limits_{i=1}^{n}\frac{\min {d}_{i}}{n}$$
(11)
$$\overline{{rE}}=\frac{1}{2}\sqrt{\frac{A}{n}}$$
(12)
$$R=\frac{\overline{r0}}{\overline{{rE}}}$$
(13)
Here: \({d}_{i}\) is the distance from point \(i\) to its nearest neighboring point; \(n\) is the number of heritage sites within the study area; \(A\) represents the area of the study region; \(R\) is the Nearest Neighbor Ratio; \(\overline{r0}\) is the Nearest Neighbor Ratio; \(\overline{{rE}}\) is the expected ANN distance under a random distribution.
By calculating the R for each city, the degree of clustering of heritage sites can be assessed. A value of 1 serves as the threshold:
When \(R > 1\), the heritage sites in the city tend to be uniformly distributed;
When \(R=1\), the distribution tends to be random;
When \(R < 1\), the sites tend to be clustered.
Questionnaire method
The questionnaire survey is one of the key methods in social research. In this study, a questionnaire-based approach was employed to verify the scientific validity of the proposed evaluation method for the CTA of heritage sites. The questionnaire focused primarily on assessing respondents awareness and recognition of heritage sites in selected cities, aiming to evaluate the effectiveness of the proposed method in reflecting public awareness and cultural communication outcomes.
The survey was conducted in August 2024 using Tencent Questionnaire, an online survey platform. The questionnaire included three key dimensions: (1) the respondent’s current city of residence, (2) their familiarity with cities along the Jiangsu section of the Beijing-Hangzhou Grand Canal, and (3) their recognition of the most well-known cultural heritage sites in designated cities (including Suzhou, Wuxi, Yangzhou, Huaian, Xuzhou, Zhenjiang, Changzhou, and Suqian). A nominal scale (i.e., selection-based questions) was primarily used, and the content was developed in consultation with experts to ensure content validity.
The questionnaire was distributed online. A stratified random sampling method was used to ensure the diversity of demographic characteristics, including age, occupation, and geographic location. Participants were selected from among residents living near heritage sites, cultural event participants, and professionals working in related fields.
A total of 121 questionnaires were collected. Based on inclusion and exclusion criteria, 86 valid responses were retained for analysis.Inclusion criteria included: being 18 years or older, and completing the full questionnaire.Exclusion criteria included: completion time under 1 min, logically inconsistent answers, selecting the same option throughout, and duplicate submissions from the same IP or device.
The survey was primarily administered in cities along the Jiangsu section of the Beijing-Hangzhou Grand Canal, namely Suzhou, Wuxi, Yangzhou, Huaian, Xuzhou, Zhenjiang, Changzhou, and Suqian, with additional responses collected from individuals in other areas of Jiangsu Province.