# Developing spatio-temporal approach to predict economic dynamics based on online news

### Spatio-temporal distribution of economic output value

Figure 1 showed the spatial distribution of annual economic output value and online news number from 2018 to 2021. The minimum economic output value was 1,032, and the maximum was 19,845,026, with the mean of 155,378 in 2018. The smallest value in 2019 was 13,684, and the biggest one was 20,352,154, with the mean of 173,630. In 2020, the minimum value was 10,928, and the maximum one was 12,972,880, with the mean of 162,638. The minimum economic output value was 10,595 in 2021, and the maximum was 18,565,926, with the mean of 164,944. Additionally, it can be seen that the agglomeration of the economic output value is obvious in space during the study period. The regions with relatively higher economic output value were found to be concentrated in the northwest and southeast part in Yinzhou.

### The variation of economic dynamics in space and time

The mean difference in yearly economic output value was 18,214 (range: − 1,277,981 to 1,812,459) in Yinzhou between 2018 and 2019 (Fig. 2). The biggest increase in economic output value was found in the southeast area. For the period 2019–2020, the rise of the annual value was observed in the majority of regions, excluded in the west and southeast areas, with the mean difference of − 10,992 (range: − 7,379,274 to 458,454). From 2020 to 2021, the mean difference in yearly value was 2307 (range: − 2,406,953 to 2,567,025), and there was an obvious decrease in the southeast region for the period.

### Spatial cluster analysis over time

Spatial autocorrelation analysis was performed for all independent variable and economic dynamics during the study period before developing GWR model. Moran’s I was used in the study as the spatial cluster analysis tool. The results are shown in the Table 2.

Spatial autocorrelation analysis showed that the values of Moran’s I of the annual economic output value, the change in the value between years, the yearly number of positive online news and the percentage of positive online news were both positive (Moran’s I range: 0–1), with the significant statistical level. The results indicated that all the variables showed cluster characteristics in space. This can be seen as the foundation for developing GWR model, and also provided the prerequisite for the validation of the model.

### Modeling online news with economic dynamics by GLM

The GLMs combining the yearly number of positive online news and the percentage of positive online news have good performance in predicting the economic dynamics, when consider the effect of industry policy. The goodness-of-fit of the models demonstrated in Table 3. The results indicated that the annual positive online news and the percentage of positive online news have positive contribution to the economic dynamics with the statistical significance (P < 0.05) (Table 3).

In the GLM of predicting economic output value, with each unit of the annual number of positive online news growing, the value of annual economic output value increased by 50.86 units. 1 unit of the percentage of positive online news rising, the value can increase by 41.35 units. Moreover, in the GLM of forecasting change in the economic output value, the change value of annual economic output value climbed by 38.49 units, when the annual positive online news increased each unit. Similarly, the change value raised by 46.61 units, when the percentage of positive online news increased 1 unit.

### Estimating economic dynamics using online news by GWR

Firstly, we determined the optimal bandwidth by the smallest AIC value and standard residual in the model for each year. As a result, the optimal bandwidths ranged from 115.67 to 121.32 for the study period. The GWR model showed that the value of local coefficients for each predictor varies from area to area, with significant differences in the minimum, maximum and mean coefficients (Table 4).

For the model training period (2018–2019), the goodness-of-fit of the developed GWR models were demonstrated by the R^{2} and AIC values. For instance, the R^{2} value of the model for predicting economic output value in 2018 was 0.82. This indicated that the model could explain 82% of the change in the yearly economic output value (Table 4).

According to the R^{2} and AIC values, it could be found that the goodness-of-fit of the GWR model generally is better than GLM. GWR has greater capacity to estimate the spatio-temporal patterns of economic dynamics. This showed the advantages of the GWR model, which estimates the local regression coefficient by spatial unit. As a result, the model could fit the contribution of each predictor for each area separately^{22}.

Additionally, it seemed that the annual number of positive online news could better predict the economic output value by the GWR model, with relatively higher regression coefficient in each year. Similarly, the percentage of positive online news had a greater potential in the forecasting of change in economic output value by the GWR model for each year. Figure 3 showed the spatial distribution of the annual economic output value based on the number of yearly positive online news using trained GWR model from 2018 to 2019. Similarly, Fig. 4 demonstrated the annual change in economic output value using the percentage of the news by trained GWR model for the model training period (2018–2019).

Then, we predicted the yearly economic output value and annual change in economic output value using the yearly number of positive online news and the percentage of the news separately for the model predictive period (2020–2021). Figure 5 showed the spatial distribution of the predicted annual economic output value based on the number of yearly positive online news from 2020 to 2021. In addition, Fig. 6 demonstrated the spatio-temporal distribution of the predicted yearly change in economic output value based on the percentage of yearly positive online news for the model predictive period (2020–2021).

The evaluation of forecasting performance of the predictive models is presented in Table 5. The table shows that the predictive capacity of the developed GWR models is performed well with high Pearson correlations (the yearly economic output value in 2020: 0.96, the yearly economic output value in 2021: 0.94, the annual change in economic output value between 2020 and 2021 = 0.97, p < 0.01). The predictive GWR models were also robust as showed by the low values of the MAPE in economic dynamics forecasting (the yearly economic output value in 2020: 1.26, the yearly economic output value in 2021: 1.94, the annual change in economic output value between 2020 and 2021 = 1.77), which measures the discrepancies between the off-target model predictive economic dynamics and the observed values for the model predictive period (2020–2021).