In this paper, we propose a deep spatio-temporal forecasting model (Deep- STF) for multi-site weather prediction post-processing by using both temporal and spatial information. In our proposed framework, the spatio-temporal information is modeled by a CNN (convolutional neural network) module and an encoder-decoder structure with the attention mechanism. The novelty of our work lies in that our model takes full account of temporal and spatial characteristics and obtain forecasts of mul- tiple meteorological stations simultaneously by using the same framework. We apply the DeepSTF model to short-term weather prediction at 226 meteorological stations in Beijing. It significantly improves the short-term forecasts compared to other widely- used benchmark models including the Model Output Statistics method. In order to evaluate the uncertainty of the model parameters, we estimate the confidence inter- vals by bootstrapping. The results show that the prediction accuracy of the DeepSTF model has strong stability. Finally, we evaluate the impact of seasonal changes and to- pographical differences on the accuracy of the model predictions. The results indicate that our proposed model has high prediction accuracy.