10 平稳序列的偏相关系数和Levinson递推公式

10.1 最优线性预测

10.1.1 有限个随机变量的最优线性预测

\(X_1, X_2, \dots, X_n, Y\)为随机变量。 考虑估计问题 \[\begin{aligned} L(Y | X_1,\dots, X_n) \stackrel{\triangle}{=} \mathop{\mathrm{argmin}}_{\hat Y = a_0 + a_1 X_1 + \dots + a_n X_n} E(Y - \hat Y)^2 \end{aligned}\]\(L(Y | X_1, \dots, X_n)\)\(Y\)关于\(X_1, \dots, X_n\)最优线性预测或者最优线性估计。 \(E(Y - \hat Y)^2 = \| Y - \hat Y \|^2\)称为预测的均方误差。 \(L(Y | X_1, \dots, X_n)\)实际是用\(X_1, \dots, X_n\)的带截距项的线性组合预测\(Y\)的均方误差最小的预测。

这里\(\text{sp}(1, X_1, \dots, X_n)\)是由\(1, X_1, \dots, X_n\)的线性组合组成的\(L^2\)的子空间。 由Hilbert空间的投影理论, \(L(Y | X_1, \dots, X_n)\)\(Y\)\(\text{sp}(1, X_1, \dots, X_n)\)上的投影。

下面用求多元函数最小值的方法推导\(L(Y | X_1, \dots, X_n)\)的公式。

\(\boldsymbol{X} = (X_1,\dots,X_n)^T\), 令 \(\boldsymbol{\xi} = \boldsymbol{X} - E\boldsymbol{X}\), \(\eta = Y - E Y\)。设 \(\Sigma_{\boldsymbol{X}} \stackrel{\triangle}{=} \text{Var}(\boldsymbol{X}) = \text{Var}(\boldsymbol{\xi})\)正定。 记\(\Sigma_{\boldsymbol{X},Y} = \text{Cov}(\boldsymbol{X}, Y)\). 对\(a_0, a_1, \dots, a_n \in R\), 记\(\boldsymbol{a} = (a_1, \dots, a_n)^T\), 有 \[\begin{aligned} & E \left( Y - (a_0 + a_1 X_1 + \dots + a_n X_n) \right)^2 \\ =& E(\eta - (a_1 \xi_1 + \dots a_n \xi_n))^2 + (E Y - a_0 - \boldsymbol{a}^T E \boldsymbol{X})^2 \end{aligned}\] 已知\(a_1, \dots, a_n\)后取\(a_0 = E Y - \boldsymbol{a}^T E\boldsymbol{X}\) 就可以使上式后一项为零,所以不妨设\(E X=0\), \(E Y=0\)

这时 \[\begin{aligned} g(\boldsymbol{a}) =& E(Y - (a_1 X_1 + \dots a_n X_n))^2 = E(Y - \boldsymbol{a}^T \boldsymbol{X})^2\\ =& \text{Var}(Y) + \boldsymbol{a}^T \Sigma_{\boldsymbol{X}} \boldsymbol{a} - 2 \boldsymbol{a}^T \Sigma_{\boldsymbol{X},Y} \\ \frac{\partial g(\boldsymbol{a})}{\partial \boldsymbol{a}} =& 2 \Sigma_{\boldsymbol{X}} \boldsymbol{a} - 2 \Sigma_{\boldsymbol{X},Y} \\ \frac{\partial^2 g(\boldsymbol{a})}{\partial \boldsymbol{a} \partial \boldsymbol{a}^T} =& 2 \Sigma_{\boldsymbol{X}} > 0 \end{aligned}\]\(\frac{\partial g(\boldsymbol{a})}{\partial \boldsymbol{a}} = 0\)求得 \[\begin{aligned} \boldsymbol{a} = \Sigma_{\boldsymbol{X}}^{-1} \Sigma_{\boldsymbol{X}, Y} \end{aligned}\] 因为海色阵\(\frac{\partial^2 g(\boldsymbol{a})}{\partial \boldsymbol{a} \partial \boldsymbol{a}^T}\) 正定所以上式为\(g(\boldsymbol{a})\)的唯一严格最小值点。

于是, \[\begin{align} L(Y | X_1, X_2, \dots, X_n) = E Y + \Sigma_{Y, \boldsymbol{X}} \Sigma_{\boldsymbol{X}}^{-1}(\boldsymbol{X} - E \boldsymbol{X}) \tag{10.1} \end{align}\]

估计误差的最小值为 \[\begin{align} E(Y - \boldsymbol{a} \boldsymbol{X})^2 = \text{Var}(Y) - \Sigma_{\boldsymbol{X},Y}^T \Sigma_{\boldsymbol{X}}^{-1} \Sigma_{\boldsymbol{X},Y} \tag{10.2} \end{align}\]

\(|\Gamma_n|=0\)时(协方差阵不满秩时), 最优线性估计也存在,但有无穷多个(详见第5章)。

10.1.2 平稳列的最优线性预测

\(\{X_t\}\)为零均值平稳列。考虑用\(X_1,\ldots,X_n\)的线性组合预测\(X_{n+1}\)。 设\(\Gamma_n>0\),则 \[\begin{aligned} & L(X_{n+1} | X_n, X_{n-1}, \dots, X_1) \\ =& \left[ \text{Var}( \left( \begin{array}{c} X_n\\ \vdots \\ X_1 \end{array} \right) )^{-1} \text{Cov}( \left( \begin{array}{c} X_n\\ \vdots \\ X_1 \end{array} \right), X_{n+1}) \right]^T \left( \begin{array}{c} X_n\\ \vdots \\ X_1 \end{array} \right) \\ =& (\Gamma_n^{-1} \left( \begin{array}{c} \gamma_1\\ \gamma_2 \\ \dots\\ \gamma_n \end{array} \right) )^T \left( \begin{array}{c} X_n \\ \dots \\ X_1 \end{array} \right) \\ \stackrel{\triangle}{=}& a_{n1} X_{n} + a_{n2} X_{n-1} + \dots + a_{nn} X_1 \\ \stackrel{\triangle}{=}& \boldsymbol{a}_n^T (X_n, X_{n-1}, \dots, X_1)^T \end{aligned}\]

Yule-Walker方程: \[\begin{aligned} \Gamma_n \left(\begin{array}{c} a_{n1} \\ \vdots \\ a_{nn} \end{array} \right) = \left( \begin{array}{c} \gamma_1\\ \vdots\\ \gamma_n \end{array} \right) \end{aligned}\] 简记为 \[\begin{aligned} \Gamma_n \boldsymbol a_n = \boldsymbol \gamma_n \end{aligned}\] 方程的解\(\boldsymbol{a}_n\)称为\(\{ X_t \}\)\(\{\gamma_k\}\)\(n\)阶Yuler-Walker系数。

\[\begin{aligned} L(X_{n+1} | X_n, \dots, X_1) = \boldsymbol{a}_n^T (X_n, \dots, X_1)^T \end{aligned}\]

由平稳性 \[\begin{aligned} L(X_{t} | X_{t-1}, \dots, X_{t-n}) = \boldsymbol{a}_n^T (X_{t-1}, \dots, X_{t-n})^T \end{aligned}\]

最小的线性预测方差为 \[\begin{aligned} \sigma_n^2 \stackrel{\triangle}{=} & E(X_{n+1} - (a_{n1} X_{n} + \dots a_{nn} X_1))^2 \\ =& \text{Var}(X_{n+1}) - \boldsymbol{\gamma}_n^T \Gamma_n \boldsymbol{\gamma}_n \\ =& \gamma_0 - \boldsymbol{\gamma}_n^T \boldsymbol{a}_n \\ =& \gamma_0 - a_{n1}\gamma_1 - \dots - a_{nn}\gamma_n \end{aligned}\] 由平稳性 \[\begin{aligned} E(X_t - (a_{n1} X_{t-1} + \dots a_{nn} X_{t-n}))^2 = \sigma_n^2 \end{aligned}\]

10.2 最小相位性

如果\(\{\gamma_k\}\)是某AR(\(p\))序列的自协方差函数, 则\(p\)阶的Yuler-Walker方程解出的Yule-Walker系数就是 AR模型的自回归系数,所以满足如下的最小相位性: \[\begin{aligned} A(z) = 1 - \sum_{j=1}^p a_j z^j \neq 0, \quad \text{对}|z|\leq 1 \end{aligned}\]

对于一般的平稳列有如下定理。

定理10.1 (Y-W系数的最小相位性) 如果实数列\(\gamma_k, k=0, 1, \dots, n\)使得 \[\begin{aligned} \Gamma_{n+1} \stackrel{\triangle}{=} \left( \begin{array}{cccc} \gamma_0 & \gamma_1 & \cdots & \gamma_n \\ \gamma_1 & \gamma_0 & \cdots & \gamma_{n-1} \\ \vdots & \vdots & & \vdots \\ \gamma_n & \gamma_{n-1} & \cdots & \gamma_0 \end{array} \right) > 0 \end{aligned}\] 则解出的\(n\)阶Yuler-Walker系数\(\boldsymbol{a}_n\)满足如下最小相位条件: \[\begin{aligned} 1 - \sum_{j=1}^n a_{nj} z^j \neq 0, \quad |z| \leq 1. \end{aligned}\]

最小相位性就是以\(\boldsymbol{a}_n\)为系数的AR(\(p\))模型能表示成因果性线性平稳列的充分必要条件。

一般的线性平稳列的自协方差列正定, 所以其任意\(n\)阶Yuler-Walker系数都满足最小相位条件。

10.3 Levinson递推公式

定理10.2 (Levinson递推) 如果\(\Gamma_{n+1}\)正定,则\(\gamma_k, k=0,1, \dots, n\)\(1,2,\dots,n, n+1\)阶Yuler-Walker系数 \(\{a_{ij}, i=1,\dots,n+1, j=1, \dots, i \}\) 和均方误差\(\sigma_k^2\)可以如下递推计算: \[\begin{align} \sigma_0^2 =& \gamma_0 \\ a_{1,1} =& \gamma_1 / \gamma_0 \\ \sigma_k^2 =& \sigma_{k-1}^2 (1 - a_{k,k}^2) \\ a_{k+1,k+1} =& \frac{\gamma_{k+1} - a_{k,1} \gamma_k - a_{k,2} \gamma_{k-1} - \dots - a_{k,k} \gamma_1}{ \gamma_0 - a_{k,1} \gamma_1 - a_{k,2} \gamma_2 - \dots - a_{k,k} \gamma_k } \\ a_{k+1,j} =& a_{k,j} - a_{k+1,k+1} a_{k,k+1-j}, \quad 1 \leq j \leq k \tag{10.3} \end{align}\] 其中 \[\begin{align} \sigma_k^2 = E (X_{k+1} - (a_{k,1} X_{k} + \dots + a_{k,k} X_1))^2 \tag{10.4} \end{align}\] 是用\(X_k, X_{k-1}, \dots, X_1\)预测\(X_{k+1}\)的均方误差。

10.3.1 Levinson公式的记忆方法

回忆§9.2中的(9.8)(9.9) \[\begin{align} & \gamma_k - \sum_{j=1}^p a_j \gamma_{k-j} = 0, \quad k \geq 1 \tag{10.5}\\ & \sigma^2 = \gamma_0 - \sum_{j=1}^p a_j \gamma_{0-j} \tag{10.6} \end{align}\](10.5))中将\(k\)替换成\(k+1\)\[ \gamma_{k+1} - \sum_{j=1}^p a_j \gamma_{k+1-j} = 0 \] 在Levinson递推的\(a_{k+1,k+1}\)递推公式中可以将分子看成上式左边\(a_j = a_{k,j}\), \(p=k\)的情形, 将分母看成是(10.6)\(p=k\), \(a_j=a_{k,j}\)的情形。

\(a_{k+1,j}, 1\leq j \leq k\)的公式可以写成如下的矩阵形式 \[\begin{aligned} \left(\begin{array}{c} a_{k+1,1} \\ \vdots \\ a_{k+1,k} \end{array}\right) = \left(\begin{array}{c} a_{k,1} \\ \vdots \\ a_{k,k} \end{array}\right) - a_{k+1,k+1} \left(\begin{array}{c} a_{k,k} \\ \vdots \\ a_{k,1} \end{array}\right) \end{aligned}\]

关于\(\sigma_k^2\):

\[\begin{aligned} \sigma_k^2 =& E[X_{k+1} - \boldsymbol a_k^T \boldsymbol X_k]^2 \\ =& E[(X_{k+1} - \boldsymbol a_k^T \boldsymbol X_k) X_{k+1}] - E[(X_{k+1} - \boldsymbol a_k^T \boldsymbol X_k) \boldsymbol a_k^T \boldsymbol X_k] \\ =& \gamma_0 - \boldsymbol a_k^T (\gamma_1 \ \ldots \gamma_k)^T - 0\\ =& \gamma_0 - a_{k,1}\gamma_1 - \dots - a_{k,k} \gamma_k \end{aligned}\] 这是\(a_{k+1,k+1}\)的递推公式的分母, 所以\(a_{k+1,k+1}\)的递推公式也可以写成 \[\begin{aligned} a_{k+1,k+1} = \frac{\gamma_{k+1} - a_{k,1} \gamma_k - a_{k,2} \gamma_{k-1} - \dots - a_{k,k} \gamma_1}{\sigma_k^2} \end{aligned}\] 注意\(\sigma_k^2\)是用\(k\)个历史值预报第\(k+1\)个的均方误差。

10.3.2 Levinson递推的计算顺序

用Levinson递推公式计算各阶Yuler-Walker系数和 \[\begin{aligned} \sigma_k^2 =& E[X_{k+1} - a_{k,1} X_k - a_{k,2} X_{k-1} - \dots - a_{k,k} X_1]^2 \end{aligned}\] 次序应为

  • 初值(不用历史资料预报\(X_1\)): \[\begin{aligned} \sigma_0^2 =& E[X_1 - 0]^2 = \gamma_0 \end{aligned}\]
  • \(k+1=1\)(用\(X_1\)预报\(X_2\)): \[\begin{aligned} a_{1,1} =& \gamma_1 / \gamma_0 \\ \sigma_1^2 =& E[X_2 - a_{1,1} X_1]^2 \\ =& \sigma_0^2 (1 - a_{1,1}^2) \end{aligned}\]
  • \(k+1=2\)(用\(X_1, X_2\)预报\(X_3\)): \[\begin{aligned} a_{2,2} =& \frac{\gamma_2 - a_{1,1} \gamma_1}{\sigma_1^2} \\ a_{2,1} =& a_{1,1} - a_{2,2} a_{1,1} \\ \sigma_2^2 =& E[X_3 - a_{2,1} X_2 - X_{2,2} X_1]^2 \\ =& \sigma_1^2 ( 1 - a_{2,2}^2) \end{aligned}\]
  • \(k+1=3\)(用\(X_1, X_2, X_3\)预报\(X_4\)): \[\begin{aligned} a_{3,3} =& \frac{\gamma_3 - a_{2,1} \gamma_2 - a_{2,2} \gamma_1}{\sigma_2^2} \\ a_{3,1} =& a_{2,1} - a_{3,3} a_{2,2} \\ a_{3,2} =& a_{2,2} - a_{3,3} a_{2,1} \\ \sigma_3^2 =& E[ X_4 - a_{3,1}X_3 - a_{3,2} X_2 - a_{3,3} X_3]^2 \\ =& \sigma_2^2 ( 1- a_{3,3}^2) \end{aligned}\]
  • ………………

计算次序为 \[ \begin{array}{c|ccccc|c} \hline k & a_{k,j} & & & & & \sigma_{k}^2 \\ \hline 0 & & & & & & \sigma_0^2 \\ 1 & a_{1,1} & & & & & \sigma_1^2 \\ 2 & a_{2,2} & a_{2,1} & & & & \sigma_2^2 \\ 3 & a_{3,3} & a_{3,1} & a_{3,2} & & & \sigma_3^2 \\ 4 & a_{4,4} & a_{4,1} & a_{4,2} & a_{4,3} & & \sigma_4^2 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \end{array} \]

10.4 偏相关系数

定义10.1 (偏相关系数) 如果\(\Gamma_n\)正定,称\(a_{n,n}\)\(\{X_t\}\)\(\{\gamma_k\}\)\(n\)阶偏(自)相关系数。 序列\(\{a_{n,n}, n=1,2,\dots\}\)称为\(\{X_t\}\)\(\{\gamma_k\}\)\(n\)阶偏(自)相关函数。

偏自相关是\(X_1\)\(X_{n+1}\)之间的如下意义下的偏相关系数\[\begin{aligned} a_{n,n} = \text{Corr}[&X_1 - L(X_1 | X_2, \dots, X_n), \\ & X_{n+1} - L(X_{n+1} | X_2, \dots, X_n) ] \end{aligned}\]\(a_{n,n}\)\(X_1\)\(X_{n+1}\)扣除\(X_2,\dots,X_n\)的线性影响后的相关系数。

\(\{X_t\}\)是AR(\(p\))序列。其自协方差函数正定。 由Yule-Walker方程(9.7)知其\(n\)阶(\(n\geq p\))Y-W系数为 \[\begin{align} \boldsymbol{a}_n =& (a_1, \dots, a_p, 0, \dots, 0)^T \\ =& (a_{n,1}, a_{n,2}, \dots, a_{n,n})^T, \quad n \geq p \tag{10.7} \end{align}\] 其偏相关系数满足 \[\begin{align} a_{n,n} = \begin{cases} a_p \neq 0, \quad & n=p \\ 0, & n > p \end{cases} \tag{10.8} \end{align}\] 称此性质为AR(\(p\))序列的相关系数\(p\)后截尾。

反之,如果一个零均值平稳列偏相关系数\(p\)后截尾, 则它必是AR(\(p\))序列(见下面的定理)。

偏相关截尾条件隐含要求自协方差列正定。

定理10.3 (AR序列的偏相关函数条件) 设零均值平稳列\(\{X_t\}\)的自协方差函数\(\{\gamma_k\}\)是正定序列, 则\(\{X_t\}\)是AR(\(p\))序列的充分必要条件是,它的偏相关系数\(\{a_{n,n}\}\) \(p\)后截尾。

证明: 

只要证明充分性。 记\((a_{p,1}, \dots, a_{p,p})=(a_1,\dots,a_p)\), 令\(\varepsilon_t = X_t - \sum_{j=1}^p a_j X_{t-j}\), 只要证明\(\{\varepsilon_t\}\)是白噪声。 最小相位性由定理10.1给出。

\(\boldsymbol{a}_p = (a_{p,1}, \dots, a_{p,p})=(a_1,\dots,a_p)\), 由Levinson公式和\(a_{p+k,p+k}=0\)(\(k>0\))得 \[\begin{aligned} a_{p+1,j} =& a_{p,j} - a_{p+1,p+1} a_{p,p+1-j} = a_j, \quad & 1\leq j \leq p \\ a_{p+k,j} =& a_{p+k-1,j} = \dots = a_{p,j} = a_j, & k \geq 2, 1 \leq j \leq p \\ a_{p+k,j} =& a_{p+k-1, j} = 0 & p < j \leq p+k \end{aligned}\]\(n\geq p\)\[\begin{aligned} \boldsymbol{a}_n = (a_{n,1}, a_{n,2}, \dots, a_{n,n}) = (a_1, a_2, \dots, a_p, 0, \dots, 0) \end{aligned}\]

注意\(\boldsymbol{a}_n\)是Y-W方程的解,即 \[\begin{aligned} \left(\begin{array}{cccc} \gamma_0 & \gamma_1 & \cdots & \gamma_{n-1} \\ \gamma_1 & \gamma_0 & \cdots & \gamma_{n-2} \\ \vdots & \vdots & & \vdots \\ \gamma_{n-1} & \gamma_{n-2} & \cdots & \gamma_0 \end{array} \right) \left(\begin{array}{c} a_1 \\ a_2 \\ \vdots \\ a_p \\ 0 \\ \vdots \\ 0 \end{array} \right) = \left(\begin{array}{c} \gamma_1 \\ \gamma_2 \\ \vdots \\ \gamma_n \end{array} \right) \end{aligned}\] 可写成 \[\begin{aligned} \gamma_k =& a_1 \gamma_{k-1} + a_2 \gamma_{k-2} + \dots + a_p \gamma_{k-p} \\ =& \sum_{j=1}^p a_j \gamma_{k-j}, \quad k\geq 1 \end{aligned} \tag{*} \]

由定理10.1\(A(z)=1 - \sum_{j=1}^p a_j z^j\)满足最小相位条件。

\[\begin{aligned} \varepsilon_t = X_t - \sum_{j=1}^p a_j X_{t-j}, \quad t \in \mathbb Z \end{aligned}\]\(\{\varepsilon_t\}\)是平稳序列,满足\(E\varepsilon_t=0\), \(\text{Var}(\varepsilon_t)=\sigma_p^2 > 0\) (因为\(\{\gamma_k\}\)为正定序列所以\(\{X_t\}\)不是可完全线性预测的)。

下面只要证明\(\{\varepsilon_t\}\)是白噪声。 \(\forall t > s\)\[\begin{aligned} E(\varepsilon_t X_s) =& E\left[ \left( X_t - \sum_{j=1}^p a_j X_{t-j} \right) X_s \right] \\ =& \gamma_{t-s} - \sum_{j=1}^p a_j \gamma_{t-s-j} \\ =& 0 \qquad\text{(由(*))} \end{aligned}\] 所以\(t>s\)\[\begin{aligned} E(\varepsilon_t \varepsilon_s) = E \left[ \varepsilon_t \left(X_s - \sum_{j=1}^p a_j X_{s-j} \right) \right] = 0 \end{aligned}\]\(\{\varepsilon_t\}\)\(\text{WN}(0,\sigma_p^2)\),且 \(a_1, a_2, \dots, a_p\)满足最小相位条件。证毕。

○○○○○○

10.5 本节内容的应用意义

  • 有了观测样本\(x_1, x_2, \dots, x_N\)可以估计样本自协方差函数: \[\begin{aligned} \hat\gamma_k = \frac{1}{N} \sum_{t=1}^{N-k} (x_t - \bar x)(x_{t+k} - \bar x) \end{aligned}\]
  • 有了\(\{\hat\gamma_k\}\)可以计算各阶偏相关系数的估计\(\{a_{k,k}\}\)
  • 如果发现样本偏相关系数呈现截尾性则可以拟合AR模型。
  • 定理10.1保证当\(\hat\Gamma_{p+1}\) 正定时得到的模型系数满足最小相位条件。
  • 最小相位条件保证系统是稳定的,预测有意义。
  • 真实模型为AR(\(p\))时\(\hat\Gamma_{p+1}\) a.s.正定。

10.6 附录:最优线性预测的Hilbert空间投影意义

\(X_1, X_2, \dots, X_n, Y\)为随机变量。 考虑估计问题 \[\begin{aligned} L(Y | X_1,\dots, X_n) \stackrel{\triangle}{=} \mathop{\mathrm{argmin}}_{\hat Y = a_0 + a_1 X_1 + \dots + a_n X_n} E(Y - \hat Y)^2 \end{aligned}\]\(L(Y | X_1, \dots, X_n)\)\(Y\)关于\(X_1, \dots, X_n\)最优线性估计

上述最优线性预测问题等价于在\(L^2\)的闭子空间\(\text{sp}(1, X_1, \dots, X_n)\)求一个与\(Y\)距离最近的元素, 根据Hilbert空间投影的性质可知, \(L(Y | X_1, \dots, X_n)\)\(Y\)\(\text{sp}(1, X_1, \dots, X_n)\)的投影。 \(L(Y | X_1, \dots, X_n)\)满足的条件是 \[ Y - L(Y | X_1, \dots, X_n) \perp \text{sp}(1, X_1, \dots, X_n) \]\(L(Y | X_1, \dots, X_n) = a_0 + a_1 X_1 + \dots + a_n X_n\),则 \[ Y - a_0 + a_1 X_1 + \dots + a_n X_n \perp 1, X_1, \dots, X_n \] 也既是 \[\begin{aligned} & E (Y - a_0 + a_1 X_1 + \dots + a_n X_n) = 0 \\ & E X_j (Y - a_0 + a_1 X_1 + \dots + a_n X_n) = 0, j=1,\dots,n \end{aligned}\]\(\boldsymbol a = (a_1, \dots, a_n)^T\), \(\boldsymbol X = (X_1, \dots, X_n)^T\)\(\Sigma_{XX} = \text{Var}(\boldsymbol X)\), \(\Sigma_{XY} = \text{Cov}(\boldsymbol X, Y)\)。 估计问题可以写成 \[ L(Y | \boldsymbol X) \stackrel{\triangle}{=} \mathop{\mathrm{argmin}}_{\hat Y = a_0 + \boldsymbol a^T \boldsymbol X} E(Y - \hat Y)^2 \] \(a_0\)\(\boldsymbol a\)的充分必要条件是 \[\begin{aligned} & a_0 = EY - \boldsymbol a^T E \boldsymbol X \\ & E [(Y - a_0 - \boldsymbol a^T \boldsymbol X) \boldsymbol X^T] = 0 \end{aligned}\]

将第一式代入到第二式中,得 \[\begin{aligned} E [(Y - EY - \boldsymbol a^T (\boldsymbol X - E\boldsymbol X)) \boldsymbol X^T] = 0 \end{aligned}\]\[\begin{aligned} \text{Cov}(Y, \boldsymbol X) =& \boldsymbol a^T \text{Var}(\boldsymbol X) \\ \text{Var}(\boldsymbol X) \boldsymbol a =& \text{Cov}(Y, \boldsymbol X) \\ \Sigma_{XX} \boldsymbol a =& \Sigma_{XY} \end{aligned}\] 由投影的存在性知\(\boldsymbol a\)必有解, 且\(\Sigma_{XX}>0\)\[ \boldsymbol a = \Sigma_{XX}^{-1} \Sigma_{XY} \]\(|\Sigma_{XX}|=0\)\(\boldsymbol a\)有无穷多解, 但是得到的最佳线性预测都是同一个。

10.7 附录:最小相位性定理证明

来证明定理10.1的Y-W系数最小相位性。

\(I_n\)表示\(n\)阶单位阵, 用\(\boldsymbol 0_n\)表示元素都是0的\(n\)阶方阵。 由平稳性, \((X_{m}, X_{m-1}, \dots, X_{m-n})\)\((X_{n+1}, X_n, \dots, X_1\)有相同的协方差阵\(\Gamma_{n+1}\), 所以用\(X_{m-1}, \dots, X_{m-n}\)\(X_m\)作最优线性预测的公式为 \[\begin{align} \hat X_m \stackrel{\triangle}{=} L(X_m | X_{m-1}, \dots, X_{m-n}) = \sum_{j=1}^n a_{nj} X_{m-j} \tag{10.9} \end{align}\] 定义 \[\begin{align} V_m = X_m - \hat X_m = X_m - \sum_{j=1}^n a_{nj} X_{m-j} \tag{10.10} \end{align}\] 则由\(\Gamma_{n+1}>0\)可知 \(E V_m^2 \stackrel{\triangle}{=} \sigma_n^2 > 0\)

由最优线性预测的性质(或者\(L^2\)中投影算子的性质)可知 \[ E(V_m X_{m-j}) = 0, j=1,2,\dots,n \]

引入 \[ \boldsymbol Y_m =\left(\begin{array}{cc} X_m \\ X_{m-1} \\ \vdots \\ X_{m-n+1} \end{array}\right) \quad \boldsymbol V_m = \left(\begin{array}{cc} V_m \\ 0 \\ \vdots \\ 0 \end{array}\right) \] \[ A = \left(\begin{array}{ccccccc} a_{n1}\ a_{n2}\ \cdots a_{n,n-1}\ & a_{nn} \\ I_{n-1} & 0 \end{array}\right) \] 则有 \[\begin{align} \boldsymbol Y_m - A \boldsymbol Y_{m-1} = \boldsymbol V_m \tag{10.11} \end{align}\] 这称为AR模型的马氏扩张。 于是有 \[\begin{align} &\left(\begin{array}{cc} \sigma_n^2 & 0 \nonumber \\ 0 & \boldsymbol 0_{n-1} \end{array}\right) = E (\boldsymbol V_m \boldsymbol V_m^T) \nonumber \\ =& E[ \boldsymbol V_m (\boldsymbol Y_m - A \boldsymbol Y_{m-1})^T] \nonumber \\ =& E \left[ \boldsymbol V_m \boldsymbol Y_m^T \right] - E[ \boldsymbol V_m \boldsymbol Y_{m-1}^T ] A^T \nonumber \\ =& E \left[ \boldsymbol V_m \boldsymbol Y_m^T \right] \quad (\text{利用} E(V_m X_{m-j} = 0)) \nonumber \\ =& E \left[ (\boldsymbol Y_m - A \boldsymbol Y_{m-1}) \boldsymbol Y_m^T \right] \nonumber \\ =& E(\boldsymbol Y_m \boldsymbol Y_m^T) - A E(\boldsymbol Y_{m-1} \boldsymbol Y_m^T) \nonumber \\ =& \Gamma_n - A E \left[ \boldsymbol Y_{m-1} (A \boldsymbol Y_{m-1} + \boldsymbol V_m)^T \right] \nonumber \\ =& \Gamma_n - A E(\boldsymbol Y_{m-1} \boldsymbol Y_{m-1}^T) A^T - A E( \boldsymbol Y_{m-1} \boldsymbol V_{m}^T) \nonumber \\ =& \Gamma_n - A \Gamma_n A^T \quad (\text{再次利用} E(V_m X_{m-j} = 0)) \tag{10.12} \end{align}\]

设复数\(z_0\)使得\(\text{det}(I_n - z_0 A)=0\), 方程即 \[\begin{aligned} & \text{det}\left(\begin{array}{*9c} 1 - a_{n1} z_0 & - a_{n2} z_0 & - a_{n3} z_0 & \cdots & -a_{n,n-1} z_0 & - a_{nn} z_0 \\ -z_0 & 1 & 0 & \cdots & 0 & 0 \\ 0 & -z_0 & 1 & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & -z_0 & 1 \end{array}\right) \\ =& 1 - a_{n1} z_0 - a_{n2} z_0^2 - \dots - a_{nn} z_0^n = 0 \end{aligned}\]\(z_0 = 0\)时行列式等于1, 所以如果\(z_0\)是方程的根则\(z_0 \neq 0\)。 要证明最小相位性, 只要证明所有使得上面方程成立的\(z_0\)都满足\(|z_0| > 1\)

因为行列式\(\text{det}(I_n - z_0 A)=0\), 必存在非零复向量 \(\boldsymbol \alpha^* = (\alpha_1, \alpha_2, \dots, \alpha_n)\) 使得 \[ \boldsymbol \alpha^* (I_n - z_0 A) = 0 \]\[ \boldsymbol \alpha^* A = z_0^{-1} \boldsymbol \alpha^* \] 由矩阵\(A\)的结构, 这可以写成 \[\begin{equation} \begin{cases} a_{n1} \alpha_1 + \alpha_2 = z_0^{-1} \alpha_1 \\ a_{n2} \alpha_1 + \alpha_3 = z_0^{-1} \alpha_2 \\ \vdots \\ a_{n,n-1} \alpha_1 + \alpha_{n} = z_0^{-1} \alpha_{n-1} \\ a_{nn} \alpha_1 = z_0^{-1} \alpha_{n} \end{cases} \tag{10.13} \end{equation}\] 由此可知\(\alpha_1 \neq 0\), 如果不然, 则由(10.13) 递推可得\(0 = \alpha_1 = \alpha_2 = \dots = \alpha_n = 0\), 这与\(\boldsymbol\alpha^*\)非零矛盾。

利用\(\sigma_n^2 = E V_m^2 > 0\)(10.12)\(\boldsymbol\alpha^* A = z_0^{-1} \boldsymbol\alpha^*\)\[\begin{aligned} 0 <& \alpha_1 \sigma_n^2 \alpha_1^* \\ =& \boldsymbol\alpha^* \left(\begin{array}{cc} \sigma_n^2 & 0 \nonumber \\ 0 & \boldsymbol 0_{n-1} \end{array}\right) \boldsymbol\alpha \\ =& \boldsymbol\alpha^* \Gamma_n \boldsymbol\alpha - \boldsymbol\alpha^* A \Gamma_n A^T \boldsymbol\alpha \\ =& \boldsymbol\alpha^* \Gamma_n \boldsymbol\alpha - |z_0|^{-2} \boldsymbol\alpha^* \Gamma_n \boldsymbol\alpha \\ =& (1 - |z_0|^{-2}) \boldsymbol\alpha^* \Gamma_n \boldsymbol\alpha \end{aligned}\]\(\Gamma_n\)的正定性知\(\boldsymbol\alpha^* \Gamma_n \boldsymbol\alpha > 0\), 所以\(1 - |z_0|^{-2} > 0\), 即有\(|z_0| > 1\)。 定理证毕。

○○○○○○

10.8 附录:AR序列的等价定义

定理10.4 (AR序列的等价定义) \(a_1, \dots, a_p\)为实数,\(a_p \neq 0\)\(A(z) = 1 - a_1 z - \dots a_p z^p\), \(\{ \varepsilon_t \}\)为WN(0, \(\sigma^2\)), 零均值平稳列\(\{ X_t, t \in \mathbb Z \}\) 满足 \[ A(\mathscr B) X_t = \varepsilon_t, \ t \in \mathbb Z \] 如果\(\varepsilon_{t+k}, k=1,2,\dots\)\(\{ X_s, s \leq t \}\)互不相关, 则\(A(z)\)的根都在单位圆外, 从而\(\{X_t \}\)为AR(\(p\))序列。

证明 \(L(X_t | X_{t-1}, \dots, X_{t-p})\)\(X_t\)\(\text{sp}(X_{t-1}, \dots, X_{t-p})\)的投影, 由投影的线性性质及\(\varepsilon_t\)\(X_{t-1}, \dots, X_{t-p}\)正交可得 \[ L(X_t | X_{t-1}, \dots, X_{t-p}) = a_1 X_{t-1} + \dots + a_p X_{t-p} + 0 \] 可知用\(X_{t-1}, \dots, X_{t-p}\)预测\(X_t\)的Y-W系数为\((a_1, \dots, a_p)\), 又 \[ E(X_t - L(X_t | X_{t-1}, \dots, X_{t-p}))^2 = E \varepsilon_t^2 = \sigma^2 > 0 \] 所以\(X_t, X_{t-1}, \dots, X_{t-p}\)线性无关, 有\(\Gamma_{p+1} > 0\)。 由定理10.1可知\(A(z)\)满足最小相位性, 因此\(\{ X_t \}\)是AR(\(p\))序列。

○○○○○○

10.9 附录:离散谱序列的预测

考虑离散谱序列 \[ X_t = A \cos(\omega t) + B \sin(\omega t), t \in \mathbb Z, \] 其中\(0<\omega<\pi\), \(A, B\)是互不相关的零均值随机变量, \(\text{Var}(A)=\text{Var}(B)=\sigma^2\)。 这是零均值平稳列, 自协方差函数为 \[ \gamma_k = \sigma^2 \cos(k \omega), k=0,1,2, \dots \] 考虑最优线性预测问题。

10.9.1 直接求解Y-W方程

为了用\(X_1\)预测\(X_2\), 只要求解 \[ \gamma_0 a_{11} = \gamma_1 \] 即可得\(a_{11} = \cos\omega\)\[ L(X_2|X_1) = a_{11} X_1 = \cos\omega \cdot X_1 \] 预测的均方误差为 \[\begin{aligned} \sigma_2^2 =& E(X_2 - \cos\omega \cdot X_1)^2 \\ =& \sigma^2 (\cos 2\omega - \cos\omega \cos\omega)^2 + \sigma^2 (\sin 2\omega - \cos\omega \sin\omega)^2 \\ =& \sigma^2 ( \sin^4 \omega + \cos^2 \omega \sin^2 \omega) \\ =& \sigma^2 \sin^2 \omega > 0 \end{aligned}\]

考虑用\(X_2, X_1\)预测\(X_3\)的问题。 \[ \Gamma_2 = \left(\begin{array}{cc} 1 & \cos\omega \\ \cos\omega & 1 \end{array}\right) \] \(|\Gamma_2| = 1 - \cos^2 \omega = \sin^2\omega > 0\), 所以\(\Gamma_2 > 0\)。 求解方程

\[ \left(\begin{array}{cc} 1 & \cos\omega \\ \cos\omega & 1 \end{array}\right) \left(\begin{array}{c} a_{21} \\ a_{22} \end{array}\right) = \left(\begin{array}{c} \cos\omega \\ \cos 2\omega \end{array}\right) \]

\[\begin{aligned} & a_{21} + \cos\omega \cdot a_{22} = \cos\omega \\ & \cos\omega \cdot a_{21} + a_{22} = \cos 2\omega \\ & a_{22} = \cos 2\omega - \cos\omega \cdot a_{21} \\ & a_{21} + \cos\omega(\cos 2\omega - \cos\omega \cdot a_{21}) = \cos\omega \\ & a_{21}+ \cos\omega \cos 2\omega - \cos^2 \omega \cdot a_{21} = \cos\omega \\ & \sin^2 \omega \cdot a_{21} = \cos\omega(1 - \cos 2\omega) = 2 \cos\omega \sin^2 \omega \\ & a_{21} = 2 \cos\omega \\ & a_{22} = \cos 2\omega - \cos\omega \cdot a_{21} = \cos 2\omega - \cos\omega \cdot 2 \cos\omega = -1 \end{aligned}\] 其中用到三角函数公式\(\cos 2\omega = 2 \cos^2 \omega - 1 = 1 - 2 \sin^2 \omega\)

于是\(X_3\)的最优线性预测为 \[ \hat X_3 = a_{21} X_2 + a_{22} X_1 = 2 \cos\omega \cdot X_2 - X_1 \]

预测的均方误差为

\[\begin{aligned} \sigma_2^2 =& E(X_3 - 2 \cos\omega \cdot X_2 + X_1)^2 \\ =& \sigma^2 \left( \cos 3\omega - 2 \cos\omega \cos 2\omega + \cos\omega \right)^2 \\ & + \sigma^2 \left( \sin 3\omega - 2 \cos\omega \sin 2\omega + \sin\omega \right)^2 \\ =&0 \end{aligned}\] 所以离散谱序列可完全线性预测。 对任意\(t\), 为了用\(X_{t-1}, \dots, X_{t-p}\)预测\(X_t\), 当\(p \geq 2\)时有 \[ L(X_t | X_{t-1}, \dots, X_{t-p}) = 2 \cos\omega \cdot X_{t-1} - X_{t-2} \]\(L(X_t | X_{t-1}, \dots, X_{t-p}) = X_t\), 预测误差为零。

事实上,\(\Gamma_3\)\[ \left(\begin{array}{ccc} 1 & \cos\omega & \cos 2\omega \\ \cos\omega & 1 & \cos\omega \\ \cos 2\omega & \cos\omega & 1 \end{array}\right) \]\(b = \cos\omega\), 则\(\cos 2\omega = 2 b^2 - 1\), 行列式为

\[\begin{aligned} & \left|\begin{array}{ccc} 1 & b & 2b^2 - 1 \\ b & 1 & b \\ 2 b^2 - 1 & b & 1 \end{array}\right| \\ =& \left|\begin{array}{ccc} 1 & b & 2b^2 - 1 \\ 0 & 1 - b^2 & 2b - 2b^2 \\ 0 & 2b - 2b^2 & -4b^4 + 4b^2 \end{array}\right| \\ =& (1-b^2)^2 \left|\begin{array}{cc} 1 & 2b \\ 2b & 4b^2 \end{array}\right| \\ =& 0 \end{aligned}\]

10.9.2 用Levinson递推求解Y-W方程

\(\gamma_0 = \sigma^2\), \(\gamma_1 = \sigma^2 \cos\omega\), \(\gamma_2 = \sigma^2 \cos 2\omega\)

按照Levinson递推公式的递推次序依次计算: \[\begin{aligned} \text{初值}: & \\ \sigma_0^2 =& \gamma_0 = \sigma^2 \\ k+1=1: & \\ a_{11} =& \frac{\gamma_1}{\gamma_0} = \cos\omega \\ \sigma_1^2 =& E[ X_2 - L(X_2 | X_1) ]^2 \\ =& \sigma_0^2 [ 1 - a_{11}^2 ] = \sigma^2 \sin^2 \omega \\ k+1=2: & \\ a_{22} =& \frac{\gamma_2 - a_{11} \gamma_1}{\sigma_1^2} \\ =& \frac{\cos 2\omega - \cos\omega \cos\omega}{\sin^2 \omega} \\ =& -1 \\ a_{21} =& a_{11} - a_{22} a_{11} \\ =& \cos\omega [ 1 - (-1) ] = 2 \cos\omega \\ \sigma_2^2 =& E[ X_3 - L(X_3 | X_2, X_1) ]^2 \\ =& \sigma_1^2 [ 1 - a_{22}^2 ] \\ =& \sigma^2 \sin^2 \omega [1 - (-1)^2] \\ =& 0 \end{aligned}\]

10.10 附录:Y-W方程反解讨论

Y-W方程中如果\(\Gamma_{p+1}>0\)\(a_1, \dots, a_p, \sigma^2\)唯一确定, 其中\(a_1, \dots, a_p\)满足最小相位条件 (见定理10.1), \(\sigma^2>0\)

反过来, 如果\(a_1, \dots, a_p\)满足最小相位条件, \(\sigma^2>0\), Y-W方程中的\(\gamma_0, \gamma_1,\dots, \gamma_p\) 是否唯一确定?

参考:谢衷洁《时间序列分析》P.189定理4.4。 定理说明,给定某平稳列的前\(p+1\)个自协方差 \(\gamma_0, \gamma_1,\dots, \gamma_p\), 必存在AR\((p)\)序列使其前\(p+1\)个自协方差函数等于 这\(p+1\)个,模型参数由Y-W解出。 见习题6.1.2。 另外,该参考书定理4.5说明在前\(p+1\)个自协方差函数等于给定的这 \(p+1\)个的所有平稳列中,AR(\(p\))模型的一步预测误差达到最大, 从而信息量最大。

更一般地,对非可完全线性预测平稳列\(\{X_t\}\)的自协方差列 \(\{\gamma_k \}\),有各阶Y-W方程: \[\begin{align*} \Gamma_n \boldsymbol a_n =& \boldsymbol\gamma_n \\ \gamma_0 - \boldsymbol a_n^T \boldsymbol\gamma_n =& \sigma_n^2 \end{align*}\] 其中\(\boldsymbol\gamma_n = (\gamma_1, \dots, \gamma_n)^T\)。 假设\(\boldsymbol a_n = (a_{n1}, a_{n2}, \dots, a_{nn})^T\)\(\sigma_n^2>0\)给定, \(\boldsymbol a_n\)满足最小相位条件, 则满足上述Y-W方程的\(\gamma_0, \gamma_1, \dots, \gamma_n\)是否唯一确定?

\(n=1\), 显然 \[\begin{align*} \gamma_0 =& \frac{\sigma^2}{1-a_1^2} \\ \gamma_1 =& a_1 \gamma_0 \end{align*}\] 唯一。

\(n=2\),方程为 \[\begin{align*} \gamma_0 a_1 + \gamma_1 a_2 =& \gamma_1 \\ \gamma_1 a_1 + \gamma_0 a_2 =& \gamma_2 \\ \gamma_0 - a_1 a_1 \gamma_1 - a_2 \gamma_2 =& \sigma^2 \end{align*}\] 消元得 \[\begin{align*} \gamma_0 =& \sigma^2 / \left( 1 - a_2^2 - \frac{(1+a_2) a_1^2}{1-a_2} \right) \\ \gamma_1 =& \frac{a_1}{1-a_2} \gamma_0 \\ \gamma_2 =& a_1 \gamma_1 + a_2 \gamma_0 \end{align*}\]

\(n=3\),因为\(\gamma_0>0\)\(\gamma_k = \rho_k \gamma_0\), 所以如果能由\(a_1, a_2, a_3\)决定\(\rho_1, \rho_2, \rho_3\)则 可由 \[\begin{align*} \sigma_3^2 = \gamma_0 - a_1 \gamma_1 - a_2 \gamma_2 - - a_3 \gamma_3 = \gamma_0 (1 - a_1 \rho_1 - a_2 \rho_2 - a_3 \rho_3) \end{align*}\] 解出\(\gamma_0\)。 把 \[\begin{align*} \left(\begin{array}{ccc} \gamma_0 & \gamma_1 & \gamma_2 \\ \gamma_1 & \gamma_0 & \gamma_1 \\ \gamma_2 & \gamma_1 & \gamma_0 \end{array}\right) \left(\begin{array}{c} a_1 \\ a_2 \\ a_3 \end{array}\right) = \left(\begin{array}{c} \gamma_1 \\ \gamma_2 \\ \gamma_3 \end{array}\right) \end{align*}\] 两边除以\(\gamma_0\)并写成关于\(\rho_1, \rho_2, \rho_3\)的方程, 得 \[\begin{align*} \left(\begin{array}{ccc} a_2 - 1 & 0 & a_3 \\ a_1 + a_3 & -1 & 0 \\ a_2 & a_1 & -1 \end{array}\right) \left(\begin{array}{c} \rho_1 \\ \rho_2 \\ \rho_3 \end{array}\right) = -\left(\begin{array}{c} a_1 \\ a_2 \\ a_3 \end{array}\right) \end{align*}\] 很难判别此三元一次方程组的系数矩阵是否满秩。 可以计算其行列式为 \[\begin{align*} a_1^2 a_3 + a_1 a_3^2 + a_2 a_3 + a_2 - 1 \end{align*}\] 这个矩阵可能有不满秩的情况,例如当 \[\begin{align*} A(z) = 1 - 1.8z + 1.775789z^2 - 0.9z^3 \end{align*}\] 时,此矩阵行列式为零,且\(A(z)\)满足最小相位条件。