3 正态时间序列和随机变量的收敛性

3.1 随机向量的数学期望和方差

矩阵随机变量\({\boldsymbol M} = (M_{i,j})_{m\times n}\).

期望为每个元素取期望: \[\begin{aligned} E ({\boldsymbol M}) = (E M_{i,j})_{m\times n} = (\mu_{ij})_{m\times n}. \end{aligned}\]

\(A, B, C\)是常值矩阵, \(C + A {\boldsymbol M} B\)有意义,则 \[\begin{aligned} E (C + A {\boldsymbol M} B) = C + A \cdot E ({\boldsymbol M}) \cdot B \end{aligned}\]

随机向量\({\boldsymbol X} = (X_1, X_2, \ldots, X_n)^T\). 则协方差阵为 \[ \Sigma_X = \text{Var}({\boldsymbol X}) = E[({\boldsymbol X} - \mu) ({\boldsymbol X} - \mu)^T] \]

\(\Sigma_X\)对称非负定(半正定)。 \[ \Sigma_X = E({\boldsymbol X} {\boldsymbol X}^T) - E({\boldsymbol X}) E({\boldsymbol X})^T . \]

\[\begin{align} {\boldsymbol Y} = {\boldsymbol a} + B {\boldsymbol X} \tag{3.1} \end{align}\] 则有 \[\begin{align} E {\boldsymbol Y} = {\boldsymbol a} + B E{\boldsymbol X}, \qquad \text{Var}({\boldsymbol Y}) = B \Sigma_X B^T. \tag{3.2} \end{align}\]

(3.2), 对\(Y = {\boldsymbol \alpha}^T {\boldsymbol X}\), 有 \[\begin{aligned} 0 \leq \text{Var}(Y) = \text{Var}({\boldsymbol \alpha}^T {\boldsymbol X}) = {\boldsymbol \alpha}^T \text{Var}({\boldsymbol X}) {\boldsymbol \alpha} \end{aligned}\] 由此可以证明随机向量协方差阵非负定(半正定)。

设随机向量 \(\boldsymbol X=(X_1, X_2, \dots, X_n)^T\)\(\boldsymbol Y=(Y_1, Y_2, \dots, Y_m)^T\), 则两个随机向量的协方差阵为 \[ \text{Cov}(\boldsymbol X, \boldsymbol Y) = E \big[ (\boldsymbol X - E \boldsymbol X) (\boldsymbol Y - E \boldsymbol Y)^T \big] \] 这是一个\(n \times m\)矩阵, 其\((i,j)\)元素为\(\text{Cov}(X_i, Y_j)\)。 有恒等式 \[ \text{Cov}(\boldsymbol X, \boldsymbol Y) = E \big( \boldsymbol X \boldsymbol Y^T \big) - (E \boldsymbol X) (E \boldsymbol Y)^T \]

\(\boldsymbol \mu\), \(\boldsymbol \nu\)为非随机的向量, \(A\), \(B\)为非随机的矩阵, 则 \[ \text{Cov}(\boldsymbol\mu + A \boldsymbol X, \boldsymbol\nu + B \boldsymbol Y) = A \text{Cov}(\boldsymbol X, \boldsymbol Y) B^T \]

3.2 多元正态分布

称随机向量 \({\boldsymbol Y} =(Y_1,Y_2,\cdots,Y_m)^T\) 服从 \(m\)元(或多元)正态分布, 如果存在 \(m\) 维常数列向量\({\boldsymbol\mu}\), \(m \times n\) 常数矩阵 \(B\) 和iid的标准正态随机变量 \(X_1,X_2,\ldots,X_n\) 使得 \[ {\boldsymbol Y}={\boldsymbol \mu} + B {\boldsymbol X} . \] 也称为多维正态分布。 这时\(E{\boldsymbol Y} = {\boldsymbol \mu}\), \(\Sigma = \text{Var}({\boldsymbol Y}) = B B^T\).

每个\(X_j\)的特征函数为 \[\begin{aligned} E\left( e^{it X_j} \right) = e^{-t^2/2} \end{aligned}\] 随机向量\(\boldsymbol X = (X_1, X_2, \dots, X_n)^T\)的特征函数为 \[\begin{aligned} \phi_{\boldsymbol X}(\boldsymbol t) =& E e^{i \boldsymbol t^T \boldsymbol X} = E \prod_{j=1}^n e^{i t_j X_j} \\ =& \prod_{j=1}^n E e^{i t_j X_j} = \prod_{j=1}^n e^{-t_j^2/2} = e^{-\boldsymbol t^T \boldsymbol t / 2} \end{aligned}\] 其中 \(\boldsymbol t = (t_1, t_2, \dots, t_n)^T\)

于是, \({\boldsymbol Y}\)的特征函数为 \[\begin{aligned} \phi_{\boldsymbol Y}(\boldsymbol t) =& E e^{i \boldsymbol t^T \boldsymbol Y} \\ =& E e^{i(\boldsymbol t^T \boldsymbol \mu + \boldsymbol t^T B \boldsymbol X)} \\ =& e^{i \boldsymbol t^T \boldsymbol \mu} E e^{i (\boldsymbol t^T B) \boldsymbol X} \\ =& e^{i \boldsymbol t^T \boldsymbol \mu} e^{-i (\boldsymbol t^T B) (\boldsymbol t^T B)^T / 2} \\ & (\text{注意} E e^{i \boldsymbol s^T \boldsymbol X} = e^{-\boldsymbol s^T \boldsymbol s / 2}, \text{令} \boldsymbol s^T = (\boldsymbol t^T B)) \\ =& \exp\left[ i {\boldsymbol t}^T {\boldsymbol \mu} - \frac{1}{2} {\boldsymbol t}^T B B^T {\boldsymbol t} \right] \\ =& \exp\left[ i {\boldsymbol t}^T {\boldsymbol \mu} - \frac{1}{2} {\boldsymbol t}^T \Sigma {\boldsymbol t} \right]. \end{aligned}\] 这是多维正态分布的等价定义。 特征函数为 \[\begin{align} \phi_{\boldsymbol Y}(\boldsymbol t) =& E e^{i \boldsymbol t^T \boldsymbol Y} = \exp\left[ i {\boldsymbol t}^T {\boldsymbol \mu} - \frac{1}{2} {\boldsymbol t}^T \Sigma {\boldsymbol t} \right]. \tag{3.3} \end{align}\]

多维正态分布记为\(\boldsymbol Y \sim\)N(\({\boldsymbol \mu}, \Sigma\)). \(\boldsymbol Y\)的分布完全由\(\boldsymbol\mu, \Sigma\)决定。

\(\Sigma>0\)(正定)时,\(\boldsymbol Y\)有密度 \[ p(\boldsymbol y) = (2\pi)^{-\frac{n}{2}} |\Sigma|^{-\frac{1}{2}} \exp\left\{ -\frac12 (\boldsymbol y - \boldsymbol\mu)^T \Sigma^{-1} (\boldsymbol y - \boldsymbol\mu) \right\} \]

\(|\Sigma|=0\), 则\(Y\)的分量由两部分\(\boldsymbol Y_1\)\(\boldsymbol Y_2\)组成, \(\text{Var} (\boldsymbol Y_1)>0\)\(\boldsymbol Y_2\)\(\boldsymbol Y_1\)的线性组合。(可递推证明)

定理3.1 \({\boldsymbol\xi} = (\xi_1, \xi_2,..\xi_n)^T \sim \text{N}({\boldsymbol\mu}, \Sigma)\) 的充分必要条件是:

对任何\({\boldsymbol a}=(a_1,a_2,\cdots,a_n)^T \in {\mathbb R}^n\) \[\begin{align} Y = {\boldsymbol a}^T \boldsymbol\xi \ \sim \text{N} ({\boldsymbol a}^T {\boldsymbol\mu}, {\boldsymbol a}^T \Sigma {\boldsymbol a}). \tag{3.4} \end{align}\]

定理(3.4)说明多维正态分布的任意线性组合是一元正态分布。 但是,这里的一元正态分布是推广的\(\text{N}(\mu, \sigma^2)\), 允许\(\sigma^2=0\)

证明:

必要性: 由(3.3)\(Y\)的特征函数为 \[\begin{align} \phi(t) =& E \exp(itY) \nonumber \\ =& E \exp[it {\boldsymbol a}^T {\boldsymbol \xi}] \nonumber \\ =& E \exp[i(t {\boldsymbol a}^T) {\boldsymbol \xi}] \nonumber \\ =& \exp\left[ it {\boldsymbol a}^T {\boldsymbol\mu} - \frac12 t^2 {\boldsymbol a}^T \Sigma {\boldsymbol a} \right] \tag{3.5} \end{align}\] 这是一元正态分布的特征函数,所以 \(Y \sim \text{N}({\boldsymbol a}^T {\boldsymbol\mu}, {\boldsymbol a}^T \Sigma {\boldsymbol a})\)

充分性

(3.4)成立, 则(3.5)成立, 取\(t=1\), 对任意\(\boldsymbol a\)\[\begin{aligned} E \exp(i {\boldsymbol a}^T {\boldsymbol\xi}) =& \exp\left( i {\boldsymbol a}^T {\boldsymbol\mu} - \frac12 {\boldsymbol a}^T \Sigma {\boldsymbol a} \right). \end{aligned}\]\(\boldsymbol\xi\)的特征函数为(3.3), 于是\(\boldsymbol\xi\)服从多维正态分布。

○○○○○○

3.3 正态平稳序列

定义3.1 对于时间序列 \(\{X_t\}\), 如果对任何 \(n \geq 1\)\(t_1,t_2,\cdots\), \(t_n \in \mathbb Z\), 有 \((X(t_1), X(t_2),\ldots,X(t_n))\)服从多元正态分布, 则称\(\{X_t\}\)正态时间序列. 特别当\(\{X_t\}\)还是平稳序列时, 又称为正态平稳列.

\(\{X_t: t \in \mathbb N_+ \}\) 是正态时间序列 \(\Longleftrightarrow\) 对任何正整数\(m\), \((X_1, X_2, \ldots, X_m)\)服从\(m\)维正态分布。

\(\{X_t: t \in {\mathbb Z} \}\) 是正态时间序列 \(\Longleftrightarrow\) 对任何正整数\(m\), \((X_{-m}, X_{-m+1}, \ldots, X_m)\)服从\(2m+1\)维正态分布.

正态分布对线性运算的封闭性为其理论研究提供了便利。 另外,正态分布和线性模型之间有一种内在的联系。

3.4 概率极限

\(\xi_n \sim F_n(x)\), \(\xi \sim F(x)\)。 如果在\(F\)的每个连续点\(x\)\(F_n(x) \to F(x)\), 则称\(\xi_n\)依分布收敛\(\xi\), 记做\(\xi_n \stackrel{d}{\to} \xi\)

如果对任取\(\epsilon>0\)\(P(|\xi_n-\xi|\geq \epsilon) \to 0\), 则称\(\xi_n\)依概率收敛\(\xi\), 或称\(\xi_n\)相合于\(\xi\), 或\(\xi_n\)弱收敛到\(\xi\), 记做\(\xi_n \stackrel{\text{P}}{\to} \xi\)

如果 \(E|\xi_n -\xi| \to 0\), 则称\(\xi_n\) \(L^1\)收敛到 \(\xi\) (很少用)。

如果 \(E|\xi_n -\xi|^2 \to 0\), 则称\(\xi_n\) \(L^2\)收敛\(\xi\), 或称\(\xi_n\) 均方收敛\(\xi\), 记做\(\xi_n \to \xi\ (L^2)\)

\(p>0\), 如果\(E|\xi_n|^p\)\(E|\xi|^p\)都有限, 且\(E|\xi_n - \xi|^p \to 0\), 则称称\(\xi_n\) \(L^p\)收敛\(\xi\)。 因为\(0 < p < q\)\(E |X|^p \leq 1 + E |X|^q\), 所以\(L^q\)收敛推出\(L^p\)收敛。

如果 \[\begin{aligned} P(\lim_{n\to\infty} \xi_n = \xi) = 1 \end{aligned}\] 则称\(\xi_n\) a.s.收敛\(\xi\)

定理3.2 \(L^2\)收敛 \(\Rightarrow\) \(L^1\)收敛 \(\Rightarrow\) 依概率收敛 \(\Rightarrow\) 依分布收敛。

证明略。

定理3.3 a.s.收敛 \(\Rightarrow\) 依概率收敛 \(\Rightarrow\) 依分布收敛。

证明略。

定理3.4 \(\xi_n\)依概率收敛到\(\xi\), 则存在子序列\(\{ n_k \}\)使得\(\xi_{n_k}\) a.s. 收敛到\(\xi\)

证明略。

定理3.5 依概率极限如果存在, 就a.s.唯一。

证明

设随机变量序列\(\{ \xi_n \}\)依概率收敛到\(\xi\)\(\eta\)。 则\(\forall \epsilon > 0\)\[ \begin{aligned} \lim_{n\to\infty} P(|\xi_n - \xi| > \epsilon) =& 0 \\ \ \lim_{n\to\infty} P(|\xi_n - \eta| > \epsilon) =& 0 \end{aligned} \] 于是 \[ \begin{aligned} & P(|\xi - \eta| > 2\epsilon) \\ =& P(|(\xi_n - \eta) - (\xi_n - \xi)| > 2\epsilon) \\ \leq& P(|\xi_n - \eta| + |\xi_n - \xi| > 2\epsilon) \\ \leq& P(|\xi_n - \eta| > \epsilon \text{ 或 } |\xi_n - \xi| > \epsilon) \\ \leq& P(|\xi_n - \eta| > \epsilon) + P(|\xi_n - \xi| > \epsilon) \\ \to& 0 \ (n \to \infty) \end{aligned} \] 于是 \[ \begin{aligned} P(\xi \neq \eta) =& P(|\xi - \eta| > 0) \\ =& P\left( \bigcup_{n=1}^\infty \left\{ |\xi - \eta| > \frac{1}{n} \right\} \right) \\ \leq& \sum_{n=1}^\infty P\left( |\xi - \eta| > \frac{1}{n} \right) \\ =& 0 \end{aligned} \]\(\xi = \eta\), a.s.,证毕。

推论 依概率1收敛、\(L^2\)极限、\(L^1\)极限、依概率极限如果存在, 则a.s.唯一; 如果这些极限中的几个同时存在, 则极限也a.s.相等。

定理3.6 \(\xi_n\)依分布收敛到\(\xi\), 当且仅当对任意\(\mathbb R\)上的一元有界实值连续函数\(f(\cdot)\)都有 \[ E f(\xi_n) \to E f(\xi), \ n \to \infty \]

由此,也称依分布收敛为弱收敛。 证明略。

定理3.7 \(\xi_n\)依分布收敛到\(\xi\), 当且仅当对任意\(\forall t \in \mathbb R\) \[ E e^{it\xi_n} \to E e^{it \xi}, \ n \to\infty \]

即依分布收敛等价于特征函数收敛。 证明略。

定理3.8 \(\xi_n\)依分布收敛到常数\(c\), 当且仅当\(\xi_n\)依概率收敛到常数\(c\)

证明略。

定理3.9 随机向量\(\boldsymbol{\xi}_n\) a.s. (或者\(L^p\)、依概率) 收敛到随机向量\(\boldsymbol{\xi}\), 当且仅当对应的分量a.s.(或者\(L^p\)、依概率)收敛关系成立。

证明略。

定理3.10 如果正态序列 \(\xi_n \sim \text{N}(\mu_n, \sigma_n^2), n \in \mathbb N\) 依分布收敛到随机变量 \(\xi\), 则极限 \[ \lim \mu_n = \mu, \ \lim \sigma_n^2 = \sigma^2 \] 存在,且 \(\xi \sim \text{N}(\mu, \sigma^2)\).

证明参见王梓坤《随机过程论》P.18。

定理3.11 \(\{\varepsilon_t\}\)是正态\(\text{WN}(0,\sigma^2)\)序列, 实数列\(\{a_j\}\)绝对可和,则线性序列 \[ X_t = \sum_{j=-\infty}^\infty a_j \varepsilon_{t-j} \] 是零均值正态平稳列,自协方差函数为 \[\begin{align} \gamma_k = \sigma^2 \sum_{j=-\infty}^\infty a_j a_{j+k}, \ k \in \mathbb Z \tag{3.6} \end{align}\]

\(\{a_j\} \in l_2\)时结论仍成立。

证明:

由§2.2\(\{ X_t \}\)是零均值平稳列, 自协方差函数为(3.6)

只要证明\(\{ X_t \}\)是正态序列, 只要证明\(\forall m \in \mathbb N_+\), \(\boldsymbol X = (X_{-m}, \dots, X_0, \dots, X_m)^T\)服从多元正态分布。 要使用定理3.1(多元正态与一元正态关系) 和定理3.10(一元正态分布的依分布极限仍为正态分布)。

\(\Sigma = (\gamma_{|i-j|})_{i,j=-m, \dots, m}\), 来证明\(\boldsymbol X \sim \text{N}(0, \Sigma)\)。 记 \[\begin{aligned} \eta_t(n) = \sum_{j=-n}^n a_j \varepsilon_{t-j}, \ t = -m, \dots, m \end{aligned}\] 这是\(X_t\)的部分和。由控制收敛定理可知 \[\begin{aligned} E|\eta_t(n) - X_t| \leq& \sum_{|j|>n} |a_j| \sigma \to 0 (n \to \infty) \end{aligned}\]

\(\forall \boldsymbol b = (b_{-m}, \dots, b_0, \dots, b_m)^T \in \mathbb R^{2m+1}\), 记 \[\begin{aligned} Y =& \boldsymbol b^T \boldsymbol X = \sum_{t=-m}^m b_t X_t \\ \eta(n) =& \sum_{t=-m}^m b_t \eta_t(n) \end{aligned}\] 则当\(n\to\infty\)\[\begin{aligned} E|\eta(n) - Y| \leq \sum_{t=-m}^m |b_t| \cdot E|\eta_t(n) - X_t| \to 0 \end{aligned}\]\(\eta(n) \stackrel{L_1}{\to} Y\), 于是\(\eta(n) \stackrel{d}{\to} Y\), 由定理3.1\(\eta(n)\)服从正态分布, 由定理3.10\(Y\)服从正态分布, 易见\(EY=0\), \[\begin{aligned} \text{Var}(Y) = \text{Var}(\sum_{t=-m}^m b_t X_t) = \boldsymbol b^T \Sigma \boldsymbol b \end{aligned}\] 即有\(Y \sim \text{N}(0, \boldsymbol b^T \Sigma \boldsymbol b)\), 从而由定理3.1可知\(\boldsymbol X\)服从多元正态分布, 从而\(\{ X_t \}\)为正态序列。

○○○○○○

3.4.1 一些反例

例3.1 a.s.收敛推出依概率收敛, 但是反之不然。给出反例。

\(\Omega=[0,1]\)\(P(\cdot)\)\([0,1]\)上的勒贝格测度。 令\(f_{mk} = I_{[\frac{k-1}{m}, \frac{k}{m}]}\), \(k=1,2,\dots,m\), \(m=1,2,\dots\)。 将\(\{ f_{mk} \}\)排序为 \(f_{11}\), \(f_{21}\), \(f_{22}\), \(f_{31}\), \(f_{32}\), \(f_{33}\), \(\ldots\), 记这个随机变量序列为\(X_n(\omega)\)(\(\omega \in [0,1]\))。 则对任意\(\epsilon \in (0,1)\)\[ P(|X_n(\omega)| > \epsilon) = P(X_n(\omega) = 1) \to 0 \ (n\to\infty) \]\(X_n\)依概率收敛到0。 但是对任意\(\omega \in [0,1]\), 总有无数个\(n\)使得\(X_n(\omega)=1\), 从而\(X_n\)不a.s.收敛到0。

○○○○○○

例3.2 \(L_1\)收敛和\(L_2\)收敛都推出依概率收敛, 但是反之不然。给出反例。

\(\Omega=[0,1]\)\(P(\cdot)\)\([0,1]\)上的勒贝格测度。 令\(X_n(\omega) = n^2 I_{[0, \frac{1}{n}]}(\omega)\), 则对任意\(\epsilon \in (0,1)\)\[ P(|X_n - 0| > \epsilon) = P(X_n \neq 0) = \frac{1}{n} \to 0, \ n \to \infty \]\(X_n\)依概率收敛到0。 但是, \[ E|X_n - 0| = E X_n = \int_0^1 X_n(\omega) \, d\omega = n^2 \cdot \frac{1}{n} = n \to \infty \ (n\to\infty) \] 所以\(X_n\)\(L_1\)收敛到0, 也不\(L_2\)收敛到0。

○○○○○○

例3.3 依概率收敛推出依分布收敛, 但是反之不然。给出反例。

\(X, X_1, X_2, \dots\)独立同N(0,1)分布。 则\(X_n\)的分布函数\(F_n(x)\)\(X\)的分布函数\(F(x)\)处处相等, 当然有\(\lim_{n\to\infty} F_n(x) = F(x)\), 对任意\(x \in (-\infty, \infty)\)成立, 即\(X_n\)依分布收敛到\(X\)。 但是对任意\(\epsilon>0\), 因为\(X_n - X \sim \text{N}(0, 2)\), 所以 \[ P(|X_n - X| > \epsilon) = 2(1 - \Phi(\epsilon/\sqrt{2})) \] 为正常数, 因此\(X_n\)不能依概率收敛到\(X\)。 上式中\(\Phi(\cdot)\)表示标准正态分布函数。

○○○○○○

3.5 补充

3.5.1 联合密度

性质: 若\(\boldsymbol Y\)服从多元正态分布\(N(\boldsymbol\mu, \Sigma)\)\(\Sigma\)正定, 则 \(\boldsymbol Y\)有联合密度 \[ p(\boldsymbol y) = (2\pi)^{-\frac{n}{2}} |\Sigma|^{-\frac{1}{2}} \exp\left\{ -\frac12 (\boldsymbol y - \boldsymbol\mu)^T \Sigma^{-1} (\boldsymbol y - \boldsymbol\mu) \right\} \]

证明:

\(\Sigma\)为对称正定阵时, 由线性代数知识可知\(\Sigma\)有特征值分解 \(\Sigma = U \Lambda U^T\), 其中\(\Lambda = \text{diag}(\lambda_1, \dots, \lambda_n)\)\(\lambda_j > 0, j=1,2,\dots,n\)\(U\)为正交阵\(U^T U = I_n\)。 令\(\Lambda^{-1/2} = \text{diag}(\lambda_1^{-1/2}, \dots, \lambda_n^{-1/2})\)\(\Sigma^{-1/2} = U \Lambda^{-1/2} U^T\), 令\(\boldsymbol X = \Sigma^{-1/2} (\boldsymbol Y - \boldsymbol\mu)\), 则\(\boldsymbol Y = \boldsymbol\mu + \Sigma^{1/2} \boldsymbol X\), \(\boldsymbol X\)的特征函数为 \[\begin{aligned} \phi(\boldsymbol t) =& E \exp\left\{ i \boldsymbol t^T \boldsymbol X \right\} \\ =& E \exp\left\{ i \boldsymbol t^T \Sigma^{-1/2} \boldsymbol Y \right\} \cdot \exp\left\{ -i \boldsymbol t^T \Sigma^{-1/2} \boldsymbol\mu \right\} \\ =& \exp\left\{ i \boldsymbol t^T \Sigma^{-1/2} \boldsymbol\mu - \frac12 \boldsymbol t^T \Sigma^{-1/2} \Sigma \Sigma^{-1/2} \boldsymbol t \right\} \cdot \exp\left\{ -i \boldsymbol t^T \Sigma^{-1/2} \boldsymbol\mu \right\} \\ =& \exp\left\{ -\frac12 \boldsymbol t^T \boldsymbol t \right\} \end{aligned}\] 这说明\(\boldsymbol X\)\(n\)维标准正态分布随机向量, 于是\(\boldsymbol X\)的联合密度函数为 \(p_{\boldsymbol X}(\boldsymbol x) = (2\pi)^{-n/2} \exp\{ -\frac12 \boldsymbol x^T \boldsymbol x \}\)。 从\(\boldsymbol X\)\(\boldsymbol Y\)的变换的逆变换为 \(\boldsymbol X = \Sigma^{-1/2} (\boldsymbol Y - \boldsymbol\mu)\), 这是\(\mathbb R^n\)上的一一变换, 逆变换的Jacobi行列式为\(|\Sigma^{-1/2}|=|\Sigma|^{-1/2}\)。 由随机向量的变换的密度公式可得\(\boldsymbol Y\)的密度为 \[\begin{aligned} p_{\boldsymbol Y}(\boldsymbol y) =& p_{\boldsymbol X}(\Sigma^{-1/2}(\boldsymbol y - \boldsymbol\mu)) \cdot |\Sigma|^{-1/2} \\ =& (2\pi)^{-n/2} |\Sigma|^{-1/2} \exp\left\{ -\frac12 (\boldsymbol y - \boldsymbol\mu)^T \Sigma^{-1} (\boldsymbol y - \boldsymbol\mu) \right\} \end{aligned}\]

○○○○○○

性质:对\(\boldsymbol\mu \in \mathbb R^n\)\(n\)阶对称非负定阵\(\Sigma\), 设\(\Sigma\)的秩为\(m \leq n\), 则存在列满秩矩阵\(B_{n\times m}\)\(m\)元的标准多元正态分布随机向量\(\boldsymbol X\) 使得\(\boldsymbol Y = \boldsymbol\mu + B \boldsymbol X\) 服从多元正态分布\(\text{N}(\boldsymbol\mu, \Sigma)\)分布。

证明

由线性代数知识,\(\Sigma\)有如下的特征值分解: \[ \Sigma = U \text{diag}(\lambda_1, \dots, \lambda_m, 0, \dots, 0) U^T, \] 其中\(\lambda_1 \geq \dots \geq \lambda_m > 0\)\(\Sigma\)的正特征值, \(U\)\(n\)阶正交阵, 记\(U = (U_1\ U_2)\), 其中\(U_1\)\(U\)的前\(m\)列组成的矩阵, 记\(\Lambda_1 = \text{diag}(\lambda_1, \dots, \lambda_m)\), \(\Lambda_1^{1/2} = \text{diag}(\lambda_1^{1/2}, \dots, \lambda_m^{1/2})\), 则 \[ \Sigma = (U_1\ U_2) \left(\begin{array}{cc} \Lambda_1 & 0 \\ 0 & 0 \end{array}\right) \left(\begin{array}{c} U_1^T \\ U_2^T \end{array}\right) = U_1 \Lambda_1 U_1^T \]\(B = U_1 \Lambda_1^{1/2}\), 则\(B B^T = \Sigma\), 于是若\(\boldsymbol X\)服从\(m\)元的标准多元正态分布, 则 \(\boldsymbol Y = \boldsymbol\mu + B \boldsymbol X\) 服从多元正态分布\(\text{N}(\boldsymbol\mu, B B^T)\)\(\text{N}(\boldsymbol\mu, \Sigma)\)

○○○○○○

3.5.2 二元正态分布

二元正态分布的协方差阵为 \[ \Sigma = \left(\begin{array}{cc} \sigma_1^2 & \rho \sigma_1 \sigma_2 \\ \rho \sigma_1 \sigma_2 & \sigma_2^2 \end{array}\right) \] 行列式\(|\Sigma| = \sigma_1^2 \sigma_2^2 (1 - \rho^2)\), \(|\Sigma| = 0\)当且仅当\(\rho = \pm 1\)\(|\rho| < 1\)时有联合密度 \[\begin{aligned} p_{\boldsymbol Y}(\boldsymbol y) =& \frac{1}{2 \pi \sigma_1 \sigma_2 \sqrt{1 - \rho^2}} \exp\left\{ - \frac{1}{2(1-\rho^2)} \left[ \left( \frac{y_1 - \mu_1}{\sigma_1} \right)^2 + \left( \frac{y_2 - \mu_2}{\sigma_2} \right)^2 \right. \right. \\ & \left. \left. - 2 \rho \left( \frac{y_1 - \mu_1}{\sigma_1} \right) \left( \frac{y_2 - \mu_2}{\sigma_2} \right) \right] \right\} \end{aligned}\]

3.5.3 正态条件分布

\[\begin{aligned} \boldsymbol X = \left(\begin{array}{c} \boldsymbol X_1 \\ \boldsymbol X_2 \end{array}\right) \sim \text{N}(\boldsymbol\mu, \Sigma), \ \boldsymbol \mu = \left(\begin{array}{c} \boldsymbol \mu_1 \\ \boldsymbol \mu_2 \end{array}\right) \ \Sigma = \left(\begin{array}{cc} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array}\right) \end{aligned}\]\(\boldsymbol X_2 = \boldsymbol x_2\)条件下\(\boldsymbol X_1\)的条件分布为 \[\begin{aligned} \text{N}\big(\boldsymbol\mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(\boldsymbol x_2 - \boldsymbol\mu_2), \; \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \big). \end{aligned}\] 条件方差不依赖于\(\boldsymbol x_2\)的值。

\[\begin{aligned} \boldsymbol X_{1\cdot 2} =& \boldsymbol X_1 - E(\boldsymbol X_1 | \boldsymbol X_2) = \boldsymbol X_1 - \boldsymbol\mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(\boldsymbol X_2 - \boldsymbol\mu_2), \\ \Sigma_{11\cdot 2} =& \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \end{aligned}\]\((\boldsymbol X_2, \boldsymbol X_{1\cdot 2})\)独立, \(\boldsymbol X_{1\cdot 2} \sim \text{N}(\boldsymbol 0, \Sigma_{11\cdot 2})\)

3.5.4 多元正态分布等价定义证明

如果随机向量\(\boldsymbol Z\)有特征函数 \[\begin{aligned} \phi(\boldsymbol t) = \exp(i \boldsymbol t^T \boldsymbol\mu - \frac12 \boldsymbol t^T \Sigma \boldsymbol t), \end{aligned}\] 其中\(\Sigma\)\(n\)阶非负定矩阵, 则存在分量独立同标准正态分布的随机向量\(\boldsymbol\varepsilon\)和常数矩阵\(B\)使得 \(\boldsymbol Z = \boldsymbol\mu + B \boldsymbol\varepsilon\)

证明: 令 \[\begin{aligned} \boldsymbol Y = \boldsymbol Z - \boldsymbol\mu \end{aligned}\]\(\boldsymbol Y\)的特征函数为 \[\begin{aligned} \phi_{\boldsymbol Y}(\boldsymbol t) =& E \exp\left[ i\boldsymbol t^T \boldsymbol Y \right] = \exp \left[ -\frac12 \boldsymbol t^T \Sigma \boldsymbol t \right] \end{aligned}\]\(\text{rank}(\Sigma)=m\leq n\), 做特征值分解 \[\begin{aligned} & \Sigma = P^T \Lambda P, \ P^T P = I_n, \\ & \Lambda = \text{diag}(\lambda_1, \lambda_2, \dots, \lambda_m, 0, \dots, 0) \end{aligned}\] (其中\(\lambda_j>0, j=1,2,\dots,m\))。 令 \[\begin{aligned} A =& \text{diag}(\lambda_1^{-\frac{1}{2}}, \lambda_2^{-\frac{1}{2}}, \dots, \lambda_m^{-\frac{1}{2}}, 1, \dots, 1) \\ \boldsymbol W =& A P \boldsymbol Y \end{aligned}\]\[\begin{aligned} \boldsymbol Y = P^T A^{-1} \boldsymbol W \stackrel{\triangle}{=} D \boldsymbol W, \end{aligned}\] 其中 \[\begin{aligned} \text{Var}(\boldsymbol W) = \text{Var}(A P \boldsymbol Y) = A P \Sigma P^T A = A \Lambda A = \text{diag}(1,1,\dots, 1, 0, \dots, 0) \end{aligned}\] 所以 \[\begin{aligned} \boldsymbol W = \left(\begin{array}{c} \boldsymbol \varepsilon \\ \boldsymbol 0 \end{array}\right) \end{aligned}\] 其中\(\boldsymbol \varepsilon\)\(m\)维。记 \[\begin{aligned} G = \left( I_m \ \ \boldsymbol 0 \right)_{m\times n} \end{aligned}\]\(\boldsymbol \varepsilon = G \boldsymbol W = G A P \boldsymbol Y\), \(\boldsymbol \varepsilon\)的特征函数为 \[\begin{aligned} \phi_{\boldsymbol \varepsilon}(\boldsymbol t) =& E \exp\left[ i \boldsymbol t^T G A P \boldsymbol Y \right] = \phi_{\boldsymbol Y}(P^T A G^T \boldsymbol t) \\ =& \exp\left[ -\frac12 \boldsymbol t^T GAP\Sigma P^T A G^T \boldsymbol t \right] = \exp\left[ -\frac12 \boldsymbol t^T \boldsymbol t \right] \end{aligned}\]\(\boldsymbol \varepsilon \sim \text{N}(0, I_m)\)。 则 \[\begin{aligned} \boldsymbol Y =& D \boldsymbol W = \left(D_1 \ D_2 \right) \left( \begin{array}{c} \boldsymbol \varepsilon \\ 0 \end{array}\right) \\ =& D_1 \boldsymbol \varepsilon \stackrel{\triangle}{=} B \boldsymbol \varepsilon \\ \boldsymbol Z =& \boldsymbol \mu + B \boldsymbol \varepsilon . \end{aligned}\] 证毕。