ARCH, GARCH模型与广义矩量法（GMM）¶

本文将介绍时间序列模型中的ARCH模型和GARCH模型，它们常用于波动率的预测。此外我们还会讨论使用最大似然估计（MLE）和广义矩量法（GMM）来估计以上模型的参数。广义矩量法是一种非常酷的技术，它使用基于残差统计矩的目标函数来执行参数优化，特别是当我们的模型没有易于推导的最大似然估计时时非常有用。

ARCH模型的全称为AutoRegressive Conditionally Heteroskedastic。顾名思义，这也是一种时间序列自相关的模型。在金融市场中，ARCH模型非常适用于刻画股票的波动率序列。GARCH模型的英文为Generalized ARCH，即广义ARCH模型，是ARCH的推广。当GARCH模型的两个参数取1,1时，GARCH(1,1)即是ARCH模型的滞后项p取无穷时的情形。

import cvxopt
from functools import partial
import math
import numpy as np
import scipy
from scipy import stats
import statsmodels.api as sm
from statsmodels.stats.stattools import jarque_bera
import matplotlib.pyplot as plt

模拟一个 GARCH(1, 1) 案例¶

我们将使用蒙特卡洛方法来模拟一个 GARCH(1, 1)过程（即ARCH过程），模型形式如下

$$\sigma_1 = \sqrt{\frac{a_0}{1-a_1-b_1}} \\ \sigma_t^2 = a_0 + a_1 x_{t-1}^2+b_1 \sigma_{t-1}^2 \\ x_t = \sigma_t \epsilon_t \\ \epsilon \sim \mathcal{N}(0, 1)$$

我们的参数取 $a_0 = 1$, $a_1=0.1$, $b_1=0.8$。

# 定义参数
a0 = 1.0
a1 = 0.1
b1 = 0.8
sigma1 = math.sqrt(a0 / (1 - a1 - b1))

def simulate_GARCH(T, a0, a1, b1, sigma1):
    
    # 初始化
    X = np.ndarray(T)
    sigma = np.ndarray(T)
    sigma[0] = sigma1
    
    for t in range(1, T):
        # 求下一个 x_t
        X[t - 1] = sigma[t - 1] * np.random.normal(0, 1)
        # D求下一个 sigma_t
        sigma[t] = math.sqrt(a0 + b1 * sigma[t - 1]**2 + a1 * X[t - 1]**2)
        
    X[T - 1] = sigma[T - 1] * np.random.normal(0, 1)    
    
    return X, sigma

现在我们将GARCH（1,1）过程的尾部与正态分布进行对比。我们预计能看到更厚的尾部，因为GARCH（1,1）过程将容易取到极端值。

X, _ = simulate_GARCH(10000, a0, a1, b1, sigma1)
X = X[1000:] 
X = X / np.std(X) # 标准化

def compare_tails_to_normal(X):
    # Define matrix to store comparisons
    A = np.zeros((2,4))
    for k in range(4):
        A[0, k] = len(X[X > (k + 1)]) / float(len(X)) # 估计尾部
        A[1, k] = 1 - stats.norm.cdf(k + 1) # 与高斯过程对比
    return A

compare_tails_to_normal(X)

array([[1.54444444e-01, 2.21111111e-02, 2.77777778e-03, 6.66666667e-04],
       [1.58655254e-01, 2.27501319e-02, 1.34989803e-03, 3.16712418e-05]])

果然，GARCH（1,1）过程的尾巴更加肥厚。我们也可以用图像来展示（虽然这次图像看起来似乎不太直观）。

plt.figure(figsize=(16,9))
plt.hist(X, bins=50)
plt.xlabel('sigma')
plt.ylabel('observations');

下面我们用更直观的方法来展示。先从正态分布中抽样，然后再与GARCH（1,1）过程的值放在同一幅图中对比。

X2 = np.random.normal(0, 1, 9000)
both = np.matrix([X, X2])

plt.figure(figsize=(16,9))
plt.plot(both.T, alpha=.7);
plt.axhline(X2.std(), color='yellow', linestyle='--')
plt.axhline(-X2.std(), color='yellow', linestyle='--')
plt.axhline(3*X2.std(), color='red', linestyle='--')
plt.axhline(-3*X2.std(), color='red', linestyle='--')
plt.xlabel('time')
plt.ylabel('sigma');

橙色折线是正态分布，蓝色折线是GARCH（1,1）过程。黄色虚线是一倍标准差，红色虚线是三倍标准差。显然，蓝色折线更频繁地突破红线，说明GARCH（1,1）过程相比正态分布更容易得到极端值。

序列的ARCH检验¶

对时间序列做ARCH条件检验，即是用下面的模型来做回归拟合。

$$x_t^2 = a_0 + a_1 x_{t-1}^2 + \dots + a_p x_{t-p}^2$$

我们用OLS法来估测参数 $\hat\theta = (\hat a_0, \hat a_1, \dots, \hat a_p)$ 和协方差矩阵 $\hat\Omega$。然后，我们可以据此算出检验统计量：

$$F = \hat\theta \hat\Omega^{-1} \hat\theta'$$

如果$F$在$\mathcal(X)^2(p)$自由度下，大于95%的置信水平对应的阈值，我们就拒绝原假设，即认为这个序列满足ARCH条件。

下面，我们取20阶滞后项，用$p=20$来做ARCH检验。我们用于检验的序列是上面的GARCH(1,1)模型生成的过程。

X, _ = simulate_GARCH(1100, a0, a1, b1, sigma1)
X = X[100:] 

p = 20

Y2 = (X**2)[p:]
X2 = np.ndarray((980, p))
for i in range(p, 1000):
    X2[i - p, :] = np.asarray((X**2)[i-p:i])[::-1]

model = sm.OLS(Y2, X2)
model = model.fit()
theta = np.matrix(model.params)
omega = np.matrix(model.cov_HC0)
F = np.asscalar(theta * np.linalg.inv(omega) * theta.T)

print (np.asarray(theta.T).shape)

plt.figure(figsize=(16,9))
plt.plot(range(20), np.asarray(theta.T))
plt.xlabel('Lag Amount')
plt.ylabel('Estimated Coefficient for Lagged Datapoint')

print ('F = ' + str(F))

chi2dist = scipy.stats.chi2(p)
pvalue = 1-chi2dist.cdf(F)
print ('p-value = ' + str(pvalue))

print (theta/np.diag(omega))

(20, 1)
F = 381.2240140043612
p-value = 0.0
[[ 43.08500931  31.36000881  34.77898353  30.60777617  24.81555387
   18.50700298   1.86619722  55.13447654 -10.06165885  29.78249772
   29.13756465  40.8829858   16.6182405    6.81943106  25.30936091
   26.29932614  12.80433726  31.40710715  66.00553956 -35.18397387]]

p值小于0.05，说明这是一个ARCH序列（这是当然的，毕竟我们是用GARCH(1,1)模型生成的序列）。总之，以上就是序列是否满足ARCH条件的检验流程。

用极大似然估计法拟合 GARCH(1, 1)¶

为了使用极大似然估计来拟合，我们需要使用对数似然函数来估计GARCH(1, 1)的参数。

$$\mathcal{L}(\theta) = \sum_{t=1}^T - \ln \sqrt{2\pi} - \frac{x_t^2}{2\sigma_t^2} - \frac{1}{2}\ln(\sigma_t^2)$$

显然我们需要知道 $x_t$ 和 $\sigma_t$的值，$1 \leq t \leq T$。我们已经有了 $x_t$的值，但我们还需要计算 $\sigma_t$。其实我们就剩要对 $\sigma_1$进行一个“猜测”。我们猜测 $\sigma_1^2 = \hat E[x_t^2]$。一旦我们猜了第一个值，其他项的值就可以根据下列递推式依次求得。

$$\sigma_t^2 = a_0 + a_1 x_{t-1}^2 + b_1\sigma_{t-1}^2$$

def compute_squared_sigmas(X, initial_sigma, theta):
    
    a0 = theta[0]
    a1 = theta[1]
    b1 = theta[2]
    
    T = len(X)
    sigma2 = np.ndarray(T)
    
    sigma2[0] = initial_sigma ** 2
    
    for t in range(1, T):
        # Here's where we apply the equation
        sigma2[t] = a0 + a1 * X[t-1]**2 + b1 * sigma2[t-1]
    
    return sigma2

X, _ = simulate_GARCH(10000, a0, a1, b1, sigma1)
X = X[1000:]

plt.figure(figsize=(16,9))
plt.plot(range(len(X)), compute_squared_sigmas(X, np.sqrt(np.mean(X**2)), (1, 0.5, 0.5)))
plt.xlabel('Time')
plt.ylabel('Sigma');

上面的图就是我们求得的 $\sigma$序列。接下来我们就要根据求得的$\sigma$ 来通过数值优化求得模型的参数。

$$\hat\theta = \arg \max_{(a_0, a_1, b_1)}\mathcal{L}(\theta) = \arg \min_{(a_0, a_1, b_1)}-\mathcal{L}(\theta)$$

约束条件为： $$a_1 \geq 0, b_1 \geq 0, a_1+b_1 < 1$$

def negative_log_likelihood(X, theta):
    
    T = len(X)
    
    # Estimate initial sigma squared
    initial_sigma = np.sqrt(np.mean(X ** 2))
    
    # Generate the squared sigma values
    sigma2 = compute_squared_sigmas(X, initial_sigma, theta)
    
    # Now actually compute
    return -sum(
        [-np.log(np.sqrt(2.0 * np.pi)) -
        (X[t] ** 2) / (2.0 * sigma2[t]) -
        0.5 * np.log(sigma2[t]) for
         t in range(T)]
    )

objective = partial(negative_log_likelihood, X)

def constraint1(theta):
    return np.array([1 - (theta[1] + theta[2])])

def constraint2(theta):
    return np.array([theta[1]])

def constraint3(theta):
    return np.array([theta[2]])

cons = ({'type': 'ineq', 'fun': constraint1},
        {'type': 'ineq', 'fun': constraint2},
        {'type': 'ineq', 'fun': constraint3})

result = scipy.optimize.minimize(objective, (1, 0.5, 0.5),
                        method='SLSQP',
                        constraints = cons)
theta_mle = result.x
print ('theta MLE: ' + str(theta_mle))

/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:16: RuntimeWarning: invalid value encountered in log
  app.launch_new_instance()

theta MLE: [1.05421978 0.11844767 0.78050382]

我们已经得到了GARCH模型的MLE估计参数，那么如何来检验呢？我们通常关注下面两点：

残差的尾部有多厚
在Jarque-Bera正态性检测中，残差序列的表现如何？

我们可以使用下面的 check_theta_estimate 函数来很方便地完成这两点检验。

def check_theta_estimate(X, theta_estimate):
    initial_sigma = np.sqrt(np.mean(X ** 2))
    sigma = np.sqrt(compute_squared_sigmas(X, initial_sigma, theta_estimate))
    epsilon = X / sigma
    print ('Tails table')
    print (compare_tails_to_normal(epsilon / np.std(epsilon)))
    print ('')
    
    _, pvalue, _, _ = jarque_bera(epsilon)
    print ('Jarque-Bera probability normal: ' + str(pvalue))
    
check_theta_estimate(X, theta_mle)

Tails table
[[1.57222222e-01 2.36666667e-02 1.22222222e-03 1.11111111e-04]
 [1.58655254e-01 2.27501319e-02 1.34989803e-03 3.16712418e-05]]

Jarque-Bera probability normal: 0.30717890644147866

Jarque-Bera统计量的p值远远大于0.05（或者你取别的标准也行），拒绝原假设（H0：序列服从正态分布），即说明残差是不服从正态分布的。

量化交易吧 / 量化平台 帖子：3369864 新帖：43

【QLS-48】ARCH, GARCH模型和广义矩量法

大师做交易发表于：5 月 10 日 03：16回复(1)

ARCH, GARCH模型与广义矩量法（GMM）¶

模拟一个 GARCH(1, 1) 案例¶

序列的ARCH检验¶

用极大似然估计法拟合 GARCH(1, 1)¶

全部回复

0/140

粉丝:914

帖子数:0

粉丝:734

帖子数:0

粉丝:555

帖子数:3

量化课程

热门标签

删除回复

确认要删除这篇文章么？

举报用户

信息提示

该文章已删除

设置置顶

完成设置【置顶】！

设置置顶

已取消设置【置顶】！

设置精华

完成设置【精华】！

设置精华

已取消设置【精华】！

审核信息

该文章已审核通过

审核信息

您已设置该文章审核不通过

举报成功

您已举报成功

用户登录

移动帖子

创建私信

屏蔽提示

确认要屏蔽该用户么？

屏蔽回复

您已对该用户实现屏蔽

信息回复

已发送成功

量化交易吧 / 量化平台帖子：3369864 新帖：43