import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm

简单线性回归¶

nsample = 100
x = np.linspace(0, 10, nsample)
X = sm.add_constant(x)
beta = np.array([1, 10])
e = np.random.normal(size=nsample)
y = np.dot(X, beta) + e

model = sm.OLS(y,X)
results = model.fit()

print(results.params)

[ 1.15699288  9.98710951]

print(results.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.999
Model:                            OLS   Adj. R-squared:                  0.999
Method:                 Least Squares   F-statistic:                 7.396e+04
Date:                Wed, 13 Jul 2016   Prob (F-statistic):          7.35e-143
Time:                        14:45:48   Log-Likelihood:                -147.72
No. Observations:                 100   AIC:                             299.4
Df Residuals:                      98   BIC:                             304.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          1.1570      0.213      5.443      0.000         0.735     1.579
x1             9.9871      0.037    271.957      0.000         9.914    10.060
==============================================================================
Omnibus:                        0.573   Durbin-Watson:                   2.369
Prob(Omnibus):                  0.751   Jarque-Bera (JB):                0.717
Skew:                           0.137   Prob(JB):                        0.699
Kurtosis:                       2.688   Cond. No.                         11.7
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

y_fitted = results.fittedvalues
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(x, y, 'o', label='data')
ax.plot(x, y_fitted, 'r--.',label='OLS')
ax.legend(loc='best')

<matplotlib.legend.Legend at 0x7fc4814a9438>

y_fitted = results.fittedvalues
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(x, y, 'o', label='data')
ax.plot(x, y_fitted, 'r--.',label='OLS')
ax.legend(loc='best')
ax.axis((-0.05, 2, -1, 25))

(-0.05, 2, -1, 25)

高次模型的回归¶

nsample = 100
x = np.linspace(0, 10, nsample)
X = np.column_stack((x, x**2))
X = sm.add_constant(X)
beta = np.array([1, 0.1, 10])
e = np.random.normal(size=nsample)
y = np.dot(X, beta) + e

model = sm.OLS(y,X)
results = model.fit()

print(results.params)

[ 0.6864524   0.20010325  9.9915112 ]

print(results.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 4.614e+06
Date:                Wed, 13 Jul 2016   Prob (F-statistic):          3.57e-242
Time:                        14:47:19   Log-Likelihood:                -139.67
No. Observations:                 100   AIC:                             285.3
Df Residuals:                      97   BIC:                             293.2
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.6865      0.292      2.351      0.021         0.107     1.266
x1             0.2001      0.135      1.483      0.141        -0.068     0.468
x2             9.9915      0.013    765.007      0.000         9.966    10.017
==============================================================================
Omnibus:                        3.918   Durbin-Watson:                   2.046
Prob(Omnibus):                  0.141   Jarque-Bera (JB):                2.354
Skew:                          -0.145   Prob(JB):                        0.308
Kurtosis:                       2.306   Cond. No.                         144.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

y_fitted = results.fittedvalues
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(x, y, 'o', label='data')
ax.plot(x, y_fitted, 'r--.',label='OLS')
ax.legend(loc='best')

<matplotlib.legend.Legend at 0x7fc481325208>

y_fitted = results.fittedvalues
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(x, y, 'o', label='data')
ax.plot(x, y_fitted, 'r--.',label='OLS')
ax.legend(loc='best')
ax.axis((-0.05, 2, -1, 50))

(-0.05, 2, -1, 50)

哑变量¶

nsample = 50
groups = np.zeros(nsample, int)
groups[20:40] = 1
groups[40:] = 2
dummy = sm.categorical(groups, drop=True)
x = np.linspace(0, 20, nsample)
X = np.column_stack((x, dummy))
X = sm.add_constant(X)
beta = [10, 1, 1, 3, 8]
e = np.random.normal(size=nsample)
y = np.dot(X, beta) + e

result = sm.OLS(y,X).fit()
print(result.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.987
Model:                            OLS   Adj. R-squared:                  0.986
Method:                 Least Squares   F-statistic:                     1180.
Date:                Wed, 13 Jul 2016   Prob (F-statistic):           1.68e-43
Time:                        14:53:16   Log-Likelihood:                -67.128
No. Observations:                  50   AIC:                             142.3
Df Residuals:                      46   BIC:                             149.9
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const         10.3005      0.547     18.822      0.000         9.199    11.402
x1             1.0102      0.063     16.043      0.000         0.883     1.137
x2             0.4935      0.347      1.422      0.162        -0.205     1.192
x3             2.8904      0.290      9.966      0.000         2.307     3.474
x4             6.9166      0.653     10.585      0.000         5.601     8.232
==============================================================================
Omnibus:                        0.377   Durbin-Watson:                   2.017
Prob(Omnibus):                  0.828   Jarque-Bera (JB):                0.031
Skew:                           0.000   Prob(JB):                        0.985
Kurtosis:                       3.122   Cond. No.                     2.12e+17
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 1.51e-31. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

fig, ax = plt.subplots(figsize=(8,6))
ax.plot(x, y, 'o', label="data")
ax.plot(x, result.fittedvalues, 'r--.', label="OLS")
ax.legend(loc='best')

<matplotlib.legend.Legend at 0x7fc4810369e8>

简单应用¶

data = get_price(['000001.XSHG', '399001.XSHE'], start_date='2015-01-01', end_date='2016-01-01', frequency='daily', fields=['close'])['close']
x_price = data['000001.XSHG'].values
y_price = data['399001.XSHE'].values

x_pct, y_pct = [], []
for i in range(1, len(x_price)):
    x_pct.append(x_price[i]/x_price[i-1]-1)
for i in range(1, len(y_price)):
    y_pct.append(y_price[i]/y_price[i-1]-1)
    
x = np.array(x_pct)
X = sm.add_constant(x)
y = np.array(y_pct)

results = sm.OLS(y, X).fit()
print(results.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.846
Model:                            OLS   Adj. R-squared:                  0.845
Method:                 Least Squares   F-statistic:                     1325.
Date:                Wed, 13 Jul 2016   Prob (F-statistic):          6.56e-100
Time:                        14:54:37   Log-Likelihood:                 765.13
No. Observations:                 243   AIC:                            -1526.
Df Residuals:                     241   BIC:                            -1519.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.0002      0.001      0.327      0.744        -0.001     0.002
x1             0.9991      0.027     36.396      0.000         0.945     1.053
==============================================================================
Omnibus:                       41.392   Durbin-Watson:                   2.013
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               95.086
Skew:                          -0.802   Prob(JB):                     2.25e-21
Kurtosis:                       5.611   Cond. No.                         41.0
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

fig, ax = plt.subplots(figsize=(8,6))
ax.plot(x, y, 'o', label="data")
ax.plot(x, results.fittedvalues, 'r--', label="OLS")
ax.legend(loc='best')

<matplotlib.legend.Legend at 0x7fc481303c50>

量化交易吧 / 数理科学 帖子：3374198 新帖：25

【量化课堂】Statsmodels 统计包之 OLS 回归

汇市风云榜发表于：5 月 10 日 07：51回复(1)

在一切开始之前

简单 OLS 回归

高次模型的回归

哑变量

简单应用

结语

简单线性回归¶

高次模型的回归¶

哑变量¶

简单应用¶

全部回复

0/140

粉丝:555

帖子数:0

粉丝:686

帖子数:0

粉丝:676

帖子数:391

量化课程

热门标签

删除回复

确认要删除这篇文章么？

举报用户

信息提示

该文章已删除

设置置顶

完成设置【置顶】！

设置置顶

已取消设置【置顶】！

设置精华

完成设置【精华】！

设置精华

已取消设置【精华】！

审核信息

该文章已审核通过

审核信息

您已设置该文章审核不通过

举报成功

您已举报成功

用户登录

移动帖子

创建私信

屏蔽提示

确认要屏蔽该用户么？

屏蔽回复

您已对该用户实现屏蔽

信息回复

已发送成功

量化交易吧 / 数理科学帖子：3374198 新帖：25