因子正交与系统性风险*¶
在文中,作者提出了一种新颖的因子正交方法,该方法相对于传统的主成分分宜以及施密特政教方法具有显著的有点,一是对所有的因子平等对待,而是该方法政教后得到的因子与原始因子保持最大的相似性。基于该方法正交后的因子,我们可以方便地将系统性风险*到各因子上,并衡量各因子对于资产系统性风险的相对贡献。
下面我们开始编写新的正交方法¶
假设一个风险资产j的收益可以被K个因子 $f^k$ 线性解释,例如市场(RM),规模 (SMB),价值(HML),动量(Mom),长期反转(Rev),线性因子模型表达式如下: $$ 𝑟^𝑡_j= 𝛼_𝑗 + 𝛴^K_{𝑘=1}𝛽_{𝑘_𝑗}𝑓_𝑡^k+ 𝜀_𝑡^j $$ 其中$ 𝑓_𝑘$ 和残差项 $𝜀_𝑗 $不相关,但是因子之间可能相关。 资产 𝑗 的收益方差可以表示为 $$ 𝜎_{𝑠𝑗}^2 = 𝛴^K_{k=1}𝛽_{𝑘_𝑗}𝛽_{𝑙_𝑗}𝐶𝑜𝑣(𝑓^𝑘 , 𝑓^𝑙 ) $$
我们通过正交的方式获得正交后的系数𝛽T,则收益方差 $$ 𝜎̂_{𝑠_𝑗}^2 = 𝛴^K_{k=1} (𝛽T_{k_j}𝜎̂T_{f^k})^2=𝜎̂^2_j-𝜎̂^2_{𝜀_j} $$
以下为实现步骤:¶
1.输入A,A的维数为T*K,K表示因子个数,T表示时间节点
2.用A的各因子列减去因子的均值得到$Fmean$
3.计算$Fmean$的协方差阵$𝛴_{K \times K}$,则重叠矩阵为$M_{K \times K}=(T-1)𝛴_{K \times K}$
4.为了获得Fmean正交矩阵FmeanT: $$ FmeanT' \times FmeanT=(FmeanT \times S_{K \times K})' FmeanS_{K \times K}=S_{K \times K}'(FmeanT'Fmean)S_{K \times K}=S_{K \times K}'M_{K \times K}S_{K \times K}=I_{K \times K} $$ $$ S_{K \times K}S_{K \times K}'=M_{K \times K}^{-1} $$
5.这里我们通过对$M_{K \times K}$特征*得到特征向量$O_{K \times K}$和特征根 对角阵$D_{K \times K}$,满足$M_{K \times K}=O_{K \times K}D_{K \times K}O_{K \times K}'$ 可以求出$S_{K \times K}=O_{K \times K}D_{K \times K}^{-1/2}O_{K \times K}'$
6.进行缩放 $S_{K \times K}->S_{K \times K}\sqrt{T-1}L_{K \times K}$ 其中$L_{K \times K}$为K个因子标准差构成的的对角阵
7.最后A的正交阵$AT=AS_{K \times K}$
更具体的过程请参考:Klein, Rudolf F., and V. K. Chow. "Orthogonalized factors and systematic risk decomposition." Quarterly Review of Economics & Finance 53.2(2013):175-187.
import pandas as pdimport numpy as npimport scipy.stats as scsfrom sklearn.decomposition import PCA from statsmodels.formula.api import olsfrom itertools import combinations
###新的正交方法###def new_zj(A):T=len(A[:,0])K=len(A[0,:])fmean=np.zeros_like(A)FF=np.zeros_like(A)tzg=np.zeros((K,K))for i in arange(K):fmean[:,i]=A[:,i]-mean(A[:,i])fmean=mat(fmean)M=(T-1)*numpy.cov(fmean.T) #获得重叠矩阵M=mat(M)u,v=np.linalg.eig(M) #u是特征值,v是特征向量sk1=np.dot( np.dot(v ,np.linalg.inv(np.diag(u**0.5)) ), v.T)sk2=np.dot(sk1*((T-1)**0.5),np.diag(A.var(axis=0)**0.5))####F=np.dot(A,sk2)for i in arange(K):FF[:,i]=[float(j) for j in F[:,i]]return FF
####pca正交化###def pca2(A):T,K=shape(A)fmean=np.zeros_like(A)#FF=np.zeros_like(A)#tzg=np.zeros((K,K))for i in arange(K):fmean[:,i]=A[:,i]-mean(A[:,i])getcov=np.cov(fmean.T)u,v=np.linalg.eig(getcov) #u是特征值,v是特征向量b = np.argsort(u)v2=v.T[b,:]result=np.dot(v2,A.T)result=result*(np.sign(A[0,0])/np.sign(result[0,0]))##做一个符号统一化return result.T
####pca 正交###def pca(A):pca=PCA(n_components=2)pca.fit(A)result=pca.fit_transform(A)# result=result*(sign(A[0,0]*sign(result[0,0])))return result
###函数,gram-schmidt正交化def gram_schmidt(A):T,K=shape(A)Q=np.zeros_like(A)Q[:,0]=A[:,0]for i in arange(1,K):for j in arange(i):uu=np.dot(np.dot(Q[:,j].T,A[:,i]),Q[:,j])/np.dot(Q[:,j],Q[:,j].T)hh=A[:,i]-uuQ[:,i]=hhreturn Q
###函数,获得三种正交化后变量的相关系数def three_orth(name1):A=factors[name1].as_matrix()index=[]index2=[]####PCA主成分回归,输出为第一个因子与第二个因子正交前后的相关系数result=pca(A)index.append(np.corrcoef(list(factors[name1[0]]),list(result[:,0]))[0][1])index.append(np.corrcoef(list(factors[name1[1]]),list(result[:,1]))[0][1])index2.append(np.linalg.norm(result))###Gram-Schmidt正交Q=gram_schmidt(A)index.append(np.corrcoef(list(factors[name1[0]]),Q[:,0])[0][1])index.append(np.corrcoef(list(factors[name1[1]]),Q[:,1])[0][1])index2.append(np.linalg.norm(Q))###新的正交new=new_zj(A)index.append(np.corrcoef(new[:,0],list(factors[name1[0]]))[0][1])index.append(np.corrcoef(new[:,1],list(factors[name1[1]]))[0][1])index2.append(np.linalg.norm(new))return index,index2
###输入数据factors=pd.read_csv('Five Factor.CSV')factors.head()factors.columns=['time','RM-RF','SMB','HML','RF','Mom','Rev']RM_RF=list(factors['RM-RF'])RF=list(factors['RF'])RM=[RM_RF[i]+RF[i] for i in arange(len(RF))]factors['RM']=RM
###输出相关系数test_data = ['RM','Mom','Rev','HML','SMB']print 'Note: corrcoef of the original factor and orthogonal factor'print 120*'-'print "%5s %30s %30s %30s" %('','PCA','GS','New')print "%14s %30s %30s %30s" % ('','-','-','-')print "%14s %15s %15s %15s %15s %15s %15s" % ('','(1)','(2)','(1)','(2)','(1)','(2)')for i in combinations(test_data, 2):i=list(i)corrindex=three_orth(i)[0]print "%14s %15.5f %15.5f %15.5f %15.5f %15.5f %15.5f" % (i[0]+'&'+i[1],corrindex[0],corrindex[1],corrindex[2],corrindex[3],corrindex[4],corrindex[5])
Note: corrcoef of the original factor and orthogonal factor PCA GS New - - - (1) (2) (1) (2) (1) (2) RM&Mom 0.89315 -0.68569 1.00000 0.95121 0.98645 0.98313 RM&Rev 0.98675 0.91315 1.00000 0.96517 0.99484 0.98797 RM&HML 0.98505 0.90869 1.00000 0.96374 0.99469 0.98810 RM&SMB 0.98789 0.88404 1.00000 0.94446 0.99222 0.97890 Mom&Rev -0.97695 0.90147 1.00000 0.97644 0.99519 0.99098 Mom&HML -0.95773 0.75221 1.00000 0.91919 0.98345 0.97003 Mom&SMB -0.99423 0.97119 1.00000 0.99271 0.99855 0.99686 Rev&HML 0.89920 -0.41602 1.00000 0.76766 0.94008 0.94245 Rev&SMB 0.87917 -0.61790 1.00000 0.91611 0.98078 0.97768 HML&SMB 0.91833 0.86050 1.00000 0.99086 0.99815 0.99777
#####输出范数#####print 'Note:Frobenius norm values'print '-'*70print "%14s %15s %15s %15s" %('','PCA','GS','New')for i in combinations(test_data, 2):i=list(i)corrindex=three_orth(i)[1]print "%14s %15.5f %15.5f %15.5f" % (i[0]+'&'+i[1],corrindex[0],corrindex[1],corrindex[2])
Note:Frobenius norm values PCA GS New RM&Mom 229.28022 227.13690 233.40926 RM&Rev 204.09788 204.52081 206.31689 RM&HML 205.32148 205.74586 207.59102 RM&SMB 199.82695 199.39138 202.05923 Mom&Rev 189.66259 189.34010 191.11510 Mom&HML 190.97870 186.98608 193.24317 Mom&SMB 185.05881 185.82403 186.30073 Rev&HML 159.87305 142.96471 160.35140 Rev&SMB 152.75239 147.49523 153.08458 HML&SMB 154.38348 154.53620 155.02464
从范数来看,新方法的范数更大¶
##函数,用来打印统计特征##def print_stat(dt):sta=scs.describe(dt,axis=0)print (100*"-")print "%14s %15s %15s %15s %15s %15s" % ('statistics',dt.columns[0],dt.columns[1],dt.columns[2],dt.columns[3],dt.columns[4])print "%14s %15.5f %15.5f %15.5f %15.5f% 15.5f" % ('min',sta[1][0][0],sta[1][0][1],sta[1][0][2],sta[1][0][3],sta[1][0][4])print "%14s %15.5f %15.5f %15.5f %15.5f% 15.5f" % ('max',sta[1][1][0],sta[1][1][1],sta[1][1][2],sta[1][1][3],sta[1][1][4])print "%14s %15.5f %15.5f %15.5f %15.5f% 15.5f" % ('mean',sta[2][0],sta[2][1],sta[2][2],sta[2][3],sta[2][4])print "%14s %15.5f %15.5f %15.5f %15.5f% 15.5f" % ('std',sta[3][0],sta[3][1],sta[3][2],sta[3][3],sta[3][4])print "%14s %15.5f %15.5f %15.5f %15.5f% 15.5f" % ('skew',sta[4][0],sta[4][1],sta[4][2],sta[4][3],sta[4][4])print "%14s %15.5f %15.5f %15.5f %15.5f% 15.5f" % ('kurtosis',sta[5][0],sta[5][1],sta[5][2],sta[5][3],sta[5][4])
##数据的处理test=factors[['RM','Mom','Rev','HML','SMB']].as_matrix()before=pd.DataFrame(test)before.columns=['RM','Mom','Rev','HML','SMB']after=pd.DataFrame(new_zj(test))after.columns=['RMT','MomT','RevT','HMLT','SMBT']
##打印统计特征##print 'Original'print_stat(before)print print print 'Orthogonal'print_stat(after)
Original statistics RM Mom Rev HML SMB min -29.10000 -52.26000 -13.95000 -13.28000 -17.28000 max 38.95000 18.33000 36.36000 35.46000 36.70000 mean 0.95748 0.59543 0.30870 0.40792 0.26307 std 27.87247 22.43319 11.98968 12.46907 10.33883 skew 0.26774 -3.13415 2.94240 2.18544 2.01414 kurtosis 8.37924 28.14925 25.05201 19.15865 19.98738 Orthogonal statistics RMT MomT RevT HMLT SMBT min -29.07851 -43.45487 -18.41538 -14.30052 -18.09761 max 35.23136 18.50452 37.40221 26.74093 32.00083 mean 1.04070 0.91282 0.10737 0.48244 0.15287 std 27.84582 22.41174 11.97822 12.45715 10.32895 skew -0.25533 -2.20092 2.05794 0.73530 1.29000 kurtosis 4.89541 16.10147 18.54849 6.10281 12.93129
print before.corr().round(2)print after.corr().round(2)
RM Mom Rev HML SMB RM 1.00 -0.34 0.25 0.25 0.32 Mom -0.34 1.00 -0.23 -0.41 -0.13 Rev 0.25 -0.23 1.00 0.64 0.40 HML 0.25 -0.41 0.64 1.00 0.13 SMB 0.32 -0.13 0.40 0.13 1.00 RMT MomT RevT HMLT SMBT RMT 1.0 0.0 -0.0 -0.0 0.0 MomT 0.0 1.0 0.0 0.0 0.0 RevT -0.0 0.0 1.0 -0.0 -0.0 HMLT -0.0 0.0 -0.0 1.0 0.0 SMBT 0.0 0.0 -0.0 0.0 1.0
上面是正交前后五因子的协方差矩阵¶
###读取资产组合数据Portfolios=pd.read_csv('Fama_Portfolio.csv')Portfolios.tail()Portfolios=Portfolios.rename(columns={Portfolios.columns.values[0]:'time'})newdata1=pd.merge(Portfolios,factors)after['time']=newdata1[['time']]newdata=pd.merge(newdata1,after)newdata.columns
Index([u'time', u'SmallGrowth', u'SmallNeutral', u'SmallValue', u'BigGrowth', u'BigNeutral', u'BigValue', u'SMALLDown', u'SmallMedium', u'SmallUp', u'BigDown', u'BigMedium', u'BigUp', u'SmallLowRev', u'ME1 PRIOR2', u'SmallHiRev', u'BigLowRev', u'ME2 PRIOR2', u'BigHiRv', u'RM-RF', u'SMB', u'HML', u'RF', u'Mom', u'Rev', u'RM', u'RMT', u'MomT', u'RevT', u'HMLT', u'SMBT'], dtype='object')
inputnames=['SmallValue','SmallGrowth','SMALLDown','SmallLowRev','BigValue','BigGrowth','BigDown','BigLowRev']print "%14s %15s %15s %15s %15s %15s" % ('Original factors','RM','Mom','Rev','HML','SMB')print '-'*110for i in inputnames:reg=ols(i+'~ RM+Mom+Rev+HML+SMB',data=newdata).fit()reg.params=reg.params[1:]print "%14s %15.3f %15.3f %15.3f %15.3f %15.3f" % (i, reg.params[0],reg.params[1],reg.params[2],reg.params[3],reg.params[4])print "%14s (%6.3f) (%6.3f) (%6.3f) (%6.3f) (%6.3f)" % ('(tvalue)', reg.tvalues[0],reg.tvalues[1],reg.tvalues[2],reg.tvalues[3],reg.tvalues[4])print printprint "%14s%15s %15s %15s %15s %15s" % ('Orthogonal factors','RMT','MomT','RevT','HMLT','SMBT')print '-'*110for i in inputnames:reg=ols(i+'~ RMT+MomT+RevT+HMLT+SMBT',data=newdata).fit()reg.params=reg.params[1:]print "%14s %15.3f %15.3f %15.3f %15.3f %15.3f" % (i, reg.params[0],reg.params[1],reg.params[2],reg.params[3],reg.params[4])print "%14s (%6.3f) (%6.3f) (%6.3f) (%6.3f) (%6.3f)" % ('(tvalue)', reg.tvalues[0],reg.tvalues[1],reg.tvalues[2],reg.tvalues[3],reg.tvalues[4])
Original factors RM Mom Rev HML SMB SmallValue 1.012 -0.028 0.029 0.761 0.918 (tvalue) ( 1.801) (219.495) (-5.284) ( 3.246) (87.182) SmallGrowth 1.090 -0.048 0.011 -0.227 1.033 (tvalue) (-3.699) (145.284) (-5.634) ( 0.757) (-15.956) SMALLDown 1.072 -0.587 0.013 0.239 0.992 (tvalue) (-2.307) (122.253) (-58.905) ( 0.760) (14.405) SmallLowRev 1.068 -0.102 0.389 0.382 1.051 (tvalue) ( 0.633) (131.973) (-11.132) (24.807) (24.970) BigValue 1.092 -0.034 -0.010 0.778 0.020 (tvalue) (-2.537) (135.808) (-3.669) (-0.646) (51.094) BigGrowth 1.014 -0.013 0.008 -0.235 -0.096 (tvalue) ( 3.365) (217.349) (-2.485) ( 0.873) (-26.586) BigDown 1.077 -0.673 -0.008 0.038 -0.062 (tvalue) ( 3.216) (132.585) (-72.766) (-0.509) ( 2.441) BigLowRev 1.058 -0.039 0.806 0.038 -0.127 (tvalue) (-0.529) (117.992) (-3.850) (46.347) ( 2.225) Orthogonal factors RMT MomT RevT HMLT SMBT SmallValue 1.163 -0.364 0.577 0.844 1.094 (tvalue) ( 1.801) (282.919) (-79.558) (92.022) (137.393) SmallGrowth 1.174 -0.234 0.266 -0.045 1.202 (tvalue) (-3.699) (175.525) (-31.408) (26.059) (-4.549) SMALLDown 1.273 -0.830 0.460 0.498 1.194 (tvalue) (-2.307) (162.899) (-95.341) (38.647) (42.664) SmallLowRev 1.242 -0.411 0.818 0.642 1.303 (tvalue) ( 0.633) (172.255) (-51.138) (74.438) (59.553) BigValue 1.134 -0.353 0.388 0.845 0.240 (tvalue) (-2.537) (158.328) (-44.143) (35.534) (78.910) BigGrowth 0.961 -0.146 0.036 -0.086 0.102 (tvalue) ( 3.365) (231.151) (-31.480) ( 5.679) (-13.905) BigDown 1.150 -0.841 0.185 0.314 0.182 (tvalue) ( 3.216) (158.816) (-104.236) (16.777) (28.967) BigLowRev 1.090 -0.276 0.857 0.432 0.247 (tvalue) (-0.529) (136.433) (-31.000) (70.334) (36.169)
上表为8个投资组合分别关于正交前以及正交后的因子回归得到的beta结果,对比t值我们可以发现,用正交后的因子做回归得到的beta的t值绝对值更大,因此回归系数更加显著,即beta更为稳定。¶
print 'Note: value= β^2 * σ^2'print "%14s %15s %15s %15s %15s %15s %15s %15s" % ('Original factors','RM','Mom','Rev','HML','SMB','Sum','σ^2(sj)')print '-'*130names=['RM','Mom','Rev','HML','SMB']for i in inputnames:reg=ols(i+'~ RM+Mom+Rev+HML+SMB',data=newdata).fit()reg.params=reg.params[1:]sumindex=(reg.params[0]**2)*var(newdata[names[0]])+(reg.params[1]**2)*var(newdata[names[1]])+(reg.params[2]**2)*var(newdata[names[2]])+(reg.params[3]**2)*var(newdata[names[3]])+(reg.params[4]**2)*var(newdata[names[4]])print "%14s %15.3f %15.3f %15.3f %15.3f %15.3f %15.3f %15.3f" % (i, (reg.params[0]**2)*var(newdata[names[0]]),(reg.params[1]**2)*var(newdata[names[1]]),(reg.params[2]**2)*var(newdata[names[2]]),(reg.params[3]**2)*var(newdata[names[3]]),(reg.params[4]**2)*var(newdata[names[4]]),sumindex,var(newdata[[i]]))print printprint "%14s%15s %15s %15s %15s %15s %15s %15s" % ('Orthogonal factors','RMT','MomT','RevT','HMLT','SMBT','Sum','σ^2(sj)')print '-'*130names=['RMT','MomT','RevT','HMLT','SMBT']for i in inputnames:reg=ols(i+'~ RMT+MomT+RevT+HMLT+SMBT',data=newdata).fit()reg.params=reg.params[1:]sumindex=(reg.params[0]**2)*var(newdata[names[0]])+(reg.params[1]**2)*var(newdata[names[1]])+(reg.params[2]**2)*var(newdata[names[2]])+(reg.params[3]**2)*var(newdata[names[3]])+(reg.params[4]**2)*var(newdata[names[4]])print "%14s %15.3f %15.3f %15.3f %15.3f %15.3f %15.3f %15.3f" % (i, (reg.params[0]**2)*var(newdata[names[0]]),(reg.params[1]**2)*var(newdata[names[1]]),(reg.params[2]**2)*var(newdata[names[2]]),(reg.params[3]**2)*var(newdata[names[3]]),(reg.params[4]**2)*var(newdata[names[4]]),sumindex,var(newdata[[i]]))
Note: value= β^2 * σ^2 Original factors RM Mom Rev HML SMB Sum σ^2(sj) SmallValue 28.541 0.017 0.010 7.215 8.700 44.483 66.258 SmallGrowth 33.096 0.052 0.001 0.640 11.024 44.812 56.608 SMALLDown 31.991 7.735 0.002 0.712 10.158 50.598 82.579 SmallLowRev 31.771 0.235 1.817 1.823 11.400 47.046 78.876 BigValue 33.209 0.025 0.001 7.532 0.004 40.772 51.354 BigGrowth 28.646 0.004 0.001 0.687 0.095 29.432 26.888 BigDown 32.316 10.138 0.001 0.018 0.039 42.511 56.106 BigLowRev 31.198 0.035 7.790 0.018 0.166 39.207 48.386 Orthogonal factors RMT MomT RevT HMLT SMBT Sum σ^2(sj) SmallValue 37.606 2.974 3.978 8.869 12.342 65.770 66.258 SmallGrowth 38.312 1.227 0.844 0.026 14.906 55.315 56.608 SMALLDown 45.048 15.431 2.535 3.090 14.709 80.813 82.579 SmallLowRev 42.927 3.783 8.016 5.131 17.514 77.371 78.876 BigValue 35.797 2.783 1.803 8.892 0.594 49.869 51.354 BigGrowth 25.696 0.477 0.016 0.093 0.107 26.388 26.888 BigDown 36.774 15.841 0.410 1.223 0.341 54.590 56.106 BigLowRev 33.082 1.708 8.792 2.325 0.631 46.537 48.386