请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  数理科学 帖子:3364712 新帖:0

实现快速多因子测试框架(含全部实现代码).

只求稳定发表于:5 月 10 日 06:29回复(1)

前几天小编催我也写一个多因子策略,无奈我还没写到策略这一步,而且肚里没水很是尴尬啊 哈哈 想想每一阶段我差不多也要写一篇总结贴,因此就有了此文,此文的目的是实现了一个快速多因子测试框架,在这个框架下聚宽100多个因子池实现多因子回测不会超过30秒,哦哦 你可以尽情发挥你的想象力,一天做几千次测试验证,只要你有灵感!

因为本人数学理论方面并不擅长因子不在算法方面多做讨论,只着重于应用于实现。本文实现了简单打分法的因子组合测试,大家可以在这个框架下继续添加各种算法,很多人都知道细节是魔鬼,因为本文仅仅是随意选了因子库的前10个,所以回测结果并不理想,貌似仅仅是单调性好了一些,另外我也不想为这个演示贴花费太多精力,因此匆忙写就,细节部分的调整需要各位自行努力啦。

文中一些算法仅仅使用了最简单的方式,因为复杂算法实现在演示贴很困难 ,如需要附加多个算法库,如果用通用代码实现需要写太多代码,况且很多算法在论坛里都有,大家自行解决吧

1 相关系数计算,文中计算使用了简单的顺序相关系数,可改为加权相关系数更好。
2 因子贡献度, 文中只使用了因子1组和末组差来代替,更好的做法是,(1组-末组收益) / (市场1组-末组收益)
3 因子方向计算,文中使用因子累计超额方向来定义,可手工定义经济意义方向。
4 因子筛选排名只使用了因子贡献及因子ir ,大家可自行加入其他参数,如回撤,icp值,其他检验指标,夏普率等,可选几十项。
5 因子排名打分本文仅使用了等权累计,可选其他权重组合方式。

总之多因子策略相关细节很多很多,遍历一遍目录都需要半天别说全部写出来了,所以需要挨个测试实验体会。

本文数据部分需要大家自己解决。因为上传我也不知道上传到哪里。再说数据本身很简单没啥特殊的。
factors 是一个列表。
factors = [价格, 因子1, 因子2, 因子3。。。。]

最后提供了本文需要的全套矢量计算代码,基本涵盖了因子计算的需要,因为使用numpy方式写成,可能不太好理解。
所以如果很费劲可以自行使用习惯的方式替换本文的实现函数。

import time
import datetime
import jqdata
import datetime
from jqfactor import Factor,calc_factors,winsorize,standardlize
import pandas as pd
import statsmodels.api as sm
import scipy.stats as st
import pickle
pkl_file = open('My10.pkl', 'rb')
factors = pickle.load(pkl_file)

简单说明下数据¶

本次演示参与数据:¶

factors[0] : 价格数据, 300指数股. 09年1月4日开始,每周收盘价 (标准df格式)¶

factors[1:10] : 因子数据, 聚宽因子库前10个,周期同上 (标准数组格式)¶

=

factors[0] 
000001.XSHE 000002.XSHE 000060.XSHE 000063.XSHE 000069.XSHE 000100.XSHE 000157.XSHE 000166.XSHE 000333.XSHE 000338.XSHE ... 601992.XSHG 601997.XSHG 601998.XSHG 603160.XSHG 603260.XSHG 603288.XSHG 603799.XSHG 603833.XSHG 603858.XSHG 603993.XSHG
2009-01-09 385.000000 706.559998 109.669998 152.389999 137.009995 3.54 194.220001 NaN NaN 33.000000 ... NaN NaN 3.93 NaN NaN NaN NaN NaN NaN NaN
2009-01-16 415.089996 709.640015 119.110001 156.259995 133.110001 3.66 207.910004 NaN NaN 36.880001 ... NaN NaN 4.04 NaN NaN NaN NaN NaN NaN NaN
2009-01-23 454.959991 721.940002 119.900002 163.119995 141.369995 3.67 219.089996 NaN NaN 41.869999 ... NaN NaN 4.06 NaN NaN NaN NaN NaN NaN NaN
2009-02-06 515.539978 801.929993 141.149994 167.509995 156.160004 3.86 236.009995 NaN NaN 42.080002 ... NaN NaN 4.39 NaN NaN NaN NaN NaN NaN NaN
2009-02-13 531.960022 876.789978 145.929993 187.690002 174.369995 4.83 263.970001 NaN NaN 45.549999 ... NaN NaN 4.63 NaN NaN NaN NaN NaN NaN NaN
2009-02-20 532.739990 793.729980 145.809998 183.809998 165.029999 4.84 303.839996 NaN NaN 41.980000 ... NaN NaN 4.51 NaN NaN NaN NaN NaN NaN NaN
2009-02-27 539.390015 733.229980 126.379997 163.289993 178.729996 4.21 275.149994 NaN NaN 39.630001 ... NaN NaN 4.35 NaN NaN NaN NaN NaN NaN NaN
2009-03-06 585.900024 828.599976 142.179993 178.020004 201.619995 4.49 301.489990 NaN NaN 45.060001 ... NaN NaN 4.57 NaN NaN NaN NaN NaN NaN NaN
2009-03-13 579.250000 790.650024 139.789993 187.740005 178.580002 4.31 286.190002 NaN NaN 44.509998 ... NaN NaN 4.59 NaN NaN NaN NaN NaN NaN NaN
2009-03-20 599.190002 833.719971 182.979996 198.419998 197.100006 4.52 299.429993 NaN NaN 47.500000 ... NaN NaN 4.76 NaN NaN NaN NaN NaN NaN NaN
2009-03-27 639.840027 854.229980 195.360001 188.130005 207.380005 4.67 311.790009 NaN NaN 50.529999 ... NaN NaN 4.87 NaN NaN NaN NaN NaN NaN NaN
2009-04-03 654.690002 889.099976 185.820007 193.919998 212.520004 4.65 323.559998 NaN NaN 52.680000 ... NaN NaN 5.07 NaN NaN NaN NaN NaN NaN NaN
2009-04-10 649.219971 860.390015 192.750000 199.100006 208.619995 4.67 317.820007 NaN NaN 51.660000 ... NaN NaN 4.90 NaN NaN NaN NaN NaN NaN NaN
2009-04-17 620.690002 864.489990 203.429993 211.630005 216.100006 4.98 336.660004 NaN NaN 53.610001 ... NaN NaN 4.95 NaN NaN NaN NaN NaN NaN NaN
2009-04-24 599.190002 828.599976 175.929993 209.940002 208.470001 5.28 295.019989 NaN NaN 51.549999 ... NaN NaN 4.79 NaN NaN NaN NaN NaN NaN NaN
2009-04-30 637.880005 869.619995 177.070007 214.720001 223.880005 4.83 313.559998 NaN NaN 53.070000 ... NaN NaN 4.90 NaN NaN NaN NaN NaN NaN NaN
2009-05-08 710.580017 1009.080017 190.479996 216.410004 269.410004 4.93 301.489990 NaN NaN 56.009998 ... NaN NaN 5.22 NaN NaN NaN NaN NaN NaN NaN
2009-05-15 684.000000 1067.540039 186.270004 205.110001 260.309998 5.01 302.670013 NaN NaN 53.799999 ... NaN NaN 5.11 NaN NaN NaN NaN NaN NaN NaN
2009-05-22 668.369995 989.599976 199.110001 198.869995 255.910004 4.88 297.220001 NaN NaN 58.209999 ... NaN NaN 5.05 NaN NaN NaN NaN NaN NaN NaN
2009-05-27 697.289978 998.830017 217.300003 199.660004 255.910004 4.85 292.369995 NaN NaN 56.009998 ... NaN NaN 5.13 NaN NaN NaN NaN NaN NaN NaN
2009-06-05 781.719971 1096.250000 242.419998 193.929993 255.910004 5.05 293.250000 NaN NaN 54.639999 ... NaN NaN 5.21 NaN NaN NaN NaN NaN NaN NaN
2009-06-12 781.719971 1128.979980 234.119995 202.779999 284.480011 4.91 279.709991 NaN NaN 53.220001 ... NaN NaN 5.48 NaN NaN NaN NaN NaN NaN NaN
2009-06-19 862.630005 1276.280029 229.460007 209.779999 314.000000 5.02 316.790009 NaN NaN 56.349998 ... NaN NaN 5.88 NaN NaN NaN NaN NaN NaN NaN
2009-06-26 844.260010 1297.910034 235.259995 208.970001 296.420013 5.01 344.309998 NaN NaN 57.290001 ... NaN NaN 6.09 NaN NaN NaN NaN NaN NaN NaN
2009-07-03 922.429993 1470.969971 244.350006 212.360001 390.929993 5.08 343.279999 NaN NaN 57.950001 ... NaN NaN 6.06 NaN NaN NaN NaN NaN NaN NaN
2009-07-10 890.380005 1469.939941 237.190002 234.770004 374.290009 5.23 387.489990 NaN NaN 64.839996 ... NaN NaN 6.35 NaN NaN NaN NaN NaN NaN NaN
2009-07-17 917.349976 1449.339966 276.290009 234.470001 360.940002 5.19 392.850006 NaN NaN 68.519997 ... NaN NaN 6.44 NaN NaN NaN NaN NaN NaN NaN
2009-07-24 887.250000 1464.790039 331.320007 254.229996 389.200012 5.26 395.940002 NaN NaN 69.089996 ... NaN NaN 6.49 NaN NaN NaN NaN NaN NaN NaN
2009-07-31 1023.270020 1376.199951 344.790009 251.720001 343.040009 5.56 386.519989 NaN NaN 68.809998 ... NaN NaN 6.69 NaN NaN NaN NaN NaN NaN NaN
2009-08-07 919.690002 1313.359985 304.380005 247.149994 323.730011 5.35 388.299988 NaN NaN 68.370003 ... NaN NaN 6.42 NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2018-02-23 1459.180054 4429.020020 241.800003 532.500000 348.500000 10.03 278.040009 6.99 247.869995 151.779999 ... 11.29 15.69 9.70 79.750000 55.939999 211.250000 114.599998 149.899994 51.009998 25.940001
2018-03-02 1382.810059 4117.770020 238.009995 535.900024 332.609985 10.64 276.089996 6.98 242.139999 148.389999 ... 11.14 15.37 9.13 86.650002 56.700001 213.770004 123.800003 139.729996 51.619999 27.040001
2018-03-09 1399.010010 4268.290039 237.300003 571.609985 352.480011 11.17 281.290009 7.09 258.029999 147.679993 ... 11.44 15.51 9.13 89.300003 61.779999 225.190002 120.419998 142.729996 53.619999 27.260000
2018-03-16 1346.930054 4162.410156 234.690002 544.229980 342.540009 11.14 277.390015 6.98 257.079987 149.279999 ... 10.97 14.84 8.96 88.639999 59.919998 226.750000 129.619995 145.720001 53.090000 29.850000
2018-03-23 1312.219971 3963.409912 215.250000 493.230011 323.470001 10.20 274.140015 6.92 242.229996 141.800003 ... 9.82 14.71 9.13 82.430000 61.080002 221.440002 116.870003 133.429993 50.689999 26.910000
2018-03-30 1261.300049 4246.609863 222.130005 512.609985 327.440002 10.15 273.489990 6.95 234.860001 147.139999 ... 10.61 14.61 8.76 92.330002 65.739998 216.710007 119.250000 140.660004 53.009998 27.420000
2018-04-04 1257.829956 4184.100098 224.259995 509.890015 314.329987 10.06 272.839996 6.85 228.309998 152.669998 ... 10.16 14.81 8.58 92.400002 66.339996 226.710007 112.459999 136.119995 52.860001 25.740000
2018-04-13 1338.829956 3986.379883 224.020004 530.460022 326.250000 10.09 270.890015 6.79 221.039993 155.160004 ... 9.95 14.98 8.80 90.709999 68.129997 230.649994 112.169998 136.389999 50.849998 26.709999
2018-04-20 1313.380005 3846.050049 223.550003 532.330017 311.149994 9.42 268.940002 6.57 221.380005 145.179993 ... 10.19 14.50 8.66 103.400002 62.430000 229.610001 103.000000 135.509995 48.770000 24.799999
2018-04-27 1255.520020 3622.820068 224.020004 532.330017 307.570007 9.50 272.839996 6.49 222.589996 147.320007 ... 8.89 14.68 8.68 95.809998 69.040001 239.820007 109.639999 136.589996 51.080002 24.510000
2018-05-04 1235.849976 3432.750000 219.520004 532.330017 311.149994 9.68 274.790009 6.85 225.300003 144.289993 ... 8.95 13.87 8.87 78.089996 71.070000 245.779999 100.419998 136.470001 49.919998 24.260000
2018-05-11 1274.030029 3564.139893 228.529999 532.330017 323.859985 9.65 275.440002 6.85 236.979996 147.679993 ... 8.95 13.83 9.02 74.360001 73.879997 262.730011 111.160004 141.899994 51.130001 26.360001
2018-05-18 1268.250000 3463.360107 228.240005 532.330017 322.670013 9.50 278.690002 6.77 237.600006 153.020004 ... 8.87 14.08 9.15 73.849998 74.910004 265.910004 115.900002 147.839996 54.779999 26.520000
2018-05-25 1225.430054 3366.409912 221.300003 532.330017 312.339996 9.42 275.440002 6.60 223.050003 152.669998 ... 9.00 13.76 9.00 71.769997 78.540001 268.459991 107.279999 145.750000 56.240002 24.350000
2018-06-01 1179.150024 3316.659912 215.820007 532.330017 311.940002 9.35 273.489990 6.51 229.929993 154.449997 ... 8.38 13.48 8.61 71.900002 74.330002 286.459991 104.599998 138.399994 53.349998 23.190001
2018-06-08 1171.050049 3407.229980 212.529999 532.330017 310.350006 9.44 268.940002 6.50 241.000000 157.649994 ... 7.83 13.33 8.43 72.669998 73.489998 311.070007 101.919998 143.259995 52.490002 22.959999
2018-06-15 1176.829956 3570.520020 200.850006 387.989990 321.079987 9.20 270.239990 6.54 248.139999 169.589996 ... 7.55 13.26 8.51 69.489998 69.389999 301.119995 98.709999 142.000000 49.250000 22.090000
2018-06-22 1139.800049 3584.550049 173.100006 254.690002 315.519989 8.44 265.690002 6.08 236.539993 163.179993 ... 6.92 13.51 8.47 64.239998 69.500000 289.049988 94.750000 138.979996 44.790001 20.370001
2018-06-29 1051.859985 3138.070068 177.479996 221.539993 287.309998 8.74 266.989990 6.18 230.240005 155.869995 ... 6.96 12.57 8.43 65.150002 70.989998 284.989990 98.050003 127.529999 43.820000 20.610001
2018-07-06 1002.099976 2960.760010 161.410004 221.710007 256.519989 8.26 258.549988 6.10 208.990005 145.179993 ... 6.90 12.30 8.14 68.870003 68.440002 285.609985 92.080002 118.599998 43.099998 18.510000
2018-07-13 1043.290039 3025.820068 167.619995 239.729996 269.850006 8.65 261.149994 6.18 210.929993 151.419998 ... 7.17 12.46 8.41 69.370003 75.080002 296.290009 109.919998 122.269997 45.509998 21.389999
2018-07-20 1070.319946 2951.830078 165.059998 280.869995 266.100006 8.74 261.149994 6.23 205.330002 151.419998 ... 7.32 12.94 8.61 68.739998 73.730003 284.410004 105.360001 116.160004 45.250000 19.820000
2018-07-27 1086.760010 2974.790039 176.380005 262.339996 279.010010 9.04 270.890015 6.28 208.059998 153.910004 ... 8.03 12.92 8.69 75.180000 69.150002 283.170013 100.309998 117.089996 43.700001 18.969999
2018-08-03 1046.819946 2689.050049 166.889999 226.979996 262.350006 8.35 262.450012 6.03 184.649994 141.919998 ... 7.63 12.33 8.30 73.580002 63.389999 267.070007 88.660004 105.940002 39.990002 17.100000
2018-08-10 1084.410034 2956.929932 175.649994 246.529999 276.929993 8.56 270.890015 6.18 198.320007 147.419998 ... 8.03 12.55 8.44 77.150002 60.139999 266.179993 73.889999 104.120003 41.480000 16.410000
2018-08-17 1035.069946 2894.429932 165.059998 270.329987 263.179993 8.38 263.750000 6.08 181.300003 137.520004 ... 7.41 12.17 8.11 72.209999 57.860001 253.410004 67.949997 93.720001 38.790001 15.720000
2018-08-24 1178.400024 3020.540039 167.250000 291.410004 259.850006 8.44 269.589996 6.28 182.440002 138.800003 ... 7.39 12.74 8.37 74.660004 58.950001 263.390015 70.809998 100.750000 37.860001 15.760000
2018-08-31 1190.150024 3188.929932 168.350006 324.570007 261.519989 8.59 264.130005 6.31 183.410004 146.690002 ... 7.60 12.44 8.47 74.980003 58.950001 265.100006 70.660004 96.629997 37.540001 15.400000
2018-09-07 1176.050049 3077.550049 163.240005 307.570007 252.360001 8.35 253.889999 6.33 177.679993 141.550003 ... 7.84 12.33 8.27 85.239998 56.959999 262.000000 68.320000 92.660004 37.540001 14.680000
2018-09-14 1156.079956 3092.139893 162.869995 312.160004 251.520004 8.23 252.529999 6.33 177.679993 147.419998 ... 7.82 12.26 8.10 79.720001 53.730000 268.230011 70.559998 89.180000 37.669998 14.510000

487 rows × 284 columns

print factors[1].shape 
factors[1] 
(487, 284)
array([[ 0.29700801,  0.13352799,  0.119612  , ...,         nan,
                nan,         nan],
       [ 0.29700801,  0.13352799,  0.119612  , ...,         nan,
                nan,         nan],
       [ 0.29700801,  0.13352799,  0.119612  , ...,         nan,
                nan,         nan],
       ..., 
       [ 0.220341  ,  0.145767  ,  0.05608   , ...,  0.13442899,
         0.120966  ,  0.219456  ],
       [ 0.220341  ,  0.145767  ,  0.05608   , ...,  0.13442899,
         0.120966  ,  0.219456  ],
       [ 0.220341  ,  0.145767  ,  0.05608   , ...,  0.13442899,
         0.120966  ,  0.219456  ]], dtype=float32)

现在测试下本次演示用到的函数 (源代码在本文最后)¶

ford = pct_change(factors[0].values,-1,extre=True)              # 计算前向收益率,同时去极值
deme = demeaned(ford,1)                                         # 计算等权超额收益
facs,cuts = standard(factors[1],factors[0],q=5)                 # 因子标准化分组 
mask = np.isfinite(ford) & np.isfinite(facs)                    # 收集收益率及因子中的有效值标记
h = bincount(cuts,1,ford)[:,0:-1]/bincount(cuts,1,mask)[:,0:-1] # 分组统计收益/组数量
p = bincount(cuts,1,deme)[:,0:-1]/bincount(cuts,1,mask)[:,0:-1] # 分组统计收益/组数量
# 绘图
plt.figure(figsize=(12,3))
_ = plt.plot(np.nancumprod(h + 1, 0) - 1)  # 输出因子分层统计(分组收益)
plt.figure(figsize=(12,3))
_ = plt.plot(np.nancumprod(p + 1, 0) - 1)  # 输出因子分层统计(超额收益)
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: invalid value encountered in divide
  """
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:6: RuntimeWarning: invalid value encountered in divide
  

这里为每个因子定义方向¶

fsign = []
ford = pct_change(factors[0].values,-1,extre=True)                  # 计算前向收益率,同时去极值
deme = demeaned(ford,1)                        # 收益率去均值
for f in factors[1:]:
    # 这里处理因子对因子去极值标准化后分组,并返回标准化后的因子及分组表
    facs,cuts = standard(f,factors[0],q=5)                          # 因子标准化分组 
    # 这里收集数据中有效值标记,在后面计算中用到
    mask = np.isfinite(ford) & np.isfinite(facs)                    # 收集收益率及因子中的有效值标记
    # 汇总横截面每个分组的总收益 / 每个分组中有效元素的总是, 取得组均收益
    # [:,0:-1] 前面分组中将无效数据单分为一类,这样不需要单独标记无效值,
    # 因此可以用int8类型,相对只需内存占用1/8,因此这里取值避开最后无效组
    h = bincount(cuts,1,deme)[:,0:-1]/bincount(cuts,1,mask)[:,0:-1] # 分组统计收益/组数量
    # 这里计算每个因子的分组总收益
    m = np.nanprod(h + 1, 0) - 1
    # 收集每个因子方向
    fsign.append(m[0]<m[-1])
fsign
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:12: RuntimeWarning: invalid value encountered in divide
  if sys.path[0] == '':
[True, False, False, True, False, False, False, False, False, True]

重定义方向后的因子分层统计¶

ford = pct_change(factors[0].values,-1,extre=True)                  # 计算前向收益率,同时去极值
for f,sign in zip(factors[1:],fsign):
    # 分组时按经济意义确定方向(这里按fsign中的定义)
    facs,cuts = standard(f,factors[0],asc=sign,q=5)                 # 因子标准化分组 
    mask = np.isfinite(ford) & np.isfinite(facs)                    # 收集收益率及因子中的有效值标记
    h = bincount(cuts,1,ford)[:,0:-1]/bincount(cuts,1,mask)[:,0:-1] # 分组统计收益/组数量
    plt.figure(figsize=(12,2))
    _ = plt.plot(np.nancumprod(h + 1, 0) - 1)                       # 输出因子分层统计(分组收益)
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:6: RuntimeWarning: invalid value encountered in divide
  

因子合成之因子打分¶

ford = pct_change(factors[0].values,-1,extre=True)                         # 计算前向收益率,同时去极值
deme = demeaned(ford,1)                        # 收益率去均值
mask = np.isfinite(ford)                                                   # 收集市场收益有效
cutr = standard(ford,q=5)[-1]                                              # 前向收益分层
fdsc = []; ficr = []

for f,sign in zip(factors[1:],fsign):
    
    # 分组时按经济意义确定方向(这里按 fsign 中的定义)
    facs,cuts = standard(f,factors[0],asc=sign,q=5)                        # 因子标准化分组 
    mask = np.isfinite(ford) & np.isfinite(facs)                           # 收集收益率及因子中的有效值标记
    h = bincount(cuts,1,ford)[:,0:-1]/bincount(cuts,1,mask)[:,0:-1]        # 分组统计收益/组数量
    
    # 一组收益与末组收益差
    disw = pd.rolling_mean(pd.DataFrame(h[:,0]-h[:,-1]),window=24,min_periods=1)
    
    # 计算ic-ir
    ic = pearsonr(facs,ford,1,rank=True,pct=1)                             # 计算因子与前向收益的截面相关
    ic_mean = pd.rolling_mean(pd.DataFrame(ic),window=24,min_periods = 1)
    ic_std = pd.rolling_std(pd.DataFrame(ic),window=24,min_periods = 1)
    icir = ic_mean/ic_std
    
    # 收集每个因子贡献及ir
    fdsc.append(disw)                                
    ficr.append(icir)

# 计算排名 (注意 asc 定义了排名方向, asc=1 表示 ir 值越大越好, 否则 = 0)   
fdsc = argsort(np.c_[tuple(fdsc)],rank=1,asc=1,pct=1,fillv=0,dtype=int8)[-1]# 所有因子贡献排名
ficr = argsort(np.c_[tuple(ficr)],rank=1,asc=1,pct=1,fillv=0,dtype=int8)[-1]# 所有因子ir排名
# 简单加权组合
fank = (fdsc*0.5)+(ficr*0.5)  

print fank.shape
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:12: RuntimeWarning: invalid value encountered in divide
  if sys.path[0] == '':
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:15: FutureWarning: pd.rolling_mean is deprecated for DataFrame and will be removed in a future version, replace with 
	DataFrame.rolling(min_periods=1,window=24,center=False).mean()
  from ipykernel import kernelapp as app
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:20: RuntimeWarning: invalid value encountered in divide
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:19: FutureWarning: pd.rolling_mean is deprecated for DataFrame and will be removed in a future version, replace with 
	DataFrame.rolling(min_periods=1,window=24,center=False).mean()
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:20: FutureWarning: pd.rolling_std is deprecated for DataFrame and will be removed in a future version, replace with 
	DataFrame.rolling(min_periods=1,window=24,center=False).std()
(487, 10)

测试合成因子¶

close = factors[0].values
farg = argsort(fank,axis=1)[:,-5:]             # 对因子打分表排序,并返回最大的5个因子
scre = np.zeros(shape=close.shape)             # 生成一个分值表

# ================== 这里使用选择的因子为个股打分 ====================
for fx in np.unique(farg):                     # 遍历所有用到的因子索引
    rows = (farg==fx).any(1)                   # 取得因子有效行(该因子在哪些周期被选中)
    if rows.any():
        factor = factors[fx+1]
        cutr = standard(factor[rows],close[rows],q=5,asc=fsign[fx])[-1] # 因子分层
        cutr = cutr * fank[:,fx][rows][:,None]
        # 合成新的打分因子
        scre[rows] += cutr         

# ==================================================================        
# 合成因子不能直接用于回测(因为前天因子与昨天收益得到的结果差了2期,
# 因此合成因子需要延迟2期,避免未来函数)        
scre = shift(scre,2,fillv=0)                   # 因子打分延后2期 

# ========================下面显示合成因子  ========================

# w = 1 持股周期。模拟实际方式来测试因子有效性衰减. 

deme = demeaned(ford,1)                        # 收益率去均值
cutr = standard(scre,ford,q=5,w=1)[-1]         # 生成因子分层(返回标准化后的因子)
mask = np.isfinite(ford) & np.isfinite(scre)        
# ...................

# 因子分组收益
rets = bincount(cutr,1,ford)[:,0:-1]/bincount(cutr,1,mask)[:,0:-1] 
plt.figure(figsize=(12,3))
_ = plt.plot(np.nancumprod(rets+1, 0)-1)      
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:27: RuntimeWarning: invalid value encountered in divide
# 因子超额收益
rets = bincount(cutr,1,deme)[:,0:-1]/bincount(cutr,1,mask)[:,0:-1] 
plt.figure(figsize=(12,3))
_ = plt.plot(np.nancumprod(rets+1, 0)-1)      
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:2: RuntimeWarning: invalid value encountered in divide
  
# 因子24期滚动收益
rets = pd.rolling_sum(pd.DataFrame(rets),window=24,min_periods = 1)
plt.figure(figsize=(12,3))
_ = plt.plot(rets)
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:2: FutureWarning: pd.rolling_sum is deprecated for DataFrame and will be removed in a future version, replace with 
	DataFrame.rolling(min_periods=1,window=24,center=False).sum()
  
from numpy.core.umath_tests import inner1d

def pearsonr(A,B,axis=1,rank=False,pct=False):
    A, B = np.asarray(A), np.asarray(B);  ndis = np.arange(A.ndim) 
    axis = 0 if axis is None else (ndis[axis] if axis<0 else axis)
    if rank:
        A = argsort(A,axis,rank=rank,pct=pct,dtype=float)[1]
        B = argsort(B,axis,rank=rank,pct=pct,dtype=float)[1]
        if pct:
            A /= np.expand_dims(A.max(axis),axis)
            B /= np.expand_dims(B.max(axis),axis)
    Af = A-np.expand_dims(A.mean(axis),axis)
    Bf = B-np.expand_dims(B.mean(axis),axis)
    if axis == ndis[-1]: # -1轴有一倍加速
        mult = inner1d(Af,Bf)
        diff = np.sqrt(inner1d(Af,Af)*inner1d(Bf,Bf))
    else:
        mult = (Af*Bf).sum(axis)
        diff = np.sqrt((Af**2).sum(axis)*(Bf**2).sum(axis))  
    return mult/diff

def bincount(self,axis=None,weights=None,minlength=None):
    if axis is None: view = np.ravel(np.asarray(self))
    else:            view = np.asarray(self)
    mask = np.isfinite(view)
    if not weights is None:
        mask &= np.isfinite(weights);    weights = np.asarray(weights)[mask]
    if view.ndim==1: out = np.bincount(view[mask],weights=weights,minlength = minlength)
    else:
        m = view.shape[1-axis]; n = (view[mask].max()+1)*np.arange(m)
        if minlength is None: minlength = m*(view[mask].max().astype(int)+1)        
        if axis!=1: out = np.bincount((view+n)[mask].astype(int),weights=weights,minlength=minlength).reshape(m,-1).T
        else: out = np.bincount((view+n[:,None])[mask].astype(int),weights=weights,minlength=minlength).reshape(m,-1)
    return out

def demeaned(self,axis=0,func=np.nanmean):
    ''' 指定轴向去掉均值 '''
    view = np.asarray(self)
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore","Mean of empty slice")
        if axis!=0: 
              return self-func(view,axis)[:,None]
        else: return self-func(view,axis)
        
def shift(self,num,fillv=np.nan,axis=0):
    ''' 通用滚动函数
    self :  1d/2d array 
    axis :  支持轴向选择 0 or 1
    num  :  int   行或列轴方向移动  
    '''    
    result = np.empty_like(self) 
    axs = 1 if axis==0 else np.s_[0:]
    if num > 0:
        result[np.s_[:,:num][axs]] = fillv
        result[np.s_[:,num:][axs]] = self[np.s_[:,:-num][axs]]
    elif num < 0:
        result[np.s_[:,num:][axs]] = fillv
        result[np.s_[:,:num][axs]] = self[np.s_[:,-num:][axs]]
    else: result[:] = self
    return result
    
def pct_change(x,N=1,axis=0,extre=False):
    with warnings.catch_warnings(): 
        warnings.filterwarnings("ignore","invalid value encountered in (less|multiply|divide|greater)")
        warnings.filterwarnings('ignore', r'All-NaN (slice|axis) encountered')    
        if   N>0:
            rets = x/shift(np.asarray(x),N,axis=axis)-1
        elif N<0:
            rets = shift(np.asarray(x),N,axis=axis)/x-1
        if extre: # 去极值
            rets = winsorize(rets,qrange=[0.05,0.93],inclusive=True,axis=axis)   
    return rets

def fcut(self,q,axis=-1,asc=False,dtype=np.int16):
    ''' 快速版本分位数分箱函数
    ------------------
    self   : 1d/2d 数组
    axis   : 0 or 1
    q      : 分箱层数
    asc    : 分箱顺序反向(支持输入信号表,分期对因子方向进行调整)
    ------------------
    out: 与 pd.qcut 输出略有差异,另外Nan的值为>=q
    '''
    if asc:
          view =   np.asarray(self)
    else: view = 0-np.asarray(self)
    mask = np.isfinite(view)
    cout = view.shape[axis]; step = (1.0/q)*cout            # 计算分位平均间隔
    inds = argsort(view,axis=axis,rank=True)[1] # 取得顺序号
    caxs = np.expand_dims(mask.sum(axis),axis)-1.0 # 取得轴向最大数量
    inds = inds/caxs*(view.shape[axis]-1) # 去除无效值影响
    icut = (inds/step).astype(dtype) 
    icut[~mask]=q
    return icut

def argsort(self,axis=-1,asc=True,rank=False,pct=False,kind=None,fillv=None,dtype=np.int16):
    """ numpy - np.argsort 及扩展, 可扩展至多维。 
    Parameters
    ----------
        self         : 数组数据
        axis         : 轴向选择
        asc          : 顺序或者倒序
        rank         : 获取顺序排名(顺序号), 默认'mergesort' 为排序选择
        pct          : 百分比排名或(pct>1: 标准分排名,需指定 pct总数)
    Returns: 
    ----------
        idx          : 如果未选排名返回索引,同 np.argsort
        res          : 如果选择排名输出含排名
    另: 如果存在无效值,无效位上的索引或排名不能确定. 需要另行处理 """    
    if asc: view =  np.asarray(self)
    else:   view = -np.asarray(self)
    if rank|(pct!=0):
        rank = True; mask = np.isfinite(view) 
        if kind is None: kind = 'mergesort'
        res = np.empty(view.shape, dtype=dtype)
        I = np.ogrid[tuple(map(slice,view.shape))]
        idx = view.argsort(axis=axis,kind=kind)
        rng,I[axis]=I[axis],idx; res[I]=rng
        if pct: 
            emk  = expand_dims(mask.sum(axis),axis)-1.0
            emk[emk==0] = .00000001; ran = res / emk
            if pct>1: ran = np.rint(ran*(pct-1)).astype(dtype)
        else:  ran = res
        if fillv!=None: ran[~mask] = fillv    
        else:           ran[~mask] = pct    
        return idx.astype(dtype),ran
    if kind is None: kind = 'quicksort'
    return view.argsort(axis=axis,kind=kind).astype(dtype)

def standard(factor,other=None,q=5,w=1,asc=False,ac=True):
    """ 因子标准化流程函数(输出与输入尺寸相同)
    Parameters
    ----------
    factor       : 数组, 因子数据
    other        : 数组/列表/元组 其他可能参与的数据,用于收集无效值标记
    q            : 分位数分箱数
    w            : 周期间隔数,指定间隔内组合保持不变,近似实际操作
    fast         : 快速分位数计算,可选标准
    Returns: 
    ----------
    输出:处理后因子,分组表
    """
    mask = np.isfinite(factor)
    if not other is None:
        if isinstance(other,(list,tuple)): 
            for o in other:                                    # 遍历可能参数收集无效值标记
                mask &= np.isfinite(o)
        else:   mask &= np.isfinite(other)
    dayx  = np.arange(factor.shape[0]); rows = mask.any(1)
    if w > 1:# =================== 因子跨周期计算,近似模拟操作 ===========================
           equl,reps = np.unique(dayx[rows]//w,True,return_counts=True)[1:]        
           temp = factor[equl]; invd = mask[equl]                           # 跨周期分割
    else:  temp = factor[rows]; invd = mask[rows]   
    # =================== 标记无效元素,防止无效元素参与标准化及数字化计算 ===================
    try:   temp[~invd] = np.nan                            
    except:temp = temp.astype(float); temp[~invd] = np.nan   
    # ============================== 因子标准化,数字化处理 ==============================
    if ac:
        temp = winsorize(temp, qrange=[0.05,0.93], inclusive=True,axis=1)   # 因子去极值
        temp = standardlize(temp,axis=1)                                    # 因子标准化 
    cuts = fcut(temp,q,axis=1,asc=asc,dtype=np.uint8) 
    icut = np.empty(factor.shape,dtype=np.int8)                             # 生成空数组 
    ifac = np.empty(factor.shape,dtype=factor.dtype)                        # 生成空数组 
    if w > 1: 
          icut[rows] = np.asarray(cuts).repeat(reps,axis=0)                 # 跨周期复原
          ifac[rows] = np.asarray(temp).repeat(reps,axis=0)                 # 跨周期复原
    else: icut[rows] = cuts; ifac[rows] = temp
    icut[~mask] = q; ifac[~mask] = np.nan                                   # 超边界箱号   
    return ifac,icut

def hist(self,bins=33,ranges=None,facecolor=None,rwidth=0.8,normed=True,alpha=0.95):
    fig = plt.figure(figsize=(6, 3))  
    order = (self,) if isinstance(self,np.ndarray) else self
    for i,f in enumerate(order):
        ht = np.asarray(f).ravel();mask = np.isfinite(ht)  
        if ranges is None:
            ranges = tuple([np.min(ht[mask]),np.max(ht[mask])])
        if facecolor is None:
            plt.hist(ht[mask],bins=bins,range=ranges,histtype='bar',rwidth=rwidth,normed=normed,alpha=alpha)
        else:
            plt.hist(ht[mask],bins=bins,range=ranges,facecolor=facecolor,histtype='bar',rwidth=rwidth,normed=normed,alpha=alpha)
    plt.grid(True ,axis= 'both', which='major', linestyle = "-.", color = "black", linewidth = 0.5, alpha=.22)  
    plt.show()
 

全部回复

0/140

量化课程

    移动端课程