前几天小编催我也写一个多因子策略,无奈我还没写到策略这一步,而且肚里没水很是尴尬啊 哈哈 想想每一阶段我差不多也要写一篇总结贴,因此就有了此文,此文的目的是实现了一个快速多因子测试框架,在这个框架下聚宽100多个因子池实现多因子回测不会超过30秒,哦哦 你可以尽情发挥你的想象力,一天做几千次测试验证,只要你有灵感!
因为本人数学理论方面并不擅长因子不在算法方面多做讨论,只着重于应用于实现。本文实现了简单打分法的因子组合测试,大家可以在这个框架下继续添加各种算法,很多人都知道细节是魔鬼,因为本文仅仅是随意选了因子库的前10个,所以回测结果并不理想,貌似仅仅是单调性好了一些,另外我也不想为这个演示贴花费太多精力,因此匆忙写就,细节部分的调整需要各位自行努力啦。
文中一些算法仅仅使用了最简单的方式,因为复杂算法实现在演示贴很困难 ,如需要附加多个算法库,如果用通用代码实现需要写太多代码,况且很多算法在论坛里都有,大家自行解决吧
1 相关系数计算,文中计算使用了简单的顺序相关系数,可改为加权相关系数更好。
2 因子贡献度, 文中只使用了因子1组和末组差来代替,更好的做法是,(1组-末组收益) / (市场1组-末组收益)
3 因子方向计算,文中使用因子累计超额方向来定义,可手工定义经济意义方向。
4 因子筛选排名只使用了因子贡献及因子ir ,大家可自行加入其他参数,如回撤,icp值,其他检验指标,夏普率等,可选几十项。
5 因子排名打分本文仅使用了等权累计,可选其他权重组合方式。
总之多因子策略相关细节很多很多,遍历一遍目录都需要半天别说全部写出来了,所以需要挨个测试实验体会。
本文数据部分需要大家自己解决。因为上传我也不知道上传到哪里。再说数据本身很简单没啥特殊的。
factors 是一个列表。
factors = [价格, 因子1, 因子2, 因子3。。。。]
最后提供了本文需要的全套矢量计算代码,基本涵盖了因子计算的需要,因为使用numpy方式写成,可能不太好理解。
所以如果很费劲可以自行使用习惯的方式替换本文的实现函数。
import time
import datetime
import jqdata
import datetime
from jqfactor import Factor,calc_factors,winsorize,standardlize
import pandas as pd
import statsmodels.api as sm
import scipy.stats as st
import pickle
pkl_file = open('My10.pkl', 'rb')
factors = pickle.load(pkl_file)
=
factors[0]
000001.XSHE | 000002.XSHE | 000060.XSHE | 000063.XSHE | 000069.XSHE | 000100.XSHE | 000157.XSHE | 000166.XSHE | 000333.XSHE | 000338.XSHE | ... | 601992.XSHG | 601997.XSHG | 601998.XSHG | 603160.XSHG | 603260.XSHG | 603288.XSHG | 603799.XSHG | 603833.XSHG | 603858.XSHG | 603993.XSHG | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2009-01-09 | 385.000000 | 706.559998 | 109.669998 | 152.389999 | 137.009995 | 3.54 | 194.220001 | NaN | NaN | 33.000000 | ... | NaN | NaN | 3.93 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-01-16 | 415.089996 | 709.640015 | 119.110001 | 156.259995 | 133.110001 | 3.66 | 207.910004 | NaN | NaN | 36.880001 | ... | NaN | NaN | 4.04 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-01-23 | 454.959991 | 721.940002 | 119.900002 | 163.119995 | 141.369995 | 3.67 | 219.089996 | NaN | NaN | 41.869999 | ... | NaN | NaN | 4.06 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-02-06 | 515.539978 | 801.929993 | 141.149994 | 167.509995 | 156.160004 | 3.86 | 236.009995 | NaN | NaN | 42.080002 | ... | NaN | NaN | 4.39 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-02-13 | 531.960022 | 876.789978 | 145.929993 | 187.690002 | 174.369995 | 4.83 | 263.970001 | NaN | NaN | 45.549999 | ... | NaN | NaN | 4.63 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-02-20 | 532.739990 | 793.729980 | 145.809998 | 183.809998 | 165.029999 | 4.84 | 303.839996 | NaN | NaN | 41.980000 | ... | NaN | NaN | 4.51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-02-27 | 539.390015 | 733.229980 | 126.379997 | 163.289993 | 178.729996 | 4.21 | 275.149994 | NaN | NaN | 39.630001 | ... | NaN | NaN | 4.35 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-03-06 | 585.900024 | 828.599976 | 142.179993 | 178.020004 | 201.619995 | 4.49 | 301.489990 | NaN | NaN | 45.060001 | ... | NaN | NaN | 4.57 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-03-13 | 579.250000 | 790.650024 | 139.789993 | 187.740005 | 178.580002 | 4.31 | 286.190002 | NaN | NaN | 44.509998 | ... | NaN | NaN | 4.59 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-03-20 | 599.190002 | 833.719971 | 182.979996 | 198.419998 | 197.100006 | 4.52 | 299.429993 | NaN | NaN | 47.500000 | ... | NaN | NaN | 4.76 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-03-27 | 639.840027 | 854.229980 | 195.360001 | 188.130005 | 207.380005 | 4.67 | 311.790009 | NaN | NaN | 50.529999 | ... | NaN | NaN | 4.87 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-04-03 | 654.690002 | 889.099976 | 185.820007 | 193.919998 | 212.520004 | 4.65 | 323.559998 | NaN | NaN | 52.680000 | ... | NaN | NaN | 5.07 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-04-10 | 649.219971 | 860.390015 | 192.750000 | 199.100006 | 208.619995 | 4.67 | 317.820007 | NaN | NaN | 51.660000 | ... | NaN | NaN | 4.90 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-04-17 | 620.690002 | 864.489990 | 203.429993 | 211.630005 | 216.100006 | 4.98 | 336.660004 | NaN | NaN | 53.610001 | ... | NaN | NaN | 4.95 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-04-24 | 599.190002 | 828.599976 | 175.929993 | 209.940002 | 208.470001 | 5.28 | 295.019989 | NaN | NaN | 51.549999 | ... | NaN | NaN | 4.79 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-04-30 | 637.880005 | 869.619995 | 177.070007 | 214.720001 | 223.880005 | 4.83 | 313.559998 | NaN | NaN | 53.070000 | ... | NaN | NaN | 4.90 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-05-08 | 710.580017 | 1009.080017 | 190.479996 | 216.410004 | 269.410004 | 4.93 | 301.489990 | NaN | NaN | 56.009998 | ... | NaN | NaN | 5.22 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-05-15 | 684.000000 | 1067.540039 | 186.270004 | 205.110001 | 260.309998 | 5.01 | 302.670013 | NaN | NaN | 53.799999 | ... | NaN | NaN | 5.11 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-05-22 | 668.369995 | 989.599976 | 199.110001 | 198.869995 | 255.910004 | 4.88 | 297.220001 | NaN | NaN | 58.209999 | ... | NaN | NaN | 5.05 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-05-27 | 697.289978 | 998.830017 | 217.300003 | 199.660004 | 255.910004 | 4.85 | 292.369995 | NaN | NaN | 56.009998 | ... | NaN | NaN | 5.13 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-06-05 | 781.719971 | 1096.250000 | 242.419998 | 193.929993 | 255.910004 | 5.05 | 293.250000 | NaN | NaN | 54.639999 | ... | NaN | NaN | 5.21 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-06-12 | 781.719971 | 1128.979980 | 234.119995 | 202.779999 | 284.480011 | 4.91 | 279.709991 | NaN | NaN | 53.220001 | ... | NaN | NaN | 5.48 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-06-19 | 862.630005 | 1276.280029 | 229.460007 | 209.779999 | 314.000000 | 5.02 | 316.790009 | NaN | NaN | 56.349998 | ... | NaN | NaN | 5.88 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-06-26 | 844.260010 | 1297.910034 | 235.259995 | 208.970001 | 296.420013 | 5.01 | 344.309998 | NaN | NaN | 57.290001 | ... | NaN | NaN | 6.09 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-07-03 | 922.429993 | 1470.969971 | 244.350006 | 212.360001 | 390.929993 | 5.08 | 343.279999 | NaN | NaN | 57.950001 | ... | NaN | NaN | 6.06 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-07-10 | 890.380005 | 1469.939941 | 237.190002 | 234.770004 | 374.290009 | 5.23 | 387.489990 | NaN | NaN | 64.839996 | ... | NaN | NaN | 6.35 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-07-17 | 917.349976 | 1449.339966 | 276.290009 | 234.470001 | 360.940002 | 5.19 | 392.850006 | NaN | NaN | 68.519997 | ... | NaN | NaN | 6.44 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-07-24 | 887.250000 | 1464.790039 | 331.320007 | 254.229996 | 389.200012 | 5.26 | 395.940002 | NaN | NaN | 69.089996 | ... | NaN | NaN | 6.49 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-07-31 | 1023.270020 | 1376.199951 | 344.790009 | 251.720001 | 343.040009 | 5.56 | 386.519989 | NaN | NaN | 68.809998 | ... | NaN | NaN | 6.69 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2009-08-07 | 919.690002 | 1313.359985 | 304.380005 | 247.149994 | 323.730011 | 5.35 | 388.299988 | NaN | NaN | 68.370003 | ... | NaN | NaN | 6.42 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2018-02-23 | 1459.180054 | 4429.020020 | 241.800003 | 532.500000 | 348.500000 | 10.03 | 278.040009 | 6.99 | 247.869995 | 151.779999 | ... | 11.29 | 15.69 | 9.70 | 79.750000 | 55.939999 | 211.250000 | 114.599998 | 149.899994 | 51.009998 | 25.940001 |
2018-03-02 | 1382.810059 | 4117.770020 | 238.009995 | 535.900024 | 332.609985 | 10.64 | 276.089996 | 6.98 | 242.139999 | 148.389999 | ... | 11.14 | 15.37 | 9.13 | 86.650002 | 56.700001 | 213.770004 | 123.800003 | 139.729996 | 51.619999 | 27.040001 |
2018-03-09 | 1399.010010 | 4268.290039 | 237.300003 | 571.609985 | 352.480011 | 11.17 | 281.290009 | 7.09 | 258.029999 | 147.679993 | ... | 11.44 | 15.51 | 9.13 | 89.300003 | 61.779999 | 225.190002 | 120.419998 | 142.729996 | 53.619999 | 27.260000 |
2018-03-16 | 1346.930054 | 4162.410156 | 234.690002 | 544.229980 | 342.540009 | 11.14 | 277.390015 | 6.98 | 257.079987 | 149.279999 | ... | 10.97 | 14.84 | 8.96 | 88.639999 | 59.919998 | 226.750000 | 129.619995 | 145.720001 | 53.090000 | 29.850000 |
2018-03-23 | 1312.219971 | 3963.409912 | 215.250000 | 493.230011 | 323.470001 | 10.20 | 274.140015 | 6.92 | 242.229996 | 141.800003 | ... | 9.82 | 14.71 | 9.13 | 82.430000 | 61.080002 | 221.440002 | 116.870003 | 133.429993 | 50.689999 | 26.910000 |
2018-03-30 | 1261.300049 | 4246.609863 | 222.130005 | 512.609985 | 327.440002 | 10.15 | 273.489990 | 6.95 | 234.860001 | 147.139999 | ... | 10.61 | 14.61 | 8.76 | 92.330002 | 65.739998 | 216.710007 | 119.250000 | 140.660004 | 53.009998 | 27.420000 |
2018-04-04 | 1257.829956 | 4184.100098 | 224.259995 | 509.890015 | 314.329987 | 10.06 | 272.839996 | 6.85 | 228.309998 | 152.669998 | ... | 10.16 | 14.81 | 8.58 | 92.400002 | 66.339996 | 226.710007 | 112.459999 | 136.119995 | 52.860001 | 25.740000 |
2018-04-13 | 1338.829956 | 3986.379883 | 224.020004 | 530.460022 | 326.250000 | 10.09 | 270.890015 | 6.79 | 221.039993 | 155.160004 | ... | 9.95 | 14.98 | 8.80 | 90.709999 | 68.129997 | 230.649994 | 112.169998 | 136.389999 | 50.849998 | 26.709999 |
2018-04-20 | 1313.380005 | 3846.050049 | 223.550003 | 532.330017 | 311.149994 | 9.42 | 268.940002 | 6.57 | 221.380005 | 145.179993 | ... | 10.19 | 14.50 | 8.66 | 103.400002 | 62.430000 | 229.610001 | 103.000000 | 135.509995 | 48.770000 | 24.799999 |
2018-04-27 | 1255.520020 | 3622.820068 | 224.020004 | 532.330017 | 307.570007 | 9.50 | 272.839996 | 6.49 | 222.589996 | 147.320007 | ... | 8.89 | 14.68 | 8.68 | 95.809998 | 69.040001 | 239.820007 | 109.639999 | 136.589996 | 51.080002 | 24.510000 |
2018-05-04 | 1235.849976 | 3432.750000 | 219.520004 | 532.330017 | 311.149994 | 9.68 | 274.790009 | 6.85 | 225.300003 | 144.289993 | ... | 8.95 | 13.87 | 8.87 | 78.089996 | 71.070000 | 245.779999 | 100.419998 | 136.470001 | 49.919998 | 24.260000 |
2018-05-11 | 1274.030029 | 3564.139893 | 228.529999 | 532.330017 | 323.859985 | 9.65 | 275.440002 | 6.85 | 236.979996 | 147.679993 | ... | 8.95 | 13.83 | 9.02 | 74.360001 | 73.879997 | 262.730011 | 111.160004 | 141.899994 | 51.130001 | 26.360001 |
2018-05-18 | 1268.250000 | 3463.360107 | 228.240005 | 532.330017 | 322.670013 | 9.50 | 278.690002 | 6.77 | 237.600006 | 153.020004 | ... | 8.87 | 14.08 | 9.15 | 73.849998 | 74.910004 | 265.910004 | 115.900002 | 147.839996 | 54.779999 | 26.520000 |
2018-05-25 | 1225.430054 | 3366.409912 | 221.300003 | 532.330017 | 312.339996 | 9.42 | 275.440002 | 6.60 | 223.050003 | 152.669998 | ... | 9.00 | 13.76 | 9.00 | 71.769997 | 78.540001 | 268.459991 | 107.279999 | 145.750000 | 56.240002 | 24.350000 |
2018-06-01 | 1179.150024 | 3316.659912 | 215.820007 | 532.330017 | 311.940002 | 9.35 | 273.489990 | 6.51 | 229.929993 | 154.449997 | ... | 8.38 | 13.48 | 8.61 | 71.900002 | 74.330002 | 286.459991 | 104.599998 | 138.399994 | 53.349998 | 23.190001 |
2018-06-08 | 1171.050049 | 3407.229980 | 212.529999 | 532.330017 | 310.350006 | 9.44 | 268.940002 | 6.50 | 241.000000 | 157.649994 | ... | 7.83 | 13.33 | 8.43 | 72.669998 | 73.489998 | 311.070007 | 101.919998 | 143.259995 | 52.490002 | 22.959999 |
2018-06-15 | 1176.829956 | 3570.520020 | 200.850006 | 387.989990 | 321.079987 | 9.20 | 270.239990 | 6.54 | 248.139999 | 169.589996 | ... | 7.55 | 13.26 | 8.51 | 69.489998 | 69.389999 | 301.119995 | 98.709999 | 142.000000 | 49.250000 | 22.090000 |
2018-06-22 | 1139.800049 | 3584.550049 | 173.100006 | 254.690002 | 315.519989 | 8.44 | 265.690002 | 6.08 | 236.539993 | 163.179993 | ... | 6.92 | 13.51 | 8.47 | 64.239998 | 69.500000 | 289.049988 | 94.750000 | 138.979996 | 44.790001 | 20.370001 |
2018-06-29 | 1051.859985 | 3138.070068 | 177.479996 | 221.539993 | 287.309998 | 8.74 | 266.989990 | 6.18 | 230.240005 | 155.869995 | ... | 6.96 | 12.57 | 8.43 | 65.150002 | 70.989998 | 284.989990 | 98.050003 | 127.529999 | 43.820000 | 20.610001 |
2018-07-06 | 1002.099976 | 2960.760010 | 161.410004 | 221.710007 | 256.519989 | 8.26 | 258.549988 | 6.10 | 208.990005 | 145.179993 | ... | 6.90 | 12.30 | 8.14 | 68.870003 | 68.440002 | 285.609985 | 92.080002 | 118.599998 | 43.099998 | 18.510000 |
2018-07-13 | 1043.290039 | 3025.820068 | 167.619995 | 239.729996 | 269.850006 | 8.65 | 261.149994 | 6.18 | 210.929993 | 151.419998 | ... | 7.17 | 12.46 | 8.41 | 69.370003 | 75.080002 | 296.290009 | 109.919998 | 122.269997 | 45.509998 | 21.389999 |
2018-07-20 | 1070.319946 | 2951.830078 | 165.059998 | 280.869995 | 266.100006 | 8.74 | 261.149994 | 6.23 | 205.330002 | 151.419998 | ... | 7.32 | 12.94 | 8.61 | 68.739998 | 73.730003 | 284.410004 | 105.360001 | 116.160004 | 45.250000 | 19.820000 |
2018-07-27 | 1086.760010 | 2974.790039 | 176.380005 | 262.339996 | 279.010010 | 9.04 | 270.890015 | 6.28 | 208.059998 | 153.910004 | ... | 8.03 | 12.92 | 8.69 | 75.180000 | 69.150002 | 283.170013 | 100.309998 | 117.089996 | 43.700001 | 18.969999 |
2018-08-03 | 1046.819946 | 2689.050049 | 166.889999 | 226.979996 | 262.350006 | 8.35 | 262.450012 | 6.03 | 184.649994 | 141.919998 | ... | 7.63 | 12.33 | 8.30 | 73.580002 | 63.389999 | 267.070007 | 88.660004 | 105.940002 | 39.990002 | 17.100000 |
2018-08-10 | 1084.410034 | 2956.929932 | 175.649994 | 246.529999 | 276.929993 | 8.56 | 270.890015 | 6.18 | 198.320007 | 147.419998 | ... | 8.03 | 12.55 | 8.44 | 77.150002 | 60.139999 | 266.179993 | 73.889999 | 104.120003 | 41.480000 | 16.410000 |
2018-08-17 | 1035.069946 | 2894.429932 | 165.059998 | 270.329987 | 263.179993 | 8.38 | 263.750000 | 6.08 | 181.300003 | 137.520004 | ... | 7.41 | 12.17 | 8.11 | 72.209999 | 57.860001 | 253.410004 | 67.949997 | 93.720001 | 38.790001 | 15.720000 |
2018-08-24 | 1178.400024 | 3020.540039 | 167.250000 | 291.410004 | 259.850006 | 8.44 | 269.589996 | 6.28 | 182.440002 | 138.800003 | ... | 7.39 | 12.74 | 8.37 | 74.660004 | 58.950001 | 263.390015 | 70.809998 | 100.750000 | 37.860001 | 15.760000 |
2018-08-31 | 1190.150024 | 3188.929932 | 168.350006 | 324.570007 | 261.519989 | 8.59 | 264.130005 | 6.31 | 183.410004 | 146.690002 | ... | 7.60 | 12.44 | 8.47 | 74.980003 | 58.950001 | 265.100006 | 70.660004 | 96.629997 | 37.540001 | 15.400000 |
2018-09-07 | 1176.050049 | 3077.550049 | 163.240005 | 307.570007 | 252.360001 | 8.35 | 253.889999 | 6.33 | 177.679993 | 141.550003 | ... | 7.84 | 12.33 | 8.27 | 85.239998 | 56.959999 | 262.000000 | 68.320000 | 92.660004 | 37.540001 | 14.680000 |
2018-09-14 | 1156.079956 | 3092.139893 | 162.869995 | 312.160004 | 251.520004 | 8.23 | 252.529999 | 6.33 | 177.679993 | 147.419998 | ... | 7.82 | 12.26 | 8.10 | 79.720001 | 53.730000 | 268.230011 | 70.559998 | 89.180000 | 37.669998 | 14.510000 |
487 rows × 284 columns
print factors[1].shape
factors[1]
(487, 284)
array([[ 0.29700801, 0.13352799, 0.119612 , ..., nan, nan, nan], [ 0.29700801, 0.13352799, 0.119612 , ..., nan, nan, nan], [ 0.29700801, 0.13352799, 0.119612 , ..., nan, nan, nan], ..., [ 0.220341 , 0.145767 , 0.05608 , ..., 0.13442899, 0.120966 , 0.219456 ], [ 0.220341 , 0.145767 , 0.05608 , ..., 0.13442899, 0.120966 , 0.219456 ], [ 0.220341 , 0.145767 , 0.05608 , ..., 0.13442899, 0.120966 , 0.219456 ]], dtype=float32)
ford = pct_change(factors[0].values,-1,extre=True) # 计算前向收益率,同时去极值
deme = demeaned(ford,1) # 计算等权超额收益
facs,cuts = standard(factors[1],factors[0],q=5) # 因子标准化分组
mask = np.isfinite(ford) & np.isfinite(facs) # 收集收益率及因子中的有效值标记
h = bincount(cuts,1,ford)[:,0:-1]/bincount(cuts,1,mask)[:,0:-1] # 分组统计收益/组数量
p = bincount(cuts,1,deme)[:,0:-1]/bincount(cuts,1,mask)[:,0:-1] # 分组统计收益/组数量
# 绘图
plt.figure(figsize=(12,3))
_ = plt.plot(np.nancumprod(h + 1, 0) - 1) # 输出因子分层统计(分组收益)
plt.figure(figsize=(12,3))
_ = plt.plot(np.nancumprod(p + 1, 0) - 1) # 输出因子分层统计(超额收益)
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: invalid value encountered in divide """ /opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:6: RuntimeWarning: invalid value encountered in divide
fsign = []
ford = pct_change(factors[0].values,-1,extre=True) # 计算前向收益率,同时去极值
deme = demeaned(ford,1) # 收益率去均值
for f in factors[1:]:
# 这里处理因子对因子去极值标准化后分组,并返回标准化后的因子及分组表
facs,cuts = standard(f,factors[0],q=5) # 因子标准化分组
# 这里收集数据中有效值标记,在后面计算中用到
mask = np.isfinite(ford) & np.isfinite(facs) # 收集收益率及因子中的有效值标记
# 汇总横截面每个分组的总收益 / 每个分组中有效元素的总是, 取得组均收益
# [:,0:-1] 前面分组中将无效数据单分为一类,这样不需要单独标记无效值,
# 因此可以用int8类型,相对只需内存占用1/8,因此这里取值避开最后无效组
h = bincount(cuts,1,deme)[:,0:-1]/bincount(cuts,1,mask)[:,0:-1] # 分组统计收益/组数量
# 这里计算每个因子的分组总收益
m = np.nanprod(h + 1, 0) - 1
# 收集每个因子方向
fsign.append(m[0]<m[-1])
fsign
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:12: RuntimeWarning: invalid value encountered in divide if sys.path[0] == '':
[True, False, False, True, False, False, False, False, False, True]
ford = pct_change(factors[0].values,-1,extre=True) # 计算前向收益率,同时去极值
for f,sign in zip(factors[1:],fsign):
# 分组时按经济意义确定方向(这里按fsign中的定义)
facs,cuts = standard(f,factors[0],asc=sign,q=5) # 因子标准化分组
mask = np.isfinite(ford) & np.isfinite(facs) # 收集收益率及因子中的有效值标记
h = bincount(cuts,1,ford)[:,0:-1]/bincount(cuts,1,mask)[:,0:-1] # 分组统计收益/组数量
plt.figure(figsize=(12,2))
_ = plt.plot(np.nancumprod(h + 1, 0) - 1) # 输出因子分层统计(分组收益)
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:6: RuntimeWarning: invalid value encountered in divide
ford = pct_change(factors[0].values,-1,extre=True) # 计算前向收益率,同时去极值
deme = demeaned(ford,1) # 收益率去均值
mask = np.isfinite(ford) # 收集市场收益有效
cutr = standard(ford,q=5)[-1] # 前向收益分层
fdsc = []; ficr = []
for f,sign in zip(factors[1:],fsign):
# 分组时按经济意义确定方向(这里按 fsign 中的定义)
facs,cuts = standard(f,factors[0],asc=sign,q=5) # 因子标准化分组
mask = np.isfinite(ford) & np.isfinite(facs) # 收集收益率及因子中的有效值标记
h = bincount(cuts,1,ford)[:,0:-1]/bincount(cuts,1,mask)[:,0:-1] # 分组统计收益/组数量
# 一组收益与末组收益差
disw = pd.rolling_mean(pd.DataFrame(h[:,0]-h[:,-1]),window=24,min_periods=1)
# 计算ic-ir
ic = pearsonr(facs,ford,1,rank=True,pct=1) # 计算因子与前向收益的截面相关
ic_mean = pd.rolling_mean(pd.DataFrame(ic),window=24,min_periods = 1)
ic_std = pd.rolling_std(pd.DataFrame(ic),window=24,min_periods = 1)
icir = ic_mean/ic_std
# 收集每个因子贡献及ir
fdsc.append(disw)
ficr.append(icir)
# 计算排名 (注意 asc 定义了排名方向, asc=1 表示 ir 值越大越好, 否则 = 0)
fdsc = argsort(np.c_[tuple(fdsc)],rank=1,asc=1,pct=1,fillv=0,dtype=int8)[-1]# 所有因子贡献排名
ficr = argsort(np.c_[tuple(ficr)],rank=1,asc=1,pct=1,fillv=0,dtype=int8)[-1]# 所有因子ir排名
# 简单加权组合
fank = (fdsc*0.5)+(ficr*0.5)
print fank.shape
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:12: RuntimeWarning: invalid value encountered in divide if sys.path[0] == '': /opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:15: FutureWarning: pd.rolling_mean is deprecated for DataFrame and will be removed in a future version, replace with DataFrame.rolling(min_periods=1,window=24,center=False).mean() from ipykernel import kernelapp as app /opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:20: RuntimeWarning: invalid value encountered in divide /opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:19: FutureWarning: pd.rolling_mean is deprecated for DataFrame and will be removed in a future version, replace with DataFrame.rolling(min_periods=1,window=24,center=False).mean() /opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:20: FutureWarning: pd.rolling_std is deprecated for DataFrame and will be removed in a future version, replace with DataFrame.rolling(min_periods=1,window=24,center=False).std()
(487, 10)
close = factors[0].values
farg = argsort(fank,axis=1)[:,-5:] # 对因子打分表排序,并返回最大的5个因子
scre = np.zeros(shape=close.shape) # 生成一个分值表
# ================== 这里使用选择的因子为个股打分 ====================
for fx in np.unique(farg): # 遍历所有用到的因子索引
rows = (farg==fx).any(1) # 取得因子有效行(该因子在哪些周期被选中)
if rows.any():
factor = factors[fx+1]
cutr = standard(factor[rows],close[rows],q=5,asc=fsign[fx])[-1] # 因子分层
cutr = cutr * fank[:,fx][rows][:,None]
# 合成新的打分因子
scre[rows] += cutr
# ==================================================================
# 合成因子不能直接用于回测(因为前天因子与昨天收益得到的结果差了2期,
# 因此合成因子需要延迟2期,避免未来函数)
scre = shift(scre,2,fillv=0) # 因子打分延后2期
# ========================下面显示合成因子 ========================
# w = 1 持股周期。模拟实际方式来测试因子有效性衰减.
deme = demeaned(ford,1) # 收益率去均值
cutr = standard(scre,ford,q=5,w=1)[-1] # 生成因子分层(返回标准化后的因子)
mask = np.isfinite(ford) & np.isfinite(scre)
# ...................
# 因子分组收益
rets = bincount(cutr,1,ford)[:,0:-1]/bincount(cutr,1,mask)[:,0:-1]
plt.figure(figsize=(12,3))
_ = plt.plot(np.nancumprod(rets+1, 0)-1)
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:27: RuntimeWarning: invalid value encountered in divide
# 因子超额收益
rets = bincount(cutr,1,deme)[:,0:-1]/bincount(cutr,1,mask)[:,0:-1]
plt.figure(figsize=(12,3))
_ = plt.plot(np.nancumprod(rets+1, 0)-1)
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:2: RuntimeWarning: invalid value encountered in divide
# 因子24期滚动收益
rets = pd.rolling_sum(pd.DataFrame(rets),window=24,min_periods = 1)
plt.figure(figsize=(12,3))
_ = plt.plot(rets)
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:2: FutureWarning: pd.rolling_sum is deprecated for DataFrame and will be removed in a future version, replace with DataFrame.rolling(min_periods=1,window=24,center=False).sum()
from numpy.core.umath_tests import inner1d
def pearsonr(A,B,axis=1,rank=False,pct=False):
A, B = np.asarray(A), np.asarray(B); ndis = np.arange(A.ndim)
axis = 0 if axis is None else (ndis[axis] if axis<0 else axis)
if rank:
A = argsort(A,axis,rank=rank,pct=pct,dtype=float)[1]
B = argsort(B,axis,rank=rank,pct=pct,dtype=float)[1]
if pct:
A /= np.expand_dims(A.max(axis),axis)
B /= np.expand_dims(B.max(axis),axis)
Af = A-np.expand_dims(A.mean(axis),axis)
Bf = B-np.expand_dims(B.mean(axis),axis)
if axis == ndis[-1]: # -1轴有一倍加速
mult = inner1d(Af,Bf)
diff = np.sqrt(inner1d(Af,Af)*inner1d(Bf,Bf))
else:
mult = (Af*Bf).sum(axis)
diff = np.sqrt((Af**2).sum(axis)*(Bf**2).sum(axis))
return mult/diff
def bincount(self,axis=None,weights=None,minlength=None):
if axis is None: view = np.ravel(np.asarray(self))
else: view = np.asarray(self)
mask = np.isfinite(view)
if not weights is None:
mask &= np.isfinite(weights); weights = np.asarray(weights)[mask]
if view.ndim==1: out = np.bincount(view[mask],weights=weights,minlength = minlength)
else:
m = view.shape[1-axis]; n = (view[mask].max()+1)*np.arange(m)
if minlength is None: minlength = m*(view[mask].max().astype(int)+1)
if axis!=1: out = np.bincount((view+n)[mask].astype(int),weights=weights,minlength=minlength).reshape(m,-1).T
else: out = np.bincount((view+n[:,None])[mask].astype(int),weights=weights,minlength=minlength).reshape(m,-1)
return out
def demeaned(self,axis=0,func=np.nanmean):
''' 指定轴向去掉均值 '''
view = np.asarray(self)
with warnings.catch_warnings():
warnings.filterwarnings("ignore","Mean of empty slice")
if axis!=0:
return self-func(view,axis)[:,None]
else: return self-func(view,axis)
def shift(self,num,fillv=np.nan,axis=0):
''' 通用滚动函数
self : 1d/2d array
axis : 支持轴向选择 0 or 1
num : int 行或列轴方向移动
'''
result = np.empty_like(self)
axs = 1 if axis==0 else np.s_[0:]
if num > 0:
result[np.s_[:,:num][axs]] = fillv
result[np.s_[:,num:][axs]] = self[np.s_[:,:-num][axs]]
elif num < 0:
result[np.s_[:,num:][axs]] = fillv
result[np.s_[:,:num][axs]] = self[np.s_[:,-num:][axs]]
else: result[:] = self
return result
def pct_change(x,N=1,axis=0,extre=False):
with warnings.catch_warnings():
warnings.filterwarnings("ignore","invalid value encountered in (less|multiply|divide|greater)")
warnings.filterwarnings('ignore', r'All-NaN (slice|axis) encountered')
if N>0:
rets = x/shift(np.asarray(x),N,axis=axis)-1
elif N<0:
rets = shift(np.asarray(x),N,axis=axis)/x-1
if extre: # 去极值
rets = winsorize(rets,qrange=[0.05,0.93],inclusive=True,axis=axis)
return rets
def fcut(self,q,axis=-1,asc=False,dtype=np.int16):
''' 快速版本分位数分箱函数
------------------
self : 1d/2d 数组
axis : 0 or 1
q : 分箱层数
asc : 分箱顺序反向(支持输入信号表,分期对因子方向进行调整)
------------------
out: 与 pd.qcut 输出略有差异,另外Nan的值为>=q
'''
if asc:
view = np.asarray(self)
else: view = 0-np.asarray(self)
mask = np.isfinite(view)
cout = view.shape[axis]; step = (1.0/q)*cout # 计算分位平均间隔
inds = argsort(view,axis=axis,rank=True)[1] # 取得顺序号
caxs = np.expand_dims(mask.sum(axis),axis)-1.0 # 取得轴向最大数量
inds = inds/caxs*(view.shape[axis]-1) # 去除无效值影响
icut = (inds/step).astype(dtype)
icut[~mask]=q
return icut
def argsort(self,axis=-1,asc=True,rank=False,pct=False,kind=None,fillv=None,dtype=np.int16):
""" numpy - np.argsort 及扩展, 可扩展至多维。
Parameters
----------
self : 数组数据
axis : 轴向选择
asc : 顺序或者倒序
rank : 获取顺序排名(顺序号), 默认'mergesort' 为排序选择
pct : 百分比排名或(pct>1: 标准分排名,需指定 pct总数)
Returns:
----------
idx : 如果未选排名返回索引,同 np.argsort
res : 如果选择排名输出含排名
另: 如果存在无效值,无效位上的索引或排名不能确定. 需要另行处理 """
if asc: view = np.asarray(self)
else: view = -np.asarray(self)
if rank|(pct!=0):
rank = True; mask = np.isfinite(view)
if kind is None: kind = 'mergesort'
res = np.empty(view.shape, dtype=dtype)
I = np.ogrid[tuple(map(slice,view.shape))]
idx = view.argsort(axis=axis,kind=kind)
rng,I[axis]=I[axis],idx; res[I]=rng
if pct:
emk = expand_dims(mask.sum(axis),axis)-1.0
emk[emk==0] = .00000001; ran = res / emk
if pct>1: ran = np.rint(ran*(pct-1)).astype(dtype)
else: ran = res
if fillv!=None: ran[~mask] = fillv
else: ran[~mask] = pct
return idx.astype(dtype),ran
if kind is None: kind = 'quicksort'
return view.argsort(axis=axis,kind=kind).astype(dtype)
def standard(factor,other=None,q=5,w=1,asc=False,ac=True):
""" 因子标准化流程函数(输出与输入尺寸相同)
Parameters
----------
factor : 数组, 因子数据
other : 数组/列表/元组 其他可能参与的数据,用于收集无效值标记
q : 分位数分箱数
w : 周期间隔数,指定间隔内组合保持不变,近似实际操作
fast : 快速分位数计算,可选标准
Returns:
----------
输出:处理后因子,分组表
"""
mask = np.isfinite(factor)
if not other is None:
if isinstance(other,(list,tuple)):
for o in other: # 遍历可能参数收集无效值标记
mask &= np.isfinite(o)
else: mask &= np.isfinite(other)
dayx = np.arange(factor.shape[0]); rows = mask.any(1)
if w > 1:# =================== 因子跨周期计算,近似模拟操作 ===========================
equl,reps = np.unique(dayx[rows]//w,True,return_counts=True)[1:]
temp = factor[equl]; invd = mask[equl] # 跨周期分割
else: temp = factor[rows]; invd = mask[rows]
# =================== 标记无效元素,防止无效元素参与标准化及数字化计算 ===================
try: temp[~invd] = np.nan
except:temp = temp.astype(float); temp[~invd] = np.nan
# ============================== 因子标准化,数字化处理 ==============================
if ac:
temp = winsorize(temp, qrange=[0.05,0.93], inclusive=True,axis=1) # 因子去极值
temp = standardlize(temp,axis=1) # 因子标准化
cuts = fcut(temp,q,axis=1,asc=asc,dtype=np.uint8)
icut = np.empty(factor.shape,dtype=np.int8) # 生成空数组
ifac = np.empty(factor.shape,dtype=factor.dtype) # 生成空数组
if w > 1:
icut[rows] = np.asarray(cuts).repeat(reps,axis=0) # 跨周期复原
ifac[rows] = np.asarray(temp).repeat(reps,axis=0) # 跨周期复原
else: icut[rows] = cuts; ifac[rows] = temp
icut[~mask] = q; ifac[~mask] = np.nan # 超边界箱号
return ifac,icut
def hist(self,bins=33,ranges=None,facecolor=None,rwidth=0.8,normed=True,alpha=0.95):
fig = plt.figure(figsize=(6, 3))
order = (self,) if isinstance(self,np.ndarray) else self
for i,f in enumerate(order):
ht = np.asarray(f).ravel();mask = np.isfinite(ht)
if ranges is None:
ranges = tuple([np.min(ht[mask]),np.max(ht[mask])])
if facecolor is None:
plt.hist(ht[mask],bins=bins,range=ranges,histtype='bar',rwidth=rwidth,normed=normed,alpha=alpha)
else:
plt.hist(ht[mask],bins=bins,range=ranges,facecolor=facecolor,histtype='bar',rwidth=rwidth,normed=normed,alpha=alpha)
plt.grid(True ,axis= 'both', which='major', linestyle = "-.", color = "black", linewidth = 0.5, alpha=.22)
plt.show()
本社区仅针对特定人员开放
查看需注册登录并通过风险意识测评
5秒后跳转登录页面...