基于《【研究】量化选股-因子检验和多因子模型的构建》。在源码的基础上添加了一些因子,同时将时间滞后。
1.时间选取11-17年作为样本期,并进行因子筛选及检验。
2.基准选取上证综指(000001.XSHG)
拟选取以下四个方面的因子:
价值类因子:市盈率(PE),市净率(PB),市销率(PS),基本每股收益(EPS),账面市值比(B/M)
成长类因子:净资产收益率(ROE),总资产净利率(ROA),销售毛利率(gross_profit_margin),净利润同比增长率(inc_net_profit_year_on_year),净利润环比增长率(inc_net_profit_annual),营业利润同比增长率(inc_operation_profit_year_on_year),营业利润环比增长率(inc_operation_profit_annual),主营毛利率(GP/R)、净利率(P/R)
规模类因子:净利润(net_profit),营业收入(operating_revenue),总股本(capitalization),流通股本(circulating_cap),总市值(market_cap),流通市值(circulating_market_cap),资产负债(L/A)、固定资产比例(FAP)
交投类因子:换手率(turnover_ratio)
采用排序法对因子的有效性进行验证。
基于《【研究】量化选股-因子检验和多因子模型的构建》
在源码的基础上添加了一些因子,同时将时间滞后。
1.时间选取11-17年作为样本期,并进行因子筛选及检验。
2.基准选取上证综指(000001.XSHG)
拟选取以下四个方面的因子:
价值类因子:市盈率(PE),市净率(PB),市销率(PS),基本每股收益(EPS),账面市值比(B/M)
成长类因子:净资产收益率(ROE),总资产净利率(ROA),销售毛利率(gross_profit_margin),净利润同比增长率(inc_net_profit_year_on_year),净利润环比增长率(inc_net_profit_annual),营业利润同比增长率(inc_operation_profit_year_on_year),营业利润环比增长率(inc_operation_profit_annual),主营毛利率(GP/R)、净利率(P/R)
规模类因子:净利润(net_profit),营业收入(operating_revenue),总股本(capitalization),流通股本(circulating_cap),总市值(market_cap),流通市值(circulating_market_cap),资产负债(L/A)、固定资产比例(FAP)
交投类因子:换手率(turnover_ratio)
采用排序法对因子的有效性进行验证。
import pandas as pdfrom pandas import Series, DataFrameimport numpy as npimport statsmodels.api as smimport scipy.stats as scsimport matplotlib.pyplot as plt
月初取出所有因子数值,例如2018-01-01
factors = ['PE', 'PB', 'PS', 'EPS', 'B/M', 'ROE', 'ROA', 'gross_profit_margin', 'inc_net_profit_year_on_year', 'inc_net_profit_annual', 'inc_operation_profit_year_on_year', 'inc_operation_profit_annual', 'GP/R', 'P/R', 'net_profit', 'operating_revenue', 'capitalization', 'circulating_cap', 'market_cap', 'circulating_market_cap', 'L/A', 'FAP', 'turnover_ratio']# 月初取出因子值def get_factors(fdate, factors):stock_set = get_index_stocks('000001.XSHG', fdate)q = query(valuation.code,balance.total_owner_equities/valuation.market_cap/100000000,valuation.pe_ratio,valuation.pb_ratio,valuation.ps_ratio,income.basic_eps,indicator.roe,indicator.roa,indicator.gross_profit_margin,indicator.inc_net_profit_year_on_year,indicator.inc_net_profit_annual,indicator.inc_operation_profit_year_on_year,indicator.inc_operation_profit_annual,income.total_profit/income.operating_revenue,income.net_profit/income.operating_revenue,income.net_profit,income.operating_revenue,valuation.capitalization,valuation.circulating_cap,valuation.market_cap,valuation.circulating_market_cap,balance.total_liability/balance.total_assets,balance.fixed_assets/balance.total_assets,valuation.turnover_ratio).filter(valuation.code.in_(stock_set),valuation.circulating_market_cap)fdf = get_fundamentals(q, date=fdate)fdf.index = fdf['code']fdf.columns = ['code'] + factorsreturn fdf.iloc[:,-23:]fdf = get_factors('2018-01-01', factors)fdf.head().T
code | 600000.XSHG | 600004.XSHG | 600006.XSHG | 600007.XSHG | 600008.XSHG |
---|---|---|---|---|---|
PE | 1.143846e+00 | 4.827871e-01 | 6.119287e-01 | 3.655787e-01 | 6.300832e-01 |
PB | 6.804400e+00 | 2.009680e+01 | 1.093361e+02 | 2.808410e+01 | 3.863470e+01 |
PS | 9.538000e-01 | 2.144100e+00 | 1.787800e+00 | 2.736200e+00 | 3.135900e+00 |
EPS | 2.244700e+00 | 4.643700e+00 | 6.311000e-01 | 6.594300e+00 | 2.607200e+00 |
B/M | 4.800000e-01 | 1.900000e-01 | -9.700000e-03 | 1.800000e-01 | 3.390000e-02 |
ROE | 3.413200e+00 | 2.754000e+00 | -2.960000e-01 | 2.928800e+00 | 1.513400e+00 |
ROA | 2.316000e-01 | 1.990600e+00 | -7.116000e-01 | 1.580100e+00 | 4.027000e-01 |
gross_profit_margin | NaN | 3.798770e+01 | 1.084940e+01 | 5.062150e+01 | 3.024400e+01 |
inc_net_profit_year_on_year | -1.588900e+00 | 8.844300e+00 | -6.544932e+02 | 6.390600e+00 | 2.954560e+01 |
inc_net_profit_annual | -7.200000e-03 | 4.770500e+00 | -2.012624e+03 | 3.357990e+01 | -3.995000e-01 |
inc_operation_profit_year_on_year | -1.833300e+00 | 1.868020e+01 | -6.605708e+02 | -4.675000e-01 | 1.377109e+02 |
inc_operation_profit_annual | 2.019000e+00 | 2.919000e+00 | -1.447837e+03 | 2.506970e+01 | -2.233990e+01 |
GP/R | 4.344424e-01 | 3.085870e-01 | -4.038174e-02 | 3.296453e-01 | 1.174919e-01 |
P/R | 3.350075e-01 | 2.308729e-01 | -3.273438e-02 | 2.473615e-01 | 8.858635e-02 |
net_profit | 1.387400e+10 | 3.935985e+08 | -1.587791e+08 | 1.823589e+08 | 1.900669e+08 |
operating_revenue | 4.141400e+10 | 1.704828e+09 | 4.850528e+09 | 7.372163e+08 | 2.145555e+09 |
capitalization | 2.935208e+06 | 2.069320e+05 | 2.000000e+05 | 1.007282e+05 | 4.820614e+05 |
circulating_cap | 2.810376e+06 | 2.069320e+05 | 2.000000e+05 | 1.007282e+05 | 4.820614e+05 |
market_cap | 3.695427e+03 | 3.041901e+02 | 1.170000e+02 | 1.726482e+02 | 2.477796e+02 |
circulating_market_cap | 3.538264e+03 | 3.041901e+02 | 1.170000e+02 | 1.726482e+02 | 2.477796e+02 |
L/A | 9.302917e-01 | 2.719476e-01 | 6.862103e-01 | 4.571281e-01 | 6.699646e-01 |
FAP | 4.168150e-03 | 3.381332e-01 | 1.754063e-01 | 1.815366e-01 | 1.011792e-01 |
turnover_ratio | 5.820000e-02 | 4.095000e-01 | 5.574000e-01 | 7.120000e-02 | 3.734000e-01 |
score = fdf['circulating_market_cap'].order()score.head()
code 603580.XSHG 5.0777 603991.XSHG 5.2659 603330.XSHG 5.3535 603041.XSHG 5.6300 603269.XSHG 5.7038 Name: circulating_market_cap, dtype: float64
len(score)
1352
startdate = '2018-01-01'enddate = '2018-02-01'nextdate = '2018-03-01'df = {}circulating_market_cap = fdf['circulating_market_cap']port1 = list(score.index)[: len(score)/5]port2 = list(score.index)[ len(score)/5: 2*len(score)/5]port3 = list(score.index)[ 2*len(score)/5: -2*len(score)/5]port4 = list(score.index)[ -2*len(score)/5: -len(score)/5]port5 = list(score.index)[ -len(score)/5: ]
def calculate_port_monthly_return(port, startdate, enddate, nextdate, circulating_market_cap):close1 = get_price(port, startdate, enddate, 'daily', ['close'])close2 = get_price(port, enddate, nextdate, 'daily', ['close'])weighted_m_return = ((close2['close'].ix[0,:]/close1['close'].ix[0,:]-1)* circulating_market_cap).sum()/(circulating_market_cap.ix[port].sum())return weighted_m_returncalculate_port_monthly_return(port1, '2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])
-0.09004705495088357
def calculate_benchmark_monthly_return(startdate, enddate, nextdate):close1 = get_price(['000001.XSHG'],startdate,enddate,'daily',['close'])['close']close2 = get_price(['000001.XSHG'],enddate, nextdate, 'daily',['close'])['close']benchmark_return = (close2.ix[0,:]/close1.ix[0,:]-1).sum()return benchmark_returncalculate_benchmark_monthly_return('2018-01-01','2018-02-01','2018-03-01')
0.029462448444448563
从结果可以看出,在构建因子组合之前,前四组的收益跑输大盘。
benchmark_return = calculate_benchmark_monthly_return('2018-01-01', '2018-02-01', '2018-03-01')df['port1'] = calculate_port_monthly_return(port1,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])df['port2'] = calculate_port_monthly_return(port2,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])df['port3'] = calculate_port_monthly_return(port3,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])df['port4'] = calculate_port_monthly_return(port4,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])df['port5'] = calculate_port_monthly_return(port5,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])print Series(df)print 'benchmark_return %s'%benchmark_return
port1 -0.090047 port2 -0.088405 port3 -0.075064 port4 -0.060624 port5 0.068629 dtype: float64 benchmark_return 0.0294624484444
时间:2011-2017年,计算1-5组以及benchmark组合的月收益率,形成84×6的面板数据。
factors = ['PE', 'PB', 'PS', 'EPS', 'B/M', 'ROE', 'ROA', 'gross_profit_margin', 'inc_net_profit_year_on_year', 'inc_net_profit_annual', 'inc_operation_profit_year_on_year', 'inc_operation_profit_annual', 'GP/R', 'P/R', 'net_profit', 'operating_revenue', 'capitalization', 'circulating_cap', 'market_cap', 'circulating_market_cap', 'L/A', 'FAP', 'turnover_ratio']#因为研究模块取fundamental数据默认date为研究日期的前一天。所以要自备时间序列。按月取year = ['2011','2012','2013','2014','2015','2016','2017']month = ['01','02','03','04','05','06','07','08','09','10','11','12']result = {}for i in range(7*12):startdate = year[i/12] + '-' + month[i%12] + '-01'try:enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'except IndexError:enddate = '2018-01-01'try:nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'except IndexError:if enddate == '2018-01-01':nextdate = '2018-02-01'else:nextdate = '2018-01-01'# print 'time %s'%startdatefdf = get_factors(startdate,factors)CMV = fdf['circulating_market_cap']#5个组合,23个因子df = DataFrame(np.zeros(6*23).reshape(6,23),index = ['port1','port2','port3','port4','port5','benchmark'],columns = factors)for fac in factors:score = fdf[fac].order()port1 = list(score.index)[: len(score)/5]port2 = list(score.index)[ len(score)/5+1: 2*len(score)/5]port3 = list(score.index)[ 2*len(score)/5+1: -2*len(score)/5]port4 = list(score.index)[ -2*len(score)/5+1: -len(score)/5]port5 = list(score.index)[ -len(score)/5+1: ]df.ix['port1',fac] = calculate_port_monthly_return(port1,startdate,enddate,nextdate,circulating_market_cap)df.ix['port2',fac] = calculate_port_monthly_return(port2,startdate,enddate,nextdate,circulating_market_cap)df.ix['port3',fac] = calculate_port_monthly_return(port3,startdate,enddate,nextdate,circulating_market_cap)df.ix['port4',fac] = calculate_port_monthly_return(port4,startdate,enddate,nextdate,circulating_market_cap)df.ix['port5',fac] = calculate_port_monthly_return(port5,startdate,enddate,nextdate,circulating_market_cap)df.ix['benchmark',fac] = calculate_benchmark_monthly_return(startdate,enddate,nextdate)# print 'factor %s'%facresult[i+1]=dfmonthly_return = pd.Panel(result)
monthly_return[:,:,'PE']
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ... | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
port1 | -0.063961 | 0.057468 | -0.003538 | 0.011939 | -0.000767 | 0.028005 | 0.048595 | -0.003958 | -0.109566 | 0.062509 | ... | 0.021345 | -0.006460 | -0.001198 | 0.049791 | 0.009338 | 0.049369 | 0.058072 | 0.069637 | -0.033160 | 0.056772 |
port2 | -0.065009 | 0.076102 | -0.027128 | -0.018031 | -0.066994 | 0.031146 | 0.028017 | -0.046184 | -0.120076 | 0.034576 | ... | 0.037914 | -0.048666 | -0.053362 | 0.072887 | 0.040365 | 0.036833 | 0.069451 | 0.003253 | -0.022327 | 0.018349 |
port3 | -0.056932 | 0.079801 | -0.017569 | -0.027592 | -0.073196 | 0.034040 | -0.025730 | -0.054367 | -0.129013 | 0.045660 | ... | 0.017931 | -0.045419 | -0.053020 | 0.054920 | 0.028066 | 0.018909 | 0.027487 | 0.002008 | -0.047184 | 0.006735 |
port4 | -0.021293 | 0.046165 | -0.005278 | -0.011301 | -0.069544 | 0.019637 | -0.019397 | -0.080514 | -0.107611 | 0.081045 | ... | 0.004030 | -0.021088 | -0.005480 | 0.057372 | 0.065631 | 0.025304 | -0.005043 | 0.059699 | 0.016978 | 0.023916 |
port5 | 0.013760 | 0.024953 | 0.050458 | 0.006419 | -0.054836 | 0.007156 | -0.035373 | -0.041296 | -0.052494 | 0.068615 | ... | 0.011837 | -0.021979 | 0.048663 | 0.021466 | 0.074965 | 0.014275 | -0.003084 | -0.003351 | 0.003836 | 0.009916 |
benchmark | -0.018820 | 0.042859 | 0.016612 | -0.011870 | -0.064326 | 0.005755 | -0.020142 | -0.054642 | -0.082649 | 0.053409 | ... | 0.007198 | -0.038710 | -0.013070 | 0.030068 | 0.030266 | 0.022620 | 0.002156 | 0.006382 | -0.023056 | 0.009257 |
6 rows × 84 columns
(monthly_return[:,:,'PE'].T+1).cumprod().tail()
port1 | port2 | port3 | port4 | port5 | benchmark | |
---|---|---|---|---|---|---|
80 | 2.173926 | 1.652334 | 1.708928 | 1.980452 | 2.433185 | 1.180349 |
81 | 2.300171 | 1.767090 | 1.755901 | 1.970465 | 2.425681 | 1.182893 |
82 | 2.460347 | 1.772839 | 1.759427 | 2.088099 | 2.417553 | 1.190442 |
83 | 2.378763 | 1.733257 | 1.676409 | 2.123552 | 2.426825 | 1.162996 |
84 | 2.513809 | 1.765060 | 1.687700 | 2.174338 | 2.450891 | 1.173762 |
模型建立后,计算n个组合的年化复合收益、超额收益、不同市场情况下高收益组合跑赢benchmark和低收益组合跑输benchmark的概率。
检验有效性的量化标准:
(1)序列1-n的组合,年化复合收益应满足一定排序关系,即组合因子大小与收益具有较大相关关系。假定序列i的组合年化收益为Xi,则Xi与i的相关性绝对值Abs(Corr(Xi,i))>MinCorr。此处MinCorr为给定的最小相关阈值。
(2)序列1和n表示的两个极端组合超额收益分别为AR1、ARn。MinARtop、MinARbottom表示最小超额收益阈值。 if AR1 > ARn #因子越小,收益越大 则应满足AR1 > MinARtop >0 and ARn < MinARbottom < 0 if AR1 < ARn #因子越小,收益越大 则应满足ARn > MinARtop >0 and AR1 < MinARbottom < 0 以上条件保证因子最大和最小的两个组合,一个明显跑赢市场,一个明显跑输市场。
(3)在任何市场行情下,1和n两个极端组合,都以较高概率跑赢或跑输市场。 以上三个条件,可以选出过去一段时间有较好选股能力的因子。
因为开始选择的因子较多,因此三条量化标准的选择更加严格,采用如下标准进行选取:
(1)记录因子相关性,>0.7或<-0.7合格。
(2)记录赢家组合和输家组合超额收益。
(3)记录赢家组合跑赢概率>0.6和输家组合跑输概率>0.4合格。
total_return = {}annual_return = {}excess_return = {}win_prob = {}loss_prob = {}effect_test = {}MinCorr = 0.3Minbottom = -0.05Mi*p = 0.05for fac in factors:effect_test[fac] = {}monthly = monthly_return[:,:,fac]total_return[fac] = (monthly+1).T.cumprod().iloc[-1,:]-1annual_return[fac] = (total_return[fac]+1)**(1./6)-1excess_return[fac] = annual_return[fac]- annual_return[fac][-1]#判断因子有效性#1.年化收益与组合序列的相关性 大于 阈值effect_test[fac][1] = annual_return[fac][0:5].corr(Series([1,2,3,4,5],index = annual_return[fac][0:5].index))#2.高收益组合跑赢概率#因子小,收益小,port1是输家组合,port5是赢家组合if total_return[fac][0] < total_return[fac][-2]:loss_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))win_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]#超额收益effect_test[fac][2] = [excess_return[fac][-2]*100,excess_return[fac][0]*100]#因子小,收益大,port1是赢家组合,port5是输家组合else:loss_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))win_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]#超额收益effect_test[fac][2] = [excess_return[fac][0]*100,excess_return[fac][-2]*100]#由于选择的因子较多,test标准选取适当严格一些#effect_test[1]记录因子相关性,>0.7或<-0.7合格#effect_test[2]记录【赢家组合超额收益,输家组合超额收益】#effect_test[3]记录赢家组合跑赢概率和输家组合跑输概率。【>0.6,>0.4】合格 (因实际情况,跑输概率暂时不考虑)DataFrame(effect_test).T
1 | 2 | 3 | |
---|---|---|---|
B/M | 0.6281959 | [15.1984852636, 8.76175660448] | [0.690476190476, 0.404761904762] |
EPS | 0.2488584 | [14.2720133294, 12.9632231367] | [0.678571428571, 0.357142857143] |
FAP | -0.5671644 | [13.4503120268, 9.44267504971] | [0.619047619048, 0.380952380952] |
GP/R | 0.8064658 | [13.7519085368, 9.10242336036] | [0.619047619048, 0.357142857143] |
L/A | -0.5898578 | [16.5046555213, 12.1611504111] | [0.702380952381, 0.416666666667] |
P/R | 0.9215462 | [13.980265264, 9.09336493425] | [0.642857142857, 0.380952380952] |
PB | -0.8818369 | [13.9012096024, 6.71073706755] | [0.619047619048, 0.428571428571] |
PE | 0.1328435 | [13.9001078939, 13.4085302139] | [0.607142857143, 0.369047619048] |
PS | -0.5030761 | [14.1865783133, 9.18250270639] | [0.607142857143, 0.392857142857] |
ROA | 0.5423133 | [19.3405425743, 9.77751849214] | [0.75, 0.380952380952] |
ROE | 0.6386198 | [17.9776162079, 9.73910681099] | [0.654761904762, 0.404761904762] |
capitalization | -0.7644211 | [22.4171821446, 9.86517390072] | [0.583333333333, 0.404761904762] |
circulating_cap | -0.7761155 | [19.8132954476, 9.86514645415] | [0.571428571429, 0.369047619048] |
circulating_market_cap | -0.8791725 | [38.1580067747, 10.3384004828] | [0.714285714286, 0.369047619048] |
gross_profit_margin | 0.7770139 | [15.5893122733, 9.22929383936] | [0.642857142857, 0.452380952381] |
inc_net_profit_annual | 0.6899743 | [14.9827068239, 9.99043264863] | [0.678571428571, 0.392857142857] |
inc_net_profit_year_on_year | 0.8082138 | [13.825611634, 3.32909642528] | [0.630952380952, 0.416666666667] |
inc_operation_profit_annual | 0.5963116 | [13.1949471333, 9.79858245467] | [0.654761904762, 0.404761904762] |
inc_operation_profit_year_on_year | 0.8663793 | [14.0478401847, 3.17046201915] | [0.654761904762, 0.404761904762] |
market_cap | -0.8262643 | [44.3574164544, 10.5284689923] | [0.738095238095, 0.369047619048] |
net_profit | 0.04857344 | [12.1195026493, 8.12374126557] | [0.642857142857, 0.380952380952] |
operating_revenue | -0.7751005 | [23.9766654178, 11.219895262] | [0.630952380952, 0.345238095238] |
turnover_ratio | -0.6218568 | [10.175151521, 4.22831336907] | [0.619047619048, 0.511904761905] |
同时满足上述三个条件的有:
(1)价值类因子:市盈率(B/M)
(2)成长类因子:主营毛利率(P/R),销售毛利率(gross_profit_margin),净利润同比增长率(inc_net_profit_year_on_year),营业利润同比增长率( inc_operation_profit_year_on_year)
(3)规模类因子:营业收入(operating_revenue),总股本(capitalization),流通股本(circulating_cap),总市值(market_cap),流通市值(circulating_market_cap),资产负债(L/A)
effective_factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']DataFrame(total_return).ix[:,effective_factors].T
port1 | port2 | port3 | port4 | port5 | benchmark | |
---|---|---|---|---|---|---|
B/M | 0.918228 | 1.480658 | 1.142045 | 1.148155 | 1.686498 | 0.173762 |
L/A | 1.870086 | 1.526532 | 0.843702 | 1.124105 | 1.297099 | 0.173762 |
P/R | 0.952724 | 1.060859 | 1.183619 | 1.649951 | 1.524196 | 0.173762 |
capitalization | 2.837346 | 1.656063 | 1.372731 | 1.964715 | 1.035016 | 0.173762 |
circulating_cap | 2.382449 | 1.692737 | 1.170379 | 1.747633 | 1.035013 | 0.173762 |
circulating_market_cap | 6.812751 | 2.619596 | 1.248171 | 1.063917 | 1.086887 | 0.173762 |
gross_profit_margin | 0.967012 | 1.086652 | 0.899555 | 1.183325 | 1.740373 | 0.173762 |
inc_net_profit_year_on_year | 0.421356 | 0.994127 | 1.055084 | 2.446268 | 1.504189 | 0.173762 |
inc_operation_profit_year_on_year | 0.408645 | 0.801897 | 1.381442 | 2.183790 | 1.532979 | 0.173762 |
market_cap | 9.116529 | 1.863749 | 1.864567 | 0.896007 | 1.108029 | 0.173762 |
operating_revenue | 3.133399 | 1.325240 | 1.267816 | 1.006326 | 1.186449 | 0.173762 |
DataFrame(annual_return).ix[:,effective_factors].T
port1 | port2 | port3 | port4 | port5 | benchmark | |
---|---|---|---|---|---|---|
B/M | 0.114680 | 0.163486 | 0.135372 | 0.135911 | 0.179047 | 0.027062 |
L/A | 0.192109 | 0.167045 | 0.107342 | 0.133781 | 0.148674 | 0.027062 |
P/R | 0.117996 | 0.128084 | 0.139015 | 0.176358 | 0.166865 | 0.027062 |
capitalization | 0.251234 | 0.176810 | 0.154892 | 0.198571 | 0.125714 | 0.027062 |
circulating_cap | 0.225195 | 0.179503 | 0.137861 | 0.183477 | 0.125714 | 0.027062 |
circulating_market_cap | 0.408642 | 0.239110 | 0.144559 | 0.128363 | 0.130446 | 0.027062 |
gross_profit_margin | 0.119355 | 0.130425 | 0.112864 | 0.138990 | 0.182955 | 0.027062 |
inc_net_profit_year_on_year | 0.060353 | 0.121912 | 0.127556 | 0.229018 | 0.165318 | 0.027062 |
inc_operation_profit_year_on_year | 0.058767 | 0.103117 | 0.155598 | 0.212897 | 0.167540 | 0.027062 |
market_cap | 0.470636 | 0.191669 | 0.191726 | 0.112517 | 0.132347 | 0.027062 |
operating_revenue | 0.266829 | 0.151007 | 0.146220 | 0.123053 | 0.139261 | 0.027062 |
def draw_return_picture(df):plt.figure(figsize =(10,4))plt.plot((df.T+1).cumprod().ix[:,0], label = 'port1')plt.plot((df.T+1).cumprod().ix[:,1], label = 'port2')plt.plot((df.T+1).cumprod().ix[:,2], label = 'port3')plt.plot((df.T+1).cumprod().ix[:,3], label = 'port4')plt.plot((df.T+1).cumprod().ix[:,4], label = 'port5')plt.plot((df.T+1).cumprod().ix[:,5], label = 'benchmark')plt.xlabel('return of factor %s'%fac)plt.legend(loc=0)for fac in effective_factors:draw_return_picture(monthly_return[:,:,fac])
有些因子,因为内在的逻辑比较相近等原因,选出来的组合在个股构成和收益等方面相关性较高。所以要对这些因子做冗余剔除,保留同类因子中收益最好、区分度最高的因子。 由于本人能力有限,未完成此步骤,具体方法:
(1)对不同因子的n个组合打分。收益越大分值越大。分值达到好将分值赋给每月该组合内的所有个股。
if AR1 > ARn #因子越小,收益越大
则组合i的分值为(n-i+1)
if AR1 < ARn #因子越小,收益越小
则组合i的分值为i
(2)按月计算个股不同因子得分的相关性矩阵。得到第t月个股的因子得分相关性矩阵Score_Corrt,u,v。u,v为因子序号。
(3)计算样本期内相关性矩阵的平均值。即样本期共m个月,加总矩阵后取1/m。
(4)设定得分相关性阈值MinScoreCorr。只保留与其他因子相关性较小的因子。
根据选好的有效因子,每月初对市场个股计算因子得分,按一定权重求得所有因子的平均分。如遇因子当月无取值时,按剩下的因子分值求加权平均。通过对个股的加权平均得分进行排序,选择排名靠前的股票交易。
以下代码段等权重对因子分值求和,选出分值最高的股票进行交易
def score_stock(fdate):#B/M, L/A, P/R, capitalization, circulating_cap, circulating_market_cap, market_cap, operating_revenue#八个因子越小收益越大,分值越大,应降序排;gross_profit_margin, inc_net_profit_year_on_year, #inc_operation_profit_year_on_year三个因子越大收益越大应顺序排effective_factors = {'inc_net_profit_year_on_year':True,'gross_profit_margin':True,'inc_operation_profit_year_on_year':True, 'B/M':False,'L/A':False,'P/R':False, 'capitalization':False, 'circulating_cap':False,'circulating_market_cap':False, 'market_cap':False, 'operating_revenue':False}fdf = get_factors(fdate)score = {}for fac,value in effective_factors.items():score[fac] = fdf[fac].rank(ascending = value,method = 'first')print DataFrame(score).T.sum().order(ascending = False).head(5)score_stock = list(DataFrame(score).T.sum().order(ascending = False).index)return score_stock,fdf['circulating_market_cap']def get_factors(fdate):factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']stock_set = get_index_stocks('000001.XSHG',fdate)q = query(valuation.code,balance.total_owner_equities/valuation.market_cap/100000000,balance.total_liability/balance.total_assets,income.net_profit/income.operating_revenue,valuation.capitalization,valuation.circulating_cap,valuation.circulating_market_cap,indicator.gross_profit_margin,indicator.inc_net_profit_year_on_year,indicator.inc_operation_profit_year_on_year,valuation.market_cap,income.operating_revenue).filter(valuation.code.in_(stock_set))fdf = get_fundamentals(q,date = fdate)fdf.index = fdf['code']fdf.columns = ['code'] + factorsreturn fdf.iloc[:,-11:][score_result,circulating_market_cap] = score_stock('2017-01-01')
code 603859.XSHG 10554 603189.XSHG 10521 600817.XSHG 10451 600385.XSHG 10372 603518.XSHG 10326 dtype: float64
计算port1-port5以及TOP20和benchmark的月收益率,时间跨度为7×12=84个月,并将所有数据储存在panel中。
year = ['2011','2012','2013','2014','2015','2016','2017']month = ['01','02','03','04','05','06','07','08','09','10','11','12']factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']result = {}for i in range(7*12):startdate = year[i/12] + '-' + month[i%12] + '-01'try:enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'except IndexError:enddate = '2018-01-01'try:nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'except IndexError:if enddate == '2018-01-01':nextdate = '2018-02-01'else:nextdate = '2018-01-01'print 'time %s'%startdate#综合11个因子打分后,划分几个组合df = DataFrame(np.zeros(7),index = ['Top20','port1','port2','port3','port4','port5','benchmark'])[score,circulating_market_cap] = score_stock(startdate)port0 = score[:20]port1 = score[: len(score)/5]port2 = score[ len(score)/5+1: 2*len(score)/5]port3 = score[ 2*len(score)/5+1: -2*len(score)/5]port4 = score[ -2*len(score)/5+1: -len(score)/5]port5 = score[ -len(score)/5+1: ]print len(score) df.ix['Top20'] = calculate_port_monthly_return(port0,startdate,enddate,nextdate,circulating_market_cap)df.ix['port1'] = calculate_port_monthly_return(port1,startdate,enddate,nextdate,circulating_market_cap)df.ix['port2'] = calculate_port_monthly_return(port2,startdate,enddate,nextdate,circulating_market_cap)df.ix['port3'] = calculate_port_monthly_return(port3,startdate,enddate,nextdate,circulating_market_cap)df.ix['port4'] = calculate_port_monthly_return(port4,startdate,enddate,nextdate,circulating_market_cap)df.ix['port5'] = calculate_port_monthly_return(port5,startdate,enddate,nextdate,circulating_market_cap)df.ix['benchmark'] = calculate_benchmark_monthly_return(startdate,enddate,nextdate)result[i+1]=df
time 2011-01-01 code 600671.XSHG 8250 600506.XSHG 8065 600365.XSHG 8040 600634.XSHG 7864 600647.XSHG 7843 dtype: float64 867 time 2011-02-01 code 600671.XSHG 8275 600365.XSHG 8059 600506.XSHG 8055 600634.XSHG 7874 600647.XSHG 7855 dtype: float64 867 time 2011-03-01 code 600671.XSHG 8266 600506.XSHG 8034 600365.XSHG 7951 600634.XSHG 7852 600647.XSHG 7842 dtype: float64 866 time 2011-04-01 code 600671.XSHG 8285 600365.XSHG 7943 600634.XSHG 7902 600617.XSHG 7852 600077.XSHG 7834 dtype: float64 874 time 2011-05-01 code 600671.XSHG 8522 600340.XSHG 8239 600365.XSHG 8209 600562.XSHG 8103 600613.XSHG 8097 dtype: float64 885 time 2011-06-01 code 600671.XSHG 8506 600365.XSHG 8221 600149.XSHG 8120 600562.XSHG 8104 600613.XSHG 8104 dtype: float64 885 time 2011-07-01 code 600671.XSHG 8518 600365.XSHG 8240 600149.XSHG 8140 600613.XSHG 8111 600562.XSHG 8098 dtype: float64 885 time 2011-08-01 code 600671.XSHG 8534 600149.XSHG 8126 600613.XSHG 8116 600562.XSHG 8076 600520.XSHG 7937 dtype: float64 886 time 2011-09-01 code 600634.XSHG 8410 600562.XSHG 8198 600671.XSHG 8059 600476.XSHG 7986 600077.XSHG 7970 dtype: float64 901 time 2011-10-01 code 600634.XSHG 8416 600562.XSHG 8113 600671.XSHG 8071 600476.XSHG 8037 600077.XSHG 7963 dtype: float64 902 time 2011-11-01 code 600671.XSHG 8693 600705.XSHG 8048 600421.XSHG 8030 600476.XSHG 8030 600576.XSHG 8006 dtype: float64 913 time 2011-12-01 code 600671.XSHG 8707 600576.XSHG 8080 600705.XSHG 8064 600476.XSHG 8043 600571.XSHG 7970 dtype: float64 913 time 2012-01-01 code 600671.XSHG 8688 600576.XSHG 8088 600705.XSHG 8074 600476.XSHG 8044 600421.XSHG 7984 dtype: float64 913 time 2012-02-01 code 600671.XSHG 8695 600136.XSHG 8190 600576.XSHG 8103 600705.XSHG 8086 600476.XSHG 8068 dtype: float64 913 time 2012-03-01 code 600671.XSHG 8702 600136.XSHG 8178 600576.XSHG 8088 600476.XSHG 8047 600571.XSHG 7994 dtype: float64 912 time 2012-04-01 code 600671.XSHG 8748 600365.XSHG 8250 600576.XSHG 8223 600136.XSHG 8201 600733.XSHG 8149 dtype: float64 914 time 2012-05-01 code 600671.XSHG 8792 600593.XSHG 8544 600562.XSHG 8469 600513.XSHG 8430 600576.XSHG 8395 dtype: float64 920 time 2012-06-01 code 600634.XSHG 8708 600593.XSHG 8620 600513.XSHG 8496 600562.XSHG 8481 600455.XSHG 8228 dtype: float64 922 time 2012-07-01 code 600634.XSHG 8705 600593.XSHG 8637 600562.XSHG 8493 600513.XSHG 8400 600571.XSHG 8239 dtype: float64 922 time 2012-08-01 code 600634.XSHG 8707 600593.XSHG 8636 600562.XSHG 8496 600513.XSHG 8409 600571.XSHG 8249 dtype: float64 922 time 2012-09-01 code 600136.XSHG 9255 600485.XSHG 8874 600733.XSHG 8834 600749.XSHG 8725 600520.XSHG 8476 dtype: float64 933 time 2012-10-01 code 600136.XSHG 9251 600485.XSHG 8875 600733.XSHG 8824 600749.XSHG 8732 600758.XSHG 8475 dtype: float64 933 time 2012-11-01 code 600634.XSHG 9496 600733.XSHG 8811 600365.XSHG 8663 600758.XSHG 8474 600647.XSHG 8473 dtype: float64 940 time 2012-12-01 code 600634.XSHG 9494 600733.XSHG 8859 600365.XSHG 8682 600647.XSHG 8520 600758.XSHG 8480 dtype: float64 940 time 2013-01-01 code 600634.XSHG 9494 600733.XSHG 8849 600365.XSHG 8678 600647.XSHG 8525 600758.XSHG 8484 dtype: float64 940 time 2013-02-01 code 600634.XSHG 9480 600733.XSHG 8821 600647.XSHG 8538 600758.XSHG 8493 600980.XSHG 8458 dtype: float64 940 time 2013-03-01 code 600634.XSHG 9482 600733.XSHG 8832 600647.XSHG 8548 600758.XSHG 8504 600599.XSHG 8498 dtype: float64 942 time 2013-04-01 code 600634.XSHG 9396 600613.XSHG 8620 600985.XSHG 8602 600599.XSHG 8492 600647.XSHG 8442 dtype: float64 942 time 2013-05-01 code 600634.XSHG 9449 600136.XSHG 8910 600980.XSHG 8731 600985.XSHG 8607 600599.XSHG 8545 dtype: float64 942 time 2013-06-01 code 600485.XSHG 9022 600136.XSHG 8892 600980.XSHG 8726 600576.XSHG 8345 600706.XSHG 8332 dtype: float64 941 time 2013-07-01 code 600485.XSHG 9032 600136.XSHG 8902 600980.XSHG 8712 600706.XSHG 8331 600576.XSHG 8318 dtype: float64 941 time 2013-08-01 code 600485.XSHG 9037 600980.XSHG 8705 600576.XSHG 8343 600706.XSHG 8313 600379.XSHG 8302 dtype: float64 941 time 2013-09-01 code 600365.XSHG 8997 600485.XSHG 8938 600980.XSHG 8832 600615.XSHG 8649 600593.XSHG 8545 dtype: float64 941 time 2013-10-01 code 600365.XSHG 8983 600485.XSHG 8922 600980.XSHG 8826 600615.XSHG 8655 600234.XSHG 8566 dtype: float64 941 time 2013-11-01 code 600733.XSHG 8684 600485.XSHG 8457 600758.XSHG 8422 600099.XSHG 8401 600520.XSHG 8390 dtype: float64 941 time 2013-12-01 code 600733.XSHG 8723 600758.XSHG 8423 600520.XSHG 8402 600099.XSHG 8397 600146.XSHG 8356 dtype: float64 941 time 2014-01-01 code 600733.XSHG 8666 600485.XSHG 8421 600758.XSHG 8417 600520.XSHG 8400 600099.XSHG 8391 dtype: float64 941 time 2014-02-01 code 600733.XSHG 8702 600758.XSHG 8421 600146.XSHG 8411 600520.XSHG 8403 600099.XSHG 8393 dtype: float64 941 time 2014-03-01 code 600733.XSHG 8683 600485.XSHG 8460 600758.XSHG 8424 600520.XSHG 8422 600146.XSHG 8392 dtype: float64 941 time 2014-04-01 code 600146.XSHG 8422 600781.XSHG 8411 600506.XSHG 8409 600576.XSHG 8357 600485.XSHG 8354 dtype: float64 944 time 2014-05-01 code 600539.XSHG 9141 600980.XSHG 9020 600753.XSHG 8852 600593.XSHG 8846 600355.XSHG 8760 dtype: float64 948 time 2014-06-01 code 600539.XSHG 9140 600980.XSHG 9039 600753.XSHG 8873 600593.XSHG 8854 600355.XSHG 8765 dtype: float64 948 time 2014-07-01 code 600539.XSHG 9115 600980.XSHG 9006 600753.XSHG 8899 600593.XSHG 8853 600355.XSHG 8729 dtype: float64 947 time 2014-08-01 code 600539.XSHG 9151 600980.XSHG 8984 600593.XSHG 8846 600576.XSHG 8844 600753.XSHG 8838 dtype: float64 947 time 2014-09-01 code 600365.XSHG 8977 600099.XSHG 8765 600355.XSHG 8750 600847.XSHG 8742 600539.XSHG 8677 dtype: float64 951 time 2014-10-01 code 600365.XSHG 8988 600355.XSHG 8806 600099.XSHG 8776 600847.XSHG 8773 600476.XSHG 8696 dtype: float64 951 time 2014-11-01 code 600599.XSHG 9072 600696.XSHG 8995 600419.XSHG 8905 600136.XSHG 8883 600539.XSHG 8838 dtype: float64 968 time 2014-12-01 code 600696.XSHG 9009 600599.XSHG 8950 600419.XSHG 8910 600136.XSHG 8875 600539.XSHG 8836 dtype: float64 969 time 2015-01-01 code 600696.XSHG 9094 600599.XSHG 9039 600136.XSHG 8901 600419.XSHG 8895 600539.XSHG 8755 dtype: float64 969 time 2015-02-01 code 600696.XSHG 9076 600599.XSHG 8999 600419.XSHG 8902 600136.XSHG 8895 600539.XSHG 8756 dtype: float64 969 time 2015-03-01 code 600696.XSHG 9078 600599.XSHG 9007 600419.XSHG 8906 600539.XSHG 8785 600892.XSHG 8737 dtype: float64 969 time 2015-04-01 code 600696.XSHG 9142 600099.XSHG 8952 603601.XSHG 8946 600539.XSHG 8857 600599.XSHG 8817 dtype: float64 982 time 2015-05-01 code 603869.XSHG 9587 603088.XSHG 9461 600455.XSHG 9348 603898.XSHG 9339 603988.XSHG 9335 dtype: float64 1020 time 2015-06-01 code 603869.XSHG 9577 603088.XSHG 9544 603988.XSHG 9415 600455.XSHG 9412 600365.XSHG 9389 dtype: float64 1030 time 2015-07-01 code 603869.XSHG 9757 603088.XSHG 9632 603988.XSHG 9517 600455.XSHG 9494 603636.XSHG 9465 dtype: float64 1039 time 2015-08-01 code 603869.XSHG 9701 603988.XSHG 9515 600365.XSHG 9356 603010.XSHG 9319 600136.XSHG 9305 dtype: float64 1041 time 2015-09-01 code 600506.XSHG 9835 603099.XSHG 9546 600520.XSHG 9501 600593.XSHG 9441 600136.XSHG 9397 dtype: float64 1060 time 2015-10-01 code 600506.XSHG 9834 603099.XSHG 9563 600520.XSHG 9541 600593.XSHG 9476 600365.XSHG 9389 dtype: float64 1060 time 2015-11-01 code 603918.XSHG 9637 600980.XSHG 9520 600599.XSHG 9420 603601.XSHG 9391 600371.XSHG 9374 dtype: float64 1060 time 2015-12-01 code 600980.XSHG 9522 600753.XSHG 9475 603918.XSHG 9472 603010.XSHG 9364 600599.XSHG 9322 dtype: float64 1060 time 2016-01-01 code 603918.XSHG 9641 600980.XSHG 9549 600753.XSHG 9509 600599.XSHG 9438 603601.XSHG 9389 dtype: float64 1066 time 2016-02-01 code 603918.XSHG 9725 603778.XSHG 9652 600599.XSHG 9615 600980.XSHG 9538 603085.XSHG 9419 dtype: float64 1071 time 2016-03-01 code 603918.XSHG 9743 603778.XSHG 9706 600599.XSHG 9683 600980.XSHG 9576 600419.XSHG 9429 dtype: float64 1073 time 2016-04-01 code 600599.XSHG 9913 600419.XSHG 9801 603778.XSHG 9739 600080.XSHG 9710 603918.XSHG 9669 dtype: float64 1078 time 2016-05-01 code 603601.XSHG 9916 603918.XSHG 9907 600137.XSHG 9836 600733.XSHG 9693 603023.XSHG 9673 dtype: float64 1080 time 2016-06-01 code 600137.XSHG 9964 600733.XSHG 9869 603601.XSHG 9766 600506.XSHG 9756 603023.XSHG 9724 dtype: float64 1088 time 2016-07-01 code 600137.XSHG 10035 600733.XSHG 9957 600506.XSHG 9864 603601.XSHG 9716 603066.XSHG 9699 dtype: float64 1096 time 2016-08-01 code 600137.XSHG 10049 603322.XSHG 9969 603601.XSHG 9892 600506.XSHG 9862 600733.XSHG 9801 dtype: float64 1100 time 2016-09-01 code 600455.XSHG 10155 600980.XSHG 9933 603088.XSHG 9885 603027.XSHG 9881 603838.XSHG 9849 dtype: float64 1114 time 2016-10-01 code 600455.XSHG 10177 600980.XSHG 10053 603027.XSHG 9976 603088.XSHG 9970 603779.XSHG 9969 dtype: float64 1123 time 2016-11-01 code 603859.XSHG 10604 600817.XSHG 10441 603779.XSHG 10403 603189.XSHG 10400 600385.XSHG 10387 dtype: float64 1130 time 2016-12-01 code 603859.XSHG 10599 600817.XSHG 10443 603189.XSHG 10410 603779.XSHG 10400 600385.XSHG 10391 dtype: float64 1130 time 2017-01-01 code 603859.XSHG 10554 603189.XSHG 10521 600817.XSHG 10451 600385.XSHG 10372 603518.XSHG 10326 dtype: float64 1130 time 2017-02-01 code 603189.XSHG 10618 603859.XSHG 10489 600817.XSHG 10474 600385.XSHG 10409 603779.XSHG 10399 dtype: float64 1131 time 2017-03-01 code 603189.XSHG 10638 600817.XSHG 10488 603859.XSHG 10467 603779.XSHG 10438 600385.XSHG 10420 dtype: float64 1131 time 2017-04-01 code 603189.XSHG 10792 603779.XSHG 10609 600385.XSHG 10587 603022.XSHG 10441 603088.XSHG 10438 dtype: float64 1152 time 2017-05-01 code 603088.XSHG 11346 603903.XSHG 11275 603960.XSHG 11187 603040.XSHG 11168 603319.XSHG 11143 dtype: float64 1240 time 2017-06-01 code 603088.XSHG 11410 603040.XSHG 11337 603903.XSHG 11331 603960.XSHG 11255 603966.XSHG 11254 dtype: float64 1245 time 2017-07-01 code 603088.XSHG 11429 603903.XSHG 11410 603040.XSHG 11369 603966.XSHG 11275 603960.XSHG 11264 dtype: float64 1246 time 2017-08-01 code 603903.XSHG 11545 603088.XSHG 11454 603040.XSHG 11379 603960.XSHG 11310 603966.XSHG 11286 dtype: float64 1248 time 2017-09-01 code 603040.XSHG 11983 600455.XSHG 11890 603326.XSHG 11672 603429.XSHG 11576 603229.XSHG 11490 dtype: float64 1309 time 2017-10-01 code 603040.XSHG 12019 600455.XSHG 11897 603326.XSHG 11673 600506.XSHG 11525 603229.XSHG 11497 dtype: float64 1309 time 2017-11-01 code 603960.XSHG 12511 603232.XSHG 12503 603859.XSHG 12377 603383.XSHG 12297 603500.XSHG 12238 dtype: float64 1352 time 2017-12-01 code 603232.XSHG 12533 603960.XSHG 12437 603859.XSHG 12353 603500.XSHG 12288 603040.XSHG 12275 dtype: float64 1352
df = pd.Panel(result)
matplotlib.rcParams['axes.unicode_minus']=Falseindex = ['Top20','port1','port2','port3','port4','port5']def draw_backtest_picture(ind):plt.figure(figsize =(10,4))plt.plot(df.ix[:,ind,0]-df.ix[:,'benchmark',0], label = 'excess return: %s'%ind)plt.xlabel('backtest excess return of factor %s'%ind)plt.legend(loc=0)grid()for ind in index:draw_backtest_picture(ind)
本社区仅针对特定人员开放
查看需注册登录并通过风险意识测评
5秒后跳转登录页面...
移动端课程