请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  量化平台 帖子:3365791 新帖:0

多因子选股模型-证券投资学作业

此人已认证发表于:5 月 10 日 07:36回复(1)

基于《【研究】量化选股-因子检验和多因子模型的构建》。在源码的基础上添加了一些因子,同时将时间滞后。
1.时间选取11-17年作为样本期,并进行因子筛选及检验。
2.基准选取上证综指(000001.XSHG)

拟选取以下四个方面的因子:

  1. 价值类因子:市盈率(PE),市净率(PB),市销率(PS),基本每股收益(EPS),账面市值比(B/M)

  2. 成长类因子:净资产收益率(ROE),总资产净利率(ROA),销售毛利率(gross_profit_margin),净利润同比增长率(inc_net_profit_year_on_year),净利润环比增长率(inc_net_profit_annual),营业利润同比增长率(inc_operation_profit_year_on_year),营业利润环比增长率(inc_operation_profit_annual),主营毛利率(GP/R)、净利率(P/R)

  3. 规模类因子:净利润(net_profit),营业收入(operating_revenue),总股本(capitalization),流通股本(circulating_cap),总市值(market_cap),流通市值(circulating_market_cap),资产负债(L/A)、固定资产比例(FAP)

  4. 交投类因子:换手率(turnover_ratio)

采用排序法对因子的有效性进行验证。

多因子选股模型¶

基于《【研究】量化选股-因子检验和多因子模型的构建》

在源码的基础上添加了一些因子,同时将时间滞后。

1.时间选取11-17年作为样本期,并进行因子筛选及检验。

2.基准选取上证综指(000001.XSHG)

模型构建及因子选取¶

拟选取以下四个方面的因子:

  1. 价值类因子:市盈率(PE),市净率(PB),市销率(PS),基本每股收益(EPS),账面市值比(B/M)

  2. 成长类因子:净资产收益率(ROE),总资产净利率(ROA),销售毛利率(gross_profit_margin),净利润同比增长率(inc_net_profit_year_on_year),净利润环比增长率(inc_net_profit_annual),营业利润同比增长率(inc_operation_profit_year_on_year),营业利润环比增长率(inc_operation_profit_annual),主营毛利率(GP/R)、净利率(P/R)

  3. 规模类因子:净利润(net_profit),营业收入(operating_revenue),总股本(capitalization),流通股本(circulating_cap),总市值(market_cap),流通市值(circulating_market_cap),资产负债(L/A)、固定资产比例(FAP)

  4. 交投类因子:换手率(turnover_ratio)

采用排序法对因子的有效性进行验证。

import pandas as pdfrom pandas import Series, DataFrameimport numpy as npimport statsmodels.api as smimport scipy.stats as scsimport matplotlib.pyplot as plt

月初取出所有因子数值,例如2018-01-01

factors = ['PE', 'PB', 'PS', 'EPS', 'B/M',   'ROE', 'ROA', 'gross_profit_margin', 'inc_net_profit_year_on_year', 'inc_net_profit_annual', 
                     'inc_operation_profit_year_on_year', 'inc_operation_profit_annual', 'GP/R', 'P/R',   'net_profit', 'operating_revenue', 'capitalization', 'circulating_cap', 'market_cap', 'circulating_market_cap', 'L/A', 'FAP',   'turnover_ratio']# 月初取出因子值def get_factors(fdate, factors):stock_set = get_index_stocks('000001.XSHG', fdate)q = query(valuation.code,balance.total_owner_equities/valuation.market_cap/100000000,valuation.pe_ratio,valuation.pb_ratio,valuation.ps_ratio,income.basic_eps,indicator.roe,indicator.roa,indicator.gross_profit_margin,indicator.inc_net_profit_year_on_year,indicator.inc_net_profit_annual,indicator.inc_operation_profit_year_on_year,indicator.inc_operation_profit_annual,income.total_profit/income.operating_revenue,income.net_profit/income.operating_revenue,income.net_profit,income.operating_revenue,valuation.capitalization,valuation.circulating_cap,valuation.market_cap,valuation.circulating_market_cap,balance.total_liability/balance.total_assets,balance.fixed_assets/balance.total_assets,valuation.turnover_ratio).filter(valuation.code.in_(stock_set),valuation.circulating_market_cap)fdf = get_fundamentals(q, date=fdate)fdf.index = fdf['code']fdf.columns = ['code'] + factorsreturn fdf.iloc[:,-23:]fdf = get_factors('2018-01-01', factors)fdf.head().T
code600000.XSHG600004.XSHG600006.XSHG600007.XSHG600008.XSHG
PE1.143846e+004.827871e-016.119287e-013.655787e-016.300832e-01
PB6.804400e+002.009680e+011.093361e+022.808410e+013.863470e+01
PS9.538000e-012.144100e+001.787800e+002.736200e+003.135900e+00
EPS2.244700e+004.643700e+006.311000e-016.594300e+002.607200e+00
B/M4.800000e-011.900000e-01-9.700000e-031.800000e-013.390000e-02
ROE3.413200e+002.754000e+00-2.960000e-012.928800e+001.513400e+00
ROA2.316000e-011.990600e+00-7.116000e-011.580100e+004.027000e-01
gross_profit_marginNaN3.798770e+011.084940e+015.062150e+013.024400e+01
inc_net_profit_year_on_year-1.588900e+008.844300e+00-6.544932e+026.390600e+002.954560e+01
inc_net_profit_annual-7.200000e-034.770500e+00-2.012624e+033.357990e+01-3.995000e-01
inc_operation_profit_year_on_year-1.833300e+001.868020e+01-6.605708e+02-4.675000e-011.377109e+02
inc_operation_profit_annual2.019000e+002.919000e+00-1.447837e+032.506970e+01-2.233990e+01
GP/R4.344424e-013.085870e-01-4.038174e-023.296453e-011.174919e-01
P/R3.350075e-012.308729e-01-3.273438e-022.473615e-018.858635e-02
net_profit1.387400e+103.935985e+08-1.587791e+081.823589e+081.900669e+08
operating_revenue4.141400e+101.704828e+094.850528e+097.372163e+082.145555e+09
capitalization2.935208e+062.069320e+052.000000e+051.007282e+054.820614e+05
circulating_cap2.810376e+062.069320e+052.000000e+051.007282e+054.820614e+05
market_cap3.695427e+033.041901e+021.170000e+021.726482e+022.477796e+02
circulating_market_cap3.538264e+033.041901e+021.170000e+021.726482e+022.477796e+02
L/A9.302917e-012.719476e-016.862103e-014.571281e-016.699646e-01
FAP4.168150e-033.381332e-011.754063e-011.815366e-011.011792e-01
turnover_ratio5.820000e-024.095000e-015.574000e-017.120000e-023.734000e-01

对每个因子大小排序(以流通市值为例)¶

score = fdf['circulating_market_cap'].order()score.head()
code
603580.XSHG    5.0777
603991.XSHG    5.2659
603330.XSHG    5.3535
603041.XSHG    5.6300
603269.XSHG    5.7038
Name: circulating_market_cap, dtype: float64

股票个数¶

len(score)
1352

按照流通市值将股票池进行五等分¶

startdate = '2018-01-01'enddate = '2018-02-01'nextdate = '2018-03-01'df = {}circulating_market_cap = fdf['circulating_market_cap']port1 = list(score.index)[: len(score)/5]port2 = list(score.index)[ len(score)/5: 2*len(score)/5]port3 = list(score.index)[ 2*len(score)/5: -2*len(score)/5]port4 = list(score.index)[ -2*len(score)/5: -len(score)/5]port5 = list(score.index)[ -len(score)/5: ]

按流通市值加权计算组合月收益(例如2018-01,2018-02月收益)¶

def calculate_port_monthly_return(port, startdate, enddate, nextdate, circulating_market_cap):close1 = get_price(port, startdate, enddate, 'daily', ['close'])close2 = get_price(port, enddate, nextdate, 'daily', ['close'])weighted_m_return = ((close2['close'].ix[0,:]/close1['close'].ix[0,:]-1)* circulating_market_cap).sum()/(circulating_market_cap.ix[port].sum())return weighted_m_returncalculate_port_monthly_return(port1, '2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])
-0.09004705495088357

计算基准月收益¶

def calculate_benchmark_monthly_return(startdate, enddate, nextdate):close1 = get_price(['000001.XSHG'],startdate,enddate,'daily',['close'])['close']close2 = get_price(['000001.XSHG'],enddate, nextdate, 'daily',['close'])['close']benchmark_return = (close2.ix[0,:]/close1.ix[0,:]-1).sum()return benchmark_returncalculate_benchmark_monthly_return('2018-01-01','2018-02-01','2018-03-01')
0.029462448444448563

观察5个组合在2018年初一个月内的收益情况¶

从结果可以看出,在构建因子组合之前,前四组的收益跑输大盘。

benchmark_return = calculate_benchmark_monthly_return('2018-01-01', '2018-02-01', '2018-03-01')df['port1'] = calculate_port_monthly_return(port1,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])df['port2'] = calculate_port_monthly_return(port2,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])df['port3'] = calculate_port_monthly_return(port3,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])df['port4'] = calculate_port_monthly_return(port4,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])df['port5'] = calculate_port_monthly_return(port5,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])print Series(df)print 'benchmark_return %s'%benchmark_return
port1   -0.090047
port2   -0.088405
port3   -0.075064
port4   -0.060624
port5    0.068629
dtype: float64
benchmark_return 0.0294624484444

构建因子组合,计算不同组合月收益率¶

时间:2011-2017年,计算1-5组以及benchmark组合的月收益率,形成84×6的面板数据。

factors = ['PE', 'PB', 'PS', 'EPS', 'B/M',   'ROE', 'ROA', 'gross_profit_margin', 'inc_net_profit_year_on_year', 'inc_net_profit_annual', 
                     'inc_operation_profit_year_on_year', 'inc_operation_profit_annual', 'GP/R', 'P/R',   'net_profit', 'operating_revenue', 'capitalization', 'circulating_cap', 'market_cap', 'circulating_market_cap', 'L/A', 'FAP',   'turnover_ratio']#因为研究模块取fundamental数据默认date为研究日期的前一天。所以要自备时间序列。按月取year = ['2011','2012','2013','2014','2015','2016','2017']month = ['01','02','03','04','05','06','07','08','09','10','11','12']result = {}for i in range(7*12):startdate = year[i/12] + '-' + month[i%12] + '-01'try:enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'except IndexError:enddate = '2018-01-01'try:nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'except IndexError:if enddate == '2018-01-01':nextdate = '2018-02-01'else:nextdate = '2018-01-01'# print 'time %s'%startdatefdf = get_factors(startdate,factors)CMV = fdf['circulating_market_cap']#5个组合,23个因子df = DataFrame(np.zeros(6*23).reshape(6,23),index = ['port1','port2','port3','port4','port5','benchmark'],columns = factors)for fac in factors:score = fdf[fac].order()port1 = list(score.index)[: len(score)/5]port2 = list(score.index)[ len(score)/5+1: 2*len(score)/5]port3 = list(score.index)[ 2*len(score)/5+1: -2*len(score)/5]port4 = list(score.index)[ -2*len(score)/5+1: -len(score)/5]port5 = list(score.index)[ -len(score)/5+1: ]df.ix['port1',fac] = calculate_port_monthly_return(port1,startdate,enddate,nextdate,circulating_market_cap)df.ix['port2',fac] = calculate_port_monthly_return(port2,startdate,enddate,nextdate,circulating_market_cap)df.ix['port3',fac] = calculate_port_monthly_return(port3,startdate,enddate,nextdate,circulating_market_cap)df.ix['port4',fac] = calculate_port_monthly_return(port4,startdate,enddate,nextdate,circulating_market_cap)df.ix['port5',fac] = calculate_port_monthly_return(port5,startdate,enddate,nextdate,circulating_market_cap)df.ix['benchmark',fac] = calculate_benchmark_monthly_return(startdate,enddate,nextdate)# print 'factor %s'%facresult[i+1]=dfmonthly_return = pd.Panel(result)

取某个因子的5个组合月收益情况(例如市盈率PE)¶

monthly_return[:,:,'PE']

12345678910...75767778798081828384
port1-0.0639610.057468-0.0035380.011939-0.0007670.0280050.048595-0.003958-0.1095660.062509...0.021345-0.006460-0.0011980.0497910.0093380.0493690.0580720.069637-0.0331600.056772
port2-0.0650090.076102-0.027128-0.018031-0.0669940.0311460.028017-0.046184-0.1200760.034576...0.037914-0.048666-0.0533620.0728870.0403650.0368330.0694510.003253-0.0223270.018349
port3-0.0569320.079801-0.017569-0.027592-0.0731960.034040-0.025730-0.054367-0.1290130.045660...0.017931-0.045419-0.0530200.0549200.0280660.0189090.0274870.002008-0.0471840.006735
port4-0.0212930.046165-0.005278-0.011301-0.0695440.019637-0.019397-0.080514-0.1076110.081045...0.004030-0.021088-0.0054800.0573720.0656310.025304-0.0050430.0596990.0169780.023916
port50.0137600.0249530.0504580.006419-0.0548360.007156-0.035373-0.041296-0.0524940.068615...0.011837-0.0219790.0486630.0214660.0749650.014275-0.003084-0.0033510.0038360.009916
benchmark-0.0188200.0428590.016612-0.011870-0.0643260.005755-0.020142-0.054642-0.0826490.053409...0.007198-0.038710-0.0130700.0300680.0302660.0226200.0021560.006382-0.0230560.009257

6 rows × 84 columns

总收益情况¶

(monthly_return[:,:,'PE'].T+1).cumprod().tail()

port1port2port3port4port5benchmark
802.1739261.6523341.7089281.9804522.4331851.180349
812.3001711.7670901.7559011.9704652.4256811.182893
822.4603471.7728391.7594272.0880992.4175531.190442
832.3787631.7332571.6764092.1235522.4268251.162996
842.5138091.7650601.6877002.1743382.4508911.173762

因子检验量化指标¶

模型建立后,计算n个组合的年化复合收益、超额收益、不同市场情况下高收益组合跑赢benchmark和低收益组合跑输benchmark的概率。

检验有效性的量化标准:

(1)序列1-n的组合,年化复合收益应满足一定排序关系,即组合因子大小与收益具有较大相关关系。假定序列i的组合年化收益为Xi,则Xi与i的相关性绝对值Abs(Corr(Xi,i))>MinCorr。此处MinCorr为给定的最小相关阈值。

(2)序列1和n表示的两个极端组合超额收益分别为AR1、ARn。MinARtop、MinARbottom表示最小超额收益阈值。 if AR1 > ARn #因子越小,收益越大 则应满足AR1 > MinARtop >0 and ARn < MinARbottom < 0 if AR1 < ARn #因子越小,收益越大 则应满足ARn > MinARtop >0 and AR1 < MinARbottom < 0 以上条件保证因子最大和最小的两个组合,一个明显跑赢市场,一个明显跑输市场。

(3)在任何市场行情下,1和n两个极端组合,都以较高概率跑赢或跑输市场。 以上三个条件,可以选出过去一段时间有较好选股能力的因子。

因为开始选择的因子较多,因此三条量化标准的选择更加严格,采用如下标准进行选取:

(1)记录因子相关性,>0.7或<-0.7合格。

(2)记录赢家组合和输家组合超额收益。

(3)记录赢家组合跑赢概率>0.6和输家组合跑输概率>0.4合格。

total_return = {}annual_return = {}excess_return = {}win_prob = {}loss_prob = {}effect_test = {}MinCorr = 0.3Minbottom = -0.05Mi*p = 0.05for fac in factors:effect_test[fac] = {}monthly = monthly_return[:,:,fac]total_return[fac] = (monthly+1).T.cumprod().iloc[-1,:]-1annual_return[fac] = (total_return[fac]+1)**(1./6)-1excess_return[fac] = annual_return[fac]- annual_return[fac][-1]#判断因子有效性#1.年化收益与组合序列的相关性 大于 阈值effect_test[fac][1] = annual_return[fac][0:5].corr(Series([1,2,3,4,5],index = annual_return[fac][0:5].index))#2.高收益组合跑赢概率#因子小,收益小,port1是输家组合,port5是赢家组合if total_return[fac][0] < total_return[fac][-2]:loss_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))win_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]#超额收益effect_test[fac][2] = [excess_return[fac][-2]*100,excess_return[fac][0]*100]#因子小,收益大,port1是赢家组合,port5是输家组合else:loss_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))win_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]#超额收益effect_test[fac][2] = [excess_return[fac][0]*100,excess_return[fac][-2]*100]#由于选择的因子较多,test标准选取适当严格一些#effect_test[1]记录因子相关性,>0.7或<-0.7合格#effect_test[2]记录【赢家组合超额收益,输家组合超额收益】#effect_test[3]记录赢家组合跑赢概率和输家组合跑输概率。【>0.6,>0.4】合格 (因实际情况,跑输概率暂时不考虑)DataFrame(effect_test).T

123
B/M0.6281959[15.1984852636, 8.76175660448][0.690476190476, 0.404761904762]
EPS0.2488584[14.2720133294, 12.9632231367][0.678571428571, 0.357142857143]
FAP-0.5671644[13.4503120268, 9.44267504971][0.619047619048, 0.380952380952]
GP/R0.8064658[13.7519085368, 9.10242336036][0.619047619048, 0.357142857143]
L/A-0.5898578[16.5046555213, 12.1611504111][0.702380952381, 0.416666666667]
P/R0.9215462[13.980265264, 9.09336493425][0.642857142857, 0.380952380952]
PB-0.8818369[13.9012096024, 6.71073706755][0.619047619048, 0.428571428571]
PE0.1328435[13.9001078939, 13.4085302139][0.607142857143, 0.369047619048]
PS-0.5030761[14.1865783133, 9.18250270639][0.607142857143, 0.392857142857]
ROA0.5423133[19.3405425743, 9.77751849214][0.75, 0.380952380952]
ROE0.6386198[17.9776162079, 9.73910681099][0.654761904762, 0.404761904762]
capitalization-0.7644211[22.4171821446, 9.86517390072][0.583333333333, 0.404761904762]
circulating_cap-0.7761155[19.8132954476, 9.86514645415][0.571428571429, 0.369047619048]
circulating_market_cap-0.8791725[38.1580067747, 10.3384004828][0.714285714286, 0.369047619048]
gross_profit_margin0.7770139[15.5893122733, 9.22929383936][0.642857142857, 0.452380952381]
inc_net_profit_annual0.6899743[14.9827068239, 9.99043264863][0.678571428571, 0.392857142857]
inc_net_profit_year_on_year0.8082138[13.825611634, 3.32909642528][0.630952380952, 0.416666666667]
inc_operation_profit_annual0.5963116[13.1949471333, 9.79858245467][0.654761904762, 0.404761904762]
inc_operation_profit_year_on_year0.8663793[14.0478401847, 3.17046201915][0.654761904762, 0.404761904762]
market_cap-0.8262643[44.3574164544, 10.5284689923][0.738095238095, 0.369047619048]
net_profit0.04857344[12.1195026493, 8.12374126557][0.642857142857, 0.380952380952]
operating_revenue-0.7751005[23.9766654178, 11.219895262][0.630952380952, 0.345238095238]
turnover_ratio-0.6218568[10.175151521, 4.22831336907][0.619047619048, 0.511904761905]

有效因子¶

同时满足上述三个条件的有:

(1)价值类因子:市盈率(B/M)

(2)成长类因子:主营毛利率(P/R),销售毛利率(gross_profit_margin),净利润同比增长率(inc_net_profit_year_on_year),营业利润同比增长率( inc_operation_profit_year_on_year)

(3)规模类因子:营业收入(operating_revenue),总股本(capitalization),流通股本(circulating_cap),总市值(market_cap),流通市值(circulating_market_cap),资产负债(L/A)

有效因子总收益¶

effective_factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 
                     'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']DataFrame(total_return).ix[:,effective_factors].T

port1port2port3port4port5benchmark
B/M0.9182281.4806581.1420451.1481551.6864980.173762
L/A1.8700861.5265320.8437021.1241051.2970990.173762
P/R0.9527241.0608591.1836191.6499511.5241960.173762
capitalization2.8373461.6560631.3727311.9647151.0350160.173762
circulating_cap2.3824491.6927371.1703791.7476331.0350130.173762
circulating_market_cap6.8127512.6195961.2481711.0639171.0868870.173762
gross_profit_margin0.9670121.0866520.8995551.1833251.7403730.173762
inc_net_profit_year_on_year0.4213560.9941271.0550842.4462681.5041890.173762
inc_operation_profit_year_on_year0.4086450.8018971.3814422.1837901.5329790.173762
market_cap9.1165291.8637491.8645670.8960071.1080290.173762
operating_revenue3.1333991.3252401.2678161.0063261.1864490.173762

有效因子年化收益¶

DataFrame(annual_return).ix[:,effective_factors].T

port1port2port3port4port5benchmark
B/M0.1146800.1634860.1353720.1359110.1790470.027062
L/A0.1921090.1670450.1073420.1337810.1486740.027062
P/R0.1179960.1280840.1390150.1763580.1668650.027062
capitalization0.2512340.1768100.1548920.1985710.1257140.027062
circulating_cap0.2251950.1795030.1378610.1834770.1257140.027062
circulating_market_cap0.4086420.2391100.1445590.1283630.1304460.027062
gross_profit_margin0.1193550.1304250.1128640.1389900.1829550.027062
inc_net_profit_year_on_year0.0603530.1219120.1275560.2290180.1653180.027062
inc_operation_profit_year_on_year0.0587670.1031170.1555980.2128970.1675400.027062
market_cap0.4706360.1916690.1917260.1125170.1323470.027062
operating_revenue0.2668290.1510070.1462200.1230530.1392610.027062

各个因子6组收益的时间序列图:¶

def draw_return_picture(df):plt.figure(figsize =(10,4))plt.plot((df.T+1).cumprod().ix[:,0], label = 'port1')plt.plot((df.T+1).cumprod().ix[:,1], label = 'port2')plt.plot((df.T+1).cumprod().ix[:,2], label = 'port3')plt.plot((df.T+1).cumprod().ix[:,3], label = 'port4')plt.plot((df.T+1).cumprod().ix[:,4], label = 'port5')plt.plot((df.T+1).cumprod().ix[:,5], label = 'benchmark')plt.xlabel('return of factor %s'%fac)plt.legend(loc=0)for fac in effective_factors:draw_return_picture(monthly_return[:,:,fac])

冗余因子的剔除¶

有些因子,因为内在的逻辑比较相近等原因,选出来的组合在个股构成和收益等方面相关性较高。所以要对这些因子做冗余剔除,保留同类因子中收益最好、区分度最高的因子。 由于本人能力有限,未完成此步骤,具体方法:

(1)对不同因子的n个组合打分。收益越大分值越大。分值达到好将分值赋给每月该组合内的所有个股。

if AR1 > ARn #因子越小,收益越大

则组合i的分值为(n-i+1)

if AR1 < ARn #因子越小,收益越小

则组合i的分值为i

(2)按月计算个股不同因子得分的相关性矩阵。得到第t月个股的因子得分相关性矩阵Score_Corrt,u,v。u,v为因子序号。

(3)计算样本期内相关性矩阵的平均值。即样本期共m个月,加总矩阵后取1/m。

(4)设定得分相关性阈值MinScoreCorr。只保留与其他因子相关性较小的因子。

模型建立和选股¶

根据选好的有效因子,每月初对市场个股计算因子得分,按一定权重求得所有因子的平均分。如遇因子当月无取值时,按剩下的因子分值求加权平均。通过对个股的加权平均得分进行排序,选择排名靠前的股票交易。

以下代码段等权重对因子分值求和,选出分值最高的股票进行交易

def score_stock(fdate):#B/M, L/A, P/R, capitalization, circulating_cap, circulating_market_cap, market_cap, operating_revenue#八个因子越小收益越大,分值越大,应降序排;gross_profit_margin, inc_net_profit_year_on_year, #inc_operation_profit_year_on_year三个因子越大收益越大应顺序排effective_factors = {'inc_net_profit_year_on_year':True,'gross_profit_margin':True,'inc_operation_profit_year_on_year':True, 'B/M':False,'L/A':False,'P/R':False, 'capitalization':False, 'circulating_cap':False,'circulating_market_cap':False, 'market_cap':False, 'operating_revenue':False}fdf = get_factors(fdate)score = {}for fac,value in effective_factors.items():score[fac] = fdf[fac].rank(ascending = value,method = 'first')print DataFrame(score).T.sum().order(ascending = False).head(5)score_stock = list(DataFrame(score).T.sum().order(ascending = False).index)return score_stock,fdf['circulating_market_cap']def get_factors(fdate):factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 
                     'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']stock_set = get_index_stocks('000001.XSHG',fdate)q = query(valuation.code,balance.total_owner_equities/valuation.market_cap/100000000,balance.total_liability/balance.total_assets,income.net_profit/income.operating_revenue,valuation.capitalization,valuation.circulating_cap,valuation.circulating_market_cap,indicator.gross_profit_margin,indicator.inc_net_profit_year_on_year,indicator.inc_operation_profit_year_on_year,valuation.market_cap,income.operating_revenue).filter(valuation.code.in_(stock_set))fdf = get_fundamentals(q,date = fdate)fdf.index = fdf['code']fdf.columns = ['code'] + factorsreturn fdf.iloc[:,-11:][score_result,circulating_market_cap] = score_stock('2017-01-01')
code
603859.XSHG    10554
603189.XSHG    10521
600817.XSHG    10451
600385.XSHG    10372
603518.XSHG    10326
dtype: float64

6个组合和benchmark在7年中的月收益率¶

计算port1-port5以及TOP20和benchmark的月收益率,时间跨度为7×12=84个月,并将所有数据储存在panel中。

year = ['2011','2012','2013','2014','2015','2016','2017']month = ['01','02','03','04','05','06','07','08','09','10','11','12']factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 
          'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']result = {}for i in range(7*12):startdate = year[i/12] + '-' + month[i%12] + '-01'try:enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'except IndexError:enddate = '2018-01-01'try:nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'except IndexError:if enddate == '2018-01-01':nextdate = '2018-02-01'else:nextdate = '2018-01-01'print 'time %s'%startdate#综合11个因子打分后,划分几个组合df = DataFrame(np.zeros(7),index = ['Top20','port1','port2','port3','port4','port5','benchmark'])[score,circulating_market_cap] = score_stock(startdate)port0 = score[:20]port1 = score[: len(score)/5]port2 = score[ len(score)/5+1: 2*len(score)/5]port3 = score[ 2*len(score)/5+1: -2*len(score)/5]port4 = score[ -2*len(score)/5+1: -len(score)/5]port5 = score[ -len(score)/5+1: ]print len(score)
 df.ix['Top20'] = calculate_port_monthly_return(port0,startdate,enddate,nextdate,circulating_market_cap)df.ix['port1'] = calculate_port_monthly_return(port1,startdate,enddate,nextdate,circulating_market_cap)df.ix['port2'] = calculate_port_monthly_return(port2,startdate,enddate,nextdate,circulating_market_cap)df.ix['port3'] = calculate_port_monthly_return(port3,startdate,enddate,nextdate,circulating_market_cap)df.ix['port4'] = calculate_port_monthly_return(port4,startdate,enddate,nextdate,circulating_market_cap)df.ix['port5'] = calculate_port_monthly_return(port5,startdate,enddate,nextdate,circulating_market_cap)df.ix['benchmark'] = calculate_benchmark_monthly_return(startdate,enddate,nextdate)result[i+1]=df
time 2011-01-01
code
600671.XSHG    8250
600506.XSHG    8065
600365.XSHG    8040
600634.XSHG    7864
600647.XSHG    7843
dtype: float64
867
time 2011-02-01
code
600671.XSHG    8275
600365.XSHG    8059
600506.XSHG    8055
600634.XSHG    7874
600647.XSHG    7855
dtype: float64
867
time 2011-03-01
code
600671.XSHG    8266
600506.XSHG    8034
600365.XSHG    7951
600634.XSHG    7852
600647.XSHG    7842
dtype: float64
866
time 2011-04-01
code
600671.XSHG    8285
600365.XSHG    7943
600634.XSHG    7902
600617.XSHG    7852
600077.XSHG    7834
dtype: float64
874
time 2011-05-01
code
600671.XSHG    8522
600340.XSHG    8239
600365.XSHG    8209
600562.XSHG    8103
600613.XSHG    8097
dtype: float64
885
time 2011-06-01
code
600671.XSHG    8506
600365.XSHG    8221
600149.XSHG    8120
600562.XSHG    8104
600613.XSHG    8104
dtype: float64
885
time 2011-07-01
code
600671.XSHG    8518
600365.XSHG    8240
600149.XSHG    8140
600613.XSHG    8111
600562.XSHG    8098
dtype: float64
885
time 2011-08-01
code
600671.XSHG    8534
600149.XSHG    8126
600613.XSHG    8116
600562.XSHG    8076
600520.XSHG    7937
dtype: float64
886
time 2011-09-01
code
600634.XSHG    8410
600562.XSHG    8198
600671.XSHG    8059
600476.XSHG    7986
600077.XSHG    7970
dtype: float64
901
time 2011-10-01
code
600634.XSHG    8416
600562.XSHG    8113
600671.XSHG    8071
600476.XSHG    8037
600077.XSHG    7963
dtype: float64
902
time 2011-11-01
code
600671.XSHG    8693
600705.XSHG    8048
600421.XSHG    8030
600476.XSHG    8030
600576.XSHG    8006
dtype: float64
913
time 2011-12-01
code
600671.XSHG    8707
600576.XSHG    8080
600705.XSHG    8064
600476.XSHG    8043
600571.XSHG    7970
dtype: float64
913
time 2012-01-01
code
600671.XSHG    8688
600576.XSHG    8088
600705.XSHG    8074
600476.XSHG    8044
600421.XSHG    7984
dtype: float64
913
time 2012-02-01
code
600671.XSHG    8695
600136.XSHG    8190
600576.XSHG    8103
600705.XSHG    8086
600476.XSHG    8068
dtype: float64
913
time 2012-03-01
code
600671.XSHG    8702
600136.XSHG    8178
600576.XSHG    8088
600476.XSHG    8047
600571.XSHG    7994
dtype: float64
912
time 2012-04-01
code
600671.XSHG    8748
600365.XSHG    8250
600576.XSHG    8223
600136.XSHG    8201
600733.XSHG    8149
dtype: float64
914
time 2012-05-01
code
600671.XSHG    8792
600593.XSHG    8544
600562.XSHG    8469
600513.XSHG    8430
600576.XSHG    8395
dtype: float64
920
time 2012-06-01
code
600634.XSHG    8708
600593.XSHG    8620
600513.XSHG    8496
600562.XSHG    8481
600455.XSHG    8228
dtype: float64
922
time 2012-07-01
code
600634.XSHG    8705
600593.XSHG    8637
600562.XSHG    8493
600513.XSHG    8400
600571.XSHG    8239
dtype: float64
922
time 2012-08-01
code
600634.XSHG    8707
600593.XSHG    8636
600562.XSHG    8496
600513.XSHG    8409
600571.XSHG    8249
dtype: float64
922
time 2012-09-01
code
600136.XSHG    9255
600485.XSHG    8874
600733.XSHG    8834
600749.XSHG    8725
600520.XSHG    8476
dtype: float64
933
time 2012-10-01
code
600136.XSHG    9251
600485.XSHG    8875
600733.XSHG    8824
600749.XSHG    8732
600758.XSHG    8475
dtype: float64
933
time 2012-11-01
code
600634.XSHG    9496
600733.XSHG    8811
600365.XSHG    8663
600758.XSHG    8474
600647.XSHG    8473
dtype: float64
940
time 2012-12-01
code
600634.XSHG    9494
600733.XSHG    8859
600365.XSHG    8682
600647.XSHG    8520
600758.XSHG    8480
dtype: float64
940
time 2013-01-01
code
600634.XSHG    9494
600733.XSHG    8849
600365.XSHG    8678
600647.XSHG    8525
600758.XSHG    8484
dtype: float64
940
time 2013-02-01
code
600634.XSHG    9480
600733.XSHG    8821
600647.XSHG    8538
600758.XSHG    8493
600980.XSHG    8458
dtype: float64
940
time 2013-03-01
code
600634.XSHG    9482
600733.XSHG    8832
600647.XSHG    8548
600758.XSHG    8504
600599.XSHG    8498
dtype: float64
942
time 2013-04-01
code
600634.XSHG    9396
600613.XSHG    8620
600985.XSHG    8602
600599.XSHG    8492
600647.XSHG    8442
dtype: float64
942
time 2013-05-01
code
600634.XSHG    9449
600136.XSHG    8910
600980.XSHG    8731
600985.XSHG    8607
600599.XSHG    8545
dtype: float64
942
time 2013-06-01
code
600485.XSHG    9022
600136.XSHG    8892
600980.XSHG    8726
600576.XSHG    8345
600706.XSHG    8332
dtype: float64
941
time 2013-07-01
code
600485.XSHG    9032
600136.XSHG    8902
600980.XSHG    8712
600706.XSHG    8331
600576.XSHG    8318
dtype: float64
941
time 2013-08-01
code
600485.XSHG    9037
600980.XSHG    8705
600576.XSHG    8343
600706.XSHG    8313
600379.XSHG    8302
dtype: float64
941
time 2013-09-01
code
600365.XSHG    8997
600485.XSHG    8938
600980.XSHG    8832
600615.XSHG    8649
600593.XSHG    8545
dtype: float64
941
time 2013-10-01
code
600365.XSHG    8983
600485.XSHG    8922
600980.XSHG    8826
600615.XSHG    8655
600234.XSHG    8566
dtype: float64
941
time 2013-11-01
code
600733.XSHG    8684
600485.XSHG    8457
600758.XSHG    8422
600099.XSHG    8401
600520.XSHG    8390
dtype: float64
941
time 2013-12-01
code
600733.XSHG    8723
600758.XSHG    8423
600520.XSHG    8402
600099.XSHG    8397
600146.XSHG    8356
dtype: float64
941
time 2014-01-01
code
600733.XSHG    8666
600485.XSHG    8421
600758.XSHG    8417
600520.XSHG    8400
600099.XSHG    8391
dtype: float64
941
time 2014-02-01
code
600733.XSHG    8702
600758.XSHG    8421
600146.XSHG    8411
600520.XSHG    8403
600099.XSHG    8393
dtype: float64
941
time 2014-03-01
code
600733.XSHG    8683
600485.XSHG    8460
600758.XSHG    8424
600520.XSHG    8422
600146.XSHG    8392
dtype: float64
941
time 2014-04-01
code
600146.XSHG    8422
600781.XSHG    8411
600506.XSHG    8409
600576.XSHG    8357
600485.XSHG    8354
dtype: float64
944
time 2014-05-01
code
600539.XSHG    9141
600980.XSHG    9020
600753.XSHG    8852
600593.XSHG    8846
600355.XSHG    8760
dtype: float64
948
time 2014-06-01
code
600539.XSHG    9140
600980.XSHG    9039
600753.XSHG    8873
600593.XSHG    8854
600355.XSHG    8765
dtype: float64
948
time 2014-07-01
code
600539.XSHG    9115
600980.XSHG    9006
600753.XSHG    8899
600593.XSHG    8853
600355.XSHG    8729
dtype: float64
947
time 2014-08-01
code
600539.XSHG    9151
600980.XSHG    8984
600593.XSHG    8846
600576.XSHG    8844
600753.XSHG    8838
dtype: float64
947
time 2014-09-01
code
600365.XSHG    8977
600099.XSHG    8765
600355.XSHG    8750
600847.XSHG    8742
600539.XSHG    8677
dtype: float64
951
time 2014-10-01
code
600365.XSHG    8988
600355.XSHG    8806
600099.XSHG    8776
600847.XSHG    8773
600476.XSHG    8696
dtype: float64
951
time 2014-11-01
code
600599.XSHG    9072
600696.XSHG    8995
600419.XSHG    8905
600136.XSHG    8883
600539.XSHG    8838
dtype: float64
968
time 2014-12-01
code
600696.XSHG    9009
600599.XSHG    8950
600419.XSHG    8910
600136.XSHG    8875
600539.XSHG    8836
dtype: float64
969
time 2015-01-01
code
600696.XSHG    9094
600599.XSHG    9039
600136.XSHG    8901
600419.XSHG    8895
600539.XSHG    8755
dtype: float64
969
time 2015-02-01
code
600696.XSHG    9076
600599.XSHG    8999
600419.XSHG    8902
600136.XSHG    8895
600539.XSHG    8756
dtype: float64
969
time 2015-03-01
code
600696.XSHG    9078
600599.XSHG    9007
600419.XSHG    8906
600539.XSHG    8785
600892.XSHG    8737
dtype: float64
969
time 2015-04-01
code
600696.XSHG    9142
600099.XSHG    8952
603601.XSHG    8946
600539.XSHG    8857
600599.XSHG    8817
dtype: float64
982
time 2015-05-01
code
603869.XSHG    9587
603088.XSHG    9461
600455.XSHG    9348
603898.XSHG    9339
603988.XSHG    9335
dtype: float64
1020
time 2015-06-01
code
603869.XSHG    9577
603088.XSHG    9544
603988.XSHG    9415
600455.XSHG    9412
600365.XSHG    9389
dtype: float64
1030
time 2015-07-01
code
603869.XSHG    9757
603088.XSHG    9632
603988.XSHG    9517
600455.XSHG    9494
603636.XSHG    9465
dtype: float64
1039
time 2015-08-01
code
603869.XSHG    9701
603988.XSHG    9515
600365.XSHG    9356
603010.XSHG    9319
600136.XSHG    9305
dtype: float64
1041
time 2015-09-01
code
600506.XSHG    9835
603099.XSHG    9546
600520.XSHG    9501
600593.XSHG    9441
600136.XSHG    9397
dtype: float64
1060
time 2015-10-01
code
600506.XSHG    9834
603099.XSHG    9563
600520.XSHG    9541
600593.XSHG    9476
600365.XSHG    9389
dtype: float64
1060
time 2015-11-01
code
603918.XSHG    9637
600980.XSHG    9520
600599.XSHG    9420
603601.XSHG    9391
600371.XSHG    9374
dtype: float64
1060
time 2015-12-01
code
600980.XSHG    9522
600753.XSHG    9475
603918.XSHG    9472
603010.XSHG    9364
600599.XSHG    9322
dtype: float64
1060
time 2016-01-01
code
603918.XSHG    9641
600980.XSHG    9549
600753.XSHG    9509
600599.XSHG    9438
603601.XSHG    9389
dtype: float64
1066
time 2016-02-01
code
603918.XSHG    9725
603778.XSHG    9652
600599.XSHG    9615
600980.XSHG    9538
603085.XSHG    9419
dtype: float64
1071
time 2016-03-01
code
603918.XSHG    9743
603778.XSHG    9706
600599.XSHG    9683
600980.XSHG    9576
600419.XSHG    9429
dtype: float64
1073
time 2016-04-01
code
600599.XSHG    9913
600419.XSHG    9801
603778.XSHG    9739
600080.XSHG    9710
603918.XSHG    9669
dtype: float64
1078
time 2016-05-01
code
603601.XSHG    9916
603918.XSHG    9907
600137.XSHG    9836
600733.XSHG    9693
603023.XSHG    9673
dtype: float64
1080
time 2016-06-01
code
600137.XSHG    9964
600733.XSHG    9869
603601.XSHG    9766
600506.XSHG    9756
603023.XSHG    9724
dtype: float64
1088
time 2016-07-01
code
600137.XSHG    10035
600733.XSHG     9957
600506.XSHG     9864
603601.XSHG     9716
603066.XSHG     9699
dtype: float64
1096
time 2016-08-01
code
600137.XSHG    10049
603322.XSHG     9969
603601.XSHG     9892
600506.XSHG     9862
600733.XSHG     9801
dtype: float64
1100
time 2016-09-01
code
600455.XSHG    10155
600980.XSHG     9933
603088.XSHG     9885
603027.XSHG     9881
603838.XSHG     9849
dtype: float64
1114
time 2016-10-01
code
600455.XSHG    10177
600980.XSHG    10053
603027.XSHG     9976
603088.XSHG     9970
603779.XSHG     9969
dtype: float64
1123
time 2016-11-01
code
603859.XSHG    10604
600817.XSHG    10441
603779.XSHG    10403
603189.XSHG    10400
600385.XSHG    10387
dtype: float64
1130
time 2016-12-01
code
603859.XSHG    10599
600817.XSHG    10443
603189.XSHG    10410
603779.XSHG    10400
600385.XSHG    10391
dtype: float64
1130
time 2017-01-01
code
603859.XSHG    10554
603189.XSHG    10521
600817.XSHG    10451
600385.XSHG    10372
603518.XSHG    10326
dtype: float64
1130
time 2017-02-01
code
603189.XSHG    10618
603859.XSHG    10489
600817.XSHG    10474
600385.XSHG    10409
603779.XSHG    10399
dtype: float64
1131
time 2017-03-01
code
603189.XSHG    10638
600817.XSHG    10488
603859.XSHG    10467
603779.XSHG    10438
600385.XSHG    10420
dtype: float64
1131
time 2017-04-01
code
603189.XSHG    10792
603779.XSHG    10609
600385.XSHG    10587
603022.XSHG    10441
603088.XSHG    10438
dtype: float64
1152
time 2017-05-01
code
603088.XSHG    11346
603903.XSHG    11275
603960.XSHG    11187
603040.XSHG    11168
603319.XSHG    11143
dtype: float64
1240
time 2017-06-01
code
603088.XSHG    11410
603040.XSHG    11337
603903.XSHG    11331
603960.XSHG    11255
603966.XSHG    11254
dtype: float64
1245
time 2017-07-01
code
603088.XSHG    11429
603903.XSHG    11410
603040.XSHG    11369
603966.XSHG    11275
603960.XSHG    11264
dtype: float64
1246
time 2017-08-01
code
603903.XSHG    11545
603088.XSHG    11454
603040.XSHG    11379
603960.XSHG    11310
603966.XSHG    11286
dtype: float64
1248
time 2017-09-01
code
603040.XSHG    11983
600455.XSHG    11890
603326.XSHG    11672
603429.XSHG    11576
603229.XSHG    11490
dtype: float64
1309
time 2017-10-01
code
603040.XSHG    12019
600455.XSHG    11897
603326.XSHG    11673
600506.XSHG    11525
603229.XSHG    11497
dtype: float64
1309
time 2017-11-01
code
603960.XSHG    12511
603232.XSHG    12503
603859.XSHG    12377
603383.XSHG    12297
603500.XSHG    12238
dtype: float64
1352
time 2017-12-01
code
603232.XSHG    12533
603960.XSHG    12437
603859.XSHG    12353
603500.XSHG    12288
603040.XSHG    12275
dtype: float64
1352
df = pd.Panel(result)

绘制六个组合的月超额收益率¶

matplotlib.rcParams['axes.unicode_minus']=Falseindex = ['Top20','port1','port2','port3','port4','port5']def draw_backtest_picture(ind):plt.figure(figsize =(10,4))plt.plot(df.ix[:,ind,0]-df.ix[:,'benchmark',0], label = 'excess return: %s'%ind)plt.xlabel('backtest excess return of factor %s'%ind)plt.legend(loc=0)grid()for ind in index:draw_backtest_picture(ind)

全部回复

0/140

量化课程

    移动端课程