见研究。
很长一串没有折叠起来的输出,是因为运算量比较大,中间打印了一下。
请自行忽略。。。
话说貌似好的策略可以赚零花钱。。所以大概只有我还在发研究贴了。。。
喜欢做一名宽客,是因为可以自己掌握命运。 ——丁鹏
这两天研读丁鹏的《量化投资——策略与技术》。是一本入门类的书籍,也确实给出了很多实用的方法和指导。
下面分享其中多因子模型,以及一些实证研究的结果。
总体分为基本面选股、市场行为选股。基本面选股包括:多因子模型,风格轮动模型,行业轮动模型。市场行为选股包括:资金流选股,动量反转模型,一致预期模型,趋势追踪模型和筹码选股。
今天要讲的是多因子模型。
多因子选股模型是广泛应用的一种方法。采用一系列的因子作为选股标准,满足则买入,不满足则卖出。不同的市场时期总有一些因子在发挥作用,该模型相对来说比较稳定。
模型的优点是可以综合很多信息后给出一个选股结果。选取的因子不同以及如何综合各个因子得到最终判断的方法不同会产生不同的模型。一般来说,综合因子的方法有打分法和回归法两种,打分法较为常见。
选取06-11年做样本期,进行因子检验和筛选。12-15年做OS-test(样本外检验),看回测效果。
股票选取上市时间超过1个季度的股票,benchmark = 000001.XSHG
根据市场经验和经济逻辑选取。选择更多和更有效的因子能增强模型信息捕获能力。 如一些基本面指标(PB、PE、EPS、增长率),技术面指标(动量、换手率、波动),或其他指标(预期收益增长、分析师一致预期变化、宏观经济变量)。
结合JQ能提供的数据,具体选取以下三个方面的因子:
(1)估值:账面市值比(B/M)、盈利收益率(EPS)、动态市盈(PEG)
(2)成长性:ROE、ROA、主营毛利率(GP/R)、净利率(P/R)
(3)资本结构:资产负债(L/A)、固定资产比例(FAP)、流通市值(CMV)
下面就上述10个因子的有效性进行验证。
采用排序的方法检验备选因子的有效性。
对任一个因子,从第一个月月初计算市场每只股票该因子的大小,从小到大对样本股票池排序,平均分为n个组合,一直持有到月末。每月初用同样的方法调整股票池。运用一定样本时期的数据来建立模型。
import pandas as pdfrom pandas import Series, DataFrameimport numpy as npimport statsmodels.api as smimport scipy.stats as scsimport matplotlib.pyplot as plt
注:此处剔除流通市值大于500亿的股票,避免权重股造成的影响。示例中原193只股票,剔除掉13只。
(1)估值:账面市值比(B/M)、盈利收益率(EPS)、动态市盈(PEG)
(2)成长性:ROE、ROA、主营毛利率(GP/R)、净利率(P/R)
(3)资本结构:资产负债(L/A)、固定资产比例(FAP)、流通市值(CMV)
factors = ['B/M','EPS','PEG','ROE','ROA','GP/R','P/R','L/A','FAP','CMV']#月初取出因子数值def get_factors(fdate,factors):stock_set = get_index_stocks('000001.XSHG',fdate)q = query(valuation.code,balance.total_owner_equities/valuation.market_cap/100000000,income.basic_eps,valuation.pe_ratio,income.net_profit/balance.total_owner_equities,income.net_profit/balance.total_assets,income.total_profit/income.operating_revenue,income.net_profit/income.operating_revenue,balance.total_liability/balance.total_assets,balance.fixed_assets/balance.total_assets,valuation.circulating_market_cap).filter(valuation.code.in_(stock_set),valuation.circulating_market_cap)fdf = get_fundamentals(q, date=fdate)fdf.index = fdf['code']fdf.columns = ['code'] + factorsreturn fdf.iloc[:,-10:]fdf = get_factors('2015-01-01',factors)fdf.head()
B/M | EPS | PEG | ROE | ROA | GP/R | P/R | L/A | FAP | CMV | |
---|---|---|---|---|---|---|---|---|---|---|
code | ||||||||||
600000.XSHG | 0.801232 | 0.6510 | 6.38 | 0.052452 | 0.003109 | 0.521640 | 0.400260 | 0.940733 | 0.002444 | 2341.3799 |
600004.XSHG | 0.668192 | 0.1900 | 12.89 | 0.028343 | 0.022621 | 0.238295 | 0.170218 | 0.201903 | 0.636765 | 125.7000 |
600005.XSHG | 1.045151 | 0.0340 | 66.09 | 0.009316 | 0.003532 | 0.023846 | 0.017453 | 0.620912 | 0.586177 | 361.3600 |
600006.XSHG | 0.669808 | 0.0318 | 56.26 | 0.008670 | 0.003421 | 0.018401 | 0.017138 | 0.605433 | 0.155339 | 119.0000 |
600007.XSHG | 0.332304 | 0.1400 | 36.47 | 0.028500 | 0.014895 | 0.345910 | 0.260073 | 0.477369 | 0.127698 | 154.0100 |
score = fdf['B/M'].order()score.head()
code 600301.XSHG -0.045989 600444.XSHG -0.029723 600228.XSHG -0.026231 600217.XSHG -0.026090 600876.XSHG -0.010862 Name: B/M, dtype: float64
股票池中股票数目
len(score)
966
startdate = '2015-01-01'enddate = '2015-02-01'nextdate = '2015-03-01'df = {}CMV = fdf['CMV']port1 = list(score.index)[: len(score)/5]port2 = list(score.index)[ len(score)/5: 2*len(score)/5]port3 = list(score.index)[ 2*len(score)/5: -2*len(score)/5]port4 = list(score.index)[ -2*len(score)/5: -len(score)/5]port5 = list(score.index)[ -len(score)/5: ]
15066.599999999999
def caculate_port_monthly_return(port,startdate,enddate,nextdate,CMV):close1 = get_price(port, startdate, enddate, 'daily', ['close'])close2 = get_price(port, enddate, nextdate, 'daily',['close'])weighted_m_return = ((close2['close'].ix[0,:]/close1['close'].ix[0,:]-1)*CMV).sum()/(CMV.ix[port].sum()) # weighted_m_return = (close['close'].ix[-1,:]/close['close'].ix[0,:]-1).mean()return weighted_m_returncaculate_port_monthly_return(port1,'2015-01-01','2015-02-01','2015-03-01',fdf['CMV'])
0.042660461430416276
def caculate_benchmark_monthly_return(startdate,enddate,nextdate):close1 = get_price(['000001.XSHG'],startdate,enddate,'daily',['close'])['close']close2 = get_price(['000001.XSHG'],enddate, nextdate, 'daily',['close'])['close']benchmark_return = (close2.ix[0,:]/close1.ix[0,:]-1).sum()return benchmark_returncaculate_benchmark_monthly_return('2015-01-01','2015-02-01','2015-03-01')
-0.06632375461831419
benchmark_return = caculate_benchmark_monthly_return(startdate,enddate,nextdate)df['port1'] = caculate_port_monthly_return(port1,startdate,enddate,nextdate,CMV)df['port2'] = caculate_port_monthly_return(port2,startdate,enddate,nextdate,CMV)df['port3'] = caculate_port_monthly_return(port3,startdate,enddate,nextdate,CMV)df['port4'] = caculate_port_monthly_return(port4,startdate,enddate,nextdate,CMV)df['port5'] = caculate_port_monthly_return(port5,startdate,enddate,nextdate,CMV)print Series(df)print 'benchmark_return %s'%benchmark_return
port1 0.042660 port2 -0.047200 port3 0.012783 port4 -0.063027 port5 -0.117817 dtype: float64 benchmark_return -0.0663237546183
数据范围:2009-2015共7年
得到结果monthly_return为panel数据,储存所有因子,在7×12个月内5个组合及benchmark的月收益率
factors = ['B/M','EPS','PEG','ROE','ROA','GP/R','P/R','L/A','FAP','CMV']#因为研究模块取fundmental数据默认date为研究日期的前一天。所以要自备时间序列。按月取year = ['2009','2010','2011','2012','2013','2014','2015']month = ['01','02','03','04','05','06','07','08','09','10','11','12']result = {}for i in range(7*12):startdate = year[i/12] + '-' + month[i%12] + '-01'try:enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'except IndexError:enddate = '2016-01-01'try:nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'except IndexError:if enddate == '2016-01-01':nextdate = '2016-02-01'else:nextdate = '2016-01-01'print 'time %s'%startdatefdf = get_factors(startdate,factors)CMV = fdf['CMV']#5个组合,10个因子df = DataFrame(np.zeros(6*10).reshape(6,10),index = ['port1','port2','port3','port4','port5','benchmark'],columns = factors)for fac in factors:score = fdf[fac].order()port1 = list(score.index)[: len(score)/5]port2 = list(score.index)[ len(score)/5+1: 2*len(score)/5]port3 = list(score.index)[ 2*len(score)/5+1: -2*len(score)/5]port4 = list(score.index)[ -2*len(score)/5+1: -len(score)/5]port5 = list(score.index)[ -len(score)/5+1: ]df.ix['port1',fac] = caculate_port_monthly_return(port1,startdate,enddate,nextdate,CMV)df.ix['port2',fac] = caculate_port_monthly_return(port2,startdate,enddate,nextdate,CMV)df.ix['port3',fac] = caculate_port_monthly_return(port3,startdate,enddate,nextdate,CMV)df.ix['port4',fac] = caculate_port_monthly_return(port4,startdate,enddate,nextdate,CMV)df.ix['port5',fac] = caculate_port_monthly_return(port5,startdate,enddate,nextdate,CMV)df.ix['benchmark',fac] = caculate_benchmark_monthly_return(startdate,enddate,nextdate)print 'factor %s'%facresult[i+1]=dfmonthly_return = pd.Panel(result)
time 2009-01-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2009-02-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2009-03-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2009-04-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2009-05-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2009-06-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2009-07-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2009-08-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2009-09-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2009-10-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2009-11-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2009-12-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2010-01-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2010-02-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2010-03-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2010-04-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2010-05-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2010-06-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2010-07-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2010-08-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2010-09-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2010-10-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2010-11-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2010-12-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2011-01-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2011-02-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2011-03-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2011-04-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2011-05-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2011-06-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2011-07-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2011-08-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2011-09-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2011-10-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2011-11-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2011-12-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2012-01-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2012-02-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2012-03-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2012-04-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2012-05-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2012-06-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2012-07-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2012-08-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2012-09-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2012-10-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2012-11-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2012-12-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2013-01-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2013-02-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2013-03-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2013-04-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2013-05-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2013-06-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2013-07-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2013-08-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2013-09-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2013-10-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2013-11-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2013-12-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2014-01-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2014-02-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2014-03-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2014-04-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2014-05-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2014-06-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2014-07-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2014-08-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2014-09-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2014-10-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2014-11-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2014-12-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2015-01-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2015-02-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2015-03-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2015-04-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2015-05-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2015-06-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2015-07-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2015-08-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2015-09-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2015-10-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2015-11-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV time 2015-12-01 factor B/M factor EPS factor PEG factor ROE factor ROA factor GP/R factor P/R factor L/A factor FAP factor CMV
monthly_return[:,:,'L/A']
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ... | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
port1 | 0.085377 | 0.036437 | 0.171064 | 0.068139 | 0.031353 | 0.071192 | 0.151097 | -0.187432 | 0.106754 | 0.053656 | ... | 0.160294 | 0.175091 | 0.184501 | -0.173977 | -0.126870 | -0.129739 | 0.020640 | 0.092677 | 0.059210 | -0.024342 |
port2 | 0.103112 | 0.051596 | 0.187690 | 0.086921 | 0.023387 | 0.060204 | 0.174042 | -0.207356 | 0.082787 | 0.068394 | ... | 0.108189 | 0.156054 | 0.061243 | -0.171119 | -0.069029 | -0.126619 | -0.008985 | 0.042185 | 0.040142 | -0.059092 |
port3 | 0.147836 | 0.047156 | 0.180190 | 0.078112 | 0.043121 | 0.076799 | 0.247551 | -0.256659 | 0.092075 | 0.058820 | ... | 0.135661 | 0.189277 | 0.101805 | -0.176225 | -0.131486 | -0.116174 | 0.003847 | 0.059488 | 0.023647 | -0.034763 |
port4 | 0.128160 | 0.053384 | 0.206332 | 0.051714 | 0.060888 | 0.075527 | 0.181603 | -0.234078 | 0.096960 | 0.089126 | ... | 0.186579 | 0.251944 | 0.131996 | -0.204895 | -0.156339 | -0.126159 | -0.002444 | 0.058323 | 0.032130 | -0.083936 |
port5 | 0.087002 | 0.042383 | 0.176984 | 0.053508 | 0.075892 | 0.210837 | 0.127438 | -0.238531 | 0.106269 | 0.077865 | ... | 0.123681 | 0.153539 | -0.017625 | -0.102327 | -0.066117 | -0.107363 | -0.025164 | 0.042663 | 0.037691 | -0.036141 |
benchmark | 0.069637 | 0.040645 | 0.150264 | 0.063078 | 0.063037 | 0.105417 | 0.151070 | -0.224937 | 0.084953 | 0.056645 | ... | 0.142077 | 0.175884 | 0.077732 | -0.160505 | -0.106272 | -0.125943 | -0.007348 | 0.057813 | 0.039465 | -0.046307 |
6 rows × 84 columns
(monthly_return[:,:,'L/A'].T+1).cumprod().tail()
port1 | port2 | port3 | port4 | port5 | benchmark | |
---|---|---|---|---|---|---|
80 | 2.333138 | 2.106109 | 1.935200 | 2.070456 | 2.384574 | 1.683733 |
81 | 2.381294 | 2.087185 | 1.942645 | 2.065396 | 2.324569 | 1.671362 |
82 | 2.601984 | 2.175233 | 2.058208 | 2.185855 | 2.423743 | 1.767989 |
83 | 2.756049 | 2.262552 | 2.106879 | 2.256086 | 2.515096 | 1.837762 |
84 | 2.688961 | 2.128853 | 2.033638 | 2.066720 | 2.424198 | 1.752661 |
模型建立后,计算n个组合的年化复合收益、超额收益、不同市场情况下高收益组合跑赢benchmark和低收益组合跑输benchmark的概率。
检验有效性的量化标准:
(1)序列1-n的组合,年化复合收益应满足一定排序关系,即组合因子大小与收益具有较大相关关系。假定序列i的组合年化收益为Xi,则Xi与i的相关性绝对值Abs(Corr(Xi,i))>MinCorr。此处MinCorr为给定的最小相关阀值。
(2)序列1和n表示的两个极端组合超额收益分别为AR1、ARn。MinARtop、MinARbottom表示最小超额收益阀值。
if AR1 > ARn #因子越小,收益越大
则应满足AR1 > MinARtop >0 and ARn < MinARbottom < 0
if AR1 < ARn #因子越小,收益越小
则应满足ARn > MinARtop >0 and AR1 < MinARbottom < 0
以上条件保证因子最大和最小的两个组合,一个明显跑赢市场,一个明显跑输市场。
(3) 在任何市场行情下,1和n两个极端组合,都以较高概率跑赢or跑输市场。
以上三个条件,可以选出过去一段时间有较好选股能力的因子。
total_return = {}annual_return = {}excess_return = {}win_prob = {}loss_prob = {}effect_test = {}MinCorr = 0.3Minbottom = -0.05Mi*p = 0.05for fac in factors:effect_test[fac] = {}monthly = monthly_return[:,:,fac]total_return[fac] = (monthly+1).T.cumprod().iloc[-1,:]-1annual_return[fac] = (total_return[fac]+1)**(1./6)-1excess_return[fac] = annual_return[fac]- annual_return[fac][-1]#判断因子有效性#1.年化收益与组合序列的相关性 大于 阀值effect_test[fac][1] = annual_return[fac][0:5].corr(Series([1,2,3,4,5],index = annual_return[fac][0:5].index))#2.高收益组合跑赢概率#因子小,收益小,port1是输家组合,port5是赢家组合if total_return[fac][0] < total_return[fac][-2]:loss_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))win_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]#超额收益effect_test[fac][2] = [excess_return[fac][-2]*100,excess_return[fac][0]*100]#因子小,收益大,port1是赢家组合,port5是输家组合else:loss_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))win_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]#超额收益effect_test[fac][2] = [excess_return[fac][0]*100,excess_return[fac][-2]*100]#effect_test[1]记录因子相关性,>0.5或<-0.5合格#effect_test[2]记录【赢家组合超额收益,输家组合超额收益】#effect_test[3]记录赢家组合跑赢概率和输家组合跑输概率。【>0.5,>0.4】合格(因实际情况,跑输概率暂时不考虑)DataFrame(effect_test)
B/M | CMV | EPS | FAP | GP/R | L/A | P/R | PEG | ROA | ROE | |
---|---|---|---|---|---|---|---|---|---|---|
1 | 0.8575994 | -0.9803264 | -0.2417111 | -0.7649626 | 0.4933612 | -0.3150764 | 0.7777911 | -0.9718449 | 0.440798 | 0.33929 |
2 | [8.20404772765, 0.444474144989] | [48.222405245, 0.73877402636] | [5.37013402955, 3.7224243677] | [6.81859530015, -1.43912556501] | [5.19660373631, 3.03172671649] | [8.11916063464, 6.0994646067] | [5.50314924722, 2.78928297967] | [10.1571258275, -1.27862288343] | [6.16875242846, 1.86261189929] | [5.89690556199, 2.87424422408] |
3 | [0.583333333333, 0.47619047619] | [0.714285714286, 0.47619047619] | [0.547619047619, 0.47619047619] | [0.52380952381, 0.511904761905] | [0.571428571429, 0.47619047619] | [0.630952380952, 0.47619047619] | [0.559523809524, 0.47619047619] | [0.595238095238, 0.464285714286] | [0.595238095238, 0.47619047619] | [0.535714285714, 0.5] |
检验结果,同时满足上述三个条件的5个有效因子:
(1)估值:账面市值比(B/M)、盈利收益率(EPS)、动态市盈(PEG)
(2)成长性:ROE、ROA、主营毛利率(GP/R)、净利率(P/R)
(3)资本结构:资产负债(L/A)、固定资产比例(FAP)、流通市值(CMV)
其中:CMV,FAP,PEG三个因子越小收益越大;B/M,P/R越大收益越大
小市值妖孽!!按CMV因子排序时,CMV小的组合总收益14.6倍,年化58%!
总收益第二名是FAP的port2,达到2.71倍。(这也是造成FAP组合收益相关性稍低的原因)
effective_factors = ['B/M','PEG','P/R','FAP','CMV']DataFrame(total_return).ix[:,effective_factors]
B/M | PEG | P/R | FAP | CMV | |
---|---|---|---|---|---|
port1 | 0.795662 | 1.980116 | 1.037343 | 1.515856 | 14.572930 |
port2 | 0.491867 | 1.270821 | 1.222759 | 2.716078 | 6.133603 |
port3 | 0.858536 | 1.135401 | 1.134942 | 1.530595 | 3.234002 |
port4 | 1.537122 | 0.916800 | 1.529077 | 0.805181 | 2.061771 |
port5 | 1.700595 | 0.633717 | 1.350320 | 0.619273 | 0.824615 |
benchmark | 0.752661 | 0.752661 | 0.752661 | 0.752661 | 0.752661 |
DataFrame(annual_return).ix[:,effective_factors]
B/M | PEG | P/R | FAP | CMV | |
---|---|---|---|---|---|
port1 | 0.102480 | 0.199607 | 0.125928 | 0.166221 | 0.580259 |
port2 | 0.068944 | 0.146473 | 0.142393 | 0.244555 | 0.387453 |
port3 | 0.108822 | 0.134784 | 0.134743 | 0.167357 | 0.271916 |
port4 | 0.167858 | 0.114541 | 0.167241 | 0.103452 | 0.205023 |
port5 | 0.180076 | 0.085249 | 0.153067 | 0.083644 | 0.105423 |
benchmark | 0.098035 | 0.098035 | 0.098035 | 0.098035 | 0.098035 |
def draw_return_picture(df):plt.figure(figsize =(10,4))plt.plot((df.T+1).cumprod().ix[:,0], label = 'port1')plt.plot((df.T+1).cumprod().ix[:,1], label = 'port2')plt.plot((df.T+1).cumprod().ix[:,2], label = 'port3')plt.plot((df.T+1).cumprod().ix[:,3], label = 'port4')plt.plot((df.T+1).cumprod().ix[:,4], label = 'port5')plt.plot((df.T+1).cumprod().ix[:,5], label = 'benchmark')plt.xlabel('return of factor %s'%fac)plt.legend(loc=0)for fac in effective_factors:draw_return_picture(monthly_return[:,:,fac])
有些因子,因为内在的逻辑比较相近等原因,选出来的组合在个股构成和收益等方面相关性较高。所以要对这些因子做冗余剔除,保留同类因子中收益最好、区分度最高的因子。具体步骤:
(1)对不同因子的n个组合打分。收益越大分值越大。分值达到好将分值赋给每月该组合内的所有个股。
if AR1 > ARn #因子越小,收益越大
则组合i的分值为(n-i+1)
if AR1 < ARn #因子越小,收益越小
则组合i的分值为i
(2)按月计算个股不同因子得分的相关性矩阵。得到第t月个股的因子得分相关性矩阵Score_Corrt,u,v。u,v为因子序号。
(3)计算样本期内相关性矩阵的平均值。即样本期共m个月,加总矩阵后取1/m。
(4)设定得分相关性阀值MinScoreCorr。只保留与其他因子相关性较小的因子。
根据选好的有效因子,每月初对市场个股计算因子得分,按一定权重求得所有因子的平均分。如遇因子当月无取值时,按剩下的因子分值求加权平均。通过对个股的加权平均得分进行排序,选择排名靠前的股票交易。
以下代码段等权重对因子分值求和,选出分值最高的股票进行交易。
def score_stock(fdate):#CMV,FAP,PEG三个因子越小收益越大,分值越大,应降序排;B/M,P/R越大收益越大应顺序排effective_factors = {'B/M':True,'PEG':False,'P/R':True,'FAP':False,'CMV':False}fdf = get_factors(fdate)score = {}for fac,value in effective_factors.items():score[fac] = fdf[fac].rank(ascending = value,method = 'first')print DataFrame(score).T.sum().order(ascending = False).head(5)score_stock = list(DataFrame(score).T.sum().order(ascending = False).index)return score_stock,fdf['CMV']def get_factors(fdate):factors = ['B/M','PEG','P/R','FAP','CMV']stock_set = get_index_stocks('000001.XSHG',fdate)q = query(valuation.code,balance.total_owner_equities/valuation.market_cap/100000000,valuation.pe_ratio,income.net_profit/income.operating_revenue,balance.fixed_assets/balance.total_assets,valuation.circulating_market_cap).filter(valuation.code.in_(stock_set))fdf = get_fundamentals(q,date = fdate)fdf.index = fdf['code']fdf.columns = ['code'] + factorsreturn fdf.iloc[:,-5:][score_result,CMV] = score_stock('2016-01-01')
code 600382.XSHG 4274 600638.XSHG 4224 600291.XSHG 4092 600791.XSHG 4078 600284.XSHG 4031 dtype: float64
year = ['2009','2010','2011','2012','2013','2014','2015']month = ['01','02','03','04','05','06','07','08','09','10','11','12']factors = ['B/M','PEG','P/R','FAP','CMV']result = {}for i in range(7*12):startdate = year[i/12] + '-' + month[i%12] + '-01'try:enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'except IndexError:enddate = '2016-01-01'try:nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'except IndexError:if enddate == '2016-01-01':nextdate = '2016-02-01'else:nextdate = '2016-01-01'print 'time %s'%startdate#综合5个因子打分后,划分几个组合df = DataFrame(np.zeros(7),index = ['Top20','port1','port2','port3','port4','port5','benchmark'])[score,CMV] = score_stock(startdate)port0 = score[:20]port1 = score[: len(score)/5]port2 = score[ len(score)/5+1: 2*len(score)/5]port3 = score[ 2*len(score)/5+1: -2*len(score)/5]port4 = score[ -2*len(score)/5+1: -len(score)/5]port5 = score[ -len(score)/5+1: ]print len(score)df.ix['Top20'] = caculate_port_monthly_return(port1,startdate,enddate,nextdate,CMV)df.ix['port1'] = caculate_port_monthly_return(port1,startdate,enddate,nextdate,CMV)df.ix['port2'] = caculate_port_monthly_return(port2,startdate,enddate,nextdate,CMV)df.ix['port3'] = caculate_port_monthly_return(port3,startdate,enddate,nextdate,CMV)df.ix['port4'] = caculate_port_monthly_return(port4,startdate,enddate,nextdate,CMV)df.ix['port5'] = caculate_port_monthly_return(port5,startdate,enddate,nextdate,CMV)df.ix['benchmark'] = caculate_benchmark_monthly_return(startdate,enddate,nextdate)result[i+1]=dfbacktest_results = pd.DataFrame(result)
随着模型使用人数的增加,有的因子会逐渐失效,也可能出现一些新的因素需要加入到因子库中。同时,各因子的权重设计有进一步改进空间。模型本身需要做持续的再评价,并不断改进来适应市场的变化。
这篇研究作为纯粹的个人兴趣,按照书中的方法复现了一遍(书中使用数据2005年01-2010年12)。
验证有效的因子怎么组成策略,下一次再分享~
本社区仅针对特定人员开放
查看需注册登录并通过风险意识测评
5秒后跳转登录页面...
移动端课程