本文是对东方证券研报《质优股量化投资》的一小部分聚宽版实现,使用了部分优矿的代码,供大家参考。
解释下代码中的因子:
盈利因子:
GPOA:毛利润除以总资产
GPM:毛利润
ROE:净资产收益率
ROA:总资产收益率
INC:净资产收益率(扣除非经常损益)
EPS:每股收益
成长因子:
ITRYOY:营业总收入同比增长率
IRYOY:营业收入同比增长率
IOPYOY:营业利润同比增长率
INPYOY:净利润同比增长率
INPTSYOY:归属母公司股东的净利润同比增长率
估值因子:
PB:市净率
PE:市盈率
PS:市销率
合成后的因子表现:
选择十个股票的回测结果:
增加止损后的回测结果:
结论就是,价值投资还是挺给力的。
详细的分析,大家看研报吧。
需要注意的是,研究是在金融终端运行后上传的,在网页端可能内存不够。
回测的选股数量是前10%:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as st
import time
#from CAL.PyCAL import *
from datetime import datetime, timedelta
from scipy.stats import ttest_ind
from multiprocessing.dummy import Pool as ThreadPool
import statsmodels.api as sm
from pandas import DataFrame,Series
import jqdata
sns.set_style('whitegrid')
import matplotlib as mpl
mpl.rcParams['font.family']='serif'
mpl.rcParams['font.serif']='SimHei'
mpl.rcParams['axes.unicode_minus']=False # 处理负号问题
import warnings
warnings.filterwarnings("ignore")
D:\Program Files (x86)\JoinQuant-Desktop\Python27\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead. from pandas.core import datetools
TTM_factors = []
earningFactors = ['GPOA','GPM','ROE','ROA','INC','EPS']
growthFactors = ['ITRYOY','IRYOY','IOPYOY','INPYOY','INPTSYOY']
valueFactors = ['PE','PB','PS']
other = []
factors = earningFactors + growthFactors + valueFactors + other
fac_dict = {
'MC':valuation.market_cap, # 总市值
'GP':indicator.gross_profit_margin * income.operating_revenue, # 毛利润
'OP':income.operating_profit,
'OR':income.operating_revenue, # 营业收入
'NP':income.net_profit, # 净利润
'EV':valuation.market_cap + balance.shortterm_loan+balance.non_current_liability_in_one_year+balance.longterm_loan+balance.bonds_payable+balance.longterm_account_payable - cash_flow.cash_and_equivalents_at_end,
'TOE':balance.total_owner_equities, # 股东权益合计(元)
'TOR':income.total_operating_revenue, # 营业总收入
'EBIT':income.net_profit+income.financial_expense+income.income_tax_expense,
'GPOA':indicator.gross_profit_margin * income.operating_revenue / balance.total_assets, #毛利润 / 总资产 = 毛利率*营业收入 / 总资产
'GPM':indicator.gross_profit_margin, # 毛利率
'OPM':income.operating_profit / income.operating_revenue, #营业利润率
'NPM':indicator.net_profit_margin, # 净利率
'ROA':indicator.roa, # ROA
'ROE':indicator.roe, # ROE
'INC':indicator.inc_return, # 净资产收益率(扣除非经常损益)(%)
'EPS':indicator.eps, # 净资产收益率(扣除非经常损益)(%)
'AP':indicator.adjusted_profit, # 扣除非经常损益后的净利润(元)
'OP':indicator.operating_profit, # 经营活动净收益(元)
'VCP':indicator.value_change_profit, # 价值变动净收益(元) = 公允价值变动净收益+投资净收益+汇兑净收益
'ETTR':indicator.expense_to_total_revenue, # 营业总成本/营业总收入(%)
'OPTTR':indicator.operation_profit_to_total_revenue, # 营业利润/营业总收入(%)
'NPTTR':indicator.net_profit_to_total_revenue, # 净利润/营业总收入(%)
'OETTR':indicator.operating_expense_to_total_revenue, # 营业费用/营业总收入
'GETTR':indicator.ga_expense_to_total_revenue, # 管理费用/营业总收入(%)
'FETTR':indicator.financing_expense_to_total_revenue, # 财务费用/营业总收入(%)
'OPTP':indicator.operating_profit_to_profit, # 经营活动净收益/利润总额(%)
'IPTP':indicator.invesment_profit_to_profit, # 价值变动净收益/利润总额(%)
'GSASTR':indicator.goods_sale_and_service_to_revenue, # 销售商品提供劳务收到的现金/营业收入(%)
'OTR':indicator.ocf_to_revenue, # 经营活动产生的现金流量净额/营业收入(%)
'OTOP':indicator.ocf_to_operating_profit, # 经营活动产生的现金流量净额/经营活动净收益(%)
'ITRYOY':indicator.inc_total_revenue_year_on_year, # 营业总收入同比增长率(%)
'ITRA':indicator.inc_total_revenue_annual, # 营业总收入环比增长率(%)
'IRYOY':indicator.inc_revenue_year_on_year, # 营业收入同比增长率(%)
'IRA':indicator.inc_revenue_annual, # 营业收入环比增长率(%)
'IOPYOY':indicator.inc_operation_profit_year_on_year, # 营业利润同比增长率(%)
'IOPA':indicator.inc_operation_profit_annual, # 营业利润环比增长率(%)
'INPYOY':indicator.inc_net_profit_year_on_year, # 净利润同比增长率(%)
'INPA':indicator.inc_net_profit_annual, # 净利润环比增长率(%)
'INPTSYOY':indicator.inc_net_profit_to_shareholders_year_on_year, # 归属母公司股东的净利润同比增长率(%)
'INPTSA':indicator.inc_net_profit_to_shareholders_annual, # 归属母公司股东的净利润环比增长率(%)
'INPTSA':indicator.inc_net_profit_to_shareholders_annual, # 归属母公司股东的净利润环比增长率(%)
'ROIC':(income.net_profit+income.financial_expense+income.income_tax_expense)/(balance.total_owner_equities+balance.shortterm_loan+balance.non_current_liability_in_one_year+balance.longterm_loan+balance.bonds_payable+balance.longterm_account_payable),
'OPTT':income.operating_profit / income.total_profit, # 营业利润占比
'NP':income.net_profit, # 净利润
'TA':balance.total_assets, # 总资产
'DER':balance.total_liability / balance.equities_parent_company_owners, # 产权比率 = 负债合计/归属母公司所有者权益合计
'FCFF/TNCL':(cash_flow.net_operate_cash_flow - cash_flow.net_invest_cash_flow) / balance.total_non_current_liability, #自由现金流比非流动负债
'NOCF/TL': cash_flow.net_operate_cash_flow / balance.total_liability, # 经营活动产生的现金流量净额/负债合计
'TCA/TCL':balance.total_current_assets / balance.total_current_liability, # 流动比率
'PE':valuation.pe_ratio, # PE 市盈率
'PB':valuation.pb_ratio, # PB 市净率
'PR':valuation.pcf_ratio, # PR 市现率
'PS':valuation.ps_ratio # PS 市销率
}
index = 'all'
start = '2006-09-02'
end = '2018-01-10'
interval = 20
date_list = list(jqdata.get_trade_days(start_date = start, end_date = end))
#date_str_list = map(lambda x:x.strftime("%Y-%m-%d"),date_list)
# 需要回测的日期
trade_date_list = filter(lambda x:date_list.index(x) % interval == 0, date_list)
#date_str_list = map(lambda x:x.strftime("%Y-%m-%d"),trade_date_list)
date_list_to_backPro = filter(lambda x:x.strftime("%Y-%m-%d") > '2007-01-01',trade_date_list)
len(date_list_to_backPro)
135
def get_factor_by_day(tdate):
'''
根据日期,获取当天的因子值
tdate:str,'YYYYMMDD'格式
'''
cnt = 0
while True:
try:
x = get_all_factors(tdate, factors, TTM_factors, fac_dict, index)
return x
except Exception as e:
cnt += 1
if cnt >= 3:
print('error get factor data: ', tdate)
break
'''tdate = trade_date_list[0]
get_factor_by_day(tdate)'''
'tdate = trade_date_list[0] \nget_factor_by_day(tdate)'
# 得到因子数据
def get_all_factors(fdate, factors, TTM_factors, fac_dict, index):
if index == 'all':
stock_list = get_all_securities(types=['stock'], date=fdate).index.tolist()
else:
stock_list = get_index_stocks(index, date=fdate)
# factor
q = query(valuation.code) # 股票代码
for fac in factors:
q = q.add_column(fac_dict[fac])
q.filter(valuation.code.in_(stock_list))
fdf = get_fundamentals(q, date=fdate)
fdf.index = fdf['code']
fdf.columns = ['code'] + factors
# TTM_factors
# 年
if type(fdate) == str:
date = fdate
else:
date = fdate.strftime('%Y-%m-%d')
year = int(date[:4])
# 月日
month_and_day = date[5:10]
# 季度列表
if month_and_day < '05-01':
statDate_list = [str(year-2) + "q4", str(year-1) + "q1", str(year-1) + "q2", str(year-1) + "q3"]
elif month_and_day >= '05-01' and month_and_day < '09-01':
statDate_list = [str(year-1) + "q2", str(year-1) + "q3", str(year-1) + "q4", str(year) + "q1"]
elif month_and_day >= '09-01' and month_and_day < '11-01':
statDate_list = [str(year-1) + "q3", str(year-1) + "q4", str(year) + "q1", str(year) + "q2"]
elif month_and_day >= '11-01':
statDate_list = [str(year-1) + "q4", str(year) + "q1", str(year) + "q2", str(year) + "q3"]
q = query(valuation.code) # 股票代码
for fac in TTM_factors:
q = q.add_column(fac_dict[fac])
q.filter(valuation.code.in_(stock_list))
TTM_fdf = ''
for statDate in statDate_list:
if type(TTM_fdf) == str:
df = get_fundamentals(q, statDate=statDate)
df.index = df['code']
TTM_fdf = df
else:
df = get_fundamentals(q, statDate=statDate)
df.index = df['code']
TTM_fdf.iloc[:,1:] += df.iloc[:,1:]
TTM_fdf.columns = ['code'] + TTM_factors
fdf=fdf.merge(TTM_fdf, on=['code'], how='inner')
#fdf.index = fdf['code']
fdf['tradeDate'] = fdate
fdf = fdf[['code','tradeDate'] + TTM_factors + factors]
# 行:选择全部,列,返回除了股票代码所有因子
fdf.index.name = ''
fdf.index = range(len(fdf))
fdf = fdf.sort_index(by='code')
return fdf#.iloc[:,1:]
'''
fdate = '2017-03-01'
fdate = trade_date_list[0]
index = 'all'
df = get_all_factors(fdate, factors, [], fac_dict, index)
#df.index = df['code']
#df.ix[['000001.XSHE','601939.XSHG','601988.XSHG']]
df'''
"\nfdate = '2017-03-01'\nfdate = trade_date_list[0]\nindex = 'all'\ndf = get_all_factors(fdate, factors, [], fac_dict, index)\n#df.index = df['code']\n#df.ix[['000001.XSHE','601939.XSHG','601988.XSHG']]\ndf"
def get_easy_factor_report(factor, month_return, direction):
"""
获得简单的因子分析报告,注意后面的分析会剔除金融行业。
在输入的month_return中,索引应该和factor保持一致,
输入:
factor:DataFrame,index为日期,columns为股票代码,value为因子值
month_return:DataFrame,index为日期,columns为股票代码,value为股票收益率。month_return
返回:
DataFrame:记录中性化前因子在不同域的IC,IC_IR,pValue,以及中性化后因子在不同域的IC,IC_IR,以及不同域的多空表现
"""
columns = filter(lambda x:x not in finance,factor.columns)
factor_hs300 = get_universe_factor(factor, univ=univ_hs300).loc[:, columns]
factor_zz500 = get_universe_factor(factor, univ=univ_zz500).loc[:, columns]
factor_hs300_neu = pretreat_factor(factor_hs300)
factor_zz500_neu = pretreat_factor(factor_zz500)
factor_a_neu = pretreat_factor(factor)
# 中性化前因子分析
rank_ic_hs300 = get_rank_ic(factor_hs300, month_return)
rank_ic_zz500 = get_rank_ic(factor_zz500, month_return)
rank_ic_a = get_rank_ic(factor, month_return)
rank_ic_hs300_mean = rank_ic_hs300['IC'].mean()
rank_ic_zz500_mean = rank_ic_zz500['IC'].mean()
rank_ic_a_mean = rank_ic_a['IC'].mean()
rank_ic_hs300_pvalue = ttest_ind(rank_ic_hs300['IC'].dropna().tolist(), [0] * len(rank_ic_hs300.dropna()))[1]
rank_ic_zz500_pvalue = ttest_ind(rank_ic_zz500['IC'].dropna().tolist(), [0] * len(rank_ic_zz500.dropna()))[1]
rank_ic_a_pvalue = ttest_ind(rank_ic_a['IC'].dropna().tolist(), [0] * len(rank_ic_a.dropna()))[1]
rank_ic_ir_hs300 = rank_ic_hs300['IC'].mean() / rank_ic_hs300['IC'].std()
rank_ic_ir_zz500 = rank_ic_zz500['IC'].mean() / rank_ic_zz500['IC'].std()
rank_ic_ir_a = rank_ic_a['IC'].mean() / rank_ic_a['IC'].std()
# 中性化后因子分析
rank_ic_neu_hs300 = get_rank_ic(factor_hs300_neu, month_return)
rank_ic_neu_zz500 = get_rank_ic(factor_zz500_neu, month_return)
rank_ic_neu_a = get_rank_ic(factor_a_neu, month_return)
rank_ic_hs300_neu_pvalue = ttest_ind(rank_ic_neu_hs300['IC'].dropna().tolist(), [0] * len(rank_ic_neu_hs300['IC'].dropna()))[1]
rank_ic_zz500_neu_pvalue = ttest_ind(rank_ic_neu_zz500['IC'].dropna().tolist(), [0] * len(rank_ic_neu_zz500['IC'].dropna()))[1]
rank_ic_a_neu_pvalue = ttest_ind(rank_ic_neu_a['IC'].dropna().tolist(), [0] * len(rank_ic_neu_a['IC'].dropna()))[1]
rank_ic_neu_hs300_mean = rank_ic_neu_hs300['IC'].mean()
rank_ic_neu_zz500_mean = rank_ic_neu_zz500['IC'].mean()
rank_ic_neu_a_mean = rank_ic_neu_a['IC'].mean()
rank_ic_ir_neu_hs300 = rank_ic_neu_hs300['IC'].mean() / rank_ic_neu_hs300['IC'].std()
rank_ic_ir_neu_zz500 = rank_ic_neu_zz500['IC'].mean() / rank_ic_neu_zz500['IC'].std()
rank_ic_ir_neu_a = rank_ic_neu_a['IC'].mean() / rank_ic_neu_a['IC'].std()
hs300_excess_returns = get_group_ret(factor_hs300_neu, month_return, n_quantile=10)
zz500_excess_returns = get_group_ret(factor_zz500_neu, month_return, n_quantile=10)
a_excess_returns = get_group_ret(factor_a_neu, month_return, n_quantile=10)
hs300_long_short_ret = (hs300_excess_returns.iloc[:, np.sign(direction-1)] -
hs300_excess_returns.iloc[:, -np.sign(direction+1)]).fillna(0)
zz500_long_short_ret = (zz500_excess_returns.iloc[:, np.sign(direction-1)] - zz500_excess_returns.iloc[:, -np.sign(direction+1)]).fillna(0)
a_long_short_ret = (a_excess_returns.iloc[:, np.sign(direction-1)] - a_excess_returns.iloc[:, -np.sign(direction+1)]).fillna(0)
hs300_long_short_month_ret = hs300_long_short_ret.mean()
zz500_long_short_month_ret = zz500_long_short_ret.mean()
a_long_short_month_ret = a_long_short_ret.mean()
hs300_long_short_win_ratio = float(len(hs300_long_short_ret[hs300_long_short_ret > 0])) / len(hs300_long_short_ret)
zz500_long_short_win_ratio = float(len(zz500_long_short_ret[zz500_long_short_ret > 0])) / len(zz500_long_short_ret)
a_long_short_win_ratio = float(len(a_long_short_ret[a_long_short_ret > 0])) / len(a_long_short_ret)
hs300_long_short_sharp_ratio = hs300_long_short_ret.mean() / hs300_long_short_ret.std()
zz500_long_short_sharp_ratio = zz500_long_short_ret.mean() / zz500_long_short_ret.std()
a_long_short_sharp_ratio = a_long_short_ret.mean() / a_long_short_ret.std()
# 最大回撤
hs300_long_short_max_drawdown = max([1 - v/max(1, max((hs300_long_short_ret+1).cumprod()[:i+1])) for i,v in enumerate((hs300_long_short_ret+1).cumprod())])
zz500_long_short_max_drawdown = max([1 - v/max(1, max((zz500_long_short_ret+1).cumprod()[:i+1])) for i,v in enumerate((zz500_long_short_ret+1).cumprod())])
a_long_short_max_drawdown = max([1 - v/max(1, max((a_long_short_ret+1).cumprod()[:i+1])) for i,v in enumerate((a_long_short_ret+1).cumprod())])
# 结果汇总
report = pd.DataFrame(index=['沪深300', '中证500', '全A'],
columns=[['原始因子', '原始因子', '原始因子', '行业和市值中性化后因子', '行业和市值中性化后因子','行业和市值中性化后因子',
'行业和市值中性化后因子','行业和市值中性化后因子','行业和市值中性化后因子', '行业和市值中性化后因子'],
['IC', 'IC_IR', 'pvalue', 'IC', 'IC_IR', 'pvalue', '多空组合月度收益', '胜率', '最大回撤', '夏普比率']])
report.iloc[:, 0] = [rank_ic_hs300_mean, rank_ic_zz500_mean, rank_ic_a_mean]
report.iloc[:, 1] = [rank_ic_ir_hs300, rank_ic_ir_zz500, rank_ic_ir_a]
report.iloc[:, 2] = [rank_ic_hs300_pvalue, rank_ic_zz500_pvalue, rank_ic_a_pvalue]
report.iloc[:, 3] = [rank_ic_neu_hs300_mean, rank_ic_neu_zz500_mean, rank_ic_neu_a_mean]
report.iloc[:, 4] = [rank_ic_ir_neu_hs300, rank_ic_ir_neu_zz500, rank_ic_ir_neu_a]
report.iloc[:, 5] = [rank_ic_hs300_neu_pvalue, rank_ic_zz500_neu_pvalue, rank_ic_a_neu_pvalue]
report.iloc[:, 6] = [hs300_long_short_month_ret, zz500_long_short_month_ret, a_long_short_month_ret]
report.iloc[:, 7] = [hs300_long_short_win_ratio, zz500_long_short_win_ratio, a_long_short_win_ratio]
report.iloc[:, 8] = [hs300_long_short_max_drawdown, zz500_long_short_max_drawdown, a_long_short_max_drawdown]
report.iloc[:, 9] = [hs300_long_short_sharp_ratio, zz500_long_short_sharp_ratio, a_long_short_sharp_ratio]
return report
def group_mean_report_plot(group_return, direction=1):
"""
分组收益绘图
group_return:分组收益,columns为分组序号,index为日期,值为每个调仓周期的组合收益率。可由函数get_group_ret产生
"""
fig = plt.figure(figsize=(12, 8))
ax1 = fig.add_subplot(212)
ax2 = ax1.twinx()
ax3 = fig.add_subplot(211)
ax2.grid(False)
month_return = (group_return.iloc[:, np.sign(direction-1)] - group_return.iloc[:, -np.sign(direction+1)]).fillna(0)
ax1.bar(pd.to_datetime(month_return.index), month_return.values)
ax2.plot(pd.to_datetime(month_return.index), (month_return.values+1).cumprod(), color='r')
ax1.set_title(u"因子在中证全指(扣除金融)的表现", fontsize=16)
excess_returns_means_dist = group_return.mean()
excess_dist_plus = excess_returns_means_dist[excess_returns_means_dist>0]
excess_dist_minus = excess_returns_means_dist[excess_returns_means_dist<0]
lns2 = ax3.bar(excess_dist_plus.index, excess_dist_plus.values, align='center', color='r', width=0.35)
lns3 = ax3.bar(excess_dist_minus.index, excess_dist_minus.values, align='center', color='g', width=0.35)
ax3.set_xlim(left=0.5, right=len(excess_returns_means_dist)+0.5)
ax3.set_xticks(excess_returns_means_dist.index)
ax3.set_title(u"因子分组超额收益", fontsize=16)
ax3.grid(True)
def get_rank_ic(factor, forward_return):
"""
计算因子的信息系数
输入:
factor:DataFrame,index为日期,columns为股票代码,value为因子值
forward_return:DataFrame,index为日期,columns为股票代码,value为下一期的股票收益率
返回:
DataFrame:index为日期,columns为IC,IC t检验的pvalue
注意:factor与forward_return的index及columns应保持一致
"""
common_index = factor.index.intersection(forward_return.index)
ic_data = pd.DataFrame(index=common_index, columns=['IC','pValue'])
# 计算相关系数
for dt in ic_data.index:
tmp_factor = factor.ix[dt]
tmp_ret = forward_return.ix[dt]
cor = pd.DataFrame(tmp_factor)
ret = pd.DataFrame(tmp_ret)
cor.columns = ['corr']
ret.columns = ['ret']
cor['ret'] = ret['ret']
cor = cor[~pd.isnull(cor['corr'])][~pd.isnull(cor['ret'])]
if len(cor) < 5:
continue
ic, p_value = st.spearmanr(cor['corr'], cor['ret']) # 计算秩相关系数RankIC
ic_data['IC'][dt] = ic
ic_data['pValue'][dt] = p_value
return ic_data
def get_group_ret(factor, month_ret, n_quantile=10):
"""
计算分组超额收益:组合构建方式为等权,基准也为等权.
注意:month_ret和factor应该错开一期,也就是说,month_ret要比factor晚一期
输入:
factor:DataFrame,index为日期,columns为股票代码,value为因子值
month_ret:DataFrame,index为日期,columns为股票代码,value为收益率,month_ret的日期频率应和factor保持一致
n_quantile:int,分组数量
返回:
DataFrame:列为分组序号,index为日期,值为每个调仓周期的组合收益率
"""
# 统计分位数
cols_mean = [i+1 for i in range(n_quantile)]
cols = cols_mean
excess_returns_means = pd.DataFrame(index=month_ret.index[:len(factor+1)], columns=cols)
# 计算因子分组的超额收益平均值
for t, dt in enumerate(excess_returns_means.index):
qt_mean_results = []
# ILLIQ去掉nan
tmp_factor = factor.loc[dt].dropna()
tmp_return = month_ret.loc[dt].dropna()
tmp_return = tmp_return.loc[tmp_factor.index]
tmp_return_mean = tmp_return.mean()
pct_quantiles = 1.0 / n_quantile
for i in range(n_quantile):
down = tmp_factor.quantile(pct_quantiles*i)
up = tmp_factor.quantile(pct_quantiles*(i + 1))
i_quantile_index = tmp_factor[(tmp_factor <= up) & (tmp_factor >= down)].index
mean_tmp = tmp_return[i_quantile_index].mean() - tmp_return_mean
qt_mean_results.append(mean_tmp)
excess_returns_means.ix[t] = qt_mean_results
return excess_returns_means
# 去极值
def winsorize(se):
q = se.quantile([0.025, 0.975])
if isinstance(q, pd.Series) and len(q) == 2:
se[se < q.iloc[0]] = q.iloc[0]
se[se > q.iloc[1]] = q.iloc[1]
return se
# 标准化
def standardize(se):
mean = se.mean()
std = se.std()
se = (se - mean)/std
return se
# 中性化
def neutralize(factor_se, market_cap_se, concept_se):
stock_list = factor_se.index.tolist()
# 行业数据哑变量
groups = array(concept_se.ix[stock_list].tolist())
dummy = sm.categorical(groups, drop=True)
# 市值对数化
market_cap_log = np.log(market_cap_se.ix[stock_list].tolist())
# 自变量
X = np.c_[dummy,market_cap_log]
# 因变量
y = factor_se.ix[stock_list]
# 拟合
model = sm.OLS(y,X)
results = model.fit()
# 拟合结果
y_fitted = results.fittedvalues
neutralize_factor_se = factor_se - y_fitted
return neutralize_factor_se
def get_Atickers(date):
"""
给定日期,获取这一天上市时间不低于60天的股票(参照中证全指指数编制)
输入:
date: str, 'YYYYMMDD'格式
返回:
list: 元素为股票ticker
"""
date = '2018-04-16'
df = get_all_securities(types=['stock'], date=date)
daysBefore = jqdata.get_trade_days(end_date=date, count=60)[0]
df['60DaysBefore'] = daysBefore
df = (df[df['start_date'] < df['60DaysBefore']])
return df.index.tolist()
def pretreat_factor(factor_df, neu=True):
"""
因子处理函数
输入:
factor_df:DataFrame,index为日期,columns为股票代码,value为因子值
neu:Bool,是否进行行业+市值中性化,若为True,则进行去极值->中性化->标准化;若为否,则进行去极值->标准化
返回:
factor_df:DataFrame,处理之后的因子
"""
pretreat_data = factor_df.copy(deep=True)
for dt in pretreat_data.index:
concept_se = indu.ix[dt]
market_cap_se = mkt.ix[dt]
try:
factor_dt = pretreat_data.ix[dt].dropna()
if neu:
pretreat_data.ix[dt] = standardize(neutralize(winsorize(factor_dt),market_cap_se,concept_se))
else:
pretreat_data.ix[dt] = standardize(winsorize(factor_dt))
except Exception as excp:
print (dt)
print (excp)
continue
return pretreat_data
all_stocks = get_all_securities(types=['stock'], date=None).index.tolist()
def getPriceData(date):
price=get_price(all_stocks, start_date=date, end_date=date, frequency='1d',fields=['close'])['close']
return price
#getPriceData('2007-01-04')
def getStockIndustry(fdate):
stock_list = get_all_securities(types=['stock'], date=fdate).index.tolist()
industry_set = ['801010', '801020', '801030', '801040', '801050', '801080', '801110', '801120', '801130',
'801140', '801150', '801160', '801170', '801180', '801200', '801210', '801230', '801710',
'801720', '801730', '801740', '801750', '801760', '801770', '801780', '801790', '801880','801890']
df = pd.DataFrame(index = stock_list,columns = [fdate])
df.index.name = 'code'
for i in range(len(industry_set)):
industry = get_industry_stocks(industry_set[i], date = fdate)
industry = list(set(industry) & set(df.index.tolist()))
df[fdate].ix[industry] = industry_set[i]
return df.T
#getStockIndustry('2018-04-16')
def getStockMktValue(fdate):
# df = get_factors(fdate, ['MC'], fac_dict, index)
df = get_all_factors(fdate, ['MC'], [], fac_dict, index)
df = df.pivot(index='tradeDate', columns='code', values='MC')
return df
def get_universe_factor(factor, idx=None, univ=None):
"""
筛选出某指数成份股或者指定域内的因子值
输入:
factor:DataFrame,index为日期,columns为股票代码,value为因子值
idx:指数代码,000300:沪深300,000905:中证500,000985:中证全指
univ:DataFrame,index为日期,'YYYYMMDD'格式。columns为'code',value为股票代码
返回:
factor:DataFrame,指定域下的因子值,index为日期,columns为股票代码,value为因子值
"""
universe_factor = pd.DataFrame()
if idx is not None:
for date in factor.index:
universe = get_idx_cons(idx, date)
universe_factor = universe_factor.append(factor.loc[date, universe].to_frame(date).T)
else:
if univ is not None:
for date in factor.index:
universe = univ.loc[date, 'code'].tolist()
universe_factor = universe_factor.append(factor.loc[date, universe].to_frame(date).T)
else:
raise Exception('请指定成分股或域')
return universe_factor
def replace_nan_indu(factor):
"""缺失值填充函数,使用行业中位数进行填充
输入:
factor:DataFrame,index为日期,columns为股票代码,value为因子值
返回:
factor:格式保持不变,为填充后的因子
"""
fill_factor = pd.DataFrame()
for date in factor.index:
# 因子值
factor_array = factor.ix[date, :].to_frame('values')
# 行业值
indu_array = indu.ix[date, :].dropna().to_frame('industryName1')
# 合并
factor_array = factor_array.merge(indu_array, left_index=True, right_index=True, how='inner')
# 行业中值
mid = factor_array.groupby('industryName1').median()
factor_array = factor_array.merge(mid, left_on='industryName1', right_index=True, how='left')
# 行业中值填充缺失
factor_array['values_x'][pd.isnull(factor_array['values_x'])] = factor_array['values_y'][pd.isnull(factor_array['values_x'])]
# 将当前日期的因子数据追加到结果
fill_factor = fill_factor.append(factor_array['values_x'].to_frame(date).T)
return fill_factor
def get_idx_cons(idx, date):
"""
获取某天指数成分股ticker列表
输入:
idx:str,指数代码
date:str,'YYYY-MM-DD'格式
返回:
list:指数成份股的ticker
"""
universe_idx = get_index_stocks(idx, date=date)
universe_A = get_Atickers(date)
return list(set(universe_idx) & set(universe_A))
这部分我们主要获得后文分析所需要的一些数据,包括因子数据及股票的行情数据。
# 月度收益
print ('个股行情数据开始计算...')
pool = ThreadPool(processes=16)
frame_list = pool.map(getPriceData, trade_date_list)
pool.close()
pool.join()
price = pd.concat(frame_list, axis=0)
#month_return = price.pct_change()
month_return = price.pct_change().shift(-1)
print ('个股行情数据计算完成')
print ('---------------------')
"# \xe6\x9c\x88\xe5\xba\xa6\xe6\x94\xb6\xe7\x9b\x8a\nprint ('\xe4\xb8\xaa\xe8\x82\xa1\xe8\xa1\x8c\xe6\x83\x85\xe6\x95\xb0\xe6\x8d\xae\xe5\xbc\x80\xe5\xa7\x8b\xe8\xae\xa1\xe7\xae\x97...')\npool = ThreadPool(processes=16)\nframe_list = pool.map(getPriceData, trade_date_list)\npool.close()\npool.join()\nprice = pd.concat(frame_list, axis=0)\n#month_return = price.pct_change()\nmonth_return = price.pct_change().shift(-1)\n\nprint ('\xe4\xb8\xaa\xe8\x82\xa1\xe8\xa1\x8c\xe6\x83\x85\xe6\x95\xb0\xe6\x8d\xae\xe8\xae\xa1\xe7\xae\x97\xe5\xae\x8c\xe6\x88\x90')\nprint ('---------------------')"
print ('开始生成前文所定义的股票池...')
univ, univ_zz500, univ_hs300 = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
trade_date_list = month_return.index.tolist()
for date in trade_date_list:
current_universe = pd.Series(get_Atickers(date)).to_frame(name='code')
current_universe.index = [date] * len(current_universe)
univ = univ.append(current_universe)
current_hs300_universe = pd.Series(get_idx_cons('000300.XSHG', date)).to_frame(name='code')
current_hs300_universe.index = [date] * len(current_hs300_universe)
univ_hs300 = univ_hs300.append(current_hs300_universe)
current_zz500_universe = pd.Series(get_idx_cons('000905.XSHG', date)).to_frame(name='code')
current_zz500_universe.index = [date] * len(current_zz500_universe)
univ_zz500 = univ_zz500.append(current_zz500_universe)
print ('股票池生成结束')
print ('--------------------' )
"print ('\xe5\xbc\x80\xe5\xa7\x8b\xe7\x94\x9f\xe6\x88\x90\xe5\x89\x8d\xe6\x96\x87\xe6\x89\x80\xe5\xae\x9a\xe4\xb9\x89\xe7\x9a\x84\xe8\x82\xa1\xe7\xa5\xa8\xe6\xb1\xa0...')\nuniv, univ_zz500, univ_hs300 = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()\ntrade_date_list = month_return.index.tolist()\nfor date in trade_date_list:\n current_universe = pd.Series(get_Atickers(date)).to_frame(name='code')\n current_universe.index = [date] * len(current_universe)\n univ = univ.append(current_universe)\n \n current_hs300_universe = pd.Series(get_idx_cons('000300.XSHG', date)).to_frame(name='code')\n current_hs300_universe.index = [date] * len(current_hs300_universe)\n univ_hs300 = univ_hs300.append(current_hs300_universe)\n \n current_zz500_universe = pd.Series(get_idx_cons('000905.XSHG', date)).to_frame(name='code')\n current_zz500_universe.index = [date] * len(current_zz500_universe)\n univ_zz500 = univ_zz500.append(current_zz500_universe)\nprint ('\xe8\x82\xa1\xe7\xa5\xa8\xe6\xb1\xa0\xe7\x94\x9f\xe6\x88\x90\xe7\xbb\x93\xe6\x9d\x9f')\nprint ('--------------------' )"
print ('开始计算因子数据...')
pool = ThreadPool(processes=16)
frame_list = pool.map(get_factor_by_day, trade_date_list)
pool.close()
pool.join()
factor_csv = pd.concat(frame_list, axis=0)
factor_csv.reset_index(inplace=True, drop=True)
print ('因子数据计算完成')
print ('--------------------')
开始计算因子数据... 因子数据计算完成 --------------------
print ('开始计算行业数据...')
pool = ThreadPool(processes=16)
frame_list = pool.map(getStockIndustry, trade_date_list)
pool.close()
pool.join()
indu = pd.concat(frame_list, axis=0)
print ('行业数据计算完成')
print ('--------------------')
"print ('\xe5\xbc\x80\xe5\xa7\x8b\xe8\xae\xa1\xe7\xae\x97\xe8\xa1\x8c\xe4\xb8\x9a\xe6\x95\xb0\xe6\x8d\xae...')\npool = ThreadPool(processes=16)\nframe_list = pool.map(getStockIndustry, trade_date_list)\npool.close()\npool.join()\n\nindu = pd.concat(frame_list, axis=0)\nprint ('\xe8\xa1\x8c\xe4\xb8\x9a\xe6\x95\xb0\xe6\x8d\xae\xe8\xae\xa1\xe7\xae\x97\xe5\xae\x8c\xe6\x88\x90')\nprint ('--------------------')"
print ('开始计算市值数据...')
pool = ThreadPool(processes=16)
frame_list = pool.map(getStockMktValue, trade_date_list)
pool.close()
pool.join()
mkt = pd.concat(frame_list, axis=0)
print ('市值数据计算完成')
print ('--------------------')
"print ('\xe5\xbc\x80\xe5\xa7\x8b\xe8\xae\xa1\xe7\xae\x97\xe5\xb8\x82\xe5\x80\xbc\xe6\x95\xb0\xe6\x8d\xae...')\npool = ThreadPool(processes=16)\nframe_list = pool.map(getStockMktValue, trade_date_list)\npool.close()\npool.join()\n\nmkt = pd.concat(frame_list, axis=0)\nprint ('\xe5\xb8\x82\xe5\x80\xbc\xe6\x95\xb0\xe6\x8d\xae\xe8\xae\xa1\xe7\xae\x97\xe5\xae\x8c\xe6\x88\x90')\nprint ('--------------------')"
# 找到金融类(银行,非银金融)股票,便于后面进行剔除
finance = indu.iloc[-1, :]
finance = finance[finance.isin(['801780', '801790'])].index
factor_csv = factor_csv.drop_duplicates(subset=['code','tradeDate'], keep='first', inplace=False)
startDate = filter(lambda x:x.strftime("%Y-%m-%d") >= '2007-08-30',date_list_to_backPro)[0]
endDate = date_list_to_backPro[-2]
startDate,endDate
(datetime.date(2007, 8, 30), datetime.date(2017, 12, 8))
class FactorWeight():
def __init__(self):
pass
@staticmethod
def weighted(factor_dict, factor_weight):
"""
用于因子合成的函数。因子之间需要对齐,因子和其对应的权重也应进行对齐
输入:
factor_dict:列表,用于存储因子,key为因子名,值为DataFrame(index为日期,columns为股票代码)
factor_weight:因子权重,用于对因子进行配权,为DataFrame,index为日期,列对应着因子名称,值为当期因子的权重
返回:
DataFrame:最终合成后的因子
"""
weighted_factor = 0
for factor_name, factor in factor_dict.items():
weighted_factor += factor.multiply(factor_weight[factor_name], axis=0)
return weighted_factor
@staticmethod
def equal_weight(factor_dict):
factor_weight = pd.Series([1. / len(factor_dict)] * len(factor_dict), index=factor_dict.keys()).to_dict()
weighted_factor = FactorWeight.weighted(factor_dict, factor_weight)
return weighted_factor
@staticmethod
def ic_weight(factor_dict, forward_month_return, window):
# 获得IC序列
all_rolling_ic_list = []
for factor_name, factor in factor_dict.items():
ic = get_rank_ic(factor, forward_month_return)['IC']
# 计算得到当前因子的IC
ic = pd.rolling_mean(ic, window=window)
ic = ic.shift(1)
ic.name = factor_name
all_rolling_ic_list.append(ic)
# 合并成一个DataFrame
all_rolling_ic_df = pd.concat(all_rolling_ic_list, axis=1)
all_rolling_ic_df = all_rolling_ic_df.divide(all_rolling_ic_df.sum(axis=1), axis=0)
# 因子汇总
weighted_factor = FactorWeight.weighted(factor_dict, all_rolling_ic_df)
return weighted_factor
@staticmethod
def ic_ir_weight(factor_dict, forward_month_return, window):
# 获得IC_IR序列
all_rolling_ic_ir_list = []
for factor_name, factor in factor_dict.items():
ic = get_rank_ic(factor, forward_month_return)['IC']
# 计算得到当前因子的IC_IR
ic_ir = pd.rolling_mean(ic, window=window) / pd.rolling_std(ic, window=window)
ic_ir = ic_ir.shift(1)
ic_ir.name = factor_name
all_rolling_ic_ir_list.append(ic_ir)
# 合并成一个DataFrame,并计算权重
all_rolling_ic_ir_df = pd.concat(all_rolling_ic_ir_list, axis=1)
all_rolling_ic_ir_df = all_rolling_ic_ir_df.divide(all_rolling_ic_ir_df.sum(axis=1), axis=0)
# 因子汇总
weighted_factor = FactorWeight.weighted(factor_dict, all_rolling_ic_ir_df)
return weighted_factor, all_rolling_ic_ir_df
factor_dict = {}
for facName in earningFactors:
factor = get_universe_factor(replace_nan_indu(factor_csv.pivot(index='tradeDate', columns='code', values=facName)),univ=univ)
factor_neu = pretreat_factor(factor, neu=True)
factor_dict[facName] = factor_neu
earning_factor_compose, weight_df = FactorWeight().ic_ir_weight(factor_dict, month_return, 12)
#日期转换
factor = earning_factor_compose
date_str_list = map(lambda x:x.strftime('%Y-%m-%d'),factor.index)
factor.index = map(lambda x:datetime.strptime(x, '%Y-%m-%d'),date_str_list)
columns = filter(lambda x:x not in finance,factor.columns)
factor_report = get_easy_factor_report(factor.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], -1)
factor_report
原始因子 | 行业和市值中性化后因子 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
IC | IC_IR | pvalue | IC | IC_IR | pvalue | 多空组合月度收益 | 胜率 | 最大回撤 | 夏普比率 | |
沪深300 | 0.051911 | 0.381916 | 2.587632e-05 | 0.036814 | 0.335032 | 2.110475e-04 | 0.011124 | 0.611111 | 0.241203 | 0.296951 |
中证500 | 0.049733 | 0.463550 | 4.081009e-07 | 0.057505 | 0.674788 | 6.984649e-13 | 0.013194 | 0.666667 | 0.187564 | 0.407292 |
全A | 0.044062 | 0.569011 | 8.186156e-10 | 0.044315 | 0.578099 | 4.600589e-10 | 0.009133 | 0.642857 | 0.159361 | 0.330115 |
facotr_neu = pretreat_factor(factor.ix[startDate:endDate, columns], neu=False)
factor_neu_excess_returns = get_group_ret(facotr_neu.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], 10)
ax = group_mean_report_plot(factor_neu_excess_returns, -1)
## 成长因子
factor_dict = {}
for facName in growthFactors:
factor = get_universe_factor(replace_nan_indu(factor_csv.pivot(index='tradeDate', columns='code', values=facName)),univ=univ)
factor_neu = pretreat_factor(factor, neu=True)
factor_dict[facName] = factor_neu
growth_factor_compose, weight_df = FactorWeight().ic_ir_weight(factor_dict, month_return, 12)
#日期转换
factor = growth_factor_compose
date_str_list = map(lambda x:x.strftime('%Y-%m-%d'),factor.index)
factor.index = map(lambda x:datetime.strptime(x, '%Y-%m-%d'),date_str_list)
columns = filter(lambda x:x not in finance,factor.columns)
factor_report = get_easy_factor_report(factor.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], -1)
factor_report
原始因子 | 行业和市值中性化后因子 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
IC | IC_IR | pvalue | IC | IC_IR | pvalue | 多空组合月度收益 | 胜率 | 最大回撤 | 夏普比率 | |
沪深300 | 0.033818 | 0.375935 | 3.422561e-05 | 0.027904 | 0.375405 | 3.507752e-05 | 0.008791 | 0.595238 | 0.185598 | 0.264244 |
中证500 | 0.032365 | 0.407479 | 7.537701e-06 | 0.041226 | 0.637598 | 9.135114e-12 | 0.011359 | 0.650794 | 0.084877 | 0.426981 |
全A | 0.033879 | 0.607347 | 6.911809e-11 | 0.033213 | 0.608571 | 6.376321e-11 | 0.008085 | 0.611111 | 0.056928 | 0.355129 |
facotr_neu = pretreat_factor(factor.ix[startDate:endDate, columns], neu=False)
factor_neu_excess_returns = get_group_ret(facotr_neu.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], 10)
ax = group_mean_report_plot(factor_neu_excess_returns, -1)
quality_factor_compose = FactorWeight.equal_weight({'earning':earning_factor_compose, 'growth':growth_factor_compose})
"## \xe8\xb4\xa8\xe9\x87\x8f\xe5\x9b\xa0\xe5\xad\x90\nfactor_dict = {}\nfor facName in (earningFactors + growthFactors):\n factor = get_universe_factor(replace_nan_indu(factor_csv.pivot(index='tradeDate', columns='code', values=facName)),univ=univ)\n factor_neu = pretreat_factor(factor, neu=True)\n factor_dict[facName] = factor_neu\n \nquality_factor_compose, weight_df = FactorWeight().ic_ir_weight(factor_dict, month_return, 12)"
#日期转换
factor = quality_factor_compose
date_str_list = map(lambda x:x.strftime('%Y-%m-%d'),factor.index)
factor.index = map(lambda x:datetime.strptime(x, '%Y-%m-%d'),date_str_list)
columns = filter(lambda x:x not in finance,factor.columns)
factor_report = get_easy_factor_report(factor.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], -1)
factor_report
原始因子 | 行业和市值中性化后因子 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
IC | IC_IR | pvalue | IC | IC_IR | pvalue | 多空组合月度收益 | 胜率 | 最大回撤 | 夏普比率 | |
沪深300 | 0.055540 | 0.424996 | 3.125175e-06 | 0.042408 | 0.405978 | 8.117729e-06 | 0.008981 | 0.642857 | 0.228634 | 0.286641 |
中证500 | 0.048700 | 0.472505 | 2.496071e-07 | 0.058568 | 0.727632 | 1.564002e-14 | 0.013130 | 0.626984 | 0.085730 | 0.429916 |
全A | 0.044597 | 0.613959 | 4.465081e-11 | 0.044487 | 0.621634 | 2.678478e-11 | 0.009650 | 0.666667 | 0.114691 | 0.376853 |
facotr_neu = pretreat_factor(factor.ix[startDate:endDate, columns], neu=False)
factor_neu_excess_returns = get_group_ret(facotr_neu.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], 10)
ax = group_mean_report_plot(factor_neu_excess_returns, -1)
factor_dict = {}
for facName in valueFactors:
factor = get_universe_factor(replace_nan_indu(factor_csv.pivot(index='tradeDate', columns='code', values=facName)),univ=univ)
if facName == 'PE':
factor = factor[factor>0]
factor = (1 / factor).replace([-np.inf, np.inf], np.NaN)
factor_neu = pretreat_factor(factor, neu=True)
factor_dict[facName] = factor_neu
value_factor_compose, weight_df = FactorWeight().ic_ir_weight(factor_dict, month_return, 12)
#日期转换
factor = value_factor_compose
date_str_list = map(lambda x:x.strftime('%Y-%m-%d'),factor.index)
factor.index = map(lambda x:datetime.strptime(x, '%Y-%m-%d'),date_str_list)
columns = filter(lambda x:x not in finance,factor.columns)
factor_report = get_easy_factor_report(factor.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], -1)
factor_report
原始因子 | 行业和市值中性化后因子 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
IC | IC_IR | pvalue | IC | IC_IR | pvalue | 多空组合月度收益 | 胜率 | 最大回撤 | 夏普比率 | |
沪深300 | 0.063546 | 0.395906 | 1.327636e-05 | 0.054597 | 0.450103 | 8.427076e-07 | 0.014182 | 0.619048 | 0.204698 | 0.346838 |
中证500 | 0.070719 | 0.596500 | 1.406081e-10 | 0.057387 | 0.557036 | 1.732787e-09 | 0.011801 | 0.611111 | 0.135804 | 0.307697 |
全A | 0.060943 | 0.677619 | 5.722654e-13 | 0.058574 | 0.660546 | 1.889161e-12 | 0.014107 | 0.626984 | 0.116313 | 0.416161 |
facotr_neu = pretreat_factor(factor.ix[startDate:endDate, columns], neu=False)
factor_neu_excess_returns = get_group_ret(facotr_neu.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], 10)
ax = group_mean_report_plot(factor_neu_excess_returns, -1)
factor_compose = FactorWeight.equal_weight({'quality':quality_factor_compose, 'value':value_factor_compose})
#日期转换
factor = factor_compose
date_str_list = map(lambda x:x.strftime('%Y-%m-%d'),factor.index)
factor.index = map(lambda x:datetime.strptime(x, '%Y-%m-%d'),date_str_list)
columns = filter(lambda x:x not in finance,factor.columns)
factor_report = get_easy_factor_report(factor.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], -1)
factor_report
原始因子 | 行业和市值中性化后因子 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
IC | IC_IR | pvalue | IC | IC_IR | pvalue | 多空组合月度收益 | 胜率 | 最大回撤 | 夏普比率 | |
沪深300 | 0.083652 | 0.636234 | 1.002109e-11 | 0.070079 | 0.665316 | 1.355704e-12 | 0.020218 | 0.690476 | 0.169976 | 0.509479 |
中证500 | 0.081217 | 0.817618 | 1.712030e-17 | 0.075381 | 0.832855 | 5.193972e-18 | 0.017878 | 0.698413 | 0.135111 | 0.506639 |
全A | 0.074043 | 1.058150 | 4.268384e-26 | 0.070227 | 1.002462 | 4.901411e-24 | 0.018616 | 0.777778 | 0.066582 | 0.680651 |
facotr_neu = pretreat_factor(factor.ix[startDate:endDate, columns], neu=False)
factor_neu_excess_returns = get_group_ret(facotr_neu.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], 10)
ax = group_mean_report_plot(factor_neu_excess_returns, -1)
corlation = 0
for date in earning_factor_compose.index[12:]:
tmp = pd.concat([earning_factor_compose.loc[date, :].to_frame('盈利'),
growth_factor_compose.loc[date, :].to_frame('成长'),
#quality_factor_compose.loc[date, :].to_frame('质量'),
value_factor_compose.loc[date, :].to_frame('估值')], axis=1)
corlation = tmp.corr() + corlation
corlation / len(earning_factor_compose.index[12:])
盈利 | 成长 | 估值 | |
---|---|---|---|
盈利 | 1.000000 | 0.369274 | 0.069565 |
成长 | 0.369274 | 1.000000 | -0.019578 |
估值 | 0.069565 | -0.019578 | 1.000000 |
因子在不同行业的平均水平,行业间差别还是较大的。
# 行业名称
industry_name_sw1 = {'801740':'国防军工','801020':'采掘','801110':'家用电器','801160':'公用事业','801770':'通信','801010':'农林牧渔','801120':'食品饮料','801750':'计算机','801050':'有色金属','801890':'机械设备','801170':'交通运输','801710':'建筑材料','801040':'钢铁','801130':'纺织服装','801880':'汽车','801180':'房地产','801230':'综合','801760':'传媒','801200':'商业贸易','801780':'银行','801140':'轻工制造','801720':'建筑装饰','801080':'电子','801790':'非银金融','801030':'化工','801730':'电气设备','801210':'休闲服务','801150':'医药生物'}
indu_last = (indu.iloc[-1]).to_frame('indu')
fac_last = factor.iloc[-1, :].to_frame('all')
fac_indu_mean = pd.merge(fac_last, indu_last, how='inner', right_index=True, left_index=True).groupby('indu').mean()
indu_dist = pd.concat([fac_indu_mean], axis=1)
fig = plt.figure(figsize=(16, 8))
for i in range(indu_dist.shape[1]):
k = 100 + indu_dist.shape[1] * 10 + i + 1
ax = indu_dist.iloc[:, i].plot(kind='barh', ax=fig.add_subplot(k), color='r')
ax.set_xlabel(indu_dist.columns[i])
ax.set_xticklabels(ax.get_xticks(), rotation=45)
if i == 0:
s = ax.set_yticklabels([industry_name_sw1[i].decode('utf-8') for i in indu_dist.index]) #.decode('utf-8')
s = ax.set_ylabel(u'行业', fontsize=14)
else:
ax.set_yticklabels([])
ax.set_ylabel('')
# 各个因子在申万一级行业内的平均值
在测试的过程中,为了保证因子的统一性,将金融类(这里的金融类指的是申万一级行业分类下的银行和非银金融)的股票剔除。
本社区仅针对特定人员开放
查看需注册登录并通过风险意识测评
5秒后跳转登录页面...
移动端课程