请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  数理科学 帖子:3364680 新帖:1

基于ICIR的因子组合策略

美联储主席发表于:5 月 9 日 18:40回复(1)

本文是对东方证券研报《质优股量化投资》的一小部分聚宽版实现,使用了部分优矿的代码,供大家参考。

解释下代码中的因子:

盈利因子:

GPOA:毛利润除以总资产
GPM:毛利润
ROE:净资产收益率
ROA:总资产收益率
INC:净资产收益率(扣除非经常损益)
EPS:每股收益

成长因子:

ITRYOY:营业总收入同比增长率
IRYOY:营业收入同比增长率
IOPYOY:营业利润同比增长率
INPYOY:净利润同比增长率
INPTSYOY:归属母公司股东的净利润同比增长率

估值因子:

PB:市净率
PE:市盈率
PS:市销率

合成后的因子表现:

选择十个股票的回测结果:

增加止损后的回测结果:

结论就是,价值投资还是挺给力的。

详细的分析,大家看研报吧。

需要注意的是,研究是在金融终端运行后上传的,在网页端可能内存不够。

回测的选股数量是前10%:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as st
import time

#from CAL.PyCAL import *
from datetime import datetime, timedelta
from scipy.stats import ttest_ind
from multiprocessing.dummy import Pool as ThreadPool

import statsmodels.api as sm
from pandas import DataFrame,Series

import jqdata
sns.set_style('whitegrid')

import matplotlib as mpl
mpl.rcParams['font.family']='serif'
mpl.rcParams['font.serif']='SimHei'
mpl.rcParams['axes.unicode_minus']=False # 处理负号问题

import warnings
warnings.filterwarnings("ignore")
D:\Program Files (x86)\JoinQuant-Desktop\Python27\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools

1参数准备¶

TTM_factors = []

earningFactors = ['GPOA','GPM','ROE','ROA','INC','EPS']
growthFactors = ['ITRYOY','IRYOY','IOPYOY','INPYOY','INPTSYOY']
valueFactors = ['PE','PB','PS']
other = []
factors = earningFactors + growthFactors + valueFactors + other

fac_dict = {
    'MC':valuation.market_cap, # 总市值
    'GP':indicator.gross_profit_margin * income.operating_revenue, # 毛利润
    'OP':income.operating_profit,
    'OR':income.operating_revenue, # 营业收入
    'NP':income.net_profit, # 净利润
    'EV':valuation.market_cap + balance.shortterm_loan+balance.non_current_liability_in_one_year+balance.longterm_loan+balance.bonds_payable+balance.longterm_account_payable - cash_flow.cash_and_equivalents_at_end,
    
    'TOE':balance.total_owner_equities, # 股东权益合计(元)
    'TOR':income.total_operating_revenue, # 营业总收入
    'EBIT':income.net_profit+income.financial_expense+income.income_tax_expense,
    
    'GPOA':indicator.gross_profit_margin * income.operating_revenue / balance.total_assets,  #毛利润 / 总资产 = 毛利率*营业收入 / 总资产
    'GPM':indicator.gross_profit_margin, # 毛利率
    'OPM':income.operating_profit / income.operating_revenue, #营业利润率
    'NPM':indicator.net_profit_margin, # 净利率
    'ROA':indicator.roa, # ROA
    'ROE':indicator.roe, # ROE
    'INC':indicator.inc_return, # 净资产收益率(扣除非经常损益)(%)
    'EPS':indicator.eps, # 净资产收益率(扣除非经常损益)(%)
    'AP':indicator.adjusted_profit, # 扣除非经常损益后的净利润(元)
    'OP':indicator.operating_profit, # 经营活动净收益(元)
    'VCP':indicator.value_change_profit, # 价值变动净收益(元) = 公允价值变动净收益+投资净收益+汇兑净收益
    
    'ETTR':indicator.expense_to_total_revenue, # 营业总成本/营业总收入(%)
    'OPTTR':indicator.operation_profit_to_total_revenue, # 营业利润/营业总收入(%)
    'NPTTR':indicator.net_profit_to_total_revenue, # 净利润/营业总收入(%)
    'OETTR':indicator.operating_expense_to_total_revenue, # 营业费用/营业总收入
    'GETTR':indicator.ga_expense_to_total_revenue, # 管理费用/营业总收入(%)
    'FETTR':indicator.financing_expense_to_total_revenue, # 财务费用/营业总收入(%)	
    
    'OPTP':indicator.operating_profit_to_profit, # 经营活动净收益/利润总额(%)
    'IPTP':indicator.invesment_profit_to_profit, # 价值变动净收益/利润总额(%)
    'GSASTR':indicator.goods_sale_and_service_to_revenue, # 销售商品提供劳务收到的现金/营业收入(%)
    'OTR':indicator.ocf_to_revenue, # 经营活动产生的现金流量净额/营业收入(%)
    'OTOP':indicator.ocf_to_operating_profit, # 经营活动产生的现金流量净额/经营活动净收益(%)
    
    'ITRYOY':indicator.inc_total_revenue_year_on_year, # 营业总收入同比增长率(%)
    'ITRA':indicator.inc_total_revenue_annual, # 营业总收入环比增长率(%)
    'IRYOY':indicator.inc_revenue_year_on_year, # 营业收入同比增长率(%)
    'IRA':indicator.inc_revenue_annual, # 营业收入环比增长率(%)
    'IOPYOY':indicator.inc_operation_profit_year_on_year, # 营业利润同比增长率(%)
    'IOPA':indicator.inc_operation_profit_annual, # 营业利润环比增长率(%)
    'INPYOY':indicator.inc_net_profit_year_on_year, # 净利润同比增长率(%)
    'INPA':indicator.inc_net_profit_annual, # 净利润环比增长率(%)
    'INPTSYOY':indicator.inc_net_profit_to_shareholders_year_on_year, # 归属母公司股东的净利润同比增长率(%)
    'INPTSA':indicator.inc_net_profit_to_shareholders_annual, # 归属母公司股东的净利润环比增长率(%)
    'INPTSA':indicator.inc_net_profit_to_shareholders_annual, # 归属母公司股东的净利润环比增长率(%)
    
    
    'ROIC':(income.net_profit+income.financial_expense+income.income_tax_expense)/(balance.total_owner_equities+balance.shortterm_loan+balance.non_current_liability_in_one_year+balance.longterm_loan+balance.bonds_payable+balance.longterm_account_payable),
    'OPTT':income.operating_profit / income.total_profit, # 营业利润占比

    'NP':income.net_profit, # 净利润
    
    'TA':balance.total_assets, # 总资产

    'DER':balance.total_liability / balance.equities_parent_company_owners, # 产权比率 = 负债合计/归属母公司所有者权益合计
    'FCFF/TNCL':(cash_flow.net_operate_cash_flow - cash_flow.net_invest_cash_flow) / balance.total_non_current_liability, #自由现金流比非流动负债
    'NOCF/TL': cash_flow.net_operate_cash_flow / balance.total_liability, # 经营活动产生的现金流量净额/负债合计
    'TCA/TCL':balance.total_current_assets / balance.total_current_liability, # 流动比率

    'PE':valuation.pe_ratio, # PE 市盈率
    'PB':valuation.pb_ratio, # PB 市净率
    'PR':valuation.pcf_ratio, # PR 市现率
    'PS':valuation.ps_ratio # PS 市销率
    }

index = 'all'
start = '2006-09-02'
end = '2018-01-10'
interval = 20

date_list = list(jqdata.get_trade_days(start_date = start, end_date = end))
#date_str_list = map(lambda x:x.strftime("%Y-%m-%d"),date_list)

# 需要回测的日期
trade_date_list = filter(lambda x:date_list.index(x) % interval == 0, date_list)
#date_str_list = map(lambda x:x.strftime("%Y-%m-%d"),trade_date_list)

date_list_to_backPro = filter(lambda x:x.strftime("%Y-%m-%d") > '2007-01-01',trade_date_list)
len(date_list_to_backPro)
135

2 工具函数的编写¶

def get_factor_by_day(tdate):
    '''
    根据日期,获取当天的因子值
    tdate:str,'YYYYMMDD'格式
    '''
    
    cnt = 0
    while True:
        try:
            x = get_all_factors(tdate, factors, TTM_factors, fac_dict, index)
            return x
        except Exception as e:
            cnt += 1
            if cnt >= 3:
                print('error get factor data: ', tdate)
                break
                
'''tdate = trade_date_list[0]                
get_factor_by_day(tdate)'''
'tdate = trade_date_list[0]                \nget_factor_by_day(tdate)'
# 得到因子数据
def get_all_factors(fdate, factors, TTM_factors, fac_dict, index):
    if index == 'all':
        stock_list = get_all_securities(types=['stock'], date=fdate).index.tolist()
    else:
        stock_list = get_index_stocks(index, date=fdate)
        
    # factor
    q = query(valuation.code) # 股票代码
    for fac in factors:
        q = q.add_column(fac_dict[fac]) 
    q.filter(valuation.code.in_(stock_list))
    fdf = get_fundamentals(q, date=fdate)
    fdf.index = fdf['code']
    fdf.columns = ['code'] + factors
    
    # TTM_factors
    # 年
    if type(fdate) == str:
        date = fdate
    else:
        date = fdate.strftime('%Y-%m-%d')
        
    year = int(date[:4])
    # 月日
    month_and_day = date[5:10]

    # 季度列表
    if month_and_day < '05-01':
        statDate_list = [str(year-2) + "q4", str(year-1) + "q1", str(year-1) + "q2", str(year-1) + "q3"]
    elif month_and_day >= '05-01' and month_and_day < '09-01':
        statDate_list = [str(year-1) + "q2", str(year-1) + "q3", str(year-1) + "q4", str(year) + "q1"]
    elif month_and_day >= '09-01' and month_and_day < '11-01':
        statDate_list = [str(year-1) + "q3", str(year-1) + "q4", str(year) + "q1", str(year) + "q2"]
    elif month_and_day >= '11-01':
        statDate_list = [str(year-1) + "q4", str(year) + "q1", str(year) + "q2", str(year) + "q3"]

    q = query(valuation.code) # 股票代码
    for fac in TTM_factors:
        q = q.add_column(fac_dict[fac]) 
    q.filter(valuation.code.in_(stock_list))
    
    TTM_fdf = ''
    for statDate in statDate_list:    
        if type(TTM_fdf) == str:
            df = get_fundamentals(q, statDate=statDate)
            df.index = df['code']
            TTM_fdf = df
        else:
            df = get_fundamentals(q, statDate=statDate)
            df.index = df['code']
            TTM_fdf.iloc[:,1:] += df.iloc[:,1:]
    TTM_fdf.columns = ['code'] + TTM_factors
    
    fdf=fdf.merge(TTM_fdf, on=['code'], how='inner')
    
    #fdf.index = fdf['code']
    
    fdf['tradeDate'] = fdate
    fdf = fdf[['code','tradeDate'] + TTM_factors + factors]
    # 行:选择全部,列,返回除了股票代码所有因子
    fdf.index.name = ''
    fdf.index = range(len(fdf))
    fdf = fdf.sort_index(by='code')
    return fdf#.iloc[:,1:]
'''
fdate = '2017-03-01'
fdate = trade_date_list[0]
index = 'all'
df = get_all_factors(fdate, factors, [], fac_dict, index)
#df.index = df['code']
#df.ix[['000001.XSHE','601939.XSHG','601988.XSHG']]
df'''
"\nfdate = '2017-03-01'\nfdate = trade_date_list[0]\nindex = 'all'\ndf = get_all_factors(fdate, factors, [], fac_dict, index)\n#df.index = df['code']\n#df.ix[['000001.XSHE','601939.XSHG','601988.XSHG']]\ndf"
def get_easy_factor_report(factor, month_return, direction):
    """
    获得简单的因子分析报告,注意后面的分析会剔除金融行业。
    在输入的month_return中,索引应该和factor保持一致,
    输入:
        factor:DataFrame,index为日期,columns为股票代码,value为因子值
        month_return:DataFrame,index为日期,columns为股票代码,value为股票收益率。month_return
    返回:
        DataFrame:记录中性化前因子在不同域的IC,IC_IR,pValue,以及中性化后因子在不同域的IC,IC_IR,以及不同域的多空表现
    """
    columns = filter(lambda x:x not in finance,factor.columns)
    factor_hs300 = get_universe_factor(factor, univ=univ_hs300).loc[:, columns]
    factor_zz500 = get_universe_factor(factor, univ=univ_zz500).loc[:, columns]

    factor_hs300_neu = pretreat_factor(factor_hs300)
    factor_zz500_neu = pretreat_factor(factor_zz500)
    factor_a_neu = pretreat_factor(factor)

    # 中性化前因子分析
    rank_ic_hs300 = get_rank_ic(factor_hs300, month_return)
    rank_ic_zz500 = get_rank_ic(factor_zz500, month_return)
    rank_ic_a = get_rank_ic(factor, month_return)

    rank_ic_hs300_mean = rank_ic_hs300['IC'].mean()
    rank_ic_zz500_mean = rank_ic_zz500['IC'].mean()
    rank_ic_a_mean = rank_ic_a['IC'].mean()

    rank_ic_hs300_pvalue = ttest_ind(rank_ic_hs300['IC'].dropna().tolist(), [0] * len(rank_ic_hs300.dropna()))[1]
    rank_ic_zz500_pvalue = ttest_ind(rank_ic_zz500['IC'].dropna().tolist(), [0] * len(rank_ic_zz500.dropna()))[1]
    rank_ic_a_pvalue = ttest_ind(rank_ic_a['IC'].dropna().tolist(), [0] * len(rank_ic_a.dropna()))[1]

    rank_ic_ir_hs300 = rank_ic_hs300['IC'].mean() / rank_ic_hs300['IC'].std()
    rank_ic_ir_zz500 = rank_ic_zz500['IC'].mean() / rank_ic_zz500['IC'].std()
    rank_ic_ir_a = rank_ic_a['IC'].mean() / rank_ic_a['IC'].std()

    # 中性化后因子分析
    rank_ic_neu_hs300 = get_rank_ic(factor_hs300_neu, month_return)
    rank_ic_neu_zz500 = get_rank_ic(factor_zz500_neu, month_return)
    rank_ic_neu_a = get_rank_ic(factor_a_neu, month_return)

    rank_ic_hs300_neu_pvalue = ttest_ind(rank_ic_neu_hs300['IC'].dropna().tolist(), [0] * len(rank_ic_neu_hs300['IC'].dropna()))[1]
    rank_ic_zz500_neu_pvalue = ttest_ind(rank_ic_neu_zz500['IC'].dropna().tolist(), [0] * len(rank_ic_neu_zz500['IC'].dropna()))[1]
    rank_ic_a_neu_pvalue = ttest_ind(rank_ic_neu_a['IC'].dropna().tolist(), [0] * len(rank_ic_neu_a['IC'].dropna()))[1]

    rank_ic_neu_hs300_mean = rank_ic_neu_hs300['IC'].mean()
    rank_ic_neu_zz500_mean = rank_ic_neu_zz500['IC'].mean()
    rank_ic_neu_a_mean = rank_ic_neu_a['IC'].mean()

    rank_ic_ir_neu_hs300 = rank_ic_neu_hs300['IC'].mean() / rank_ic_neu_hs300['IC'].std()
    rank_ic_ir_neu_zz500 = rank_ic_neu_zz500['IC'].mean() / rank_ic_neu_zz500['IC'].std()
    rank_ic_ir_neu_a = rank_ic_neu_a['IC'].mean() / rank_ic_neu_a['IC'].std()

    hs300_excess_returns = get_group_ret(factor_hs300_neu, month_return, n_quantile=10)
    zz500_excess_returns = get_group_ret(factor_zz500_neu, month_return, n_quantile=10)
    a_excess_returns = get_group_ret(factor_a_neu, month_return, n_quantile=10)

    hs300_long_short_ret = (hs300_excess_returns.iloc[:, np.sign(direction-1)] - 
                            hs300_excess_returns.iloc[:, -np.sign(direction+1)]).fillna(0)
    zz500_long_short_ret = (zz500_excess_returns.iloc[:, np.sign(direction-1)] - zz500_excess_returns.iloc[:, -np.sign(direction+1)]).fillna(0)
    a_long_short_ret = (a_excess_returns.iloc[:, np.sign(direction-1)] - a_excess_returns.iloc[:, -np.sign(direction+1)]).fillna(0)

    hs300_long_short_month_ret = hs300_long_short_ret.mean()
    zz500_long_short_month_ret = zz500_long_short_ret.mean()
    a_long_short_month_ret = a_long_short_ret.mean()

    hs300_long_short_win_ratio = float(len(hs300_long_short_ret[hs300_long_short_ret > 0])) / len(hs300_long_short_ret)
    zz500_long_short_win_ratio = float(len(zz500_long_short_ret[zz500_long_short_ret > 0])) / len(zz500_long_short_ret)
    a_long_short_win_ratio = float(len(a_long_short_ret[a_long_short_ret > 0])) / len(a_long_short_ret)

    hs300_long_short_sharp_ratio = hs300_long_short_ret.mean() / hs300_long_short_ret.std()
    zz500_long_short_sharp_ratio = zz500_long_short_ret.mean() / zz500_long_short_ret.std()
    a_long_short_sharp_ratio = a_long_short_ret.mean() / a_long_short_ret.std()

    # 最大回撤
    hs300_long_short_max_drawdown = max([1 - v/max(1, max((hs300_long_short_ret+1).cumprod()[:i+1])) for i,v in enumerate((hs300_long_short_ret+1).cumprod())])
    zz500_long_short_max_drawdown = max([1 - v/max(1, max((zz500_long_short_ret+1).cumprod()[:i+1])) for i,v in enumerate((zz500_long_short_ret+1).cumprod())])
    a_long_short_max_drawdown = max([1 - v/max(1, max((a_long_short_ret+1).cumprod()[:i+1])) for i,v in enumerate((a_long_short_ret+1).cumprod())])

    # 结果汇总
    report = pd.DataFrame(index=['沪深300', '中证500', '全A'], 
                          columns=[['原始因子', '原始因子', '原始因子', '行业和市值中性化后因子', '行业和市值中性化后因子','行业和市值中性化后因子',
                                    '行业和市值中性化后因子','行业和市值中性化后因子','行业和市值中性化后因子', '行业和市值中性化后因子'], 
                                   ['IC', 'IC_IR', 'pvalue', 'IC', 'IC_IR', 'pvalue', '多空组合月度收益', '胜率', '最大回撤', '夏普比率']])
    report.iloc[:, 0] = [rank_ic_hs300_mean, rank_ic_zz500_mean, rank_ic_a_mean]
    report.iloc[:, 1] = [rank_ic_ir_hs300, rank_ic_ir_zz500, rank_ic_ir_a]
    report.iloc[:, 2] = [rank_ic_hs300_pvalue, rank_ic_zz500_pvalue, rank_ic_a_pvalue]
    report.iloc[:, 3] = [rank_ic_neu_hs300_mean, rank_ic_neu_zz500_mean, rank_ic_neu_a_mean]
    report.iloc[:, 4] = [rank_ic_ir_neu_hs300, rank_ic_ir_neu_zz500, rank_ic_ir_neu_a]
    report.iloc[:, 5] = [rank_ic_hs300_neu_pvalue, rank_ic_zz500_neu_pvalue, rank_ic_a_neu_pvalue]
    report.iloc[:, 6] = [hs300_long_short_month_ret, zz500_long_short_month_ret, a_long_short_month_ret]
    report.iloc[:, 7] = [hs300_long_short_win_ratio, zz500_long_short_win_ratio, a_long_short_win_ratio]
    report.iloc[:, 8] = [hs300_long_short_max_drawdown, zz500_long_short_max_drawdown, a_long_short_max_drawdown]
    report.iloc[:, 9] = [hs300_long_short_sharp_ratio, zz500_long_short_sharp_ratio, a_long_short_sharp_ratio]
    return report
def group_mean_report_plot(group_return, direction=1):
    """
    分组收益绘图
    group_return:分组收益,columns为分组序号,index为日期,值为每个调仓周期的组合收益率。可由函数get_group_ret产生
    """
    fig = plt.figure(figsize=(12, 8))
    ax1 = fig.add_subplot(212)
    ax2 = ax1.twinx()
    ax3 = fig.add_subplot(211)
    ax2.grid(False)
    
    month_return = (group_return.iloc[:, np.sign(direction-1)] - group_return.iloc[:, -np.sign(direction+1)]).fillna(0)
    
    ax1.bar(pd.to_datetime(month_return.index), month_return.values)
    ax2.plot(pd.to_datetime(month_return.index), (month_return.values+1).cumprod(), color='r')
    ax1.set_title(u"因子在中证全指(扣除金融)的表现", fontsize=16)
    
    excess_returns_means_dist = group_return.mean()
    excess_dist_plus = excess_returns_means_dist[excess_returns_means_dist>0]
    excess_dist_minus = excess_returns_means_dist[excess_returns_means_dist<0]
    lns2 = ax3.bar(excess_dist_plus.index, excess_dist_plus.values, align='center', color='r', width=0.35)
    lns3 = ax3.bar(excess_dist_minus.index, excess_dist_minus.values, align='center', color='g', width=0.35)

    ax3.set_xlim(left=0.5, right=len(excess_returns_means_dist)+0.5)
    ax3.set_xticks(excess_returns_means_dist.index)
    ax3.set_title(u"因子分组超额收益", fontsize=16)
    ax3.grid(True)
def get_rank_ic(factor, forward_return):
    """
    计算因子的信息系数
    输入:
        factor:DataFrame,index为日期,columns为股票代码,value为因子值
        forward_return:DataFrame,index为日期,columns为股票代码,value为下一期的股票收益率
    返回:
        DataFrame:index为日期,columns为IC,IC t检验的pvalue
    注意:factor与forward_return的index及columns应保持一致
    """
    common_index = factor.index.intersection(forward_return.index)
    ic_data = pd.DataFrame(index=common_index, columns=['IC','pValue'])

    # 计算相关系数
    for dt in ic_data.index:
        tmp_factor = factor.ix[dt]
        tmp_ret = forward_return.ix[dt]
        cor = pd.DataFrame(tmp_factor)
        ret = pd.DataFrame(tmp_ret)
        cor.columns = ['corr']
        ret.columns = ['ret']
        cor['ret'] = ret['ret']
        cor = cor[~pd.isnull(cor['corr'])][~pd.isnull(cor['ret'])]
        if len(cor) < 5:
            continue

        ic, p_value = st.spearmanr(cor['corr'], cor['ret'])   # 计算秩相关系数RankIC
        ic_data['IC'][dt] = ic
        ic_data['pValue'][dt] = p_value
    return ic_data
def get_group_ret(factor, month_ret, n_quantile=10):
    """
    计算分组超额收益:组合构建方式为等权,基准也为等权.
    注意:month_ret和factor应该错开一期,也就是说,month_ret要比factor晚一期
    输入:
        factor:DataFrame,index为日期,columns为股票代码,value为因子值
        month_ret:DataFrame,index为日期,columns为股票代码,value为收益率,month_ret的日期频率应和factor保持一致
        n_quantile:int,分组数量
    返回:
        DataFrame:列为分组序号,index为日期,值为每个调仓周期的组合收益率
    """
    # 统计分位数
    cols_mean = [i+1 for i in range(n_quantile)]
    cols = cols_mean
    

    excess_returns_means = pd.DataFrame(index=month_ret.index[:len(factor+1)], columns=cols)

    # 计算因子分组的超额收益平均值
    for t, dt in enumerate(excess_returns_means.index):
        qt_mean_results = []

        # ILLIQ去掉nan
        tmp_factor = factor.loc[dt].dropna()
        tmp_return = month_ret.loc[dt].dropna()
        tmp_return = tmp_return.loc[tmp_factor.index]
        tmp_return_mean = tmp_return.mean()

        pct_quantiles = 1.0 / n_quantile
        for i in range(n_quantile):
            down = tmp_factor.quantile(pct_quantiles*i)
            up = tmp_factor.quantile(pct_quantiles*(i + 1))
            i_quantile_index = tmp_factor[(tmp_factor <= up) & (tmp_factor >= down)].index
            mean_tmp = tmp_return[i_quantile_index].mean() - tmp_return_mean
            qt_mean_results.append(mean_tmp)
        excess_returns_means.ix[t] = qt_mean_results
    return excess_returns_means
# 去极值
def winsorize(se):
    q = se.quantile([0.025, 0.975])
    if isinstance(q, pd.Series) and len(q) == 2:
        se[se < q.iloc[0]] = q.iloc[0]
        se[se > q.iloc[1]] = q.iloc[1]
    return se

# 标准化
def standardize(se):
    mean = se.mean()
    std = se.std()
    se = (se - mean)/std
    return se

# 中性化
def neutralize(factor_se, market_cap_se, concept_se):
    
    stock_list = factor_se.index.tolist()
    
    # 行业数据哑变量
    groups = array(concept_se.ix[stock_list].tolist())
    dummy = sm.categorical(groups, drop=True)
    
    # 市值对数化
    market_cap_log = np.log(market_cap_se.ix[stock_list].tolist())
    
    # 自变量
    X = np.c_[dummy,market_cap_log]
    # 因变量
    y = factor_se.ix[stock_list]

    # 拟合
    model = sm.OLS(y,X)
    results = model.fit()
    
    # 拟合结果
    y_fitted = results.fittedvalues
    
    neutralize_factor_se = factor_se - y_fitted
    
    return neutralize_factor_se
def get_Atickers(date):
    """
    给定日期,获取这一天上市时间不低于60天的股票(参照中证全指指数编制)
    输入:
        date: str, 'YYYYMMDD'格式
    返回:
        list: 元素为股票ticker
    """
    date = '2018-04-16'
    df = get_all_securities(types=['stock'], date=date)
    daysBefore = jqdata.get_trade_days(end_date=date, count=60)[0]
    df['60DaysBefore'] = daysBefore
    df = (df[df['start_date'] < df['60DaysBefore']])
    return df.index.tolist()
def pretreat_factor(factor_df, neu=True):
    """
    因子处理函数
    输入:
        factor_df:DataFrame,index为日期,columns为股票代码,value为因子值
        neu:Bool,是否进行行业+市值中性化,若为True,则进行去极值->中性化->标准化;若为否,则进行去极值->标准化
    返回:
        factor_df:DataFrame,处理之后的因子
    """
    pretreat_data = factor_df.copy(deep=True)
    for dt in pretreat_data.index:
        concept_se = indu.ix[dt]
        market_cap_se = mkt.ix[dt]
        
        try:
            factor_dt = pretreat_data.ix[dt].dropna()
            if neu:
                pretreat_data.ix[dt] = standardize(neutralize(winsorize(factor_dt),market_cap_se,concept_se))
            else:
                pretreat_data.ix[dt] = standardize(winsorize(factor_dt))
        except Exception as excp:
            print (dt)
            print (excp)
            continue
    return pretreat_data
all_stocks = get_all_securities(types=['stock'], date=None).index.tolist()
def getPriceData(date):
    price=get_price(all_stocks, start_date=date, end_date=date, frequency='1d',fields=['close'])['close']
    return price

#getPriceData('2007-01-04')
def getStockIndustry(fdate):
    stock_list = get_all_securities(types=['stock'], date=fdate).index.tolist()
    industry_set = ['801010', '801020', '801030', '801040', '801050', '801080', '801110', '801120', '801130', 
              '801140', '801150', '801160', '801170', '801180', '801200', '801210', '801230', '801710',
              '801720', '801730', '801740', '801750', '801760', '801770', '801780', '801790', '801880','801890']
    df = pd.DataFrame(index = stock_list,columns = [fdate])
    df.index.name = 'code'
    for i in range(len(industry_set)):
        industry = get_industry_stocks(industry_set[i], date = fdate)
        industry = list(set(industry) & set(df.index.tolist()))
        df[fdate].ix[industry] = industry_set[i]
        
    return df.T

#getStockIndustry('2018-04-16')        
def getStockMktValue(fdate):
#    df = get_factors(fdate, ['MC'], fac_dict, index)
    df = get_all_factors(fdate, ['MC'], [], fac_dict, index)
    df = df.pivot(index='tradeDate', columns='code', values='MC')
    return df
def get_universe_factor(factor, idx=None, univ=None):
    """
    筛选出某指数成份股或者指定域内的因子值
    输入:
        factor:DataFrame,index为日期,columns为股票代码,value为因子值
        idx:指数代码,000300:沪深300,000905:中证500,000985:中证全指
        univ:DataFrame,index为日期,'YYYYMMDD'格式。columns为'code',value为股票代码
    返回:
        factor:DataFrame,指定域下的因子值,index为日期,columns为股票代码,value为因子值
    """
    universe_factor = pd.DataFrame()
    if idx is not None:
        for date in factor.index:
            universe = get_idx_cons(idx, date)
            universe_factor = universe_factor.append(factor.loc[date, universe].to_frame(date).T)
    else:
        if univ is not None:
            for date in factor.index:
                universe = univ.loc[date, 'code'].tolist()
                universe_factor = universe_factor.append(factor.loc[date, universe].to_frame(date).T)
        else:
            raise Exception('请指定成分股或域')
    return universe_factor
def replace_nan_indu(factor):
    """缺失值填充函数,使用行业中位数进行填充
    输入:
        factor:DataFrame,index为日期,columns为股票代码,value为因子值
    返回:
        factor:格式保持不变,为填充后的因子
    """ 
    fill_factor = pd.DataFrame()
    for date in factor.index:
        # 因子值
        factor_array = factor.ix[date, :].to_frame('values')
        # 行业值
        indu_array = indu.ix[date, :].dropna().to_frame('industryName1')
        # 合并
        factor_array = factor_array.merge(indu_array, left_index=True, right_index=True, how='inner')
        # 行业中值
        mid = factor_array.groupby('industryName1').median()
        factor_array = factor_array.merge(mid, left_on='industryName1', right_index=True, how='left')
        # 行业中值填充缺失
        factor_array['values_x'][pd.isnull(factor_array['values_x'])] = factor_array['values_y'][pd.isnull(factor_array['values_x'])]
        # 将当前日期的因子数据追加到结果
        fill_factor = fill_factor.append(factor_array['values_x'].to_frame(date).T)
    return fill_factor
def get_idx_cons(idx, date):
    """
    获取某天指数成分股ticker列表
    输入:
        idx:str,指数代码
        date:str,'YYYY-MM-DD'格式
    返回:
        list:指数成份股的ticker
    """
    universe_idx = get_index_stocks(idx, date=date)
    
    universe_A = get_Atickers(date)
    return list(set(universe_idx) & set(universe_A))

3 数据的准备¶


这部分我们主要获得后文分析所需要的一些数据,包括因子数据及股票的行情数据。

# 月度收益
print ('个股行情数据开始计算...')
pool = ThreadPool(processes=16)
frame_list = pool.map(getPriceData, trade_date_list)
pool.close()
pool.join()
price = pd.concat(frame_list, axis=0)
#month_return = price.pct_change()
month_return = price.pct_change().shift(-1)

print ('个股行情数据计算完成')
print ('---------------------')
"# \xe6\x9c\x88\xe5\xba\xa6\xe6\x94\xb6\xe7\x9b\x8a\nprint ('\xe4\xb8\xaa\xe8\x82\xa1\xe8\xa1\x8c\xe6\x83\x85\xe6\x95\xb0\xe6\x8d\xae\xe5\xbc\x80\xe5\xa7\x8b\xe8\xae\xa1\xe7\xae\x97...')\npool = ThreadPool(processes=16)\nframe_list = pool.map(getPriceData, trade_date_list)\npool.close()\npool.join()\nprice = pd.concat(frame_list, axis=0)\n#month_return = price.pct_change()\nmonth_return = price.pct_change().shift(-1)\n\nprint ('\xe4\xb8\xaa\xe8\x82\xa1\xe8\xa1\x8c\xe6\x83\x85\xe6\x95\xb0\xe6\x8d\xae\xe8\xae\xa1\xe7\xae\x97\xe5\xae\x8c\xe6\x88\x90')\nprint ('---------------------')"
print ('开始生成前文所定义的股票池...')
univ, univ_zz500, univ_hs300 = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
trade_date_list = month_return.index.tolist()
for date in trade_date_list:
    current_universe = pd.Series(get_Atickers(date)).to_frame(name='code')
    current_universe.index = [date] * len(current_universe)
    univ = univ.append(current_universe)
    
    current_hs300_universe = pd.Series(get_idx_cons('000300.XSHG', date)).to_frame(name='code')
    current_hs300_universe.index = [date] * len(current_hs300_universe)
    univ_hs300 = univ_hs300.append(current_hs300_universe)
    
    current_zz500_universe = pd.Series(get_idx_cons('000905.XSHG', date)).to_frame(name='code')
    current_zz500_universe.index = [date] * len(current_zz500_universe)
    univ_zz500 = univ_zz500.append(current_zz500_universe)
print ('股票池生成结束')
print ('--------------------'    )
"print ('\xe5\xbc\x80\xe5\xa7\x8b\xe7\x94\x9f\xe6\x88\x90\xe5\x89\x8d\xe6\x96\x87\xe6\x89\x80\xe5\xae\x9a\xe4\xb9\x89\xe7\x9a\x84\xe8\x82\xa1\xe7\xa5\xa8\xe6\xb1\xa0...')\nuniv, univ_zz500, univ_hs300 = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()\ntrade_date_list = month_return.index.tolist()\nfor date in trade_date_list:\n    current_universe = pd.Series(get_Atickers(date)).to_frame(name='code')\n    current_universe.index = [date] * len(current_universe)\n    univ = univ.append(current_universe)\n    \n    current_hs300_universe = pd.Series(get_idx_cons('000300.XSHG', date)).to_frame(name='code')\n    current_hs300_universe.index = [date] * len(current_hs300_universe)\n    univ_hs300 = univ_hs300.append(current_hs300_universe)\n    \n    current_zz500_universe = pd.Series(get_idx_cons('000905.XSHG', date)).to_frame(name='code')\n    current_zz500_universe.index = [date] * len(current_zz500_universe)\n    univ_zz500 = univ_zz500.append(current_zz500_universe)\nprint ('\xe8\x82\xa1\xe7\xa5\xa8\xe6\xb1\xa0\xe7\x94\x9f\xe6\x88\x90\xe7\xbb\x93\xe6\x9d\x9f')\nprint ('--------------------'    )"
print ('开始计算因子数据...')

pool = ThreadPool(processes=16)
frame_list = pool.map(get_factor_by_day, trade_date_list)
pool.close()
pool.join()
factor_csv = pd.concat(frame_list, axis=0)
factor_csv.reset_index(inplace=True, drop=True)
print ('因子数据计算完成')
print ('--------------------')
开始计算因子数据...
因子数据计算完成
--------------------
print ('开始计算行业数据...')
pool = ThreadPool(processes=16)
frame_list = pool.map(getStockIndustry, trade_date_list)
pool.close()
pool.join()

indu = pd.concat(frame_list, axis=0)
print ('行业数据计算完成')
print ('--------------------')
"print ('\xe5\xbc\x80\xe5\xa7\x8b\xe8\xae\xa1\xe7\xae\x97\xe8\xa1\x8c\xe4\xb8\x9a\xe6\x95\xb0\xe6\x8d\xae...')\npool = ThreadPool(processes=16)\nframe_list = pool.map(getStockIndustry, trade_date_list)\npool.close()\npool.join()\n\nindu = pd.concat(frame_list, axis=0)\nprint ('\xe8\xa1\x8c\xe4\xb8\x9a\xe6\x95\xb0\xe6\x8d\xae\xe8\xae\xa1\xe7\xae\x97\xe5\xae\x8c\xe6\x88\x90')\nprint ('--------------------')"
print ('开始计算市值数据...')
pool = ThreadPool(processes=16)
frame_list = pool.map(getStockMktValue, trade_date_list)
pool.close()
pool.join()

mkt = pd.concat(frame_list, axis=0)
print ('市值数据计算完成')
print ('--------------------')
"print ('\xe5\xbc\x80\xe5\xa7\x8b\xe8\xae\xa1\xe7\xae\x97\xe5\xb8\x82\xe5\x80\xbc\xe6\x95\xb0\xe6\x8d\xae...')\npool = ThreadPool(processes=16)\nframe_list = pool.map(getStockMktValue, trade_date_list)\npool.close()\npool.join()\n\nmkt = pd.concat(frame_list, axis=0)\nprint ('\xe5\xb8\x82\xe5\x80\xbc\xe6\x95\xb0\xe6\x8d\xae\xe8\xae\xa1\xe7\xae\x97\xe5\xae\x8c\xe6\x88\x90')\nprint ('--------------------')"
# 找到金融类(银行,非银金融)股票,便于后面进行剔除
finance = indu.iloc[-1, :]
finance = finance[finance.isin(['801780', '801790'])].index

4 因子检验¶


factor_csv = factor_csv.drop_duplicates(subset=['code','tradeDate'], keep='first', inplace=False)
startDate = filter(lambda x:x.strftime("%Y-%m-%d") >= '2007-08-30',date_list_to_backPro)[0]
endDate = date_list_to_backPro[-2]
startDate,endDate
(datetime.date(2007, 8, 30), datetime.date(2017, 12, 8))
  
class FactorWeight():
    def __init__(self):
        pass
    
    @staticmethod
    def weighted(factor_dict, factor_weight):
        """
        用于因子合成的函数。因子之间需要对齐,因子和其对应的权重也应进行对齐
        输入:
            factor_dict:列表,用于存储因子,key为因子名,值为DataFrame(index为日期,columns为股票代码)
            factor_weight:因子权重,用于对因子进行配权,为DataFrame,index为日期,列对应着因子名称,值为当期因子的权重
        返回:
            DataFrame:最终合成后的因子
        """
        weighted_factor = 0
        for factor_name, factor in factor_dict.items():
            weighted_factor += factor.multiply(factor_weight[factor_name], axis=0)
        return weighted_factor

    @staticmethod
    def equal_weight(factor_dict):
        factor_weight = pd.Series([1. / len(factor_dict)] * len(factor_dict), index=factor_dict.keys()).to_dict()
        weighted_factor = FactorWeight.weighted(factor_dict, factor_weight)
        return weighted_factor
    
    @staticmethod
    def ic_weight(factor_dict, forward_month_return, window):
        
        # 获得IC序列
        all_rolling_ic_list = []
        for factor_name, factor in factor_dict.items():
            ic = get_rank_ic(factor, forward_month_return)['IC']
            # 计算得到当前因子的IC
            ic = pd.rolling_mean(ic, window=window)
            ic = ic.shift(1)
            ic.name = factor_name
            all_rolling_ic_list.append(ic)
        
        # 合并成一个DataFrame
        all_rolling_ic_df = pd.concat(all_rolling_ic_list, axis=1)
        all_rolling_ic_df = all_rolling_ic_df.divide(all_rolling_ic_df.sum(axis=1), axis=0)
        # 因子汇总
        weighted_factor = FactorWeight.weighted(factor_dict, all_rolling_ic_df)
        return weighted_factor
    
    @staticmethod
    def ic_ir_weight(factor_dict, forward_month_return, window):
        
        # 获得IC_IR序列
        all_rolling_ic_ir_list = []
        for factor_name, factor in factor_dict.items():
            ic = get_rank_ic(factor, forward_month_return)['IC']
            # 计算得到当前因子的IC_IR
            ic_ir = pd.rolling_mean(ic, window=window) / pd.rolling_std(ic, window=window)
            ic_ir = ic_ir.shift(1)
            ic_ir.name = factor_name
            all_rolling_ic_ir_list.append(ic_ir)
        
        # 合并成一个DataFrame,并计算权重
        all_rolling_ic_ir_df = pd.concat(all_rolling_ic_ir_list, axis=1)
        all_rolling_ic_ir_df = all_rolling_ic_ir_df.divide(all_rolling_ic_ir_df.sum(axis=1), axis=0)
        
        # 因子汇总
        weighted_factor = FactorWeight.weighted(factor_dict, all_rolling_ic_ir_df)
        return weighted_factor, all_rolling_ic_ir_df
  

盈利因子¶

factor_dict = {}
for facName in earningFactors:
    factor = get_universe_factor(replace_nan_indu(factor_csv.pivot(index='tradeDate', columns='code', values=facName)),univ=univ)
    factor_neu = pretreat_factor(factor, neu=True)
    factor_dict[facName] = factor_neu
earning_factor_compose, weight_df = FactorWeight().ic_ir_weight(factor_dict, month_return, 12)
#日期转换
factor = earning_factor_compose
date_str_list = map(lambda x:x.strftime('%Y-%m-%d'),factor.index)
factor.index = map(lambda x:datetime.strptime(x, '%Y-%m-%d'),date_str_list)
columns = filter(lambda x:x not in finance,factor.columns)
factor_report = get_easy_factor_report(factor.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], -1)
factor_report
原始因子 行业和市值中性化后因子
IC IC_IR pvalue IC IC_IR pvalue 多空组合月度收益 胜率 最大回撤 夏普比率
沪深300 0.051911 0.381916 2.587632e-05 0.036814 0.335032 2.110475e-04 0.011124 0.611111 0.241203 0.296951
中证500 0.049733 0.463550 4.081009e-07 0.057505 0.674788 6.984649e-13 0.013194 0.666667 0.187564 0.407292
全A 0.044062 0.569011 8.186156e-10 0.044315 0.578099 4.600589e-10 0.009133 0.642857 0.159361 0.330115
facotr_neu = pretreat_factor(factor.ix[startDate:endDate, columns], neu=False)
factor_neu_excess_returns = get_group_ret(facotr_neu.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], 10)
ax = group_mean_report_plot(factor_neu_excess_returns, -1)

成长因子¶

## 成长因子
factor_dict = {}
for facName in growthFactors:
    factor = get_universe_factor(replace_nan_indu(factor_csv.pivot(index='tradeDate', columns='code', values=facName)),univ=univ)
    factor_neu = pretreat_factor(factor, neu=True)
    factor_dict[facName] = factor_neu
growth_factor_compose, weight_df = FactorWeight().ic_ir_weight(factor_dict, month_return, 12)
#日期转换
factor = growth_factor_compose
date_str_list = map(lambda x:x.strftime('%Y-%m-%d'),factor.index)
factor.index = map(lambda x:datetime.strptime(x, '%Y-%m-%d'),date_str_list)
columns = filter(lambda x:x not in finance,factor.columns)
factor_report = get_easy_factor_report(factor.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], -1)
factor_report
原始因子 行业和市值中性化后因子
IC IC_IR pvalue IC IC_IR pvalue 多空组合月度收益 胜率 最大回撤 夏普比率
沪深300 0.033818 0.375935 3.422561e-05 0.027904 0.375405 3.507752e-05 0.008791 0.595238 0.185598 0.264244
中证500 0.032365 0.407479 7.537701e-06 0.041226 0.637598 9.135114e-12 0.011359 0.650794 0.084877 0.426981
全A 0.033879 0.607347 6.911809e-11 0.033213 0.608571 6.376321e-11 0.008085 0.611111 0.056928 0.355129
facotr_neu = pretreat_factor(factor.ix[startDate:endDate, columns], neu=False)
factor_neu_excess_returns = get_group_ret(facotr_neu.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], 10)
ax = group_mean_report_plot(factor_neu_excess_returns, -1)

质量因子¶

quality_factor_compose = FactorWeight.equal_weight({'earning':earning_factor_compose, 'growth':growth_factor_compose})
"## \xe8\xb4\xa8\xe9\x87\x8f\xe5\x9b\xa0\xe5\xad\x90\nfactor_dict = {}\nfor facName in (earningFactors + growthFactors):\n    factor = get_universe_factor(replace_nan_indu(factor_csv.pivot(index='tradeDate', columns='code', values=facName)),univ=univ)\n    factor_neu = pretreat_factor(factor, neu=True)\n    factor_dict[facName] = factor_neu\n    \nquality_factor_compose, weight_df = FactorWeight().ic_ir_weight(factor_dict, month_return, 12)"
#日期转换
factor = quality_factor_compose
date_str_list = map(lambda x:x.strftime('%Y-%m-%d'),factor.index)
factor.index = map(lambda x:datetime.strptime(x, '%Y-%m-%d'),date_str_list)
columns = filter(lambda x:x not in finance,factor.columns)
factor_report = get_easy_factor_report(factor.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], -1)
factor_report
原始因子 行业和市值中性化后因子
IC IC_IR pvalue IC IC_IR pvalue 多空组合月度收益 胜率 最大回撤 夏普比率
沪深300 0.055540 0.424996 3.125175e-06 0.042408 0.405978 8.117729e-06 0.008981 0.642857 0.228634 0.286641
中证500 0.048700 0.472505 2.496071e-07 0.058568 0.727632 1.564002e-14 0.013130 0.626984 0.085730 0.429916
全A 0.044597 0.613959 4.465081e-11 0.044487 0.621634 2.678478e-11 0.009650 0.666667 0.114691 0.376853
facotr_neu = pretreat_factor(factor.ix[startDate:endDate, columns], neu=False)
factor_neu_excess_returns = get_group_ret(facotr_neu.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], 10)
ax = group_mean_report_plot(factor_neu_excess_returns, -1)

估值因子¶

factor_dict = {}

for facName in valueFactors:
    factor = get_universe_factor(replace_nan_indu(factor_csv.pivot(index='tradeDate', columns='code', values=facName)),univ=univ)
    if facName == 'PE':
        factor = factor[factor>0]
    factor = (1 / factor).replace([-np.inf, np.inf], np.NaN)
    factor_neu = pretreat_factor(factor, neu=True)
    factor_dict[facName] = factor_neu
value_factor_compose, weight_df = FactorWeight().ic_ir_weight(factor_dict, month_return, 12)
#日期转换
factor = value_factor_compose
date_str_list = map(lambda x:x.strftime('%Y-%m-%d'),factor.index)
factor.index = map(lambda x:datetime.strptime(x, '%Y-%m-%d'),date_str_list)
columns = filter(lambda x:x not in finance,factor.columns)
factor_report = get_easy_factor_report(factor.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], -1)
factor_report
原始因子 行业和市值中性化后因子
IC IC_IR pvalue IC IC_IR pvalue 多空组合月度收益 胜率 最大回撤 夏普比率
沪深300 0.063546 0.395906 1.327636e-05 0.054597 0.450103 8.427076e-07 0.014182 0.619048 0.204698 0.346838
中证500 0.070719 0.596500 1.406081e-10 0.057387 0.557036 1.732787e-09 0.011801 0.611111 0.135804 0.307697
全A 0.060943 0.677619 5.722654e-13 0.058574 0.660546 1.889161e-12 0.014107 0.626984 0.116313 0.416161
facotr_neu = pretreat_factor(factor.ix[startDate:endDate, columns], neu=False)
factor_neu_excess_returns = get_group_ret(facotr_neu.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], 10)
ax = group_mean_report_plot(factor_neu_excess_returns, -1)

合成因子¶

factor_compose = FactorWeight.equal_weight({'quality':quality_factor_compose, 'value':value_factor_compose})
#日期转换
factor = factor_compose
date_str_list = map(lambda x:x.strftime('%Y-%m-%d'),factor.index)
factor.index = map(lambda x:datetime.strptime(x, '%Y-%m-%d'),date_str_list)
columns = filter(lambda x:x not in finance,factor.columns)
factor_report = get_easy_factor_report(factor.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], -1)
factor_report
原始因子 行业和市值中性化后因子
IC IC_IR pvalue IC IC_IR pvalue 多空组合月度收益 胜率 最大回撤 夏普比率
沪深300 0.083652 0.636234 1.002109e-11 0.070079 0.665316 1.355704e-12 0.020218 0.690476 0.169976 0.509479
中证500 0.081217 0.817618 1.712030e-17 0.075381 0.832855 5.193972e-18 0.017878 0.698413 0.135111 0.506639
全A 0.074043 1.058150 4.268384e-26 0.070227 1.002462 4.901411e-24 0.018616 0.777778 0.066582 0.680651
facotr_neu = pretreat_factor(factor.ix[startDate:endDate, columns], neu=False)
factor_neu_excess_returns = get_group_ret(facotr_neu.ix[startDate:endDate, columns], month_return.ix[startDate:endDate, :], 10)
ax = group_mean_report_plot(factor_neu_excess_returns, -1)

大类因子相关性¶

corlation = 0
for date in earning_factor_compose.index[12:]:
    tmp = pd.concat([earning_factor_compose.loc[date, :].to_frame('盈利'), 
                     growth_factor_compose.loc[date, :].to_frame('成长'),
                     #quality_factor_compose.loc[date, :].to_frame('质量'),
                     value_factor_compose.loc[date, :].to_frame('估值')], axis=1)
    corlation = tmp.corr() + corlation
corlation / len(earning_factor_compose.index[12:])
盈利 成长 估值
盈利 1.000000 0.369274 0.069565
成长 0.369274 1.000000 -0.019578
估值 0.069565 -0.019578 1.000000

4.1 行业分析¶

因子在不同行业的平均水平,行业间差别还是较大的。

# 行业名称
industry_name_sw1 = {'801740':'国防军工','801020':'采掘','801110':'家用电器','801160':'公用事业','801770':'通信','801010':'农林牧渔','801120':'食品饮料','801750':'计算机','801050':'有色金属','801890':'机械设备','801170':'交通运输','801710':'建筑材料','801040':'钢铁','801130':'纺织服装','801880':'汽车','801180':'房地产','801230':'综合','801760':'传媒','801200':'商业贸易','801780':'银行','801140':'轻工制造','801720':'建筑装饰','801080':'电子','801790':'非银金融','801030':'化工','801730':'电气设备','801210':'休闲服务','801150':'医药生物'}
indu_last = (indu.iloc[-1]).to_frame('indu')
fac_last = factor.iloc[-1, :].to_frame('all')
fac_indu_mean = pd.merge(fac_last, indu_last, how='inner', right_index=True, left_index=True).groupby('indu').mean()
indu_dist = pd.concat([fac_indu_mean], axis=1)
fig = plt.figure(figsize=(16, 8))
for i in range(indu_dist.shape[1]):
    k = 100 + indu_dist.shape[1] * 10 + i + 1
    ax = indu_dist.iloc[:, i].plot(kind='barh', ax=fig.add_subplot(k), color='r')
    ax.set_xlabel(indu_dist.columns[i])
    ax.set_xticklabels(ax.get_xticks(), rotation=45)
    if i == 0:
        s = ax.set_yticklabels([industry_name_sw1[i].decode('utf-8') for i in indu_dist.index]) #.decode('utf-8')
        s = ax.set_ylabel(u'行业', fontsize=14)
    else:
        ax.set_yticklabels([])
        ax.set_ylabel('')
# 各个因子在申万一级行业内的平均值

在测试的过程中,为了保证因子的统一性,将金融类(这里的金融类指的是申万一级行业分类下的银行和非银金融)的股票剔除。

 

全部回复

0/140

量化课程

    移动端课程