请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  量化平台 帖子:3366811 新帖:18

因子研究-2018年因子回顾

外汇老法师发表于:5 月 10 日 05:37回复(1)

多因子系列之2018年度因子回顾


多因子模型是量化交易中主流策略方法,模型构建流程中很重要的一部分就是因子的挖掘和单因子的测试,这次研究内容,我们选出了二十三个基于财务数据和量价信息的因子,都是在估值、成长、盈利、反转等风格大类下的细分因子,建立因子测试体系,基于回归方法进行因子与股票收益相关关系的统计,针对较为显著的因子,利用分层回溯方法,对股票进行分组测试收益,用于观察因子具体表现。

因子研究系列¶

研究目的¶

多因子模型构建流程中很重要的一部分就是因子的挖掘和单因子的测试,我们选出了二十多个因子均是基于财务数据和量价信息的常见因子,并且获取了价格数据,建立因子测试体系,基于回归法进行因子与股票收益相关性的统计,利用分层回溯对较为显著的因子对股票分组测试收益,用于观察这些常见因子在2018年的具体表现。

研究内容¶

接下来我们将按照上述方法进行因子效果检查,整个研究内容分布如下

  • 第一部分:数据准备阶段
    • 工具方法准备
    • 构造获取因子数据方法
    • 获取因子数据
  • 第二部分:因子计算处理阶段
    • 因子数据处理(标准化、中性化截面回归等)
    • 加入收益序列
    • 统计因子效果
  • 第三部分:效果汇总展示阶段
    • 进行因子效果展示
    • 分组回溯
    • 结论综述

结论综述¶

通过对上述二十多个因子检查回测之后,我们发现如下结论

  • 我们发现排在第一的市值因子,IC高达0.05,因子收益序列也显著大于0,因子收益均值0.45%,并不能算高,目前看到这个结果的直观感受就是买大盘股比小盘股会赔的慢些。
  • 看了一下排在最后的换手率因子,尤其是统计期为1个月的换手率,IC值绝对值能够达到0.1!,可以说是很强势的负面因子了,因子收率序列也是很显著小于0,均值达到了-0.7%,该因子的显著性是这些因子中最强的,也就是说,那些换手率越高的股票,下期收益率越低。这给我们带来的实际意义就是,持仓中需要尽可能的避免持有换手率较高的股票。
  • IC值较高的因子,如市值、ROE同比增长率、总资产周转率,因子收益也较为显著
  • 因子收益整体偏低,毕竟这种行情下已经不是追求挣多,而是少赔

因子测试模型说明¶

在对因子有效性检查时,这里根据研报采取截面回归测试的方法,回归时因子暴露为已知变量,回归得到每期的一个因子收益值𝑓𝑗

进行截面回归判断每个单因子的收益情况和显著性时, 对A股市场中一些显著影响个股收益率的因素进行了考虑,如行业因素和市值因素。市值因子 在过去的很长一段时间内都是A股市场上影响股票收益显著性极高的一个因 子,为了能够在单因子测试时得到因子真正收益情况,在回归测试时对 行业因子、市值因子也做了处理。

加入行业因子和市值因子后,单因子测试的回归方程如下所示: image.png 其中:

𝛽𝑡𝑖代表股票 i 在所测试因子上的因子暴露。

𝐼𝑡𝑖代表股票 i 的行业因子暴露(𝐼𝑡𝑖为哑变量(Dummy variable),即股票属于某个行业则该股票在该行业的因子暴露等于 1,在其他行业的因子暴露等于 0)。此处我们将选用中信一级行业分类作为行业分类标准。

𝑚𝑡𝑖代表股票 i 的市值因子暴露。 Robust Regression 稳健回归常见于单因子回归测试,RLM 通过迭代的赋权回归可以有效的减小 OLS 最小二乘法中异常值(outliers)对参数估计结果有效性和稳定性的影响。详细的 RLM 回归方法的介绍可以参考《多因子系列报告之一:因子测试框架》。

第一部分 获取因子数据

#工具函数
import time
from datetime import datetime, timedelta
import jqdata
import numpy as np
import pandas as pd
import math
from statsmodels import regression
import statsmodels.api as sm
import matplotlib.pyplot as plt
from jqfactor import get_factor_values
import datetime
from jqlib.technical_analysis import *
from scipy import stats

#设置画图样式
plt.style.use('ggplot')

#输入起止日期,返回所有自然日日期
def get_date_list(begin_date, end_date):
    dates = []
    dt = datetime.strptime(begin_date,"%Y-%m-%d")
    date = begin_date[:]
    while date <= end_date:
        dates.append(date)
        dt += timedelta(days=1)
        date = dt.strftime("%Y-%m-%d")
    return dates

#去极值函数
#mad中位数去极值法
def filter_extreme_MAD(series,n): #MAD: 中位数去极值 
    median = series.quantile(0.5)
    new_median = ((series - median).abs()).quantile(0.50)
    max_range = median + n*new_median
    min_range = median - n*new_median
    return np.clip(series,min_range,max_range)

#进行标准化处理
def winsorize(factor, std=3, have_negative = True):
    '''
    去极值函数 
    factor:以股票code为index,因子值为value的Series
    std为几倍的标准差,have_negative 为布尔值,是否包括负值
    输出Series
    '''
    r=factor.dropna().copy()
    if have_negative == False:
        r = r[r>=0]
    else:
        pass
    #取极值
    edge_up = r.mean()+std*r.std()
    edge_low = r.mean()-std*r.std()
    r[r>edge_up] = edge_up
    r[r<edge_low] = edge_low
    return r

#标准化函数:
def standardize(s,ty=2):
    '''
    s为Series数据
    ty为标准化类型:1 MinMax,2 Standard,3 maxabs 
    '''
    data=s.dropna().copy()
    if int(ty)==1:
        re = (data - data.min())/(data.max() - data.min())
    elif ty==2:
        re = (data - data.mean())/data.std()
    elif ty==3:
        re = data/10**np.ceil(np.log10(data.abs().max()))
    return re
    

#中性化函数
#传入:mkt_cap:以股票为index,市值为value的Series,
#factor:以股票code为index,因子值为value的Series,
#输出:中性化后的因子值series
def neutralization(factor,mkt_cap = False, industry = True):
    y = factor
    if type(mkt_cap) == pd.Series:
        LnMktCap = mkt_cap.apply(lambda x:math.log(x))
        if industry: #行业、市值
            dummy_industry = get_industry_exposure(factor.index)
            x = pd.concat([LnMktCap,dummy_industry.T],axis = 1)
        else: #仅市值
            x = LnMktCap
    elif industry: #仅行业
        dummy_industry = get_industry_exposure(factor.index)
        x = dummy_industry.T
    result = sm.OLS(y.astype(float),x.astype(float)).fit()
    return result.resid

#为股票池添加行业标记,return df格式 ,为中性化函数的子函数   
def get_industry_exposure(stock_list):
    df = pd.DataFrame(index=jqdata.get_industries(name='sw_l1').index, columns=stock_list)
    for stock in stock_list:
        try:
            df[stock][get_industry_code_from_security(stock)] = 1
        except:
            continue
    return df.fillna(0)#将NaN赋为0


#查询个股所在行业函数代码(申万一级) ,为中性化函数的子函数    
def get_industry_code_from_security(security,date=None):
    industry_index=jqdata.get_industries(name='sw_l1').index
    for i in range(0,len(industry_index)):
        try:
            index = get_industry_stocks(industry_index[i],date=date).index(security)
            return industry_index[i]
        except:
            continue
    return u'未找到'    

def get_win_stand_neutra(stocks):
    h=get_fundamentals(query(valuation.pb_ratio,valuation.code,valuation.market_cap)\
        .filter(valuation.code.in_(stocks)))
    stocks_pb_se=pd.Series(list(h.pb_ratio),index=list(h.code))
    stocks_pb_win_standse=standardize(winsorize(stocks_pb_se))
    stocks_mktcap_se=pd.Series(list(h.market_cap),index=list(h.code))
    stocks_neutra_se=neutralization(stocks_pb_win_standse,stocks_mktcap_se)
    return stocks_neutra_se 

#获取日期列表
def get_tradeday_list(start,end,frequency=None,count=None):
    if count != None:
        df = get_price('000001.XSHG',end_date=end,count=count)
    else:
        df = get_price('000001.XSHG',start_date=start,end_date=end)
    if frequency == None or frequency =='day':
        return df.index
    else:
        df['year-month'] = [str(i)[0:7] for i in df.index]
        if frequency == 'month':
            return df.drop_duplicates('year-month').index
        elif frequency == 'quarter':
            df['month'] = [str(i)[5:7] for i in df.index]
            df = df[(df['month']=='01') | (df['month']=='04') | (df['month']=='07') | (df['month']=='10') ]
            return df.drop_duplicates('year-month').index
        elif frequency =='halfyear':
            df['month'] = [str(i)[5:7] for i in df.index]
            df = df[(df['month']=='01') | (df['month']=='06')]
            return df.drop_duplicates('year-month').index 
        
def ret_se(start_date='2018-6-1',end_date='2018-7-1',stock_pool=None,weight=0):
    pool = stock_pool
    if len(pool) != 0:
        #得到股票的历史价格数据
        df = get_price(list(pool),start_date=start_date,end_date=end_date,fields=['close']).close
        df = df.dropna(axis=1)
        #获取列表中的股票流通市值对数值
        df_mkt = get_fundamentals(query(valuation.code,valuation.circulating_market_cap).filter(valuation.code.in_(df.columns)))
        df_mkt.index = df_mkt['code'].values
        fact_se =pd.Series(df_mkt['circulating_market_cap'].values,index = df_mkt['code'].values)
        fact_se = np.log(fact_se)
    else:
        df = get_price('000001.XSHG',start_date=start_date,end_date=end_date,fields=['close'])
        df['v'] = [1]*len(df)
        del df['close']
    #相当于昨天的百分比变化
    pct = df.pct_change()+1
    pct.iloc[0,:] = 1
    if weight == 0:
        #等权重平均收益结果
        se = pct.cumsum(axis=1).iloc[:,-1]/pct.shape[1]
        return se
    else:
        #按权重的方式计算
        se = (pct*fact_se).cumsum(axis=1).iloc[:,-1]/sum(fact_se)
        return se
    
#获取所有分组pct
def get_all_pct(pool_dict,trade_list,groups=5):
    num = 1
    for s,e in zip(trade_list[:-1],trade_list[1:]):
        stock_list = pool_dict[s]
        stock_num = len(stock_list)//groups
        if num == 0:
            pct_se_list = []
            for i in range(groups):
                pct_se_list.append(ret_se(start_date=s,end_date=e,stock_pool=stock_list[i*stock_num:(i+1)*stock_num]))
            pct_df1 = pd.concat(pct_se_list,axis=1)
            pct_df = pd.concat([pct_df,pct_df1],axis=0)
        else:
            pct_se_list = []
            for i in range(groups):
                pct_se_list.append(ret_se(start_date=s,end_date=e,stock_pool=stock_list[i*stock_num:(i+1)*stock_num]))
            pct_df = pd.concat(pct_se_list,axis=1)    
            num = 0
    return pct_df

def tradedays_before(date,count):#获取指定交易日往前推count天交易日
    date = get_price('000001.XSHG',end_date=date,count=count+1).index[0]
    return date
/opt/conda/envs/python3new/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools
#获取时间为date的全部因子数据
def get_factor_data(stock,date):
    data=pd.DataFrame(index=stock)
    q = query(valuation,balance,cash_flow,income,indicator).filter(valuation.code.in_(stock))
    df = get_fundamentals(q, date)
    df['market_cap']=df['market_cap']*100000000
    factor_data=get_factor_values(stock,['roe_ttm','roa_ttm','total_asset_turnover_rate',\
                               'net_operate_cash_flow_ttm','net_profit_ttm','net_profit_ratio',\
                              'cash_to_current_liability','current_ratio',\
                             'gross_income_ratio','non_recurring_gain_loss',\
                            'operating_revenue_ttm','net_profit_growth_rate'],end_date=date,count=1)
    factor=pd.DataFrame(index=stock)
    for i in factor_data.keys():
        factor[i]=factor_data[i].iloc[0,:]
    df.index = df['code']
    data['code'] = df['code']
    del df['code'],df['id']
    #合并得大表
    df=pd.concat([df,factor],axis=1)
    #总市值取对数
    data['size_lg']=np.log(df['market_cap'])
    #净利润(TTM)/总市值
    data['EP']=df['net_profit_ttm']/df['market_cap']
    #净资产/总市值
    data['BP']=1/df['pb_ratio']
    #营业收入(TTM)/总市值
    data['SP']=1/df['ps_ratio']
    #净现金流(TTM)/总市值
    data['NCFP']=1/df['pcf_ratio']
    #经营性现金流(TTM)/总市值
    data['OCFP']=df['net_operate_cash_flow_ttm']/df['market_cap']
    #净利润同比增长率
    data['net_g'] = df['net_profit_growth_rate']
    #净利润(TTM)同比增长率/PE_TTM
    data['G/PE']=df['net_profit_growth_rate']/df['pe_ratio']
    #ROE_ttm
    data['roe_ttm']=df['roe_ttm']
    #ROE_YTD
    data['roe_q']=df['roe']
    #ROA_ttm
    data['roa_ttm']=df['roa_ttm']
    #ROA_YTD
    data['roa_q']=df['roa']
    #净利率
    data['netprofitratio_ttm'] = df['net_profit_ratio']
    #毛利率TTM
    data['grossprofitmargin_ttm']=df['gross_income_ratio']
    #毛利率YTD
    data['grossprofitmargin_q']=df['gross_profit_margin']

    #扣除非经常性损益后净利润率YTD
    data['profitmargin_q']=df['adjusted_profit']/df['operating_revenue']
    #资产周转率TTM
    data['assetturnover_ttm']=df['total_asset_turnover_rate']
    #总资产周转率YTD 营业收入/总资产
    data['assetturnover_q']=df['operating_revenue']/df['total_assets']
    #经营性现金流/净利润TTM
    data['operationcashflowratio_ttm']=df['net_operate_cash_flow_ttm']/df['net_profit_ttm']
    #经营性现金流/净利润YTD
    data['operationcashflowratio_q']=df['net_operate_cash_flow']/df['net_profit']
    #净资产
    df['net_assets']=df['total_assets']-df['total_liability']
    #总资产/净资产
    data['financial_leverage']=df['total_assets']/df['net_assets']
    #非流动负债/净资产
    data['debtequityratio']=df['total_non_current_liability']/df['net_assets']
    #现金比率=(货币资金+有价证券)÷流动负债
    data['cashratio']=df['cash_to_current_liability']
    #流动比率=流动资产/流动负债*100%
    data['currentratio']=df['current_ratio']
    #总市值取对数
    data['ln_capital']=np.log(df['market_cap'])
    #TTM所需时间
    his_date = [pd.to_datetime(date) - datetime.timedelta(90*i) for i in range(0, 4)]
    tmp = pd.DataFrame()
    tmp['code']=list(stock)
    for i in his_date:
        tmp_adjusted_dividend = get_fundamentals(query(indicator.code, indicator.adjusted_profit, \
                                                     cash_flow.dividend_interest_payment).
                                               filter(indicator.code.in_(stock)), date = i)
        tmp=pd.merge(tmp,tmp_adjusted_dividend,how='outer',on='code')

        tmp=tmp.rename(columns={'adjusted_profit':'adjusted_profit'+str(i.month), \
                                'dividend_interest_payment':'dividend_interest_payment'+str(i.month)})
    tmp=tmp.set_index('code')
    tmp_columns=tmp.columns.values.tolist()
    tmp_adjusted=sum(tmp[[i for i in tmp_columns if 'adjusted_profit'in i ]],1)
    tmp_dividend=sum(tmp[[i for i in tmp_columns if 'dividend_interest_payment'in i ]],1)
    #扣除非经常性损益后净利润(TTM)/总市值
    data['EPcut']=tmp_adjusted/df['market_cap']
    #近12个月现金红利(按除息日计)/总市值
    data['DP']=tmp_dividend/df['market_cap']
    #扣除非经常性损益后净利润率TTM
    data['profitmargin_ttm']=tmp_adjusted/df['operating_revenue_ttm']
    #营业收入(YTD)同比增长率
    #_x现在 _y前一年
    his_date = pd.to_datetime(date) - datetime.timedelta(365)
    name=['operating_revenue','net_profit','net_operate_cash_flow','roe']
    temp_data=df[name]
    his_temp_data = get_fundamentals(query(valuation.code, income.operating_revenue,income.net_profit,\
                                            cash_flow.net_operate_cash_flow,indicator.roe).
                                      filter(valuation.code.in_(stock)), date = his_date)
    his_temp_data=his_temp_data.set_index('code')
    #重命名 his_temp_data last_year
    for i in name:
        his_temp_data=his_temp_data.rename(columns={i:i+'last_year'})

    temp_data =pd.concat([temp_data,his_temp_data],axis=1)
    #营业收入(YTD)同比增长率
    data['sales_g_q']=temp_data['operating_revenue']/temp_data['operating_revenuelast_year']-1
    #净利润(YTD)同比增长率
    data['profit_g_q']=temp_data['net_profit']/temp_data['net_profitlast_year']-1
    #经营性现金流(YTD)同比增长率
    data['ocf_g_q']=temp_data['net_operate_cash_flow']/temp_data['net_operate_cash_flowlast_year']-1
    #ROE(YTD)同比增长率
    data['roe_g_q']=temp_data['roe']/temp_data['roelast_year']-1
    
    
    #计算beta部分
    #辅助线性回归的函数
    def linreg(X,Y,columns=3):
        X=sm.add_constant(array(X))
        Y=array(Y)
        if len(Y)>1:
            results = regression.linear_model.OLS(Y, X).fit()
            return results.params
        else:
            return [float("nan")]*(columns+1)
    #个股60个月收益与上证综指回归的截距项与BETA
    stock_close=get_price(list(stock), count = 12*20+1, end_date=date, frequency='daily', fields=['close'])['close']
    SZ_close=get_price('000001.XSHG', count = 12*20+1, end_date=date, frequency='daily', fields=['close'])['close']
    stock_pchg=stock_close.pct_change().iloc[1:]
    SZ_pchg=SZ_close.pct_change().iloc[1:]
    beta=[]
    stockalpha=[]
    for i in stock:
        temp_beta, temp_stockalpha = stats.linregress(SZ_pchg, stock_pchg[i])[:2]
        beta.append(temp_beta)
        stockalpha.append(temp_stockalpha)
    #此处alpha beta为list
    #data['alpha']=stockalpha
    data['beta']=beta
    
    #反转
    data['reverse_1m']=stock_close.iloc[-21]/stock_close.iloc[-1]-1
    data['reverse_3m']=stock_close.iloc[-63]/stock_close.iloc[-1]-1
    
    #波动率(一个月、三个月标准差)
    data['std_1m']=stock_close[-20:].std()
    data['std_3m']=stock_close[-60:].std()
    
    #换手率
    #tradedays_1m = get_tradeday_list(start=date,end=date,frequency='day',count=21)#最近一个月交易日
    tradedays_3m = get_tradeday_list(start=date,end=date,frequency='day',count=63)#最近一个月交易日
    data_turnover_ratio=pd.DataFrame()
    data_turnover_ratio['code']=list(stock)
    for i in tradedays_3m:
        q = query(valuation.code,valuation.turnover_ratio).filter(valuation.code.in_(stock))
        temp = get_fundamentals(q, i)
        data_turnover_ratio=pd.merge(data_turnover_ratio, temp,how='left',on='code')
        data_turnover_ratio=data_turnover_ratio.rename(columns={'turnover_ratio':i})
    data['turn_3m']= (data_turnover_ratio.set_index('code').T).mean()
    data['turn_1m']= (data_turnover_ratio.set_index('code').T)[-21:].mean()    
    
    #技术指标部分
    date_1 = tradedays_before(date,1)
    data['PSY']=pd.Series(PSY(stock, date_1, timeperiod=20))
    data['RSI']=pd.Series(RSI(stock, date_1, N1=20))
    data['BIAS']=pd.Series(BIAS(stock,date_1, N1=20)[0])
    dif,dea,macd=MACD(stock, date_1, SHORT = 10, LONG = 30, MID = 15)
    #data['DIF']=pd.Series(dif)
    #data['DEA']=pd.Series(dea)
    data['MACD']=pd.Series(macd)
    
    return data

?这里输入想要检查的因子名称

factor_test_list = ['turn_1m','turn_3m','reverse_3m','size_lg','roa_q','roe_g_q','roe_q','assetturnover_q','RSI',\
                   'BIAS','currentratio','reverse_1m']
#设置因子检查日期
start_date = '2018-1-1'
end_date = '2019-1-1'
#设置股票池
index = '000906.XSHG'
trade_list = get_tradeday_list(start=start_date,end=end_date,frequency='month')

#因子列表
factor_list = ['code','EP','BP','SP','net_g','roe_g_q','sales_g_q','roe_q','roa_q','grossprofitmargin_ttm','netprofitratio_ttm',\
              'assetturnover_q','currentratio','size_lg','reverse_1m','reverse_3m','std_1m','std_3m','turn_1m','turn_3m',\
              'PSY','RSI','BIAS','MACD']
factor_name = ['股票代码','净利润/总市值','净资产/总市值','营业收入/总市值','净利润同比增长率','ROE同比增长率','营业收入同比增长率',\
               'ROE','ROA','毛利率ttm','净利率ttm','总资产周转率q','流动比率',\
               '市值','反转因子(1个月)','反转因子(3个月)','波动率(1个月)','波动率(3个月)','换手率(1个月)',\
               '换手率(3个月)','PSY','RSI','BIAS','MACD']

df_dict = {}
pool = {}

#获取多期所有涉及到的股票
for d in trade_list[:]:
    pool = set(pool) | set(get_index_stocks(index,date=d))
pool = list(pool)
print(len(pool))
#进行多期因子数据获取
for date in trade_list[:]:
    temp_df = get_factor_data(pool,date)
    temp_df = temp_df[factor_list]
    df_dict[date] = temp_df#pd.DataFrame(temp_df,index=df.index,columns=df.columns)
pl = pd.Panel(df_dict)
pl = pl.transpose(2,0,1)
pl
852
<class 'pandas.core.panel.Panel'>
Dimensions: 24 (items) x 12 (major_axis) x 852 (minor_axis)
Items axis: code to MACD
Major_axis axis: 2018-01-02 00:00:00 to 2018-12-03 00:00:00
Minor_axis axis: 600050.XSHG to 000860.XSHE

这里我们加入一个随机的序列,作为各因子效果的对比参考标的

random_matrix = np.matrix([[random.random() for i in range(len(pl.major_axis))] for j in range(len(pl.minor_axis))])
pl['random'] = pd.DataFrame(random_matrix.T,index=pl.major_axis,columns=pl.minor_axis)
pl
<class 'pandas.core.panel.Panel'>
Dimensions: 25 (items) x 12 (major_axis) x 852 (minor_axis)
Items axis: code to random
Major_axis axis: 2018-01-02 00:00:00 to 2018-12-03 00:00:00
Minor_axis axis: 600050.XSHG to 000860.XSHE
#保留原始数据,方便后面修改使用
pl_pro = pl.copy()
/opt/conda/envs/python3new/lib/python3.6/site-packages/ipykernel_launcher.py:2: DeprecationWarning: 
Panel is deprecated and will be removed in a future version.
The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method
Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/.
Pandas provides a `.to_xarray()` method to help automate this conversion.

  
#去空值
print('去除空值前个数:%s'%len(pl.minor_axis))
pl_pro = (pl_pro.loc[:,:,:]).dropna(axis=2)
pl_pro
去除空值前个数:852
<class 'pandas.core.panel.Panel'>
Dimensions: 25 (items) x 12 (major_axis) x 745 (minor_axis)
Items axis: code to random
Major_axis axis: 2018-01-02 00:00:00 to 2018-12-03 00:00:00
Minor_axis axis: 600050.XSHG to 000860.XSHE

第二部分 因子数据处理

因子数据处理¶

  • 因子截面回归
    • 因子标准化处理
    • 行业、市值中性化
    • 收益数据构造
    • 进行截面回归(RLM)
  • 因子IC序列获取
  • 分层回测

股票各因子值会因为存在异常值或因为量纲不同,不能直接进行后续的计算,所以数据清洗是十分必要的过程,避免可能的数据错误和极端数据对测试结果产生影响,使用标准化后的数据保证最终得到的模型的稳健性。

异常值处理:数据清洗的内容主要包括两部分,即异常值和缺失值的处理。由于常见的 3σ去极值法是基于样本服从正态分布这个假设的,但往往我们发现大部分因子值的分布都并不服从正态分布,厚尾分布的情况较为普遍。因此我们采用更加稳健的 MAD(Median AbsoluteDeviation 绝对中位数法)首先计算因子值的中位数𝑀𝑒𝑑𝑖𝑎𝑛𝑓,并定义绝对中位值为: $$𝑀𝐴𝐷 = 𝑚𝑒𝑑𝑖𝑎𝑛(|𝑓_𝑖 − 𝑀𝑒𝑑𝑖𝑎𝑛_𝑓|)$$

因子标准化:这里选择常用的Z值标准化来处理因子数据。

#进行因子截面去极值、标准化处理
start = time.time()
print('计算%s截面去极值、标准化因子中......'%len(pl_pro.minor_axis))

for factor in factor_list[1:]:
    factor_df = pl_pro.loc[factor,:,:].copy() #获取因子df
    for i in factor_df.index:
        factor_df.loc[i,:] = filter_extreme_MAD(factor_df.loc[i,:],3)#去极值 
        factor_df.loc[i,:] = standardize(factor_df.loc[i,:],ty=2)    #标准化
    pl_pro[factor] = pd.DataFrame(factor_df,index=pl_pro.major_axis, columns=pl_pro.minor_axis)
    print('%s因子处理完毕'%factor)
end = time.time()
print('因子标准化处理完毕,统计股票个数:%s,耗时:%s 秒'%(len(pl_pro.minor_axis),end-start))
pl_pro
计算745截面去极值、标准化因子中......
EP因子处理完毕
BP因子处理完毕
SP因子处理完毕
net_g因子处理完毕
roe_g_q因子处理完毕
sales_g_q因子处理完毕
roe_q因子处理完毕
roa_q因子处理完毕
grossprofitmargin_ttm因子处理完毕
netprofitratio_ttm因子处理完毕
assetturnover_q因子处理完毕
currentratio因子处理完毕
size_lg因子处理完毕
reverse_1m因子处理完毕
reverse_3m因子处理完毕
std_1m因子处理完毕
std_3m因子处理完毕
turn_1m因子处理完毕
turn_3m因子处理完毕
PSY因子处理完毕
RSI因子处理完毕
BIAS因子处理完毕
MACD因子处理完毕
因子标准化处理完毕,统计股票个数:745,耗时:2.0331106185913086 秒
<class 'pandas.core.panel.Panel'>
Dimensions: 25 (items) x 12 (major_axis) x 745 (minor_axis)
Items axis: code to random
Major_axis axis: 2018-01-02 00:00:00 to 2018-12-03 00:00:00
Minor_axis axis: 600050.XSHG to 000860.XSHE

这里我们将继续对近一年因子表现进行测试,考虑到市值因子已经在近一年没有显著的效果,这里只做行业的中性化处理,剔除行业风格的影响,对市值影响暂时不做整体处理,测试中是保留有规模因子进行效果验证。

#获得行业哑变量矩阵
start = time.time()
print('行业中性化处理中......')
from jqdata import *
sw=get_industries(name='sw_l1').index
industry_df=pd.DataFrame(0,columns=pl_pro.minor_axis,index=range(len(sw)))

for i in range(len(sw)):
    temp=list(set(pl_pro.minor_axis).intersection(set(get_industry_stocks(sw[i])))) #?
    industry_df.loc[i,temp]=1
for factor in factor_list[1:]:
    data = pl_pro.loc[factor,:,:].copy()
    data = data.applymap(lambda x:float(x))

    #去除市值、行业因素,得到新的因子值 
    data_df=pd.DataFrame()
    for i in range(len(data.index)):
        '''
        未做市值中性化
        m= get_fundamentals(query(valuation.circulating_market_cap,valuation.code).filter(valuation.code.in_(data.columns)), date=data.index[i])
        m.index=np.array(m['code'])
        m=m.iloc[:,0]
        m=(m-mean(m))/std(m)
        '''
        x=data.iloc[i,:]
        conc=pd.concat([x,industry_df.T],axis=1)#.fillna(mean(m)) 这里只做了行业中性化,全部为[x,m,industry_df.T]
        est=sm.OLS(conc.iloc[:,:1],conc.iloc[:,1:]).fit()
        y_fitted = est.fittedvalues
        data_df[i]=est.resid
    data_df=data_df.T
    data_df.index=data.index
    pl_pro[factor] = data_df
    print('%s行业中性化处理完毕'%factor)
end = time.time()
print('行业中性化处理完毕,耗时:%s 秒'%(end-start))
行业中性化处理中......
EP行业中性化处理完毕
BP行业中性化处理完毕
SP行业中性化处理完毕
net_g行业中性化处理完毕
roe_g_q行业中性化处理完毕
sales_g_q行业中性化处理完毕
roe_q行业中性化处理完毕
roa_q行业中性化处理完毕
grossprofitmargin_ttm行业中性化处理完毕
netprofitratio_ttm行业中性化处理完毕
assetturnover_q行业中性化处理完毕
currentratio行业中性化处理完毕
size_lg行业中性化处理完毕
reverse_1m行业中性化处理完毕
reverse_3m行业中性化处理完毕
std_1m行业中性化处理完毕
std_3m行业中性化处理完毕
turn_1m行业中性化处理完毕
turn_3m行业中性化处理完毕
PSY行业中性化处理完毕
RSI行业中性化处理完毕
BIAS行业中性化处理完毕
MACD行业中性化处理完毕
行业中性化处理完毕,耗时:262.64986753463745 秒
trade_list1 = get_tradeday_list(start='2017-12-1',end='2019-1-10',frequency='month')
trade_list1
DatetimeIndex(['2017-12-01', '2018-01-02', '2018-02-01', '2018-03-01',
               '2018-04-02', '2018-05-02', '2018-06-01', '2018-07-02',
               '2018-08-01', '2018-09-03', '2018-10-08', '2018-11-01',
               '2018-12-03', '2019-01-02'],
              dtype='datetime64[ns]', freq=None)
pubdate_df = pl_pro.loc['code',:,:].copy() 
pubdate_df.columns
Index(['600050.XSHG', '600900.XSHG', '600393.XSHG', '600690.XSHG',
       '000157.XSHE', '000938.XSHE', '002465.XSHE', '600808.XSHG',
       '002073.XSHE', '600770.XSHG',
       ...
       '300070.XSHE', '600066.XSHG', '600522.XSHG', '001979.XSHE',
       '002359.XSHE', '603169.XSHG', '600811.XSHG', '600482.XSHG',
       '002241.XSHE', '000860.XSHE'],
      dtype='object', length=745)
#用于获取股票的统计期涨跌幅
price_df = get_price(list(pl_pro.minor_axis),start_date=trade_list1[0],end_date=trade_list1[-1],fields=['close'])['close'].fillna(method='ffill')
price_df = price_df.loc[trade_list1,:].shift(-1) #获取指定日期列表前个各加一个月,并往前推一个周期
pct_df = price_df/price_df.shift(1)-1 #统计下期收益记录再当前日期上
pct_df
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
600050.XSHG 600900.XSHG 600393.XSHG 600690.XSHG 000157.XSHE 000938.XSHE 002465.XSHE 600808.XSHG 002073.XSHE 600770.XSHG ... 300070.XSHE 600066.XSHG 600522.XSHG 001979.XSHE 002359.XSHE 603169.XSHG 600811.XSHG 600482.XSHG 002241.XSHE 000860.XSHE
2017-12-01 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-02 0.040562 0.033267 -0.024233 0.128480 -0.043880 -0.175899 -0.072597 0.034653 -0.024876 -0.125819 ... -0.103009 -0.078331 -0.121056 0.277665 0.072931 -0.066106 -0.030303 -0.023810 -0.190920 0.033054
2018-02-01 -0.044978 0.019961 -0.031457 -0.085863 -0.016908 0.117453 0.108049 -0.050239 0.132653 -0.002999 ... 0.062581 0.045266 0.055092 -0.161303 0.090909 -0.014157 0.002232 0.046714 0.116547 -0.030980
2018-03-01 -0.109890 -0.031566 0.000000 -0.117800 -0.009828 0.204095 0.042786 -0.153652 -0.130631 0.102256 ... 0.096539 -0.069819 -0.047468 0.003316 0.023318 -0.043081 -0.006682 0.001580 -0.097938 0.132075
2018-04-02 0.000000 -0.000652 0.000000 0.011176 -0.004963 -0.098335 0.020038 0.077381 -0.056995 0.039563 ... 0.029347 -0.024703 -0.201827 0.002361 0.009339 -0.110505 -0.020179 -0.063091 -0.210714 0.240741
2018-05-02 -0.045855 0.051533 0.000000 0.124491 0.000000 -0.089230 -0.095416 -0.055249 -0.041209 -0.104987 ... -0.151694 0.045300 0.024974 0.001884 -0.275352 -0.065951 -0.020595 -0.106481 0.022624 0.245522
2018-06-01 -0.112754 -0.016749 -0.410256 -0.028971 -0.037406 -0.058058 -0.138573 -0.040936 -0.190544 -0.149560 ... -0.145847 -0.197111 -0.146193 -0.170193 0.000000 -0.198686 0.000000 -0.192652 -0.127434 0.107549
2018-07-02 0.075000 0.058675 -0.127536 -0.116143 0.018135 0.067301 0.039616 0.167683 0.033628 -0.001724 ... -0.015590 -0.054556 0.053508 0.025496 0.000000 0.059426 0.000000 0.040257 -0.044625 0.237490
2018-08-01 0.060078 -0.048868 -0.083056 -0.102471 -0.025445 -0.046497 -0.047344 0.010444 -0.046233 -0.008636 ... -0.171946 -0.071209 0.012415 -0.022099 -0.064862 -0.040619 0.000000 0.098710 -0.052017 -0.050929
2018-09-03 -0.025594 0.002506 -0.144928 0.054399 -0.052219 -0.124694 0.023030 0.031008 -0.035907 -0.104530 ... -0.130237 -0.054858 -0.074693 -0.002825 -0.498635 0.004032 -0.093458 0.175600 -0.109742 -0.027176
2018-10-08 0.016886 -0.031250 -0.118644 -0.179618 -0.066116 -0.118291 -0.114929 0.027569 -0.094972 -0.107004 ... -0.113089 -0.239860 -0.115663 0.073654 0.002179 -0.116466 0.010309 -0.059922 -0.076730 -0.183712
2018-11-01 0.003690 -0.051613 0.125000 0.150621 0.085546 0.055395 0.052209 -0.095122 0.051440 0.215686 ... 0.073200 0.077277 0.102180 0.000000 -0.001087 0.025000 -0.010204 0.069284 0.027248 0.056845
2018-12-03 -0.049632 0.080272 -0.094017 -0.065452 -0.032609 -0.145435 -0.007634 -0.067385 -0.113503 -0.155914 ... -0.141914 0.011956 0.007417 -0.084433 -0.106638 -0.053215 -0.056701 -0.038013 -0.087533 -0.124863
2019-01-02 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

14 rows × 745 columns

#加入时间序列价格
start = time.time()
print('计算%s只股票各周期收益......'%len(pl_pro.minor_axis))
#加入价格序列
#price_df = get_price(list(pl_pro.minor_axis),start_date=trade_list1[0],end_date=trade_list1[-1],fields=['close'])['close']
pubdate_df = pl_pro.loc['code',:,:].copy() #获取因子df
for colm in pubdate_df.columns:
    pubdate_df[colm] = pct_df.loc[trade_list,colm].values
pl_pro['pct'] = pd.DataFrame(pubdate_df,index=pl_pro.major_axis, columns=pl_pro.minor_axis)
#计算收益变化值
#pl_pro['pct'] = (pl_pro.loc['close',:,:]/pl_pro.loc['close',:,:].shift(1)).fillna(1)
end = time.time()
print('数据准备完毕,耗时:%s 秒'%str(end-start))
pl_pro
计算745只股票各周期收益......
数据准备完毕,耗时:0.916231632232666 秒
<class 'pandas.core.panel.Panel'>
Dimensions: 26 (items) x 12 (major_axis) x 745 (minor_axis)
Items axis: code to pct
Major_axis axis: 2018-01-02 00:00:00 to 2018-12-03 00:00:00
Minor_axis axis: 600050.XSHG to 000860.XSHE
pl_pro.major_axis
DatetimeIndex(['2018-01-02', '2018-02-01', '2018-03-01', '2018-04-02',
               '2018-05-02', '2018-06-01', '2018-07-02', '2018-08-01',
               '2018-09-03', '2018-10-08', '2018-11-01', '2018-12-03'],
              dtype='datetime64[ns]', freq=None)

进行截面RLM回归

这里参考光大证券多因子系列研报,采用了RLM稳健回归因子测试,与最小二乘法OLS相比,RLM回归采用迭代加权最小二乘估计回归系数,根据回归残差的大小确定各点的权重$w_i$,使得参数结果较为稳健。

过程中计算并保存因子收益序列f及IC值

pl_pro.major_axis[-1]
end_label = pl_pro.major_axis[-1]
start_label = pl_pro.major_axis[0]
#将因子值和y值匹配
start = time.time()
print('计算RLM回归中......')
end_label = pl_pro.major_axis[-1]
start_label = pl_pro.major_axis[0]
t_dict = {}
f_dict = {}
IC_spearman_dict = {}
IC_pearson_dict = {}
for factor in factor_list[1:]:
    x_df = pl_pro.loc[factor,:end_label,:]  #设置不同的输入因子值  factor_mad_std
    x_df = x_df.applymap(lambda x:float(x))
    y_df = pl_pro.loc['pct',start_label:,:]
    t_list = []
    f_list = []
    IC_spearman_list = []
    IC_pearson_list = []  
    
    for i in range(len(y_df.index)):
        rlm_model = sm.RLM(y_df.iloc[i,:], x_df.iloc[i,:], M=sm.robust.norms.HuberT()).fit()
        f_list.append(float(rlm_model.params))
        t_list.append(float(rlm_model.tvalues))
        
    for i in range(len(x_df.index)):
        pearson_corr = x_df.iloc[i,:].corr(y_df.iloc[i,:],method='pearson')
        spearman_corr = x_df.iloc[i,:].corr(y_df.iloc[i,:],method='spearman')
        IC_pearson_list.append(pearson_corr)
        IC_spearman_list.append(spearman_corr)
    
    t_dict[factor] = t_list
    f_dict[factor] = f_list
    IC_spearman_dict[factor] = IC_spearman_list
    IC_pearson_dict[factor] = IC_pearson_list
    
    print('%s因子计算完毕'%factor)
end = time.time()
print('回归计算完毕,耗时%s'%(end-start))
计算RLM回归中......
EP因子计算完毕
BP因子计算完毕
SP因子计算完毕
net_g因子计算完毕
roe_g_q因子计算完毕
sales_g_q因子计算完毕
roe_q因子计算完毕
roa_q因子计算完毕
grossprofitmargin_ttm因子计算完毕
netprofitratio_ttm因子计算完毕
assetturnover_q因子计算完毕
currentratio因子计算完毕
size_lg因子计算完毕
reverse_1m因子计算完毕
reverse_3m因子计算完毕
std_1m因子计算完毕
std_3m因子计算完毕
turn_1m因子计算完毕
turn_3m因子计算完毕
PSY因子计算完毕
RSI因子计算完毕
BIAS因子计算完毕
MACD因子计算完毕
回归计算完毕,耗时5684.793751239777

因子效果统计

对所有因子检测指标进行汇总统计 通过多期截面回归后,可以得到因子收益率序列$f_i$,以及每一期的假设检验t值序列,针对这两个序列,通过以下几个指标判断该因子的有效性及稳定性:

  1. 因子收益序列$f_i$的假设检验t值
  2. 因子收益序列$f_i$大于0的概率
  3. t值绝对值的均值
  4. t值绝对值大于等于2的概率

信息系数IC值,可以有效的观察到某个因子收益率预测的稳定性和动量特征,以便在组合优化时用作筛选的指标。常见的IC值计算方法有两种:相关系数(Pearson Correlation)和秩相关系数(Spearman Rank Correlation),此例中IC值统计用到的是秩相关系数,与IC相关的用来判断因子的有效性和预测能力指标如下:

  1. IC值的均值
  2. IC值的标准差
  3. IC值大于0的比例
  4. IC绝对值大于0.02的比例
  5. IR (IC均值与IC标准差的比值)
factor_name = ['股票代码','净利润/总市值','净资产/总市值','营业收入/总市值','净利润同比增长率','ROE同比增长率','营业收入同比增长率',\
               'ROE','ROA','毛利率ttm','净利率ttm','总资产周转率q','流动比率',\
               '市值','反转因子(1个月)','反转因子(3个月)','波动率(1个月)','波动率(3个月)','换手率(1个月)',\
               '换手率(3个月)','PSY','RSI','BIAS','MACD']
index_list = ['f均值','fi>0','abs(T)均值','abs(T)>2','IC均值','IR值','abs(IC)>0.02','IC>0']
summary_df = pd.DataFrame(index=index_list)

for i in range(1,len(factor_list)):
    factor = factor_list[i]
    IC_spearman_list = IC_spearman_dict[factor]
    IC_pearson_list = IC_pearson_dict[factor]
    f_list = f_dict[factor]
    t_list = t_dict[factor]
    f_mean = np.mean(f_list)
    f_ratio = sum(np.where(np.array(f_list)>0,1,0))*1.0/len(f_list)
    #print('因子收益序列fi大于0概率:%s'%round(f_ratio,4))
    t_abs_mean = np.mean([abs(i) for i in t_list])
    #print('t值绝对值的均值:%s'%round(t_abs_mean,4))
    t_abs_dayu2 = sum(np.where(((np.array(t_list)>2) | (np.array(t_list)<-2)),1,0))*1.0/len(t_list)
    #print('t值绝对值大于等于2的概率:%s'%round(t_abs_dayu2,4))

    ic_mean = np.mean(IC_spearman_list)
    ic_std = np.std(IC_spearman_list)
    ic_dayu0 = sum(np.where(np.array(IC_spearman_list)>0,1,0))*1.0/len(IC_spearman_list)
    ic_abs_dayu = sum(np.where(((np.array(IC_spearman_list)>0.02) | (np.array(IC_spearman_list)<-0.02)),1,0))*1.0/len(IC_spearman_list)
    ir = ic_mean/ic_std
    #print('IC均值:%s,标准差:%s,IR值:%s'%(round(ic_mean,4),round(ic_std,4),round(ir,4)))
    #print('IC值大于0概率:%s'%round(ic_dayu0,4))
    #print('IC值绝对值大于0.02的均值:%s'%round(ic_abs_dayu,4))

    index_values = [round(f_mean,4),round(f_ratio,4),round(t_abs_mean,4),round(t_abs_dayu2,4),round(ic_mean,4),round(ir,4),\
                    round(ic_abs_dayu,4),round(ic_dayu0,4)]
    
    summary_df[factor_name[i]] = index_values
summary_df
def showColor(val):
    color = 'red' if val > 2 else 'black'
    return 'color:%s'%color
#summary_df.style.applymap(showColor)#全表格
#summary_df.style.applymap(showColor,subset=pd.IndexSlice[3:4,:])#指定表格位置?
summary_df.T.sort_values('IC均值',ascending=0)
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
f均值 fi>0 abs(T)均值 abs(T)>2 IC均值 IR值 abs(IC)>0.02 IC>0
市值 0.0045 0.6667 3.5050 0.8333 0.0508 0.3121 1.0000 0.6667
ROA 0.0047 0.5833 2.4901 0.6667 0.0477 0.4439 1.0000 0.5833
ROE同比增长率 0.0035 0.7500 1.5877 0.4167 0.0456 0.5508 0.9167 0.7500
ROE 0.0043 0.5833 2.7568 0.6667 0.0456 0.3698 1.0000 0.5833
总资产周转率q 0.0027 0.7500 1.3834 0.1667 0.0316 0.3888 0.9167 0.7500
RSI 0.0035 0.6667 2.1623 0.5833 0.0285 0.2545 1.0000 0.6667
BIAS 0.0022 0.7500 1.4236 0.2500 0.0251 0.3017 0.8333 0.7500
流动比率 0.0019 0.6667 0.8127 0.0000 0.0220 0.7412 0.5000 0.8333
反转因子(1个月) 0.0017 0.5833 2.0656 0.2500 0.0217 0.1894 1.0000 0.5000
营业收入同比增长率 0.0015 0.5833 1.1681 0.1667 0.0203 0.3068 0.8333 0.5833
净利润/总市值 0.0021 0.5833 2.3378 0.5833 0.0195 0.1559 0.9167 0.5833
净利率ttm 0.0024 0.5833 1.6957 0.1667 0.0159 0.1998 1.0000 0.5833
净资产/总市值 -0.0001 0.4167 1.6847 0.3333 0.0115 0.1391 0.9167 0.5833
营业收入/总市值 0.0008 0.5833 1.4091 0.1667 0.0103 0.1526 1.0000 0.5833
毛利率ttm 0.0020 0.5833 1.4184 0.2500 0.0031 0.0413 0.8333 0.5833
波动率(1个月) 0.0026 0.6667 1.6098 0.3333 0.0030 0.0374 0.6667 0.5000
净利润同比增长率 0.0005 0.4167 1.5106 0.3333 -0.0013 -0.0163 0.9167 0.4167
PSY 0.0002 0.5000 0.9646 0.1667 -0.0033 -0.0558 0.8333 0.4167
MACD -0.0012 0.4167 1.4396 0.1667 -0.0108 -0.1911 0.8333 0.4167
波动率(3个月) 0.0001 0.5833 1.6781 0.3333 -0.0163 -0.1793 0.7500 0.4167
反转因子(3个月) -0.0045 0.4167 3.3537 0.7500 -0.0400 -0.2506 0.9167 0.3333
换手率(3个月) -0.0081 0.2500 3.2661 0.8333 -0.1046 -0.7815 0.9167 0.1667
换手率(1个月) -0.0079 0.2500 3.1853 0.9167 -0.1064 -0.7856 1.0000 0.1667
factor_list
['code',
 'EP',
 'BP',
 'SP',
 'net_g',
 'roe_g_q',
 'sales_g_q',
 'roe_q',
 'roa_q',
 'grossprofitmargin_ttm',
 'netprofitratio_ttm',
 'assetturnover_q',
 'currentratio',
 'size_lg',
 'reverse_1m',
 'reverse_3m',
 'std_1m',
 'std_3m',
 'turn_1m',
 'turn_3m',
 'PSY',
 'RSI',
 'BIAS',
 'MACD']
#获取各因子IC序列值
#获取各因子收益率序列
factor_ic = pd.DataFrame(index=pl_pro.major_axis[:])
factor_fi = pd.DataFrame(index=pl_pro.major_axis[:])
for i in range(1,len(factor_list)):
    factor = factor_list[i]
    factor_ic[factor_name[i]]=IC_spearman_dict[factor]
    factor_fi[factor_name[i]]=f_dict[factor]
#因子2018年月均收益
factor_fi.mean().sort_values().plot(kind='bar',color='blue',figsize=(12,8))#展示所有因子的IC均值情况
<matplotlib.axes._subplots.AxesSubplot at 0x7f168c4dceb8>
#因子2018年月均IC 
factor_ic.mean().sort_values().plot(kind='bar',figsize=(12,8))#展示所有因子的IC均值情况
<matplotlib.axes._subplots.AxesSubplot at 0x7f168c1087b8>

我们看到近一年IC值超过0.02的因子有8个

IC最大且为正的四个因子分别为估值因子BP、反转因子reverse_1m、规模因子size_lg,

IC负向最大的四个因子有动量因子、波动率因子

下面我们对更多的BP因子信息进一步的展示

factor = 'sizi_lg'
factor_fi['市值'].plot(kind='bar')#因子收益序列图
<matplotlib.axes._subplots.AxesSubplot at 0x7f161cb844a8>
factor_ic['市值'].plot(kind='bar')#IC序列图
<matplotlib.axes._subplots.AxesSubplot at 0x7f1624169b00>

按不同检测方法进行因子效果展示

分组回测

针对前面检测出效果较为显著的因子进行股票分组回测,验证收益情况

由于单因子回归法所得到的因子收益值序列并不能直观的反应因子在各期的历史收益情况以及单调性,为了同时能够展示所检验因子的单调性,将通过分层打分回溯的方法作为补充。在进行分层回溯法时,在各期期末将股票按照因子值得大小分成5等分进行回测

 
#设置用于分组检查的因子值
factor = 'roa_q'
pool_dict = {}
group = 5

x_df = pl_pro.loc[factor,:end_label,:]  #设置不同的输入因子值  factor_mad_std
x_df = x_df.applymap(lambda x:float(x))

for i in range(len(x_df.index)):
    temp_se = x_df.iloc[0,:].sort_values()#从小到大排序
    pool = temp_se[temp_se>0].index #去掉小于0的值
    #pool = temp_se.index #不做负值处理
    num = int(len(pool)/group)
    #print('第%s期每组%s只股票'%(i,num))
    pool_dict[x_df.index[i]] = pool
group_pct = get_all_pct(pool_dict,trade_list,groups=group)
group_pct.columns = ['group'+str(i) for i in range(len(group_pct.columns))]
group_pct.cumprod().plot(figsize=(12,8))
<matplotlib.axes._subplots.AxesSubplot at 0x7f1693fdc5c0>

下面对因子分层回测结果汇总

#factor_test_list = ['BP','reverse_1m','size_lg','SP','turn_1m','turn_3m','std_3m','net_g']\
factor_test_list = ['turn_1m','turn_3m','reverse_3m','size_lg','roa_q','roe_g_q','roe_q','assetturnover_q','RSI',\
                   'BIAS','currentratio','reverse_1m']
group_bt_df = pd.DataFrame()
for factor in factor_test_list:
    pl_pro = pl
    pool_dict = {}
    x_df = pl_pro.loc[factor,:end_label,:]  #设置不同的输入因子值  factor_mad_std
    x_df = x_df.applymap(lambda x:float(x))

    for i in range(len(x_df.index)):
        temp_se = x_df.iloc[0,:].sort_values()#从小到大排序
        pool = temp_se[temp_se>0].index #去掉小于0的值
        #pool = temp_se.index #不做负值处理
        num = int(len(pool)/group)
        #print('第%s期每组%s只股票'%(i,num))
        pool_dict[x_df.index[i]] = pool
    group_pct = get_all_pct(pool_dict,trade_list,groups=group)
    group_pct.columns = ['group'+str(i) for i in range(len(group_pct.columns))]
    
    group_bt_df[factor_name_df.loc[factor][0]] = group_pct.cumprod().iloc[-1,:]
group_bt_df.T
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
group0 group1 group2 group3 group4
换手率(1个月) 0.786156 0.758600 0.759814 0.745471 0.716257
换手率(3个月) 0.779276 0.774761 0.753092 0.741997 0.708396
反转因子(3个月) 0.773100 0.747032 0.730056 0.716518 0.693109
市值 0.718257 0.708390 0.748777 0.763048 0.802676
ROA 0.749958 0.749005 0.724369 0.728483 0.799551
ROE同比增长率 0.774324 0.789104 0.725273 0.763684 0.683714
ROE 0.715477 0.755160 0.764683 0.743334 0.767890
总资产周转率q 0.763339 0.706895 0.751873 0.776780 0.741547
RSI 0.721866 0.730678 0.752513 0.759153 0.773420
BIAS 0.751607 0.771928 0.754852 0.764260 0.723879
流动比率 0.737381 0.706054 0.740910 0.730728 0.798765
反转因子(1个月) 0.731763 0.792804 0.733336 0.744278 0.675730
factor_name_df.loc['EP'][0]
'净利润/总市值'
factor_name_df = pd.DataFrame(factor_name,index=factor_list)
factor_name_df
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
0
code 股票代码
EP 净利润/总市值
BP 净资产/总市值
SP 营业收入/总市值
net_g 净利润同比增长率
roe_g_q ROE同比增长率
sales_g_q 营业收入同比增长率
roe_q ROE
roa_q ROA
grossprofitmargin_ttm 毛利率ttm
netprofitratio_ttm 净利率ttm
assetturnover_q 总资产周转率q
currentratio 流动比率
size_lg 市值
reverse_1m 反转因子(1个月)
reverse_3m 反转因子(3个月)
std_1m 波动率(1个月)
std_3m 波动率(3个月)
turn_1m 换手率(1个月)
turn_3m 换手率(3个月)
PSY PSY
RSI RSI
BIAS BIAS
MACD MACD

全部回复

0/140

量化课程

    移动端课程