请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  数理科学 帖子:3364712 新帖:0

多因子选股代码框架

我是小编发表于:5 月 10 日 07:22回复(1)

写了一个多因子选股框架,包括了数据获取、数据预处理、数据分析、因子选择、股票选择和回测模块。
写了许多可能用到的函数,虽然本文最后的选股和回测模块只用到了其中部分函数,其余函数在其他多因子策略中可能用到,感兴趣的小伙伴可以基于本文提供的框架自己自由的进行策略开发。
本文代码主要是实现基本框架,有些计算趋于简单,例如夏普比率的计算,其中有很多可以改进的地方,有兴趣的小伙伴可以自己补充完善。
最后的选股策略是这样子的:
将股票按照每个因子大小分成五组,当然了因子可以根据自己的策略自由选择,分几组也可以自己调整,本文的因子是随便选的,因子选择是有门道的,是可以基于本文提供的部分代码对因子打分的,回头再发一个专门研究因子选择的。
因子的大小和收益并非线性关系,有些收益最高的因子值可能是取中间值,例如五组中的第三组,找出每一个因子收益最好的分组,每个分组对应许多支股票,取各个因子最好分组的股票交集,这就是选股策略了。
随便回测了一下,收益率还不错。
用了13年的数据确定最佳因子分组,2014-2018年收益为368%。

import pandas as pd
from jqdata import *
from sklearn.preprocessing import StandardScaler
from jqfactor import *
import scipy.stats as st
import statsmodels.api as sm

#可视化时显示负号
import matplotlib
matplotlib.rcParams['axes.unicode_minus']=False
import warnings
warnings.filterwarnings('ignore')
/opt/conda/envs/python3new/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools
ttm_factors = []

date = '2013-01-01'    #获取此日期数据,eg:当天的股票信息
start_date = '2013-01-01'
end_date = '2013-11-01'

all_securities = get_all_securities(date=date)   #获取当天所有股票代码
all_stocks = list(all_securities.index)[:10]


INDUSTRY_NAME = 'sw_l1'
PRE_PROFIT_DAYS = 1 #计算收益率时,向前计算的天数
ADJUST_DAYS = 60 #调仓天数,此天数为交易日时间,每月大概22个交易日
CALCULATE_IR_PERIOD = 12 #计算IR的数据长度

SELECT_STOCKS = 0.2 #取行业内前20%股票
MONTH_MEAN_DAY_NUM = 3 #计算月收益率时,取每月最后N天的数据平均数
#待选因子及介绍
'''
资本支出:
TOC:营业总成本
OTOP:经营活动产生的现金流量净额/经营活动净收益(%)

现金流:
NOCF / MC:经营活动产生的现金流量净额/总市值
OTR:经营活动产生的现金流量净额/营业收入

成长能力:
ITRYOY:营业总收入同比增长率(%)
EPS:每股收益
INPYOY:净利润同比增长率

流动性:
TCA/TCL:流动比率

价值分析:
PE:市盈率
PB:市净率
PR:市现率
PS:市销率

盈利能力:
ROA:总资产收益率
ROE:净资产收益率
OP/TOR:营业利润/营业总收入
TP/TOR:利润总额/营业总收入
NP/TOR:净利润/营业总收入

运营:
TOR/TA:总资产周转率
TOR/FA:固定资产周转率
TOR/TCA:流动资产周转率
LTL/OC:长期借款/营业成本

负债:
TL/TA:总负债/总资产
TL/TOE:总负债/股东权益

'''
factor_list = ['MC','TOC','OTOP','NOCF/MC','OTR','ITRYOY','EPS','INPYOY','TCA/TCL','PE','PB','PR','PS','ROA',
               'OP/TOR','TP/TOR','NP/TOR','TOR/TA','TOR/FA','TOR/TCA','LTL/OC','TL/TA','TL/TOE']
'''
基本面因子映射
'''
fac_dict = {
    'MC':valuation.market_cap, # 总市值
    'GP':indicator.gross_profit_margin * income.operating_revenue, # 毛利润
    'OP':income.operating_profit,
    'OR':income.operating_revenue, # 营业收入
    'NP':income.net_profit, # 净利润
    'EV':valuation.market_cap + balance.shortterm_loan+balance.non_current_liability_in_one_year+balance.longterm_loan+balance.bonds_payable+balance.longterm_account_payable - cash_flow.cash_and_equivalents_at_end,
    
    'TOE':balance.total_owner_equities, # 股东权益合计(元)
    'TOR':income.total_operating_revenue, # 营业总收入
    'EBIT':income.net_profit+income.financial_expense+income.income_tax_expense,
    
    'TOC':income.total_operating_cost,#营业总成本
    'NOCF/MC':cash_flow.net_operate_cash_flow / valuation.market_cap, #经营活动产生的现金流量净额/总市值
    'OTR':indicator.ocf_to_revenue, #经营活动产生的现金流量净额/营业收入(%) 
    
    
    'GPOA':indicator.gross_profit_margin * income.operating_revenue / balance.total_assets,  #毛利润 / 总资产 = 毛利率*营业收入 / 总资产
    'GPM':indicator.gross_profit_margin, # 毛利率
    'OPM':income.operating_profit / income.operating_revenue, #营业利润率
    'NPM':indicator.net_profit_margin, # 净利率
    'ROA':indicator.roa, # ROA
    'ROE':indicator.roe, # ROE
    'INC':indicator.inc_return, # 净资产收益率(扣除非经常损益)(%)
    'EPS':indicator.eps, # 净资产收益率(扣除非经常损益)(%)
    'AP':indicator.adjusted_profit, # 扣除非经常损益后的净利润(元)
    'OP':indicator.operating_profit, # 经营活动净收益(元)
    'VCP':indicator.value_change_profit, # 价值变动净收益(元) = 公允价值变动净收益+投资净收益+汇兑净收益
    
    'ETTR':indicator.expense_to_total_revenue, # 营业总成本/营业总收入(%)
    'OPTTR':indicator.operation_profit_to_total_revenue, # 营业利润/营业总收入(%)
    'NPTTR':indicator.net_profit_to_total_revenue, # 净利润/营业总收入(%)
    'OETTR':indicator.operating_expense_to_total_revenue, # 营业费用/营业总收入
    'GETTR':indicator.ga_expense_to_total_revenue, # 管理费用/营业总收入(%)
    'FETTR':indicator.financing_expense_to_total_revenue, # 财务费用/营业总收入(%)	
    
    'OPTP':indicator.operating_profit_to_profit, # 经营活动净收益/利润总额(%)
    'IPTP':indicator.invesment_profit_to_profit, # 价值变动净收益/利润总额(%)
    'GSASTR':indicator.goods_sale_and_service_to_revenue, # 销售商品提供劳务收到的现金/营业收入(%)
    'OTR':indicator.ocf_to_revenue, # 经营活动产生的现金流量净额/营业收入(%)
    'OTOP':indicator.ocf_to_operating_profit, # 经营活动产生的现金流量净额/经营活动净收益(%)
    
    'ITRYOY':indicator.inc_total_revenue_year_on_year, # 营业总收入同比增长率(%)
    'ITRA':indicator.inc_total_revenue_annual, # 营业总收入环比增长率(%)
    'IRYOY':indicator.inc_revenue_year_on_year, # 营业收入同比增长率(%)
    'IRA':indicator.inc_revenue_annual, # 营业收入环比增长率(%)
    'IOPYOY':indicator.inc_operation_profit_year_on_year, # 营业利润同比增长率(%)
    'IOPA':indicator.inc_operation_profit_annual, # 营业利润环比增长率(%)
    'INPYOY':indicator.inc_net_profit_year_on_year, # 净利润同比增长率(%)
    'INPA':indicator.inc_net_profit_annual, # 净利润环比增长率(%)
    'INPTSYOY':indicator.inc_net_profit_to_shareholders_year_on_year, # 归属母公司股东的净利润同比增长率(%)
    'INPTSA':indicator.inc_net_profit_to_shareholders_annual, # 归属母公司股东的净利润环比增长率(%)
    'INPTSA':indicator.inc_net_profit_to_shareholders_annual, # 归属母公司股东的净利润环比增长率(%)
    
    
    'ROIC':(income.net_profit+income.financial_expense+income.income_tax_expense)/(balance.total_owner_equities+balance.shortterm_loan+balance.non_current_liability_in_one_year+balance.longterm_loan+balance.bonds_payable+balance.longterm_account_payable),
    'OPTT':income.operating_profit / income.total_profit, # 营业利润占比
    'TP/TOR':income.total_profit / income.total_operating_revenue, #利润总额/营业总收入
    'OP/TOR':income.operating_profit / income.total_operating_revenue,
    'NP/TOR':income.net_profit / income.total_operating_revenue,

    'NP':income.net_profit, # 净利润
    
    'TA':balance.total_assets, # 总资产

    'DER':balance.total_liability / balance.equities_parent_company_owners, # 产权比率 = 负债合计/归属母公司所有者权益合计
    'FCFF/TNCL':(cash_flow.net_operate_cash_flow - cash_flow.net_invest_cash_flow) / balance.total_non_current_liability, #自由现金流比非流动负债
    'NOCF/TL': cash_flow.net_operate_cash_flow / balance.total_liability, # 经营活动产生的现金流量净额/负债合计
    'TCA/TCL':balance.total_current_assets / balance.total_current_liability, # 流动比率

    'PE':valuation.pe_ratio, # PE 市盈率
    'PB':valuation.pb_ratio, # PB 市净率
    'PR':valuation.pcf_ratio, # PR 市现率
    'PS':valuation.ps_ratio, # PS 市销率
    
    'TOR/TA':income.total_operating_revenue / balance.total_assets, #总资产周转率
    'TOR/FA':income.total_operating_revenue / balance.fixed_assets, #固定资产周转率
    'TOR/TCA':income.total_operating_revenue / balance.total_current_assets, #流动资产周转率
    'LTL/OC':balance.longterm_loan / income.operating_cost, #长期借款/营业成本
    
    'TL/TA':balance.total_liability / balance.total_assets, #总资产/总负债
    'TL/TOE':balance.total_liability / balance.total_owner_equities,#负债权益比
    
    }
def get_fundamental_data(factor_list,ttm_factors, date, securities=all_stocks):
    '''
    获取基本面数据,横截面数据,时间、股票、因子三个参数确定
    获取的数据中含有Nan值,一般用行业均值填充
    输入:
    factor_list:list, 普通因子
    ttm_factors:list, ttm因子,获取过去四个季度财报数据的和
    date:str 或者 datetime.data, 获取数据的时间
    securities:list,默认为全部股票
    输出:
    DataFrame,普通因子和ttm因子的合并,index为股票代码,第一列也为股票代码
    '''
    if len(factor_list) == 0:
        return 'fundamental list is empty, please input data'
    #获取查询的factor list
    q = query(valuation.code)
    for fac in factor_list:
        q = q.add_column(fac_dict[fac])
    q = q.filter(valuation.code.in_(securities))
    fundamental_df = get_fundamentals(q,date)
    fundamental_df.index = fundamental_df['code']
    fundamental_df.columns = ['code'] + factor_list

    if type(date) == str:
        year = int(date[:4])
        month_day = date[5:]
    elif type(date) == datetime.date:
        date = date.strftime('%Y-%m-%d')
        year = int(date[:4])
        month_day = date[5:]
    else:
        return 'input date error'
    
    if month_day < '05-01':
        statdate_list = [str(year-2)+'q4', str(year-1)+'q1', str(year-1)+'q2', str(year-1)+'q3']
    elif month_day >= '05-01' and month_day < '09-01':
        statdate_list = [str(year-1)+'q1', str(year-1)+'q2', str(year-1)+'q3',str(year)+'q1']
    elif month_day >= '09-01' and month_day < '11-01':
        statdate_list = [str(year-1)+'q2', str(year-1)+'q3', str(year)+'q1', str(year)+'q2']
    elif month_day >= '11-01':
        statdate_list = [str(year-1)+'q4', str(year)+'q1', str(year)+'q2', str(year)+'q3']
            
    ttm_fundamental_data = ''
   
    ttm_q = query(valuation.code)
    for fac in ttm_factors:
        ttm_q = ttm_q.add_column(fac_dict[fac])
    ttm_q = ttm_q.filter(valuation.code.in_(securities))  
                             
    for statdate in statdate_list:
        if type(ttm_fundamental_data) == str:
            fundamental_data = get_fundamentals(ttm_q, statDate=statdate)
            fundamental_data.index = fundamental_data['code']
            ttm_fundamental_data = fundamental_data
        else:
            fundamental_data = get_fundamentals(ttm_q, statDate=statdate)
            fundamental_data.index = fundamental_data['code']
            ttm_fundamental_data.iloc[:,1:] += fundamental_data.iloc[:,1:]
    ttm_fundamental_data.columns = ['code'] + ttm_factors
    results = pd.merge(fundamental_df,ttm_fundamental_data,on=['code'],how='inner')
    results = results.sort_values(by='code')
    results.index = results['code']
    results = results.drop(['code'],axis=1)
    return results
fund_data = get_fundamental_data(factor_list,ttm_factors,date,all_stocks)
def get_stock_industry(name,date,output_csv = False):
    '''
    获取股票对应的行业
    input:
    neme: str, 
    "sw_l1": 申万一级行业
    "sw_l2": 申万二级行业
    "sw_l3": 申万三级行业
    "jq_l1": 聚宽一级行业
    "jq_l2": 聚宽二级行业
    "zjw": 证监会行业
    date:时间
    output: DataFrame,index 为股票代码,columns 为所属行业代码
    '''
    industries = list(get_industries(name).index)
    all_securities = get_all_securities(date=date)   #获取当天所有股票代码
    all_securities['industry_code'] = 1
    for ind in industries:
        industry_stocks = get_industry_stocks(ind,date)
        #有的行业股票不在all_stocks列表之中
        industry_stocks = set(all_stocks) & set(industry_stocks)
        all_securities['industry_code'][industry_stocks] = ind
    stock_industry = all_securities['industry_code'].to_frame()
    if output_csv == True:
        stock_industry.to_csv('stock_industry.csv') #输出csv文件,股票对应行业
    return stock_industry
def fillna_with_industry(data,date):
    '''
    使用行业均值填充nan值
    input:
    data:DataFrame,输入数据,index为股票代码
    date:string,时间必须和data数值对应时间一致
    output:
    DataFrame,缺失值用行业中值填充,无行业数据的用列均值填充
    '''
    stocks = list(data.index)
    stocks_industry = get_stock_industry(INDUSTRY_NAME,date)
    stocks_industry_merge = data.merge(stocks_industry, left_index=True,right_index=True,how='left')
    stocks_dropna = stocks_industry_merge.dropna()
    columns = list(data.columns)
    select_data = []
    group_data = stocks_industry_merge.groupby('industry_code')
    group_data_mean = group_data.mean()
    group_data = stocks_industry_merge.merge(group_data_mean,left_on='industry_code',right_index=True,how='left')
    for column in columns:

        if type(data[column][0]) != str:

            group_data[column+'_x'][pd.isnull(group_data[column+'_x'])] = group_data[column+'_y'][pd.isnull(group_data[column+'_x'])]
            
            group_data[column] = group_data[column+'_x']
            #print(group_data.head())
            select_data.append(group_data[column])
            
    result = pd.concat(select_data,axis=1)
    #行业均值为Nan,用总体均值填充
    mean = result.mean()
    for i in result.columns:
        result[i].fillna(mean[i],inplace=True)
    return result
fillna_data = fillna_with_industry(fund_data,date)
def get_pure_values(data):
    '''
    获取纯净数值,将DataFrame中的非数值项剔除,例如‘code’项(str)
    input:
    data:pd.DataFrame,index为股票代码
    putput:
    DataFrame:只含数值项
    '''
    columns = list(data.columns)
    for column in columns:
        if not(isinstance(data[column][0],int) or isinstance(data[column][0],float)):
            data = data.drop([column],axis=1)
    return data
res = get_pure_values(fillna_data)
fund_data = get_pure_values(fillna_data)

获取股票收益数据¶

def get_month_profit(stocks,start_date,end_date):
    '''
    获取月收益率数据,数据为本月相对于上月的增长率,计算时用每月最后MONTH_MEAN_DAY_NUM天的均值
    input:
    data:dataframe,index为股票代码,values为因子值
    start_date:str, 初始日期
    end_date:str,终止日期
    
    '''
    start_year = int(start_date[:4])
    end_year = int(end_date[:4])
    start_month = int(start_date[5:7])
    end_month = int(end_date[5:7])
    len_month = (end_year - start_year)*12 + (end_month - start_month)
    price_list = []
    for i in range(len_month):
        date = str(start_year+i//12)+'-'+str(start_month+i%12)+'-'+'01'
        price = get_price(stocks,fields=['close'],count=MONTH_MEAN_DAY_NUM,end_date=date)['close']
        price_mean = price.mean().to_frame()
        price_mean.columns = [date]
        price_list.append(price_mean)
    month_profit = pd.concat(price_list,axis=1)
    return month_profit
month_profit = get_month_profit(list(res.index),start_date,end_date)
month_profit_pct = month_profit.pct_change(1,axis=1).dropna(axis=1,how='all')
def get_day_profit(stocks,end_date,start_date=None,count=-1,pre_num=PRE_PROFIT_DAYS):
    '''
    获取每天的收益率
    input:
    stocks:list or Series,股票代码
    start_date:开始时间
    end_date:结束时间
    count:与start_date二选一,向前取值个数
    pre_num:int,向前计算的天数
    output:
    profit:dataframe,index为日期,columns为股票代码,values为收益率
    '''
    if count == -1:
        price = get_price(stocks,start_date,end_date,fields=['close'])['close']
    else:
        price = get_price(stocks,end_date=end_date,count=count,fields=['close'])['close']
    profit = price.pct_change(periods=pre_num).dropna()
    return profit
day_profit = get_day_profit(all_stocks,start_date=start_date,end_date=end_date)
def get_day_profit_forward(stocks,end_date,start_date=None,count=-1,pre_num=PRE_PROFIT_DAYS):
    '''
    获取收益率,pre_num为计算时间差,在时间轴上的当期值是未来计算周期内的收益率,
    例如:pre_num=3,2013-01-01对应的收益率是2013-01-04的收益率与01-01日收益率之差
    input:
    stocks:list or Series,股票代码
    start_date:开始时间
    end_date:结束时间
    count:与start_date二选一,向前取值个数
    pre_num:int,向前计算的天数
    output:
    profit:dataframe,index为日期,columns为股票代码,values为收益率
    '''
    if count == -1:
        price = get_price(stocks,start_date,end_date,fields=['close'])['close']
        date_list = get_trade_days(start_date=start_date,end_date=end_date)
        price.index = date_list

    else:
        price = get_price(stocks,end_date=end_date,count=count,fields=['close'])['close']
        date_list = get_trade_days(end_date=end_date,count=count)
        price.index = date_list
    profit = price.pct_change(periods=pre_num).shift(-pre_num).dropna()
    return profit
day_profit_forward = get_day_profit_forward(all_stocks,start_date=start_date,end_date=end_date)
def get_sharpe_ratio_forward(stocks,end_date,start_date=None,count=-1,pre_num=ADJUST_DAYS):
    '''
    获取夏普比率,pre_num为计算时间差,在时间轴上的当期值是未来计算周期内的夏普比率,
    例如:pre_num=3,2013-01-01对应的夏普比率是2013-01-01的至01-04日的夏普比率
    计算时无风险利率按4%计算
    input:
    stocks:list or Series,股票代码
    start_date:开始时间
    end_date:结束时间
    count:与start_date二选一,向前取值个数
    pre_num:int,向前计算的天数
    output:
    profit:dataframe,index为股票代码,columns为日期,values为收益率
    '''
    if count == -1:
        price = get_price(stocks,start_date,end_date,fields=['close'])['close']
        date_list = get_trade_days(start_date=start_date,end_date=end_date)
        price.index = date_list

    else:
        price = get_price(stocks,end_date=end_date,count=count,fields=['close'])['close']
        date_list = get_trade_days(end_date=end_date,count=count)
        price.index = date_list
        price = price.fillna(0)
    profit = price.pct_change(periods=pre_num).shift(-pre_num)
    drop_list = list(profit.index)[-pre_num:]
    profit = profit.drop(drop_list)
    date_list = list(price.index)
    srl = []
    for date in date_list[:-pre_num]:
        index = date_list.index(date)
        std = price.iloc[index:index+pre_num,:].std()
        sharpe_ratio = (profit.iloc[index,:] - (0.04/250*pre_num))/(std+0.001)
        srl.append(sharpe_ratio)
        
    sr = pd.concat(srl,axis=1)
    sr.columns = date_list[:-pre_num]
    return sr
#day_sharpe_ratio_forward = get_sharpe_ratio_forward(all_stocks,start_date=start_date,end_date=end_date)

数据处理函数¶

def demeaned(data,axis=0):
    ''' 指定轴向去掉均值
    input:
    data:dataframe or list or series,输入数据
    axis:指定去均值的轴
    output:
    demean:输出类型与输入类型一致
    '''
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore","Mean of empty slice")
    if isinstance(data,pd.DataFrame):
        mean = data.mean(axis=axis)
        if axis == 0:
            demean = data.sub(mean)
        else:
            data = data.stack()
            data = data.unstack(0)
            demean = data.sub(mean)
            demean = demean.stack()
            demean = demean.unstack(0)
        demean = data.sub(mean)
    elif isinstance(data,pd.Series):
        mean = data.mean()
        demean = data - mean
    else:
        data = np.asarray(data)
        mean = np.nanmean(data)
        demean = list(data - mean)
    return demean
demean = demeaned(day_profit,1)
#数据去极值及标准化
def winsorize_and_standarlize(data,qrange=[0.05,0.95],axis=0):
    '''
    input:
    data:Dataframe or series,输入数据
    qrange:list,list[0]下分位数,list[1],上分位数,极值用分位数代替
    '''
    if isinstance(data,pd.DataFrame):
        if axis == 0:
            q_down = data.quantile(qrange[0])
            q_up = data.quantile(qrange[1])
            index = data.index
            col = data.columns
            for n in col:
                data[n][data[n] > q_up[n]] = q_up[n]
                data[n][data[n] < q_down[n]] = q_down[n]
            data = (data - data.mean())/data.std()
            data = data.fillna(0)
        else:
            data = data.stack()
            data = data.unstack(0)
            q = data.quantile(qrange)
            index = data.index
            col = data.columns
            for n in col:
                data[n][data[n] > q[n]] = q[n]
            data = (data - data.mean())/data.std()
            data = data.stack().unstack(0)
            data = data.fillna(0)
            
    elif isinstance(data,pd.Series):
        name = data.name
        q = data.quantile(qrange)
        data[data>q] = q
        data = (data - data.mean())/data.std()
    return data
data_win_sta = winsorize_and_standarlize(fillna_data)
def cut_data(data,ncut=5):
    '''
    根据分位数将数据分组
    input:
    data:pd.Series,index为股票代码,values为因子值
    ncut:分组的数量
    output:
    res:list,元素为分组值,series类型,从小到大排列
    '''
    q = 1/ncut
    l_q = []
    l_q.append(data.min()-1)
    for i in range(ncut):
        qan = data.quantile(q*(i+1))
        l_q.append(qan)
    res = []
    for n in range(ncut):
        r = data[(data>l_q[n])&(data<=l_q[n+1])]
        res.append(r)
    return res
res = cut_data(fillna_data['EPS'])
        
print(type(res[1]))
<class 'pandas.core.series.Series'>
def neutralize(data,name,date,market_cap):
    '''
    中性化,使用行业和市值因子中性化
    input:
    data:DataFrame,index为股票代码,columns为因子,values为因子值
    name:str,行业代码
    "sw_l1": 申万一级行业
    "sw_l2": 申万二级行业
    "sw_l3": 申万三级行业
    "jq_l1": 聚宽一级行业
    "jq_l2": 聚宽二级行业
    "zjw": 证监会行业
    date:获取行业数据的时间
    maket_cap:市值因子
    '''
    industry_se = get_stock_industry(name,date)
    columns = list(data.columns)
    if isinstance(industry_se,pd.Series):
        industry_se = industry_se.to_frame()
    if isinstance(market_cap,pd.Series):
        market_cap = market_cap.to_frame()
        
    index = list(data.index)
    industry_se = array(industry_se.ix[index,0].tolist())
    industry_dummy = sm.categorical(industry_se,drop=True)
    industry_dummy = pd.DataFrame(industry_dummy,index=index)
    market_cap = np.log(market_cap.loc[index])
    x = pd.concat([industry_dummy,market_cap],axis=1)
    model = sm.OLS(data,x)
    result = model.fit()
    y_fitted =  result.fittedvalues
    neu_result = data - y_fitted
    return neu_result
market_cap = get_fundamental_data(securities=all_stocks,factor_list=['MC'],ttm_factors=ttm_factors,date=date)
market_cap = get_pure_values(market_cap)
neu_res = neutralize(fund_data,'sw_l1',date,market_cap)
neu_win_sta = winsorize_and_standarlize(neu_res)
def cut_factor_plot(data,profit,ncut=5):
    '''
    分层因子收益率可视化
    input:
    data:pd.series,因子数据
    profit:dataframe,index为股票代码,columns为日期,values为收益率
    '''
    cut_res = cut_data(data,ncut=ncut)
    l = []
    for i in range(len(cut_res)):
        stocks = cut_res[i].index
        profit_cut = profit.loc[stocks]
        profit_cut = profit_cut.mean()
        l.append(profit_cut)
    df = pd.concat(l,axis=1)
    df = df.cumsum()
    df.plot(figsize=(20,10))
    plt.legend()
    

cut_factor_plot(fillna_data['EPS'],demean)

数据评判指标¶

def calculate_IC_IR_without_data_process(stocks,factors,start_date,end_date):
    '''
    计算因子的IC/IR值
    input:
    stocks:list,股票代码
    factors:list,计算的因子
    start_date:开始时间
    end_date:结束时间
    output:
    df_ic:dataframe,index为时间,columns为因子
    df_ir:dataframe,index为时间,columns为因子,CALCULATE_IR_PERIOD个向前计算数值,
    
    '''
    
    profit = get_day_profit_forward(stocks=stocks,start_date=start_date,end_date=end_date,pre_num=ADJUST_DAYS)
    date_list = profit.index
    l_ic = []
    for date in date_list:
        fund_data = get_fundamental_data(securities=stocks,factor_list=factors,ttm_factors=ttm_factors,date=date)
        l = []
        for factor in factors:
            ic,pvalue = st.spearmanr(fund_data[factor],profit.loc[date])
            l.append(ic)
        #df_day_ic = pd.DataFrame(l,index=factors)
        l_ic.append(l)
    df_ic = pd.DataFrame(l_ic,index=date_list,columns=factors)
    
    l_ir_all = []
    date_ir = list(df_ic.index)
    for fac in factors:
        if CALCULATE_IR_PERIOD <= len(date_ir):
            l_ir = []
            for date in date_ir[CALCULATE_IR_PERIOD:]:
                ind = date_ir.index(date)
                ic_cal = df_ic.loc[date_ir[ind-CALCULATE_IR_PERIOD]:date,fac]
                ir = ic_cal.mean()/ic_cal.std()
                l_ir.append(ir)
            l_ir_all.append(l_ir)
        else:
            print('CALCULATE_IR_PERIOD length is too large')
    df_ir = pd.DataFrame(l_ir_all,index=factors,columns=date_ir[CALCULATE_IR_PERIOD:])
    df_ir = df_ir.stack().unstack(0)
    return df_ic,df_ir

#ic,ir = calculate_IC_IR_without_data_process(all_stocks,factors=factor_list[:5],start_date=start_date,end_date=end_date)
def calculate_IC_IR_with_data_process(stocks,factors,start_date,end_date,name='sw_l1'):
    '''
    计算因子的IC/IR值,对因子值进行行业和市值中性化处理
    input:
    stocks:list,股票代码
    factors:list,计算的因子
    start_date:开始时间
    end_date:结束时间
    name:选取行业方式,默认申万一级行业
    output:
    df_ic:dataframe,index为时间,columns为因子
    df_ir:dataframe,index为时间,columns为因子,CALCULATE_IR_PERIOD个向前计算数值
    
    '''
    
    profit = get_day_profit_forward(stocks=stocks,start_date=start_date,end_date=end_date,pre_num=ADJUST_DAYS)
    date_list = profit.index
    l_ic = []
    for date in date_list:
        fund_data = get_fundamental_data(securities=stocks,factor_list=factors,ttm_factors=ttm_factors,date=date)
        fund_data = neutralize(fund_data,name=name,date=date,market_cap=fund_data['MC'])
        l = []
        for factor in factors:
            ic,pvalue = st.spearmanr(fund_data[factor],profit.loc[date])
            l.append(ic)
        #df_day_ic = pd.DataFrame(l,index=factors)
        l_ic.append(l)
    df_ic = pd.DataFrame(l_ic,index=date_list,columns=factors)
    
    l_ir_all = []
    date_ir = list(df_ic.index)
    for fac in factors:
        if CALCULATE_IR_PERIOD <= len(date_ir):
            l_ir = []
            for date in date_ir[CALCULATE_IR_PERIOD:]:
                ind = date_ir.index(date)
                ic_cal = df_ic.loc[date_ir[ind-CALCULATE_IR_PERIOD]:date,fac]
                ir = ic_cal.mean()/ic_cal.std()
                l_ir.append(ir)
            l_ir_all.append(l_ir)
        else:
            print('CALCULATE_IR_PERIOD length is too large')
    df_ir = pd.DataFrame(l_ir_all,index=factors,columns=date_ir[CALCULATE_IR_PERIOD:])
    df_ir = df_ir.stack().unstack(0)
    return df_ic,df_ir
#ic1,ir1 = calculate_IC_IR_with_data_process(all_stocks,factors=factor_list[:5],start_date=start_date,end_date=end_date)
def calculate_sharpe_ratio(stocks,start_date,end_date):
    '''
    在时间轴数据上计算股票收益的夏普比率
    input:
    stocks:list,股票代码
    start_date:str or datetime.date计算开始时间
    end_date:str or datetime.date结束时间
    
    '''
    profit = get_day_profit(stocks=stocks,start_date=start_date,end_date=end_date,pre_num=1)
    price_start = get_price(stocks,fields=['close'],count=1,end_date=start_date)['close']
    price_end = get_price(stocks,fields=['close'],count=1,end_date=end_date)['close']
    price = pd.concat([price_start,price_end])
    price_profit = price.pct_change().dropna()
    std = profit.std()
    
    trade_days =get_trade_days(start_date=start_date,end_date=end_date)
    length = len(trade_days)
    gz_profit = (0.04/250)*length #国债利率4%,每年250个交易日
    sharpe_ratio = (price_profit-gz_profit)/std
    sharpe_ratio = sharpe_ratio.stack().unstack(0)
    sharpe_ratio.columns = ['sharpe_ratio']
    return sharpe_ratio
#sharpe_ratio = calculate_sharpe_ratio(all_stocks,start_date,end_date)
    

选股模块¶

#选股因子
factor_list = factor_list = ['MC','EPS','TCA/TCL','PE']
trade_days_all = (get_trade_days(start_date=start_date,end_date=end_date)).tolist()
trade_days = trade_days_all[:-20]
#每20个交易日获取一次基本面数据
res_df = pd.DataFrame(0,columns=factor_list,index=range(5))
for date in trade_days[::20]:
    index = trade_days.index(date)
    all_stocks = list(get_all_securities(date=date).index)
    fund_data = get_fundamental_data(factor_list=factor_list,ttm_factors=ttm_factors,date=date,securities=all_stocks)
    fillna_d = fillna_with_industry(fund_data,date=date)
    fac_sr_list = []
    for factor in factor_list:
        cut_res = cut_data(fund_data[factor])
        sr_list = []
        for i in range(len(cut_res)):
            stocks = list(cut_res[i].index)
            #利用夏普比率对因子进行选择
            sr = get_sharpe_ratio_forward(stocks=stocks,start_date=date,end_date=trade_days_all[index+20],pre_num=20)
            sr = sr.mean()
            sr_list.append(sr)
        sr_df = pd.concat(sr_list)
        sr_df.columns = [factor]
        fac_sr_list.append(sr_df)
    fac_sr_df = pd.concat(fac_sr_list,axis=1)
    fac_sr_df.columns = factor_list
    fac_sr_df.index = range(len(fac_sr_df))
    #计算在时间轴上不同因子夏普比率之和
    res_df += fac_sr_df
print(res_df)

            
    
         MC       EPS   TCA/TCL        PE
0  0.910873  0.131916 -0.341434 -0.275349
1  0.826918  0.572168  0.507743  0.821135
2  0.420978  0.822004  0.401287  0.935940
3  0.327025  0.563808  0.545216  0.431795
4 -0.139886  0.259293  1.274726  0.433319

回测模块¶

#获取各个因子对应的最大夏普比率是第几分组,基于此进行选股
factor_index = {}
for fac in res_df.columns:
    l = list(res_df[fac].values)
    index = l.index(max(l))
    factor_index[fac] = index

    
test_start_date = '2015-01-01' 
test_end_date = '2016-01-01'
trade_days_all = (get_trade_days(start_date=test_start_date,end_date=test_end_date)).tolist()
trade_days = trade_days_all[:-20]
profit_list = []
for date in trade_days[::20]:
    index = trade_days.index(date)
    all_stocks = list(get_all_securities(date=date).index)
    fund_data = get_fundamental_data(factor_list=factor_list,ttm_factors=ttm_factors,date=date,securities=all_stocks)
    fillna_d = fillna_with_industry(fund_data,date=date)
    stocks_set = set(all_stocks)
    for factor in factor_list:
        cut_res = cut_data(fillna_d[factor])
        stocks_sel_series = cut_res[factor_index[factor]] #根据之前计算的因子分组选股
        stocks_set &= set(list(stocks_sel_series.index)) #计算各个分组股票的交集
    profit_stocks = list(stocks_set)
    #持仓一个月,计算一个月后的收益
    profit = get_day_profit_forward(stocks=profit_stocks,start_date=date,end_date=trade_days_all[index+20],pre_num=20)
    profit_mean = profit.mean(axis=1) #每月收益的均值
    profit_list.append(profit_mean)
profit_df = pd.concat(profit_list)
profit_df = profit_df + 1
print(profit_df)

profit_res = profit_df.cumprod() #计算时间轴上的总收益
print(profit_res)
'''
因子选择时采用13年数据,测试时
使用14年数据,收益率为63%
使用15年数据,收益率为174%
使用16年数据,收益率为25%
使用14-18年数据,收益率为368%

'''
2015-01-05    1.094958
2015-02-02    1.106351
2015-03-09    1.367881
2015-04-07    1.021829
2015-05-06    1.735548
2015-06-03    0.634690
2015-07-02    0.921978
2015-07-30    0.912793
2015-08-27    1.065059
2015-09-28    1.225566
2015-11-02    1.153636
2015-11-30    1.157163
dtype: float64
2015-01-05    1.094958
2015-02-02    1.211408
2015-03-09    1.657063
2015-04-07    1.693235
2015-05-06    2.938691
2015-06-03    1.865156
2015-07-02    1.719633
2015-07-30    1.569668
2015-08-27    1.671790
2015-09-28    2.048889
2015-11-02    2.363671
2015-11-30    2.735152
dtype: float64
'\n因子选择时采用13年数据,测试时\n使用14年数据,收益率为63%\n使用15年数据,收益率为174%\n使用16年数据,收益率为25%\n使用14-18年数据,收益率为368%\n\n'

全部回复

0/140

达人推荐

量化课程

    移动端课程