请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  量化平台 帖子:3364737 新帖:1

精英任务公开评审——作者:云帆

吃瓜群众小王发表于:7 月 1 日 14:43回复(1)

基于日内高频数据的短周期选股因子研究-高频数据因子研究系列一¶

引言¶

量化投资从时间长度上可分为长线投资和短线投资,不同的时间长度采用的分析数据和方法有所区别。长线投资通常指价值投资,一般分析宏观数据、企业基本面信息和低频的量价关系,判断宏观经济走势用于择时,分析基本面信息和低频量价关系用于选股。短线投资由于频率高,需要对短期数据深入分析,挖掘规律。短线投资的择时多依据技术因子,选股则要分析股票价格、成交量等信息,找到盈利因子。本文内容即是短线投资选股因子探索。
传统的多因子量化在A股市场中得到广泛应用,历史上长期有效的因子由于被大量使用,近几年已经失效,对新因子的挖掘提出迫切需求。
挖掘选股因子,本质上是从某一维度上找到股票的特异性,以此为标准对股票分类,获取超额收益。常用的数据有开盘价、收盘价、最高价、最低价、成交量等。常用的因子构建方法有:动量型,趋势型,反转型,相关性分析,统计规律分析,拟合度及残差分析,高阶距分析等。本文采用的是高阶距分析,使用日内高频数据,挖掘有效因子。

研究目的¶

1.采用日内高频数据,探索高频数据在构建因子时的有效性。

2.对高频数据差分后进行高阶距计算,探索高阶距在构建因子时的有效性。

3.采用周频调仓,本文实现部分可以调整参数,修改调仓周期,探索不同调仓周期下因子的有效性。

研究内容¶

研报中在计算因子时,采用日内5分钟价格数据,取对数差分,其本质反应的是5分钟的收益情况。对短时间的收益作高阶距运算,偏度反映统计数据相对于正态分布的偏离程度,负偏度呈现左偏形态,负偏度特点是:众数 > 中位数 > 平均数,正偏度相反。从偏度形态上推测,负偏度时更多的点落在右侧,也就是数值较大的一侧,即收益相对较高的点较多,如果历史规律有效,那么以此选股收益应该较高。这就解释了偏度越小超额收益越大。

由于考虑到运行时间,本文没有按照研报的时间和样本跑数据,而是选择了部分区间。

样本区间:2015年1月1日至2019年3月27日

样本范围:中证500历史成分股,剔除上市不满一年的股票,剔除ST股票、*ST股票,剔除交易日停牌的股票

数据频率:个股每个交易日5分钟频率的收盘价

分档方式:根据当期个股计算的因子值:已实现波动(Realized Volatility)𝑅𝑉𝑜𝑙𝑡,已实现偏度(Realized Skewness)𝑅𝑆𝑘𝑒𝑤𝑡、已实现峰度(RealizedKurtosis)𝑅𝐾𝑢𝑟𝑡𝑡,从小到大分为5档,按照股票数量平均分档

调仓周期:周频换仓,Q1档为因子值最小的,Q5档为因子值最大的。

研究结论¶

1.在周频调仓下,偏度因子选股策略具有明显的超额收益。

2.2015年至今,负IC占比均在60%以上,偏度因子与下期收益具有负相关性。

3.多空组合策略在最大回撤和波动率指标上有明显改善,表现出色。

说明¶

本文最后结果与研报结果趋势相同,但具体数据并不相同,原因可能如下:

1.调仓周期本文采用时间序列每隔五天采样,由于节假日影响,调仓日并非固定的每周第几天(例如,固定周一调仓)

2.未考虑到的其他因素影响。

一些想法¶

1.研报直接对数据进行高阶距分析,其实可以进行基本的相关性分析,即计算个股价格差分值和基准价格差分值相关性,相关性一定程度反映个股的特异性,在一些高频策略中,相关性分析表现较好。

2.计算周期可以做调整尝试,包括因子计算的N和n选择,以及调仓周期。

策略代码¶

import numpy as np
import pandas as pd
import datetime
import statsmodels.api as sm
from jqdata import *
import matplotlib.pyplot as plt
import time
import warnings
import pickle
import seaborn as sns
warnings.filterwarnings('ignore')
matplotlib.rcParams['axes.unicode_minus']=False
warnings.filterwarnings('ignore')
#参数定义
start_date = '2015-01-01'
end_date = '2019-03-27'

N = 48 #每日数据次数
n = 5  #向前取n天
cycle = 5 #调仓周期

工具函数¶

def get_tradeday_list(start, end, frequency=None, count=None):
    '''
    获取日期列表
    input:
    start:str or datetime,起始时间,与count二选一
    end:str or datetime,终止时间
    frequency:str, day,month,quarter,halfyear,默认为day
    count:int,与start二选一,默认使用start
    '''
    if isinstance(frequency, int):
        all_trade_days = get_trade_days(start, end)
        trade_days = all_trade_days[::frequency]
        days = [datetime.datetime.strftime(i, '%Y-%m-%d') for i in trade_days]
        return days

    if count != None:
        df = get_price('000001.XSHG', end_date=end, count=count)
    else:
        df = get_price('000001.XSHG', start_date=start, end_date=end)
    if frequency == None or frequency == 'day':
        days = df.index
    else:
        df['year-month'] = [str(i)[0:7] for i in df.index]
        if frequency == 'month':
            days = df.drop_duplicates('year-month').index
        elif frequency == 'quarter':
            df['month'] = [str(i)[5:7] for i in df.index]
            df = df[(df['month'] == '01') | (df['month'] == '04') | (df['month'] == '07') | (df['month'] == '10')]
            days = df.drop_duplicates('year-month').index
        elif frequency == 'halfyear':
            df['month'] = [str(i)[5:7] for i in df.index]
            df = df[(df['month'] == '01') | (df['month'] == '06')]
            days = df.drop_duplicates('year-month').index
    trade_days = [datetime.datetime.strftime(i, '%Y-%m-%d') for i in days]
    return trade_days



def cut_data_with_quantile(data,ncut= 5):
    '''
    根据分位数对数据分组
    input:
    data:pd.Series,index为股票代码,values为因子值
    ncut:分组的数量
    output:
    res:list,元素为分组值,list类型,按因子值从小到大排列
    '''
    if isinstance(data,pd.DataFrame):
        col = list(data.columns)[0]
        data = data[col]
    q = 1/ncut
    l_q = []
    l_q.append(data.min()-1)
    for i in range(ncut):
        qan = data.quantile(q*(i+1))
        l_q.append(qan)
    res = []
    for n in range(ncut):
        r = data[(data>l_q[n])&(data<=l_q[n+1])]
        ind = list(r.index)
        res.append(ind)
    return res

def cut_data_with_num(data,ncut=5):
    '''
    基于数量分组,按从小到大等数量分组
    input:
    data:pd.Series,index为股票代码,values为因子值
    ncut:分组的数量
    output:
    res:list,元素为分组值,list类型,按因子值从小到大排列
    '''
    if isinstance(data,pd.Series):
        data = data.to_frame()
    col = list(data.columns)[0]
    data = data.sort_values(by=col)
    length = len(data)
    ind = list(data.index)
    res = []
    for i in range(ncut):
        r = ind[int((i*length)/ncut):int((i+1)*length/ncut)]
        res.append(r)
    return res


#计算最大回撤
def find_max_drawdown(returns):
    '''
    returns:Series,输入为累计收益
    '''
    # 定义最大回撤的变量
    result = 0
    # 记录最高的回报率点
    historical_return = 0
    # 遍历所有日期
    for i in range(len(returns)):
        # 最高回报率记录
        historical_return = max(historical_return, returns[i])
        # 最大回撤记录
        drawdown = 1 - (returns[i]) / (historical_return)
        # 记录最大回撤
        result = max(drawdown, result)
    # 返回最大回撤值
    return result

股票过滤函数

def stocks_filter(stocks,date,n=250):
    '''
    剔除上市不满n天,ST,*ST,停牌股票
    input:
    stocks: list,股票列表
    date:str, 日期
    n:int,上市不满n天
    output:
    list,过滤后股票列表
    '''
    
    #剔除ST股
    st_data = get_extras('is_st', stocks, count = 1, end_date=date)
    st_stocks = [stock for stock in stocks if st_data[stock][0]]
    #剔除停牌股
    paused = get_price(stocks,end_date=date,count=1,fields='paused')['paused']
    paused_stocks = [stock for stock in stocks if paused[stock][0]]
    tmpList = []
    #剔除上市不满n天的股票
    date = datetime.datetime.strptime(date,'%Y-%m-%d').date()
    for stock in stocks:
        days_public = date - get_security_info(stock).start_date
        days_public = days_public.days
        if days_public < n:
            tmpList.append(stock)
    remove_stocks = set(st_stocks) | set(paused_stocks) | set(tmpList)
    sel_stocks = set(stocks) - remove_stocks
    #剔除停牌、新股及退市股票
    return sel_stocks

def get_filter_stocks_period(date_list,index='000982.XSHG',n=250):
    '''
    在时间序列上获取过滤股票
    input:
    date_list:list,时间序列
    index:股票指数,默认为中证500,若为all,则全市场选股
    n:int,剔除上市不满n天的股票
    output:
    dic,keys为时间,value为对应过滤后股票
    '''
    dic = {}
    if index != 'all': #中证500
        for date in date_list:
            stocks = get_index_stocks(index,date)
            filter_stocks = stocks_filter(stocks,date,n)
            dic[date] = filter_stocks
    else:
        stocks = list(get_all_securities().index) #全市场选股
        for date in date_list:
            filter_stocks = stocks_filter(stocks,date,n)
            dic[date] = filter_stocks
    return dic
#获取时间序列
date_list = get_tradeday_list(start_date,end_date)
#获取时间序列上过滤后的股票并保存
stocks_dic = get_filter_stocks_period(date_list,index='000982.XSHG',n=250)
with open('stocks_dic.pkl','wb') as pk_file:
    pickle.dump(stocks_dic,pk_file)
"\n#获取时间序列上过滤后的股票并保存\nstocks_dic = get_filter_stocks_period(date_list,index='000982.XSHG',n=250)\nwith open('stocks_dic.pkl','wb') as pk_file:\n    pickle.dump(stocks_dic,pk_file)\n"
with open('stocks_dic.pkl','rb') as pk_file:
    stocks_dic = pickle.load(pk_file)
#此代码用于测试使用
date_list_sel = date_list[:8]
stocks_dic_sel = {}
for date in date_list_sel:
    stocks_dic_sel[date] = stocks_dic[date]

因子计算¶

#因子计算时,使用当天加过去n-1天数据,计算收益时使用当天和未来第五天收盘价,实操中,当天收盘前卖掉持仓股票,买入新股票

def caculate_factor(stocks_dic,N=48,n=5,fre='5m'):
    '''
    因子值计算,
    input:
    datstocks_dic:dic,key为交易时间,value为对应过滤后的股票代码列表
    N:int,和fre对应,默认fre每5分钟取一次数据,每天交易时间4小时,计算得N=48
    n:int,向前计算天数
    fre:默认5m
    output:
    dic,key为日期,values为dataframe,index为股票列表,values为因子值
    
    '''
    st = time.time()
    
    n = n + 1 #研报中计算累计实现因子时,当天加上之前n天的和,所以在计算时实际是n+1个数据相加
    
    date_list = list(stocks_dic.keys())
    len_date = len(date_list)
    dic = {}
    for i in range(n,len_date-1):
        date = date_list[i-1]
        pre_date = date_list[i-n : i] 
        pre_date_for_minute_fre = date_list[i-n+1 : i+1]  #get_price取分钟频率时,取end_date前一天数据
        #过去n天均满足条件的股票,取交集
        original_set = set(stocks_dic[pre_date[0]])
        for date in pre_date[1:]:
            stocks_list = stocks_dic[date]
            stocks = original_set & set(stocks_list)
            original_set = stocks
        rvol_l = []
        rskew_l = []
        rkurt_l = []
        stock_l = []
        for stock in stocks:
            date_var = []
            date_skew = []
            date_kurt = []
            date_l = []
            for date_minute in pre_date_for_minute_fre:
                price = get_price(stock,end_date=date_minute,count=N,frequency=fre,fields='close')['close']
                price = np.log(price)
                last_price = price.shift()
                r = (price - last_price).dropna()
                #计算波动率
                RDVar = sum(r ** 2)
                if RDVar == 0: #涨停的股票波动率为0
                    continue
                RDSkew = (len(r)**(1/2)) * sum(r**3) / (RDVar ** (3/2)) #计算偏度
                RDKurt = len(r) * sum(r**4) / (RDVar ** 2) #计算峰度
                date_var.append(RDVar)
                date_skew.append(RDSkew)
                date_kurt.append(RDKurt)
                date_l.append(date_minute) #如果出现停牌股票,需要记录时间长度
            if len(date_var) == 0:
                break
            date_df = pd.DataFrame(date_var,index=date_l,columns=['var'])
            date_df['skew'] = date_skew
            date_df['kurt'] = date_kurt
            length = len(date_df) - 1
            if length == 0:
                length = 1
            rvol = ((242/length)*sum(date_df['var'])) ** (1/2) 
            rskew = sum(date_df['skew'])/length
            rkurt = sum(date_df['kurt'])/length
            rvol_l.append(rvol)
            rskew_l.append(rskew)
            rkurt_l.append(rkurt)
            stock_l.append(stock) 

        stocks_df = pd.DataFrame(rvol_l,index=stock_l,columns=['RVol'])
        stocks_df['RSkew'] = rskew_l
        stocks_df['RKurt'] = rkurt_l
        dic[date] = stocks_df
    et = time.time()
    t = (et - st) / 60
    print('time:',t)
    return dic
#次函数耗时较长,保存以备下次方便使用
res_day = caculate_factor(stocks_dic,N=N,n=n)
with open('factor_dic_day_new.pkl','wb') as pk_file:
    pickle.dump(res_day,pk_file)
"\nres_day = caculate_factor(stocks_dic,N=N,n=n)\nwith open('factor_dic_day_new.pkl','wb') as pk_file:\n    pickle.dump(res_day,pk_file)\n"
with open('factor_dic_day_new.pkl','rb') as pk_file:
    res = pickle.load(pk_file)

分布特征¶

keys = list(res.keys())
#历史数据中的分布特征
resample_keys = keys[::10] #0.1采样
resample_l = []
for key in resample_keys:
    resample_l.append(res[key])
resample_df = pd.concat(resample_l)
figure = plt.figure(figsize=(18,8))
ax1 = plt.subplot(131)
plt.title('Market Volitility')
sns.kdeplot(resample_df['RVol'], shade=True, color="g", label="RVol",alpha=.7)
ax2 = plt.subplot(132)
plt.title('Market Skewness')
plt.yticks()
sns.kdeplot(resample_df['RSkew'], shade=True, color="g", label="RSkew",alpha=.7)
ax3 = plt.subplot(133)
plt.title('Market Kurtosis')
sns.kdeplot(resample_df['RKurt'], shade=True, color="g", label="RKurt",alpha=.7)
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c226d3cf8>

百分位数据获取及图形

#百分位数据获取及图形
q1 = []
q2 = []
q3 = []
for key in resample_keys:
    quantile = res[key].quantile([0.1,0.25,0.5,0.75,0.9])
    qvol = quantile['RVol']
    qvol.name = key
    q1.append(qvol)
    qskew = quantile['RSkew']
    qskew.name = key
    q2.append(qskew)
    qkurt = quantile['RKurt']
    qkurt.name = key
    q3.append(qkurt)
q1_df = pd.concat(q1,axis=1)
q1_df = q1_df.stack().unstack(0)
q2_df = pd.concat(q2,axis=1)
q2_df = q2_df.stack().unstack(0)
q3_df = pd.concat(q3,axis=1)
q3_df = q3_df.stack().unstack(0)
figure = plt.figure(figsize=(12,15))
ax1 = plt.subplot(311)
plt.title('Volitility percentiles')
q1_df.plot(ax=ax1,xticks=range(0,len(q1_df),20))

ax2 = plt.subplot(312)
plt.title('Skewness percentiles')
q2_df.plot(ax=ax2,xticks=range(0,len(q2_df),20))

ax3 = plt.subplot(313)
plt.title('Kurtosis percentiles')
q3_df.plot(ax=ax3,xticks=range(0,len(q3_df),20))
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c21423400>

分组表现¶

def group_profit(week_list,stocks_dic,factor_dic):
    '''
    计算分组收益
    input:
    week_list:list,时间序列,本策略使用周频
    factot_dic:dic,key为时间,value为因子值
    output:
    六个dataframe,分别为各个因子的收益和累计收益
    '''
    length = len(week_list)

    profit_vol_list = []
    profit_skew_list = []
    profit_kurt_list = []
    date_l = []
    for i in range(length-1):
        date = week_list[i]
        date_pro = week_list[i+1]
        stocks = list(stocks_dic[date])
        #计算未来一周收益
        price = get_price(stocks,end_date=date_pro,count=2,frequency='5d',fields=['close'])['close']
        profit = price.pct_change().dropna(how='all')
        profit = profit.stack().unstack(0)
        factor = factor_dic[date]
        factor_vol = factor['RVol']
        factor_skew = factor['RSkew']
        factor_kurt = factor['RKurt']
        cut_list_vol = cut_data_with_num(factor_vol)
        cut_list_skew = cut_data_with_num(factor_skew)
        cut_list_kurt = cut_data_with_num(factor_kurt)

        cut_vol_l = []
        for sel_stocks in cut_list_vol:
            stocks_mean_profit = profit.ix[sel_stocks].mean()
            cut_vol_l.append(stocks_mean_profit)
        cut_vol_df = pd.concat(cut_vol_l,axis=1)
        profit_vol_list.append(cut_vol_df)

        cut_skew_l = []
        for sel_stocks in cut_list_skew:
            stocks_mean_profit = profit.ix[sel_stocks].mean()
            cut_skew_l.append(stocks_mean_profit)
        cut_skew_df = pd.concat(cut_skew_l,axis=1)
        profit_skew_list.append(cut_skew_df)

        cut_kurt_l = []
        for sel_stocks in cut_list_kurt:
            stocks_mean_profit = profit.ix[sel_stocks].mean()
            cut_kurt_l.append(stocks_mean_profit)
        cut_kurt_df = pd.concat(cut_kurt_l,axis=1)
        profit_kurt_list.append(cut_kurt_df)
        
        date_l.append(date)

    profit_df_vol = pd.concat(profit_vol_list)
    profit_df_vol.index = date_l
    profit_df_vol.columns = range(1,6)
    profit_cump_vol = (profit_df_vol + 1).cumprod()
    profit_df_skew = pd.concat(profit_skew_list)
    profit_df_skew.index = date_l
    profit_df_skew.columns = range(1,6)
    profit_cump_skew = (profit_df_skew + 1).cumprod()
    profit_df_kurt = pd.concat(profit_kurt_list)
    profit_df_kurt.index = date_l
    profit_df_kurt.columns = range(1,6)
    profit_cump_kurt = (profit_df_kurt + 1).cumprod()   
    return profit_df_vol,profit_cump_vol,profit_df_skew,profit_cump_skew,profit_df_kurt,profit_cump_kurt

调仓函数¶

#周频调仓
week_list = keys[::cycle] #每隔5天采样
profit_df_vol,profit_cump_vol,profit_df_skew,profit_cump_skew,profit_df_kurt,profit_cump_kurt = group_profit(week_list,stocks_dic,res)
#注意:此函数在修改调仓周期后,fre参数要对应修改
def get_base_profit(week_list,fre=str(cycle)+'d',base_index='000982.XSHG'):
    '''
    计算基准收益
    input:
    week_list:list,时间序列
    fre:本策略中使用’5d',此数值必须和week_list中的时间间距对应
    base_index:股票指数代码,默认中证500
    output:
    dataframe,index为时间,value为收益
    '''
    length = len(week_list)
    base_profit_l = []
    date_l = []
    #计算基准收益
    for i in range(length-1):
        date = week_list[i]
        date_pro = week_list[i+1]
        price = get_price(base_index,end_date=date_pro,count=2,frequency=fre,fields=['close'])['close']
        profit = price.pct_change().dropna(how='all')
        base_profit_l.append(profit)
        date_l.append(date)
    base_profit = pd.concat(base_profit_l)
    base_profit.index = date_l
    return base_profit
figure = plt.figure(figsize=(12,16))
ax1 = plt.subplot(311)
plt.title('Volitility profit')
profit_cump_vol.plot(ax=ax1)

ax2 = plt.subplot(312)
plt.title('Skewness profit')
profit_cump_skew.plot(ax=ax2)

ax3 = plt.subplot(313)
plt.title('Kurtosis profit')
profit_cump_kurt.plot(ax=ax3,xticks=range(0,len(profit_cump_kurt),30))
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c20da2a58>

评估¶

#计算IC
#周频调仓
def caculate_IC(week_list,factor_dic):
    '''
    计算IC
    input:
    week_list:list,时间序列,本策略使用周频
    factot_dic:dic,key为时间,value为因子值
    output:
    dataframe,index为时间,values为IC值
    '''
    length = len(week_list)

    ic_l = []
    date_l = []
    for i in range(length-1):
        date = week_list[i]
        date_pro = week_list[i+1]
        stocks = list(stocks_dic[date])
        price = get_price(stocks,end_date=date_pro,count=2,frequency='5d',fields=['close'])['close']
        profit = price.pct_change().dropna(how='all') #此时index保留为下一期,实际收益应该为当期
        profit = profit.stack().unstack(0)
        factor = factor_dic[date]['RSkew']
        ic_day = profit.corrwith(factor)
        ic_l.append(ic_day.values)
        date_l.append(date)
    ic_df = pd.DataFrame(ic_l,index=date_l,columns=['IC'])
    return ic_df
ic_df = caculate_IC(week_list,res)

计算全周期IC指标

#全周期内IC
max_ic = ic_df.max()
min_ic = ic_df.min()
std_ic = ic_df.std()
nag_value = (ic_df < 0).astype(int)
nag_ratio = nag_value.sum() / len(nag_value)
print('最小IC:',min_ic.values[0])
print('最大IC:',max_ic.values[0])
print('IC标准差:',std_ic.values[0])
print('负IC占比:',nag_ratio.values[0])
最小IC: -0.3173311157771765
最大IC: 0.22862344636517615
IC标准差: 0.09732507632155726
负IC占比: 0.6617647058823529
#IC值及移动平均线 
ic_rolling_mean = ic_df.rolling(12).mean()
ic_merge_df = pd.concat([ic_df,ic_rolling_mean],axis=1).dropna()
ic_merge_df.columns = ['IC','IC rolling mean']

figure = plt.figure(figsize=(12,6))
ax = plt.subplot(111)
plt.title('IC and rolling mean')
ic_merge_df.plot(ax=ax,xticks=range(0,len(ic_merge_df),20))
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c20bece80>
#计算基准收益
base_profit = get_base_profit(week_list)
base_profit_cump = (base_profit + 1).cumprod()
#策略收益
buy_profit = profit_df_vol.iloc[:,0]
buy_profit_cump = (buy_profit + 1).cumprod()
#收益差
delta_profit = buy_profit - base_profit
figure = plt.figure(figsize=(12,6))
ax = plt.subplot(111)
ax_sub = ax.twinx()
plt.title('收益曲线')
ax.plot(base_profit_cump,'r',label='基准收益')
ax.plot(buy_profit_cump,'g',label='做多收益')
ax_sub.plot(delta_profit,label='差额收益(右轴)')

plt.xticks(list(base_profit_cump.index)[::30])
ax.legend(loc='upper left')
ax_sub.legend(loc='upper right')
<matplotlib.legend.Legend at 0x7f7c20876940>

年度指标计算

#分年度指标
def indicator_caculate(week_list,profit_df_vol,factor_dic,base_index='000982.XSHG'):

    base_profit = get_base_profit(week_list,base_index=base_index)
    ind = list(profit_df_vol.index)
    profit_buy = profit_df_vol.iloc[:,0]
    profit_sell = profit_df_vol.iloc[:,-1]
    
    week_date_set = list(set(week_list) & set(ind))
    week_date_set.sort()
    year_num = set([y[:4] for y in week_date_set])
    year_num = list(year_num)
    year_num.sort()
    cumprod_profit_l = []
    buy_sell_cumprod_profit_l = []
    buy_maxdrowdown_l = []
    buy_sell_maxdrowdowm_l = []
    buy_volitility_l = []
    buy_sell_volitility_l = []
    ir_l = []
    for y in year_num:
        year_date = [i for i in week_date_set if y == i[:4]]
        profit_buy_year = profit_buy.ix[year_date]
        base_profit_year = base_profit.ix[year_date]
        profit_buy_year_cump = (profit_buy_year + 1).cumprod()
        profit_sell_year = profit_sell.ix[year_date]
        #计算收益
        buy_sell = profit_buy_year - profit_sell_year
        buy_sell_cump = (buy_sell + 1).cumprod()
        cumprod_profit = profit_buy_year_cump.iloc[-1] -1 
        cumprod_profit_l.append(cumprod_profit)
        buy_sell_cumprod_profit = buy_sell_cump.iloc[-1] -1 
        buy_sell_cumprod_profit_l.append(buy_sell_cumprod_profit)
        
        #计算最大回撤
        buy_maxdrowdown = find_max_drawdown(profit_buy_year_cump)
        buy_maxdrowdown_l.append(buy_maxdrowdown)
        buy_sell_maxdrowdowm = find_max_drawdown(buy_sell_cump)
        buy_sell_maxdrowdowm_l.append(buy_sell_maxdrowdowm)
        
        #计算年化波动率
        buy_volitility = profit_buy_year.std()
        buy_volitility_l.append(buy_volitility*len(profit_buy_year)**(1/2))
        buy_sell_volitility = buy_sell.std()
        buy_sell_volitility_l.append(buy_sell_volitility*len(buy_sell)**(1/2))
        
        #计算信息比率
        base_profit_cump = (base_profit_year + 1).cumprod() #基准累计收益
        base_last_profit = base_profit_cump.iloc[-1] -1 #基准年化收益
        delta_last_profit = cumprod_profit - base_last_profit #策略与基准年化收益差
        delta_profit = profit_buy_year - base_profit_year #略与基准每日收益差值
        ir = delta_last_profit / (delta_profit.std()*len(delta_profit)**(1/2)) #标准差作年化处理
        ir_l.append(ir)
        
    indicator_df = pd.DataFrame(cumprod_profit_l,index=year_num,columns=['收益'])
    indicator_df['多空收益'] = buy_sell_cumprod_profit_l
    indicator_df['最大回撤'] = buy_maxdrowdown_l
    indicator_df['多空最大回撤'] = buy_sell_maxdrowdowm_l
    indicator_df['年化波动率'] = buy_volitility_l
    indicator_df['多空年化波动率'] = buy_sell_volitility_l
    indicator_df['信息比率'] = ir_l
    columns = indicator_df.columns
    for col in columns[:-1]:
        indicator_df[col] = indicator_df[col].apply(lambda x: '%.2f%%' % (x*100))        
    return indicator_df
indicator_df = indicator_caculate(week_list,profit_df_vol,res)
indicator_df
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
收益 多空收益 最大回撤 多空最大回撤 年化波动率 多空年化波动率 信息比率
2015 72.48% 62.31% 41.11% 9.56% 52.10% 24.25% 3.625162
2016 5.13% 23.44% 11.96% 7.93% 26.21% 14.46% 1.637960
2017 -6.79% -5.10% 12.61% 13.18% 14.50% 14.72% -0.644916
2018 -31.77% 9.33% 34.73% 8.92% 24.99% 11.95% 0.644675
2019 28.35% -9.45% 2.51% 16.15% 9.58% 11.09% -1.110282

IC年度指标计算

def IC_indicator(week_list,factor_dic):
    
    year_num = set([y[:4] for y in week_list])
    year_num = list(year_num)
    year_num.sort()
    ic_mean_l = []
    ic_min_l = []
    ic_max_l = []
    ic_std_l = []
    ic_nag_l = []
    for y in year_num:
        year_date = [i for i in week_list if y == i[:4]]
        ic = caculate_IC(year_date,factor_dic)
        ic_mean_l.append(ic.mean().values[0])
        ic_min_l.append(ic.min().values[0])
        ic_max_l.append(ic.max().values[0])
        ic_std_l.append(ic.std().values[0])
        nag_value = (ic < 0).astype(int)
        nag_ratio = nag_value.sum().values[0] / len(nag_value)
        ic_nag_l.append(nag_ratio)
    ic_indicator = pd.DataFrame(ic_mean_l,index=year_num,columns=['IC均值'])
    ic_indicator['IC最小值'] = ic_min_l
    ic_indicator['IC最大值'] = ic_max_l
    ic_indicator['IC标准差'] = ic_std_l
    ic_indicator['负IC占比'] = ic_nag_l
    return ic_indicator
IC_indicator(week_list,res)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
IC均值 IC最小值 IC最大值 IC标准差 负IC占比
2015 -0.045475 -0.262305 0.171658 0.101779 0.723404
2016 -0.044675 -0.317331 0.228623 0.098602 0.666667
2017 -0.034935 -0.182469 0.130435 0.079650 0.687500
2018 -0.032929 -0.253466 0.202934 0.094992 0.595745
2019 0.018365 -0.196028 0.223170 0.150785 0.600000

总结¶

1、利用个股高频价格数据构建了个股高阶距;

2、实证结果表明,波动率和峰度在周频换仓的情况下对个股收益率区分度不高,而偏度在中证500成分股中的分档收益区分度明显,分档收益单调性明显;

3.2015年至今,负IC占比均在60%以上,偏度因子与下期收益具有负相关性。

4.多空组合策略在最大回撤和波动率指标上有明显改善,表现出色。

全部回复

0/140

量化课程

    移动端课程