请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  数理科学 帖子:3364449 新帖:15

“订单簿的温度”系列研究(一):反转因子的精细结构

我太难了发表于:7 月 25 日 12:00回复(1)

研究目的:

本文参考东吴证券研报《“订单簿的温度”系列研究(一):反转因子的精细结构》,根据研报分析,A股市场是订单驱动型市场。从动力学的角度讲,股票行情的所有演化过程,都能由订单簿(orderbook)自下而上精确决定。逐笔成交与逐笔委托数据的信息量非常丰富。本篇报告我们从最简单的数据入手,考察了“成交笔数”这个指标。所谓成交笔数,即撮合交易的次数,是从逐笔成交数据中汇总出来的统计量借助成交笔数的信息,对传统反转因子进行切割,首次提出一个理想反转因子, 实现对未来收益的预测,为订单簿因子挖掘提供了一定思路。

研究内容:

(1)研究在订单簿数据中挖掘 alpha 因子,考虑到传统反转因子在稳定性上的困难,本文认为传统反转因子存在动量效应与反转效应,因此借助单笔成交金额信息用于实现 W 切割,切割后形成的新因子称为理想反转因子。
(2)针对全 A 股数据,对理想反转因子进行单因子有效性测试,分别从因子有效性显著性检验、因子 IC 分析以及分层回测这三个角度分析因子有效性。
(3)进一步分析理想反转因子,分别就参数 N 的取值、样本空间的选择、因子收益的累积以及分组比例这四个角度对理想反转因子进行分析。

研究结论:

(1)对理想反转因子进行单因子有效性分析,根据因子收益率显著性检验结果,t 值绝对值序列的均值为 3.96,因子 IC 分析结果为 IC 序列均值为 0.0492,IR 值为 0.54,分层回测结果如下:组合 1 能够明显跑赢组合 5,且每个组合都能够跑赢 HS300 指数,且组合 1 能够吗明显获得更高的收益。
(2)对理想反转因子进行深入分析,当 N 分别取 20、40、60 时,IC 分析结果同样效果出色,但是相对而言,随着 N 取值越大,因子有效性不断降低,可见合理选择 N 的取值有利于获取对预测未来收益更为有效性的因子。
(3)针对 HS300 股票池,理想反转因子的五分组多空对冲总收益为 47.74%,年化波动 31.38%,夏普比率为 0.314,最大回撤 为 9.69%;原始反转因子 Ret20 的五分组多空对冲总收益为 16.22%,年化波动 53.88%,夏普比率为 -0.016,最大回撤为 20.88%。
(4)由于多空组合收益累积过程比较均匀,因此可以尝试做周频调仓或半月调仓。针对 N=60 的情况,当高 D 组的分组比例 X 取值在 25 - 50 之间时,均能够取得较好的结果,当 X 为 42 时,IC 值能够获得最大值。

注:个股每日成交的逐笔数据见 trades.csv

1 数据准备¶

1.1 日期列表获取¶

在每个月的月末对因子数据进行提取,因此需要对每个月的月末日期进行统计。
输入参数分别为 peroid、start_date 和 end_date,其中 peroid 进行周期选择,可选周期为周(W)、月(M)和季(Q),start_date和end_date 分别为开始日期和结束日期。
函数返回值为对应的月末日期。本文选取开始日期为 2013.1.1,结束日期为 2018.1.1。

from jqdata import *
import datetime
import pandas as pd
import numpy as np
from six import StringIO
import warnings
import time
import pickle
from jqfactor import winsorize_med
from jqfactor import neutralize
from jqfactor import standardlize
import statsmodels.api as sm
warnings.filterwarnings("ignore")
#获取指定周期的日期列表 'W、M、Q'
def get_period_date(peroid,start_date, end_date):
    #设定转换周期period_type  转换为周是'W',月'M',季度线'Q',五分钟'5min',12天'12D'
    stock_data = get_price('000001.XSHE',start_date,end_date,'daily',fields=['close'])
    #记录每个周期中最后一个交易日
    stock_data['date']=stock_data.index
    #进行转换,周线的每个变量都等于那一周中最后一个交易日的变量值
    period_stock_data=stock_data.resample(peroid,how='last')
    date=period_stock_data.index
    pydate_array = date.to_pydatetime()
    date_only_array = np.vectorize(lambda s: s.strftime('%Y-%m-%d'))(pydate_array )
    date_only_series = pd.Series(date_only_array)
    start_date = datetime.datetime.strptime(start_date, "%Y-%m-%d")
    start_date=start_date-datetime.timedelta(days=1)
    start_date = start_date.strftime("%Y-%m-%d")
    date_list=date_only_series.values.tolist()
    date_list.insert(0,start_date)
    return date_list
get_period_date('M','2017-01-01', '2018-01-01')
['2016-12-31',
 '2017-01-31',
 '2017-02-28',
 '2017-03-31',
 '2017-04-30',
 '2017-05-31',
 '2017-06-30',
 '2017-07-31',
 '2017-08-31',
 '2017-09-30',
 '2017-10-31',
 '2017-11-30',
 '2017-12-31']

1.2 股票列表获取¶

股票池: 全 A 股
股票筛选:剔除 ST 股票,剔除上市 3 个月内的股票,每只股票视作一个样本
取 2016-08-31 当天的股票成分股

#去除上市距beginDate不足2个月的股票
def delect_stop(stocks,beginDate,n=30*2):
    stockList = []
    beginDate = datetime.datetime.strptime(beginDate, "%Y-%m-%d")
    for stock in stocks:
        start_date = get_security_info(stock).start_date
        if start_date < (beginDate-datetime.timedelta(days = n)).date():
            stockList.append(stock)
    return stockList

#获取股票池
def get_stock_A(begin_date):
    begin_date = str(begin_date)
    stockList = get_index_stocks('000002.XSHG',begin_date)+get_index_stocks('399107.XSHE',begin_date)
    #剔除ST股
    st_data = get_extras('is_st', stockList, count = 1, end_date=begin_date)
    stockList = [stock for stock in stockList if not st_data[stock][0]]
    #剔除停牌、新股及退市股票
    stockList = delect_stop(stockList, begin_date)
    return stockList
get_stock_A("2018-12-31")
[u'600000.XSHG',
 u'600004.XSHG',
 u'600006.XSHG',
 u'600007.XSHG',
 u'600008.XSHG',
 u'600009.XSHG',
 u'600010.XSHG',
 u'600011.XSHG',
 u'600012.XSHG',
 u'600015.XSHG',
 u'600016.XSHG',
 u'600017.XSHG',
 u'600018.XSHG',
 u'600019.XSHG',
 u'600020.XSHG',
 u'600021.XSHG',
 u'600022.XSHG',
 u'600023.XSHG',
 u'600025.XSHG',
 u'600026.XSHG',
 u'600027.XSHG',
 u'600028.XSHG',
 u'600029.XSHG',
 u'600030.XSHG',
 u'600031.XSHG',
 u'600033.XSHG',
 u'600035.XSHG',
 u'600036.XSHG',
 u'600037.XSHG',
 u'600038.XSHG',
 u'600039.XSHG',
 u'600048.XSHG',
 u'600050.XSHG',
 u'600051.XSHG',
 u'600052.XSHG',
 u'600053.XSHG',
 u'600054.XSHG',
 u'600055.XSHG',
 u'600056.XSHG',
 u'600057.XSHG',
 u'600058.XSHG',
 u'600059.XSHG',
 u'600060.XSHG',
 u'600061.XSHG',
 u'600062.XSHG',
 u'600063.XSHG',
 u'600064.XSHG',
 u'600066.XSHG',
 u'600067.XSHG',
 u'600068.XSHG',
 u'600069.XSHG',
 u'600070.XSHG',
 u'600071.XSHG',
 u'600072.XSHG',
 u'600073.XSHG',
 u'600075.XSHG',
 u'600076.XSHG',
 u'600077.XSHG',
 u'600078.XSHG',
 u'600079.XSHG',
 u'600080.XSHG',
 u'600081.XSHG',
 u'600082.XSHG',
 u'600083.XSHG',
 u'600084.XSHG',
 u'600085.XSHG',
 u'600086.XSHG',
 u'600088.XSHG',
 u'600089.XSHG',
 u'600090.XSHG',
 u'600093.XSHG',
 u'600094.XSHG',
 u'600095.XSHG',
 u'600096.XSHG',
 u'600097.XSHG',
 u'600098.XSHG',
 u'600099.XSHG',
 u'600100.XSHG',
 u'600101.XSHG',
 u'600103.XSHG',
 u'600104.XSHG',
 u'600105.XSHG',
 u'600106.XSHG',
 u'600107.XSHG',
 u'600108.XSHG',
 u'600109.XSHG',
 u'600110.XSHG',
 u'600111.XSHG',
 u'600112.XSHG',
 u'600113.XSHG',
 u'600114.XSHG',
 u'600115.XSHG',
 u'600116.XSHG',
 u'600117.XSHG',
 u'600118.XSHG',
 u'600119.XSHG',
 u'600120.XSHG',
 u'600121.XSHG',
 u'600122.XSHG',
 u'600123.XSHG',
 u'600125.XSHG',
 u'600126.XSHG',
 u'600127.XSHG',
 u'600128.XSHG',
 u'600129.XSHG',
 u'600130.XSHG',
 u'600131.XSHG',
 u'600132.XSHG',
 u'600133.XSHG',
 u'600135.XSHG',
 u'600136.XSHG',
 u'600137.XSHG',
 u'600138.XSHG',
 u'600139.XSHG',
 u'600141.XSHG',
 u'600143.XSHG',
 u'600146.XSHG',
 u'600148.XSHG',
 u'600151.XSHG',
 u'600152.XSHG',
 u'600153.XSHG',
 u'600155.XSHG',
 u'600156.XSHG',
 u'600157.XSHG',
 u'600158.XSHG',
 u'600159.XSHG',
 u'600160.XSHG',
 u'600161.XSHG',
 u'600162.XSHG',
 u'600163.XSHG',
 u'600165.XSHG',
 u'600166.XSHG',
 u'600167.XSHG',
 u'600168.XSHG',
 u'600169.XSHG',
 u'600170.XSHG',
 u'600171.XSHG',
 u'600172.XSHG',
 u'600173.XSHG',
 u'600175.XSHG',
 u'600176.XSHG',
 u'600177.XSHG',
 u'600178.XSHG',
 u'600179.XSHG',
 u'600180.XSHG',
 u'600183.XSHG',
 u'600184.XSHG',
 u'600185.XSHG',
 u'600186.XSHG',
 u'600187.XSHG',
 u'600188.XSHG',
 u'600189.XSHG',
 u'600190.XSHG',
 u'600191.XSHG',
 u'600192.XSHG',
 u'600195.XSHG',
 u'600196.XSHG',
 u'600197.XSHG',
 u'600199.XSHG',
 u'600200.XSHG',
 u'600201.XSHG',
 u'600203.XSHG',
 u'600206.XSHG',
 u'600207.XSHG',
 u'600208.XSHG',
 u'600210.XSHG',
 u'600211.XSHG',
 u'600212.XSHG',
 u'600213.XSHG',
 u'600215.XSHG',
 u'600216.XSHG',
 u'600217.XSHG',
 u'600218.XSHG',
 u'600219.XSHG',
 u'600220.XSHG',
 u'600221.XSHG',
 u'600222.XSHG',
 u'600223.XSHG',
 u'600225.XSHG',
 u'600226.XSHG',
 u'600227.XSHG',
 u'600229.XSHG',
 u'600230.XSHG',
 u'600231.XSHG',
 u'600232.XSHG',
 u'600233.XSHG',
 u'600235.XSHG',
 u'600236.XSHG',
 u'600237.XSHG',
 u'600239.XSHG',
 u'600240.XSHG',
 u'600241.XSHG',
 u'600242.XSHG',
 u'600243.XSHG',
 u'600246.XSHG',
 u'600248.XSHG',
 u'600249.XSHG',
 u'600250.XSHG',
 u'600251.XSHG',
 u'600252.XSHG',
 u'600255.XSHG',
 u'600256.XSHG',
 u'600257.XSHG',
 u'600258.XSHG',
 u'600259.XSHG',
 u'600260.XSHG',
 u'600261.XSHG',
 u'600262.XSHG',
 u'600266.XSHG',
 u'600267.XSHG',
 u'600268.XSHG',
 u'600269.XSHG',
 u'600271.XSHG',
 u'600272.XSHG',
 u'600273.XSHG',
 u'600276.XSHG',
 u'600277.XSHG',
 u'600278.XSHG',
 u'600279.XSHG',
 u'600280.XSHG',
 u'600281.XSHG',
 u'600282.XSHG',
 u'600283.XSHG',
 u'600284.XSHG',
 u'600285.XSHG',
 u'600287.XSHG',
 u'600288.XSHG',
 u'600290.XSHG',
 u'600291.XSHG',
 u'600292.XSHG',
 u'600293.XSHG',
 u'600295.XSHG',
 u'600297.XSHG',
 u'600298.XSHG',
 u'600299.XSHG',
 u'600300.XSHG',
 u'600302.XSHG',
 u'600303.XSHG',
 u'600305.XSHG',
 u'600306.XSHG',
 u'600307.XSHG',
 u'600308.XSHG',
 u'600309.XSHG',
 u'600310.XSHG',
 u'600311.XSHG',
 u'600312.XSHG',
 u'600313.XSHG',
 u'600315.XSHG',
 u'600316.XSHG',
 u'600317.XSHG',
 u'600318.XSHG',
 u'600319.XSHG',
 u'600320.XSHG',
 u'600322.XSHG',
 u'600323.XSHG',
 u'600325.XSHG',
 u'600326.XSHG',
 u'600327.XSHG',
 u'600328.XSHG',
 u'600329.XSHG',
 u'600330.XSHG',
 u'600331.XSHG',
 u'600332.XSHG',
 u'600333.XSHG',
 u'600335.XSHG',
 u'600336.XSHG',
 u'600337.XSHG',
 u'600338.XSHG',
 u'600339.XSHG',
 u'600340.XSHG',
 u'600343.XSHG',
 u'600345.XSHG',
 u'600346.XSHG',
 u'600348.XSHG',
 u'600350.XSHG',
 u'600351.XSHG',
 u'600352.XSHG',
 u'600353.XSHG',
 u'600354.XSHG',
 u'600355.XSHG',
 u'600356.XSHG',
 u'600358.XSHG',
 u'600359.XSHG',
 u'600360.XSHG',
 u'600361.XSHG',
 u'600362.XSHG',
 u'600363.XSHG',
 u'600365.XSHG',
 u'600366.XSHG',
 u'600367.XSHG',
 u'600368.XSHG',
 u'600369.XSHG',
 u'600370.XSHG',
 u'600371.XSHG',
 u'600372.XSHG',
 u'600373.XSHG',
 u'600375.XSHG',
 u'600376.XSHG',
 u'600377.XSHG',
 u'600378.XSHG',
 u'600379.XSHG',
 u'600380.XSHG',
 u'600381.XSHG',
 u'600382.XSHG',
 u'600383.XSHG',
 u'600385.XSHG',
 u'600386.XSHG',
 u'600387.XSHG',
 u'600388.XSHG',
 u'600389.XSHG',
 u'600390.XSHG',
 u'600391.XSHG',
 u'600392.XSHG',
 u'600393.XSHG',
 u'600395.XSHG',
 u'600396.XSHG',
 u'600398.XSHG',
 u'600400.XSHG',
 u'600403.XSHG',
 u'600405.XSHG',
 u'600406.XSHG',
 u'600409.XSHG',
 u'600410.XSHG',
 u'600415.XSHG',
 u'600416.XSHG',
 u'600418.XSHG',
 u'600419.XSHG',
 u'600420.XSHG',
 u'600422.XSHG',
 u'600425.XSHG',
 u'600426.XSHG',
 u'600428.XSHG',
 u'600429.XSHG',
 u'600433.XSHG',
 u'600435.XSHG',
 u'600436.XSHG',
 u'600438.XSHG',
 u'600439.XSHG',
 u'600444.XSHG',
 u'600446.XSHG',
 u'600448.XSHG',
 u'600449.XSHG',
 u'600452.XSHG',
 u'600455.XSHG',
 u'600456.XSHG',
 u'600458.XSHG',
 u'600459.XSHG',
 u'600460.XSHG',
 u'600461.XSHG',
 u'600462.XSHG',
 u'600463.XSHG',
 u'600466.XSHG',
 u'600467.XSHG',
 u'600468.XSHG',
 u'600469.XSHG',
 u'600470.XSHG',
 u'600475.XSHG',
 u'600476.XSHG',
 u'600477.XSHG',
 u'600478.XSHG',
 u'600479.XSHG',
 u'600480.XSHG',
 u'600481.XSHG',
 u'600482.XSHG',
 u'600483.XSHG',
 u'600485.XSHG',
 u'600486.XSHG',
 u'600487.XSHG',
 u'600488.XSHG',
 u'600489.XSHG',
 u'600490.XSHG',
 u'600491.XSHG',
 u'600493.XSHG',
 u'600495.XSHG',
 u'600496.XSHG',
 u'600497.XSHG',
 u'600498.XSHG',
 u'600499.XSHG',
 u'600500.XSHG',
 u'600501.XSHG',
 u'600502.XSHG',
 u'600503.XSHG',
 u'600505.XSHG',
 u'600506.XSHG',
 u'600507.XSHG',
 u'600508.XSHG',
 u'600509.XSHG',
 u'600510.XSHG',
 u'600511.XSHG',
 u'600512.XSHG',
 u'600513.XSHG',
 u'600515.XSHG',
 u'600516.XSHG',
 u'600517.XSHG',
 u'600518.XSHG',
 u'600519.XSHG',
 u'600520.XSHG',
 u'600521.XSHG',
 u'600522.XSHG',
 u'600523.XSHG',
 u'600525.XSHG',
 u'600526.XSHG',
 u'600527.XSHG',
 u'600528.XSHG',
 u'600529.XSHG',
 u'600530.XSHG',
 u'600531.XSHG',
 u'600532.XSHG',
 u'600533.XSHG',
 u'600535.XSHG',
 u'600536.XSHG',
 u'600537.XSHG',
 u'600538.XSHG',
 u'600540.XSHG',
 u'600543.XSHG',
 u'600545.XSHG',
 u'600546.XSHG',
 u'600547.XSHG',
 u'600548.XSHG',
 u'600549.XSHG',
 u'600550.XSHG',
 u'600551.XSHG',
 u'600552.XSHG',
 u'600555.XSHG',
 u'600557.XSHG',
 u'600558.XSHG',
 u'600559.XSHG',
 u'600560.XSHG',
 u'600561.XSHG',
 u'600562.XSHG',
 u'600563.XSHG',
 u'600565.XSHG',
 u'600566.XSHG',
 u'600567.XSHG',
 u'600568.XSHG',
 u'600569.XSHG',
 u'600570.XSHG',
 u'600571.XSHG',
 u'600572.XSHG',
 u'600573.XSHG',
 u'600575.XSHG',
 u'600576.XSHG',
 u'600577.XSHG',
 u'600578.XSHG',
 u'600579.XSHG',
 u'600580.XSHG',
 u'600581.XSHG',
 u'600582.XSHG',
 u'600583.XSHG',
 u'600584.XSHG',
 u'600585.XSHG',
 u'600586.XSHG',
 u'600587.XSHG',
 u'600588.XSHG',
 u'600589.XSHG',
 u'600590.XSHG',
 u'600592.XSHG',
 u'600593.XSHG',
 u'600594.XSHG',
 u'600595.XSHG',
 u'600596.XSHG',
 u'600597.XSHG',
 u'600598.XSHG',
 u'600599.XSHG',
 u'600600.XSHG',
 u'600601.XSHG',
 u'600602.XSHG',
 u'600603.XSHG',
 u'600604.XSHG',
 u'600605.XSHG',
 u'600606.XSHG',
 u'600609.XSHG',
 u'600611.XSHG',
 u'600612.XSHG',
 u'600613.XSHG',
 u'600614.XSHG',
 u'600615.XSHG',
 u'600616.XSHG',
 u'600617.XSHG',
 u'600618.XSHG',
 u'600619.XSHG',
 u'600620.XSHG',
 u'600621.XSHG',
 u'600622.XSHG',
 u'600623.XSHG',
 u'600624.XSHG',
 u'600626.XSHG',
 u'600628.XSHG',
 u'600629.XSHG',
 u'600630.XSHG',
 u'600633.XSHG',
 u'600635.XSHG',
 u'600636.XSHG',
 u'600637.XSHG',
 u'600638.XSHG',
 u'600639.XSHG',
 u'600640.XSHG',
 u'600641.XSHG',
 u'600642.XSHG',
 u'600643.XSHG',
 u'600644.XSHG',
 u'600645.XSHG',
 u'600647.XSHG',
 u'600648.XSHG',
 u'600649.XSHG',
 u'600650.XSHG',
 u'600651.XSHG',
 u'600652.XSHG',
 u'600653.XSHG',
 u'600655.XSHG',
 u'600657.XSHG',
 u'600658.XSHG',
 u'600660.XSHG',
 u'600661.XSHG',
 u'600662.XSHG',
 u'600663.XSHG',
 u'600664.XSHG',
 u'600665.XSHG',
 u'600666.XSHG',
 u'600667.XSHG',
 u'600668.XSHG',
 u'600671.XSHG',
 u'600673.XSHG',
 u'600674.XSHG',
 u'600675.XSHG',
 u'600676.XSHG',
 u'600677.XSHG',
 u'600678.XSHG',
 u'600679.XSHG',
 u'600681.XSHG',
 u'600682.XSHG',
 u'600683.XSHG',
 u'600684.XSHG',
 u'600685.XSHG',
 u'600686.XSHG',
 u'600687.XSHG',
 u'600688.XSHG',
 u'600689.XSHG',
 u'600690.XSHG',
 u'600691.XSHG',
 u'600692.XSHG',
 u'600693.XSHG',
 u'600694.XSHG',
 u'600695.XSHG',
 u'600697.XSHG',
 u'600698.XSHG',
 u'600699.XSHG',
 u'600702.XSHG',
 u'600703.XSHG',
 u'600704.XSHG',
 u'600705.XSHG',
 u'600706.XSHG',
 u'600707.XSHG',
 u'600708.XSHG',
 u'600710.XSHG',
 u'600711.XSHG',
 u'600712.XSHG',
 u'600713.XSHG',
 u'600714.XSHG',
 u'600715.XSHG',
 u'600716.XSHG',
 u'600717.XSHG',
 u'600718.XSHG',
 u'600719.XSHG',
 u'600720.XSHG',
 u'600721.XSHG',
 u'600722.XSHG',
 u'600723.XSHG',
 u'600724.XSHG',
 u'600726.XSHG',
 u'600727.XSHG',
 u'600728.XSHG',
 u'600729.XSHG',
 u'600730.XSHG',
 u'600731.XSHG',
 u'600733.XSHG',
 u'600734.XSHG',
 u'600735.XSHG',
 u'600736.XSHG',
 u'600737.XSHG',
 u'600738.XSHG',
 u'600739.XSHG',
 u'600740.XSHG',
 u'600741.XSHG',
 u'600742.XSHG',
 u'600743.XSHG',
 u'600744.XSHG',
 u'600745.XSHG',
 u'600746.XSHG',
 u'600748.XSHG',
 u'600750.XSHG',
 u'600751.XSHG',
 u'600753.XSHG',
 u'600754.XSHG',
 u'600755.XSHG',
 u'600756.XSHG',
 u'600757.XSHG',
 u'600758.XSHG',
 u'600759.XSHG',
 u'600760.XSHG',
 u'600761.XSHG',
 u'600763.XSHG',
 u'600764.XSHG',
 u'600765.XSHG',
 u'600766.XSHG',
 u'600768.XSHG',
 u'600769.XSHG',
 u'600770.XSHG',
 u'600771.XSHG',
 u'600773.XSHG',
 u'600774.XSHG',
 u'600775.XSHG',
 u'600776.XSHG',
 u'600777.XSHG',
 u'600779.XSHG',
 u'600780.XSHG',
 u'600781.XSHG',
 u'600782.XSHG',
 u'600783.XSHG',
 u'600784.XSHG',
 u'600785.XSHG',
 u'600787.XSHG',
 u'600789.XSHG',
 u'600790.XSHG',
 u'600791.XSHG',
 u'600792.XSHG',
 u'600793.XSHG',
 u'600794.XSHG',
 u'600795.XSHG',
 u'600796.XSHG',
 u'600797.XSHG',
 u'600798.XSHG',
 u'600800.XSHG',
 u'600801.XSHG',
 u'600802.XSHG',
 u'600803.XSHG',
 u'600804.XSHG',
 u'600805.XSHG',
 u'600808.XSHG',
 u'600809.XSHG',
 u'600810.XSHG',
 u'600811.XSHG',
 u'600812.XSHG',
 u'600814.XSHG',
 u'600815.XSHG',
 u'600816.XSHG',
 u'600818.XSHG',
 u'600819.XSHG',
 u'600820.XSHG',
 u'600821.XSHG',
 u'600822.XSHG',
 u'600823.XSHG',
 u'600824.XSHG',
 u'600825.XSHG',
 u'600826.XSHG',
 u'600827.XSHG',
 u'600828.XSHG',
 u'600829.XSHG',
 u'600830.XSHG',
 u'600831.XSHG',
 u'600833.XSHG',
 u'600834.XSHG',
 u'600835.XSHG',
 u'600836.XSHG',
 u'600837.XSHG',
 u'600838.XSHG',
 u'600839.XSHG',
 u'600841.XSHG',
 u'600843.XSHG',
 u'600844.XSHG',
 u'600845.XSHG',
 u'600846.XSHG',
 u'600847.XSHG',
 u'600848.XSHG',
 u'600850.XSHG',
 u'600851.XSHG',
 u'600853.XSHG',
 u'600854.XSHG',
 u'600855.XSHG',
 u'600856.XSHG',
 u'600857.XSHG',
 u'600858.XSHG',
 u'600859.XSHG',
 u'600860.XSHG',
 u'600861.XSHG',
 u'600862.XSHG',
 u'600863.XSHG',
 u'600864.XSHG',
 u'600865.XSHG',
 u'600866.XSHG',
 u'600867.XSHG',
 u'600868.XSHG',
 u'600869.XSHG',
 u'600872.XSHG',
 u'600873.XSHG',
 u'600874.XSHG',
 u'600875.XSHG',
 u'600876.XSHG',
 u'600879.XSHG',
 u'600880.XSHG',
 u'600881.XSHG',
 u'600882.XSHG',
 u'600883.XSHG',
 u'600884.XSHG',
 u'600885.XSHG',
 u'600886.XSHG',
 u'600887.XSHG',
 u'600888.XSHG',
 u'600889.XSHG',
 u'600890.XSHG',
 u'600891.XSHG',
 u'600892.XSHG',
 u'600893.XSHG',
 u'600894.XSHG',
 u'600895.XSHG',
 u'600897.XSHG',
 u'600898.XSHG',
 u'600900.XSHG',
 u'600901.XSHG',
 u'600903.XSHG',
 u'600908.XSHG',
 u'600909.XSHG',
 u'600917.XSHG',
 u'600919.XSHG',
 u'600926.XSHG',
 u'600929.XSHG',
 u'600933.XSHG',
 u'600936.XSHG',
 u'600939.XSHG',
 u'600958.XSHG',
 u'600959.XSHG',
 u'600960.XSHG',
 u'600961.XSHG',
 u'600962.XSHG',
 u'600963.XSHG',
 u'600965.XSHG',
 u'600966.XSHG',
 u'600967.XSHG',
 u'600969.XSHG',
 u'600970.XSHG',
 u'600971.XSHG',
 u'600973.XSHG',
 u'600975.XSHG',
 u'600976.XSHG',
 u'600977.XSHG',
 u'600978.XSHG',
 u'600979.XSHG',
 u'600980.XSHG',
 u'600981.XSHG',
 u'600982.XSHG',
 u'600983.XSHG',
 u'600984.XSHG',
 u'600985.XSHG',
 u'600986.XSHG',
 u'600987.XSHG',
 u'600988.XSHG',
 u'600990.XSHG',
 u'600992.XSHG',
 u'600993.XSHG',
 u'600995.XSHG',
 u'600996.XSHG',
 u'600997.XSHG',
 u'600998.XSHG',
 u'600999.XSHG',
 u'601000.XSHG',
 u'601001.XSHG',
 u'601002.XSHG',
 u'601003.XSHG',
 u'601005.XSHG',
 u'601006.XSHG',
 u'601007.XSHG',
 u'601008.XSHG',
 u'601009.XSHG',
 u'601010.XSHG',
 u'601011.XSHG',
 u'601012.XSHG',
 u'601015.XSHG',
 u'601016.XSHG',
 u'601018.XSHG',
 u'601019.XSHG',
 u'601020.XSHG',
 u'601021.XSHG',
 u'601028.XSHG',
 u'601038.XSHG',
 u'601058.XSHG',
 u'601066.XSHG',
 u'601068.XSHG',
 u'601069.XSHG',
 u'601086.XSHG',
 u'601088.XSHG',
 u'601098.XSHG',
 u'601099.XSHG',
 u'601100.XSHG',
 u'601101.XSHG',
 u'601106.XSHG',
 u'601107.XSHG',
 u'601108.XSHG',
 u'601111.XSHG',
 u'601113.XSHG',
 u'601116.XSHG',
 u'601117.XSHG',
 u'601118.XSHG',
 u'601126.XSHG',
 u'601127.XSHG',
 u'601128.XSHG',
 u'601137.XSHG',
 u'601138.XSHG',
 u'601139.XSHG',
 u'601155.XSHG',
 u'601158.XSHG',
 u'601162.XSHG',
 u'601163.XSHG',
 u'601166.XSHG',
 u'601168.XSHG',
 u'601169.XSHG',
 u'601177.XSHG',
 u'601179.XSHG',
 u'601186.XSHG',
 u'601188.XSHG',
 u'601198.XSHG',
 u'601199.XSHG',
 u'601200.XSHG',
 u'601208.XSHG',
 u'601211.XSHG',
 u'601212.XSHG',
 u'601216.XSHG',
 u'601218.XSHG',
 u'601222.XSHG',
 u'601225.XSHG',
 u'601226.XSHG',
 u'601228.XSHG',
 u'601229.XSHG',
 u'601231.XSHG',
 u'601233.XSHG',
 u'601238.XSHG',
 u'601258.XSHG',
 u'601288.XSHG',
 u'601311.XSHG',
 u'601318.XSHG',
 u'601326.XSHG',
 u'601328.XSHG',
 u'601330.XSHG',
 u'601333.XSHG',
 u'601336.XSHG',
 u'601339.XSHG',
 u'601360.XSHG',
 u'601366.XSHG',
 u'601368.XSHG',
 u'601369.XSHG',
 u'601375.XSHG',
 u'601377.XSHG',
 u'601388.XSHG',
 u'601390.XSHG',
 u'601398.XSHG',
 u'601500.XSHG',
 u'601515.XSHG',
 u'601518.XSHG',
 u'601519.XSHG',
 u'601555.XSHG',
 u'601566.XSHG',
 u'601567.XSHG',
 u'601577.XSHG',
 u'601579.XSHG',
 u'601588.XSHG',
 u'601595.XSHG',
 u'601599.XSHG',
 u'601600.XSHG',
 u'601601.XSHG',
 u'601606.XSHG',
 u'601607.XSHG',
 u'601608.XSHG',
 u'601611.XSHG',
 u'601616.XSHG',
 u'601618.XSHG',
 u'601619.XSHG',
 u'601628.XSHG',
 u'601633.XSHG',
 u'601636.XSHG',
 u'601666.XSHG',
 u'601668.XSHG',
 u'601669.XSHG',
 u'601677.XSHG',
 u'601678.XSHG',
 u'601688.XSHG',
 u'601689.XSHG',
 u'601699.XSHG',
 u'601700.XSHG',
 u'601717.XSHG',
 u'601718.XSHG',
 u'601727.XSHG',
 u'601766.XSHG',
 u'601777.XSHG',
 u'601788.XSHG',
 u'601789.XSHG',
 u'601799.XSHG',
 u'601800.XSHG',
 u'601801.XSHG',
 u'601808.XSHG',
 u'601811.XSHG',
 u'601818.XSHG',
 u'601828.XSHG',
 u'601838.XSHG',
 u'601857.XSHG',
 u'601858.XSHG',
 u'601866.XSHG',
 u'601869.XSHG',
 u'601872.XSHG',
 u'601877.XSHG',
 u'601878.XSHG',
 u'601880.XSHG',
 u'601881.XSHG',
 u'601882.XSHG',
 u'601886.XSHG',
 u'601888.XSHG',
 u'601890.XSHG',
 u'601898.XSHG',
 u'601899.XSHG',
 u'601900.XSHG',
 u'601901.XSHG',
 u'601908.XSHG',
 u'601918.XSHG',
 u'601919.XSHG',
 u'601928.XSHG',
 u'601929.XSHG',
 u'601933.XSHG',
 u'601939.XSHG',
 u'601949.XSHG',
 u'601952.XSHG',
 u'601958.XSHG',
 u'601965.XSHG',
 u'601966.XSHG',
 u'601968.XSHG',
 u'601969.XSHG',
 u'601985.XSHG',
 u'601988.XSHG',
 u'601989.XSHG',
 u'601990.XSHG',
 u'601991.XSHG',
 u'601992.XSHG',
 u'601996.XSHG',
 u'601997.XSHG',
 u'601998.XSHG',
 u'601999.XSHG',
 u'603000.XSHG',
 u'603001.XSHG',
 u'603002.XSHG',
 u'603003.XSHG',
 u'603005.XSHG',
 u'603006.XSHG',
 u'603007.XSHG',
 u'603008.XSHG',
 u'603009.XSHG',
 u'603010.XSHG',
 u'603011.XSHG',
 u'603012.XSHG',
 u'603013.XSHG',
 u'603015.XSHG',
 u'603016.XSHG',
 u'603017.XSHG',
 u'603018.XSHG',
 u'603019.XSHG',
 u'603020.XSHG',
 u'603021.XSHG',
 u'603022.XSHG',
 u'603023.XSHG',
 u'603025.XSHG',
 u'603026.XSHG',
 u'603027.XSHG',
 u'603028.XSHG',
 u'603029.XSHG',
 u'603030.XSHG',
 u'603031.XSHG',
 u'603032.XSHG',
 u'603033.XSHG',
 u'603035.XSHG',
 u'603036.XSHG',
 u'603037.XSHG',
 u'603038.XSHG',
 u'603039.XSHG',
 u'603040.XSHG',
 u'603041.XSHG',
 u'603042.XSHG',
 u'603043.XSHG',
 u'603045.XSHG',
 u'603050.XSHG',
 u'603055.XSHG',
 u'603056.XSHG',
 u'603058.XSHG',
 u'603059.XSHG',
 u'603060.XSHG',
 u'603063.XSHG',
 u'603066.XSHG',
 u'603067.XSHG',
 u'603069.XSHG',
 u'603076.XSHG',
 u'603077.XSHG',
 u'603078.XSHG',
 u'603079.XSHG',
 u'603080.XSHG',
 u'603081.XSHG',
 ...]

1.3 数据获取¶

具体因子的计算步骤如下所示:
(1)在每个月底,对于股票 s 回溯其过去 N 个交易日的数据(为方便处理, N 取偶数);
(2)对于股票 s 逐日计算平均单笔成交金额 D(D 当日成交金额 当日成交笔数),将 N 个交易日按 D 值从大到小排序,前 N/2 个交易日称为高 D 组,后 N/2 个交易日称为低 D组;
(3)对于股票 s ,将高 D 组交易日的涨跌幅加总,得到因子 M_high;将低 D 组交易日的涨跌幅加总,得到因子 M_low;
(4)对于所有股票,分别按照上述流程计算因子值。
反转因子的计算公式如下所示:

$M = M\_high - M\_low$
begin_date = '2013-01-01'
end_date = '2018-01-01'
dateList = get_period_date('M',begin_date, end_date)
Trades = pd.read_csv("trades.csv", index_col = 0)
Trades.columns = [normalize_code(code) for code in Trades.columns]
Trades.index = [datetime.datetime.strptime(str(i), "%Y%m%d") for i in Trades.index]
factorData = {}
for date in dateList:
    stockList = get_stock_A(date)
    df_data = get_price(stockList, count = 21, end_date=date, frequency='1d', fields=['money','close'])
    Amount = df_data["money"]
    Amount = Amount.iloc[1:]
    Pchg = df_data["close"].pct_change()
    Pchg = Pchg.iloc[1:]  
    trade = Trades.loc[Pchg.index,Pchg.columns]
    SingleAmount = Amount / trade
    result = pd.DataFrame(index = SingleAmount.columns)
    M_high = []
    M_low = []
    for i in SingleAmount.columns:
        temp = SingleAmount.sort([i], ascending = False)
        M_high.append((1+Pchg.loc[temp.index[:10], i]).cumprod()[-1] - 1)
        M_low.append((1+Pchg.loc[temp.index[10:], i]).cumprod()[-1] - 1)
    result["M_high"] = M_high
    result["M_low"] = M_low
    result["reverse"] = -1 *(result["M_high"] - result["M_low"])
    result["ret20"] = df_data["close"].iloc[-1] / df_data["close"].iloc[0] - 1
    factorData[date] = result
content = pickle.dumps(factorData) 
write_file('factorData.pkl', content, append=False)
factorData['2017-12-31'].head()
M_high M_low reverse ret20
600000.XSHG 0.004776 -0.029655 -0.034431 -0.025020
600004.XSHG 0.023991 0.021090 -0.002901 0.045586
600006.XSHG 0.016217 -0.034556 -0.050773 -0.018900
600007.XSHG 0.003648 -0.019194 -0.022842 -0.015616
600008.XSHG 0.038771 -0.081927 -0.120698 -0.046332

2 单因子有效性分析¶

2.1 因子收益率显著性检验¶

主要通过 T 检验分析,根据APT模型,对历史数据进行进行多元线性回归,从而得到需要分析的因子收益率的 t 值,然后进行以下两个方面的分析:
(1)t 值绝对值序列的均值: 之所以要取绝对值,是因为只要 t 值显著不等于 0 即可以认为在当期,因子和收益率存在明显的相关性。但是这种相关性有的时候为正,有的时候为负,如果不取绝对值,则很多正负抵消,会低估因子的有效性;
(2)t 值绝对值序列大于2的比例: 检验 |t| > 2 的比例主要是为了保证 |t| 平均值的稳定性, 避免出现少数数值特别大的样本值拉高均值。

def factor_t_test(factorData, begin_date, end_date):
    dateList = get_period_date('M', begin_date, end_date)
    WLS_params = {}
    WLS_t_test = {}
    for date in dateList[:-1]:
        R_T = pd.DataFrame()
        #取股票池
        stockList = list(factorData[date].index)
        #获取横截面收益率
        df_close = get_price(stockList, date, dateList[dateList.index(date)+1], 'daily', ['close'])
        if df_close.empty:
            continue
        df_pchg=df_close['close'].iloc[-1,:]/df_close['close'].iloc[0,:]-1
        R_T['pchg'] = df_pchg
        #获取因子数据
        factor_data = -1*factorData[date]["reverse"]
        #factor_data = winsorize_med(factor_data, scale=1, inclusive=True, inf2nan=True, axis=0)
        # 行业市值中性化
        #factor_data = neutralize(factor_data, how=['sw_l1', 'market_cap'], date=dateList[0], axis=0)
        #数据标准化
        factor_data = standardlize(factor_data, inf2nan=True, axis=0)
        R_T['factor'] = factor_data
        R_T = R_T.dropna()
        X = R_T['factor']
        y = R_T['pchg']   
        # WLS回归
        wls = sm.OLS(y, X)
        result = wls.fit()
        WLS_params[date] = result.params[-1]
        WLS_t_test[date] = result.tvalues[-1]  
    t_test = pd.Series(WLS_t_test).dropna()
    print 't值序列绝对值平均值: ',np.sum(np.abs(t_test.values))/len(t_test)
    n = [x for x in t_test.values if np.abs(x)>2]
    print 't值序列均值的绝对值除以t值序列的标准差: ',np.abs(t_test.mean())/t_test.std()
    return WLS_t_test
WLS_t_test = factor_t_test(factorData, begin_date, end_date)
t值序列绝对值平均值:  3.96367285165
t值序列均值的绝对值除以t值序列的标准差:  0.592209956948

根据上面结果分析,t 值绝对值序列的均值为 3.96,符合大于 2 的特征,且 t 值绝对值序列大于 2 的比例为 59.27%,根据因子收益率显著性检验的标准,该因子为有效因子。

2.2 因子 IC 分析¶

因子 k 的 IC 值一般是指个股第T期在因子k上的暴露度与 T + 1期的收益率的相关系数。当得到因子 IC 值序列后,我们可以仿照上一小节 t 检验的分析方法进行计算:
(1)IC 值序列的均值及绝对值均值: 判断因子有效性;
(2)IC 值序列的标准差:判断因子稳定性;
(3)IC 值系列的均值与标准差比值(IR):分析分析有效性
(4)IC 值序列大于零(或小于零)的占比:判断因子效果的一致性。

import scipy.stats as st
def factor_IC_analysis(factorData, begin_date, end_date, rule='normal'):  
    dateList = get_period_date('M', begin_date, end_date)
    IC = {}
    R_T = pd.DataFrame()
    for date in dateList[:-1]:
        #取股票池
        stockList = list(factorData[date].index)
        #获取横截面收益率
        df_close=get_price(stockList, date, dateList[dateList.index(date)+1], 'daily', ['close'])
        if df_close.empty:
            continue
        df_pchg=df_close['close'].iloc[-1,:]/df_close['close'].iloc[0,:]-1
        R_T['pchg']=df_pchg
        #获取因子数据
        factor_data = factorData[date]["reverse"]
        #factor_data = winsorize_med(factor_data, scale=1, inclusive=True, inf2nan=True, axis=0)
        # 行业市值中性化
        #factor_data = neutralize(factor_data, how=['sw_l1', 'market_cap'], date=dateList[0], axis=0)
        #数据标准化
        factor_data = standardlize(factor_data, inf2nan=True, axis=0)
        R_T['factor'] = factor_data
        R_T = R_T.dropna()
        if rule=='normal':
            IC[date]=st.pearsonr(R_T.pchg, R_T['factor'])[0]
        elif rule=='rank':
            IC[date]=st.pearsonr(R_T.pchg.rank(), R_T['factor'].rank())[0]
    IC = pd.Series(IC).dropna()
    print 'IC 值序列的均值大小',IC.mean()
    print 'IC值序列绝对值的均值大小',np.mean(np.abs(IC))
    print 'IC 值序列的标准差',IC.std()
    print 'IR 比率(IC值序列均值与标准差的比值)',IC.mean()/IC.std()
    n = [x for x in IC.values if x>0]
    print 'IC 值序列大于零的占比',len(n)/float(len(IC))
factor_IC_analysis(factorData, begin_date, end_date)
IC 值序列的均值大小 0.0492137327837
IC值序列绝对值的均值大小 0.0851283558956
IC 值序列的标准差 0.0917614290168
IR 比率(IC值序列均值与标准差的比值) 0.536322650061
IC 值序列大于零的占比 0.65

由上可知,IC 序列均值为 0.0492,IR 值为 0.54,IC 值序列大于 0 占比为 65%,由这几个指标可以看出,该因子收益预测稳定性较高,符合因子 IC 分析的筛选条件,判断该因子为有效因子。

2.3 分层回测¶

策略步骤:
(1)在每个月最后一个交易日,统计全 A 股反转因子值(M)的值;
(2)根据反转因子(M)值按照从小到大的顺序排序,并将其等分为 5 组
(3)每个调仓日对每组股票池进行调仓交易,从而获得 5 组股票组合的收益曲线
评价方法: 回测年化收益率、夏普比率、最大回撤、胜率等。

#1 先导入所需要的程序包
import datetime
import numpy as np 
import pandas as pd
import time
from jqdata import *
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
import itertools
import copy
import pickle

# 定义类'参数分析'
class parameter_analysis(object):
    
    # 定义函数中不同的变量
    def __init__(self, algorithm_id=None):
        self.algorithm_id = algorithm_id            # 回测id
        
        self.params_df = pd.DataFrame()             # 回测中所有调参备选值的内容,列名字为对应修改面两名称,对应回测中的 g.XXXX
        self.results = {}                           # 回测结果的回报率,key 为 params_df 的行序号,value 为
        self.evaluations = {}                       # 回测结果的各项指标,key 为 params_df 的行序号,value 为一个 dataframe
        self.backtest_ids = {}                      # 回测结果的 id
        
        # 新加入的基准的回测结果 id,可以默认为空 '',则使用回测中设定的基准
        self.benchmark_id = 'ae0684d86e9e7128b1ab9c7d77893029'                      
        
        self.benchmark_returns = []                 # 新加入的基准的回测回报率
        self.returns = {}                           # 记录所有回报率
        self.excess_returns = {}                    # 记录超额收益率
        self.log_returns = {}                       # 记录收益率的 log 值
        self.log_excess_returns = {}                # 记录超额收益的 log 值
        self.dates = []                             # 回测对应的所有日期
        self.excess_max_drawdown = {}               # 计算超额收益的最大回撤
        self.excess_annual_return = {}              # 计算超额收益率的年化指标
        self.evaluations_df = pd.DataFrame()        # 记录各项回测指标,除日回报率外
    
    # 定义排队运行多参数回测函数
    def run_backtest(self,                          #
                     algorithm_id=None,             # 回测策略id
                     running_max=10,                # 回测中同时巡行最大回测数量
                     start_date='2006-01-01',       # 回测的起始日期
                     end_date='2016-11-30',         # 回测的结束日期
                     frequency='day',               # 回测的运行频率
                     initial_cash='1000000',        # 回测的初始持仓金额
                     param_names=[],                # 回测中调整参数涉及的变量
                     param_values=[]                # 回测中每个变量的备选参数值
                     ):
        # 当此处回测策略的 id 没有给出时,调用类输入的策略 id
        if algorithm_id == None: algorithm_id=self.algorithm_id
        
        # 生成所有参数组合并加载到 df 中
        # 包含了不同参数具体备选值的排列组合中一组参数的 tuple 的 list
        param_combinations = list(itertools.product(*param_values))
        # 生成一个 dataframe, 对应的列为每个调参的变量,每个值为调参对应的备选值
        to_run_df = pd.DataFrame(param_combinations)
        # 修改列名称为调参变量的名字
        to_run_df.columns = param_names
        
        # 设定运行起始时间和保存格式
        start = time.time()
        # 记录结束的运行回测
        finished_backtests = {}
        # 记录运行中的回测
        running_backtests = {}
        # 计数器
        pointer = 0
        # 总运行回测数目,等于排列组合中的元素个数
        total_backtest_num = len(param_combinations)
        # 记录回测结果的回报率
        all_results = {}
        # 记录回测结果的各项指标
        all_evaluations = {}
        
        # 在运行开始时显示
        print '【已完成|运行中|待运行】:', 
        # 当运行回测开始后,如果没有全部运行完全的话:
        while len(finished_backtests)<total_backtest_num:
            # 显示运行、完成和待运行的回测个数
            print('[%s|%s|%s].' % (len(finished_backtests), 
                                   len(running_backtests), 
                                   (total_backtest_num-len(finished_backtests)-len(running_backtests)) )),
            # 记录当前运行中的空位数量
            to_run = min(running_max-len(running_backtests), total_backtest_num-len(running_backtests)-len(finished_backtests))
            # 把可用的空位进行跑回测
            for i in range(pointer, pointer+to_run):
                # 备选的参数排列组合的 df 中第 i 行变成 dict,每个 key 为列名字,value 为 df 中对应的值
                params = to_run_df.ix[i].to_dict()
                # 记录策略回测结果的 id,调整参数 extras 使用 params 的内容
                backtest = create_backtest(algorithm_id = algorithm_id,
                                           start_date = start_date, 
                                           end_date = end_date, 
                                           frequency = frequency, 
                                           initial_cash = initial_cash, 
                                           extras = params, 
                                           # 再回测中把改参数的结果起一个名字,包含了所有涉及的变量参数值
                                           name = str(params)
                                           )
                # 记录运行中 i 回测的回测 id
                running_backtests[i] = backtest
            # 计数器计数运行完的数量    
            pointer = pointer+to_run
            
            # 获取回测结果
            failed = []
            finished = []
            # 对于运行中的回测,key 为 to_run_df 中所有排列组合中的序数
            for key in running_backtests.keys():
                # 研究调用回测的结果,running_backtests[key] 为运行中保存的结果 id
                bt = get_backtest(running_backtests[key])
                # 获得运行回测结果的状态,成功和失败都需要运行结束后返回,如果没有返回则运行没有结束
                status = bt.get_status()
                # 当运行回测失败
                if status == 'failed':
                    # 失败 list 中记录对应的回测结果 id
                    failed.append(key)
                # 当运行回测成功时
                elif status == 'done':
                    # 成功 list 记录对应的回测结果 id,finish 仅记录运行成功的
                    finished.append(key)
                    # 回测回报率记录对应回测的回报率 dict, key to_run_df 中所有排列组合中的序数, value 为回报率的 dict
                    # 每个 value 一个 list 每个对象为一个包含时间、日回报率和基准回报率的 dict
                    all_results[key] = bt.get_results()
                    # 回测回报率记录对应回测结果指标 dict, key to_run_df 中所有排列组合中的序数, value 为回测结果指标的 dataframe
                    all_evaluations[key] = bt.get_risk()
            # 记录运行中回测结果 id 的 list 中删除失败的运行
            for key in failed:
                running_backtests.pop(key)
            # 在结束回测结果 dict 中记录运行成功的回测结果 id,同时在运行中的记录中删除该回测
            for key in finished:
                finished_backtests[key] = running_backtests.pop(key)
            # 当一组同时运行的回测结束时报告时间
            if len(finished_backtests) != 0 and len(finished_backtests) % running_max == 0 and to_run !=0:
                # 记录当时时间
                middle = time.time()
                # 计算剩余时间,假设没工作量时间相等的话
                remain_time = (middle - start) * (total_backtest_num - len(finished_backtests)) / len(finished_backtests)
                # print 当前运行时间
                print('[已用%s时,尚余%s时,请不要关闭浏览器].' % (str(round((middle - start) / 60.0 / 60.0,3)), 
                                          str(round(remain_time / 60.0 / 60.0,3)))),
            # 5秒钟后再跑一下
            time.sleep(5) 
        # 记录结束时间
        end = time.time() 
        print ''
        print('【回测完成】总用时:%s秒(即%s小时)。' % (str(int(end-start)), 
                                           str(round((end-start)/60.0/60.0,2)))),
        # 对应修改类内部对应
        self.params_df = to_run_df
        self.results = all_results
        self.evaluations = all_evaluations
        self.backtest_ids = finished_backtests

        
    #7 最大回撤计算方法
    def find_max_drawdown(self, returns):
        # 定义最大回撤的变量
        result = 0
        # 记录最高的回报率点
        historical_return = 0
        # 遍历所有日期
        for i in range(len(returns)):
            # 最高回报率记录
            historical_return = max(historical_return, returns[i])
            # 最大回撤记录
            drawdown = 1-(returns[i] + 1) / (historical_return + 1)
            # 记录最大回撤
            result = max(drawdown, result)
        # 返回最大回撤值
        return result

    # log 收益、新基准下超额收益和相对与新基准的最大回撤
    def organize_backtest_results(self, benchmark_id=None):
        # 若新基准的回测结果 id 没给出
        if benchmark_id==None:
            # 使用默认的基准回报率,默认的基准在回测策略中设定
            self.benchmark_returns = [x['benchmark_returns'] for x in self.results[0]]
        # 当新基准指标给出后    
        else:
            # 基准使用新加入的基准回测结果
            self.benchmark_returns = [x['returns'] for x in get_backtest(benchmark_id).get_results()]
        # 回测日期为结果中记录的第一项对应的日期
        self.dates = [x['time'] for x in self.results[0]]
        
        # 对应每个回测在所有备选回测中的顺序 (key),生成新数据
        # 由 {key:{u'benchmark_returns': 0.022480100091729405,
        #           u'returns': 0.03184566700000002,
        #           u'time': u'2006-02-14'}} 格式转化为:
        # {key: []} 格式,其中 list 为对应 date 的一个回报率 list
        for key in self.results.keys():
            self.returns[key] = [x['returns'] for x in self.results[key]]
        # 生成对于基准(或新基准)的超额收益率
        for key in self.results.keys():
            self.excess_returns[key] = [(x+1)/(y+1)-1 for (x,y) in zip(self.returns[key], self.benchmark_returns)]
        # 生成 log 形式的收益率
        for key in self.results.keys():
            self.log_returns[key] = [log(x+1) for x in self.returns[key]]
        # 生成超额收益率的 log 形式
        for key in self.results.keys():
            self.log_excess_returns[key] = [log(x+1) for x in self.excess_returns[key]]
        # 生成超额收益率的最大回撤
        for key in self.results.keys():
            self.excess_max_drawdown[key] = self.find_max_drawdown(self.excess_returns[key])
        # 生成年化超额收益率
        for key in self.results.keys():
            self.excess_annual_return[key] = (self.excess_returns[key][-1]+1)**(252./float(len(self.dates)))-1
        # 把调参数据中的参数组合 df 与对应结果的 df 进行合并
        self.evaluations_df = pd.concat([self.params_df, pd.DataFrame(self.evaluations).T], axis=1)
#         self.evaluations_df = 

    # 获取最总分析数据,调用排队回测函数和数据整理的函数    
    def get_backtest_data(self,
                          algorithm_id=None,                         # 回测策略id
                          benchmark_id=None,                         # 新基准回测结果id
                          file_name='results.pkl',                   # 保存结果的 pickle 文件名字
                          running_max=10,                            # 最大同时运行回测数量
                          start_date='2006-01-01',                   # 回测开始时间
                          end_date='2016-11-30',                     # 回测结束日期
                          frequency='day',                           # 回测的运行频率
                          initial_cash='1000000',                    # 回测初始持仓资金
                          param_names=[],                            # 回测需要测试的变量
                          param_values=[]                            # 对应每个变量的备选参数
                          ):
        # 调运排队回测函数,传递对应参数
        self.run_backtest(algorithm_id=algorithm_id,
                          running_max=running_max,
                          start_date=start_date,
                          end_date=end_date,
                          frequency=frequency,
                          initial_cash=initial_cash,
                          param_names=param_names,
                          param_values=param_values
                          )
        # 回测结果指标中加入 log 收益率和超额收益率等指标
        self.organize_backtest_results(benchmark_id)
        # 生成 dict 保存所有结果。
        results = {'returns':self.returns,
                   'excess_returns':self.excess_returns,
                   'log_returns':self.log_returns,
                   'log_excess_returns':self.log_excess_returns,
                   'dates':self.dates,
                   'benchmark_returns':self.benchmark_returns,
                   'evaluations':self.evaluations,
                   'params_df':self.params_df,
                   'backtest_ids':self.backtest_ids,
                   'excess_max_drawdown':self.excess_max_drawdown,
                   'excess_annual_return':self.excess_annual_return,
                   'evaluations_df':self.evaluations_df}
        # 保存 pickle 文件
        pickle_file = open(file_name, 'wb')
        pickle.dump(results, pickle_file)
        pickle_file.close()

    # 读取保存的 pickle 文件,赋予类中的对象名对应的保存内容    
    def read_backtest_data(self, file_name='results.pkl'):
        pickle_file = open(file_name, 'rb')
        results = pickle.load(pickle_file)
        self.returns = results['returns']
        self.excess_returns = results['excess_returns']
        self.log_returns = results['log_returns']
        self.log_excess_returns = results['log_excess_returns']
        self.dates = results['dates']
        self.benchmark_returns = results['benchmark_returns']
        self.evaluations = results['evaluations']
        self.params_df = results['params_df']
        self.backtest_ids = results['backtest_ids']
        self.excess_max_drawdown = results['excess_max_drawdown']
        self.excess_annual_return = results['excess_annual_return']
        self.evaluations_df = results['evaluations_df']
        
    # 回报率折线图    
    def plot_returns(self):
        # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
        fig = plt.figure(figsize=(20,8))
        ax = fig.add_subplot(111)
        # 作图
        for key in self.returns.keys():
            ax.plot(range(len(self.returns[key])), self.returns[key], label=key)
        # 设定benchmark曲线并标记
        ax.plot(range(len(self.benchmark_returns)), self.benchmark_returns, label='benchmark', c='k', linestyle='--') 
        ticks = [int(x) for x in np.linspace(0, len(self.dates)-1, 11)]
        plt.xticks(ticks, [self.dates[i] for i in ticks])
        # 设置图例样式
        ax.legend(loc = 2, fontsize = 10)
        # 设置y标签样式
        ax.set_ylabel('returns',fontsize=20)
        # 设置x标签样式
        ax.set_yticklabels([str(x*100)+'% 'for x in ax.get_yticks()])
        # 设置图片标题样式
        ax.set_title("Strategy's performances with different parameters", fontsize=21)
        plt.xlim(0, len(self.returns[0]))
    
    # 多空组合图
    def plot_long_short(self):
       # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
        fig = plt.figure(figsize=(20,8))
        ax = fig.add_subplot(111)
        # 作图
        a1 = [i+1 for i in self.returns[0]]
        a2 = [i+1 for i in self.returns[4]]
        a1.insert(0,1)   
        a2.insert(0,1)
        b = []
        for i in range(len(a1)-1):
            b.append((a1[i+1]/a1[i]-a2[i+1]/a2[i])/2)
        c = []
        c.append(1)
        for i in range(len(b)):
            c.append(c[i]*(1+b[i]))
        ax.plot(range(len(c)), c)
        ticks = [int(x) for x in np.linspace(0, len(self.dates)-1, 11)]
        plt.xticks(ticks, [self.dates[i] for i in ticks])
        # 设置图例样式
        ax.legend(loc = 2, fontsize = 10)
        ax.set_title("Strategy's long_short performances",fontsize=20)
        # 设置图片标题样式
        plt.xlim(0, len(c))     
        return c
        
    # 获取不同年份的收益及排名分析
    def get_profit_year(self):
        profit_year = {}
        for key in self.returns.keys():
            temp = []
            date_year = []
            for i in range(len(self.dates)-1):
                if self.dates[i][:4] != self.dates[i+1][:4]:
                    temp.append(self.returns[key][i])
                    date_year.append(self.dates[i][:4])
            temp.append(self.returns[key][-1])
            date_year.append(self.dates[-1][:4]) 
            temp1 = []
            temp1.append(temp[0])
            for i in range(len(temp)-1):
                temp1.append((temp[i+1]+1)/(temp[i]+1)-1)
            profit_year[key] = temp1
        result = pd.DataFrame(index = list(self.returns.keys()), columns = date_year)
        for key in self.returns.keys():
            result.loc[key,:] = profit_year[key]
        return result
            
    # 超额收益率图    
    def plot_excess_returns(self):
        
        # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
        fig = plt.figure(figsize=(20,8))
        ax = fig.add_subplot(111)
        # 作图
        for key in self.returns.keys():
            ax.plot(range(len(self.excess_returns[key])), self.excess_returns[key], label=key)
        # 设定benchmark曲线并标记
        ax.plot(range(len(self.benchmark_returns)), [0]*len(self.benchmark_returns), label='benchmark', c='k', linestyle='--')
        ticks = [int(x) for x in np.linspace(0, len(self.dates)-1, 11)]
        plt.xticks(ticks, [self.dates[i] for i in ticks])
        # 设置图例样式
        ax.legend(loc = 2, fontsize = 10)
        # 设置y标签样式
        ax.set_ylabel('excess returns',fontsize=20)
        # 设置x标签样式
        ax.set_yticklabels([str(x*100)+'% 'for x in ax.get_yticks()])
        # 设置图片标题样式
        ax.set_title("Strategy's performances with different parameters", fontsize=21)
        plt.xlim(0, len(self.excess_returns[0]))
        
    # log回报率图    
    def plot_log_returns(self):
        # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
        fig = plt.figure(figsize=(20,8))
        ax = fig.add_subplot(111)
        # 作图
        for key in self.returns.keys():
            ax.plot(range(len(self.log_returns[key])), self.log_returns[key], label=key)
        # 设定benchmark曲线并标记
        ax.plot(range(len(self.benchmark_returns)), [log(x+1) for x in self.benchmark_returns], label='benchmark', c='k', linestyle='--')
        ticks = [int(x) for x in np.linspace(0, len(self.dates)-1, 11)]
        plt.xticks(ticks, [self.dates[i] for i in ticks])
        # 设置图例样式
        ax.legend(loc = 2, fontsize = 10)
        # 设置y标签样式
        ax.set_ylabel('log returns',fontsize=20)
        # 设置图片标题样式
        ax.set_title("Strategy's performances with different parameters", fontsize=21)
        plt.xlim(0, len(self.log_returns[0]))
    
    # 超额收益率的 log 图
    def plot_log_excess_returns(self):
        # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
        fig = plt.figure(figsize=(20,8))
        ax = fig.add_subplot(111)
        # 作图
        for key in self.returns.keys():
            ax.plot(range(len(self.log_excess_returns[key])), self.log_excess_returns[key], label=key)
        # 设定benchmark曲线并标记
        ax.plot(range(len(self.benchmark_returns)), [0]*len(self.benchmark_returns), label='benchmark', c='k', linestyle='--')
        ticks = [int(x) for x in np.linspace(0, len(self.dates)-1, 11)]
        plt.xticks(ticks, [self.dates[i] for i in ticks])
        # 设置图例样式
        ax.legend(loc = 2, fontsize = 10)
        # 设置y标签样式
        ax.set_ylabel('log excess returns',fontsize=20)
        # 设置图片标题样式
        ax.set_title("Strategy's performances with different parameters", fontsize=21)
        plt.xlim(0, len(self.log_excess_returns[0]))

        
    # 回测的4个主要指标,包括总回报率、最大回撤夏普率和波动
    def get_eval4_bar(self, sort_by=[]): 
        
        sorted_params = self.params_df
        for by in sort_by:
            sorted_params = sorted_params.sort(by)
        indices = sorted_params.index
        
        fig = plt.figure(figsize=(20,7))

        # 定义位置
        ax1 = fig.add_subplot(221)
        # 设定横轴为对应分位,纵轴为对应指标
        ax1.bar(range(len(indices)), 
                [self.evaluations[x]['algorithm_return'] for x in indices], 0.6, label = 'Algorithm_return')
        plt.xticks([x+0.3 for x in range(len(indices))], indices)
        # 设置图例样式
        ax1.legend(loc='best',fontsize=15)
        # 设置y标签样式
        ax1.set_ylabel('Algorithm_return', fontsize=15)
        # 设置y标签样式
        ax1.set_yticklabels([str(x*100)+'% 'for x in ax1.get_yticks()])
        # 设置图片标题样式
        ax1.set_title("Strategy's of Algorithm_return performances of different quantile", fontsize=15)
        # x轴范围
        plt.xlim(0, len(indices))

        # 定义位置
        ax2 = fig.add_subplot(224)
        # 设定横轴为对应分位,纵轴为对应指标
        ax2.bar(range(len(indices)), 
                [self.evaluations[x]['max_drawdown'] for x in indices], 0.6, label = 'Max_drawdown')
        plt.xticks([x+0.3 for x in range(len(indices))], indices)
        # 设置图例样式
        ax2.legend(loc='best',fontsize=15)
        # 设置y标签样式
        ax2.set_ylabel('Max_drawdown', fontsize=15)
        # 设置x标签样式
        ax2.set_yticklabels([str(x*100)+'% 'for x in ax2.get_yticks()])
        # 设置图片标题样式
        ax2.set_title("Strategy's of Max_drawdown performances of different quantile", fontsize=15)
        # x轴范围
        plt.xlim(0, len(indices))
        # 定义位置
        ax3 = fig.add_subplot(223)
        # 设定横轴为对应分位,纵轴为对应指标
        ax3.bar(range(len(indices)),
                [self.evaluations[x]['sharpe'] for x in indices], 0.6, label = 'Sharpe')
        plt.xticks([x+0.3 for x in range(len(indices))], indices)
        # 设置图例样式
        ax3.legend(loc='best',fontsize=15)
        # 设置y标签样式
        ax3.set_ylabel('Sharpe', fontsize=15)
        # 设置x标签样式
        ax3.set_yticklabels([str(x*100)+'% 'for x in ax3.get_yticks()])
        # 设置图片标题样式
        ax3.set_title("Strategy's of Sharpe performances of different quantile", fontsize=15)
        # x轴范围
        plt.xlim(0, len(indices))

        # 定义位置
        ax4 = fig.add_subplot(222)
        # 设定横轴为对应分位,纵轴为对应指标
        ax4.bar(range(len(indices)), 
                [self.evaluations[x]['algorithm_volatility'] for x in indices], 0.6, label = 'Algorithm_volatility')
        plt.xticks([x+0.3 for x in range(len(indices))], indices)
        # 设置图例样式
        ax4.legend(loc='best',fontsize=15)
        # 设置y标签样式
        ax4.set_ylabel('Algorithm_volatility', fontsize=15)
        # 设置x标签样式
        ax4.set_yticklabels([str(x*100)+'% 'for x in ax4.get_yticks()])
        # 设置图片标题样式
        ax4.set_title("Strategy's of Algorithm_volatility performances of different quantile", fontsize=15)
        # x轴范围
        plt.xlim(0, len(indices))
        
    #14 年化回报和最大回撤,正负双色表示
    def get_eval(self, sort_by=[]):

        sorted_params = self.params_df
        for by in sort_by:
            sorted_params = sorted_params.sort(by)
        indices = sorted_params.index
        
        # 大小
        fig = plt.figure(figsize = (20, 8))
        # 图1位置
        ax = fig.add_subplot(111)
        # 生成图超额收益率的最大回撤
        ax.bar([x+0.3 for x in range(len(indices))],
               [-self.evaluations[x]['max_drawdown'] for x in indices], color = '#32CD32',  
                     width = 0.6, label = 'Max_drawdown', zorder=10)
        # 图年化超额收益
        ax.bar([x for x in range(len(indices))],
               [self.evaluations[x]['annual_algo_return'] for x in indices], color = 'r', 
                     width = 0.6, label = 'Annual_return')
        plt.xticks([x+0.3 for x in range(len(indices))], indices)
        # 设置图例样式
        ax.legend(loc='best',fontsize=15)
        # 基准线
        plt.plot([0, len(indices)], [0, 0], c='k', 
                 linestyle='--', label='zero')
        # 设置图例样式
        ax.legend(loc='best',fontsize=15)
        # 设置y标签样式
        ax.set_ylabel('Max_drawdown', fontsize=15)
        # 设置x标签样式
        ax.set_yticklabels([str(x*100)+'% 'for x in ax.get_yticks()])
        # 设置图片标题样式
        ax.set_title("Strategy's performances of different quantile", fontsize=15)
        #   设定x轴长度
        plt.xlim(0, len(indices))

    #14 超额收益的年化回报和最大回撤
    # 加入新的benchmark后超额收益和
    def get_excess_eval(self, sort_by=[]):

        sorted_params = self.params_df
        for by in sort_by:
            sorted_params = sorted_params.sort(by)
        indices = sorted_params.index
        
        # 大小
        fig = plt.figure(figsize = (20, 8))
        # 图1位置
        ax = fig.add_subplot(111)
        # 生成图超额收益率的最大回撤
        ax.bar([x+0.3 for x in range(len(indices))],
               [-self.excess_max_drawdown[x] for x in indices], color = '#32CD32',  
                     width = 0.6, label = 'Excess_max_drawdown')
        # 图年化超额收益
        ax.bar([x for x in range(len(indices))],
               [self.excess_annual_return[x] for x in indices], color = 'r', 
                     width = 0.6, label = 'Excess_annual_return')
        plt.xticks([x+0.3 for x in range(len(indices))], indices)
        # 设置图例样式
        ax.legend(loc='best',fontsize=15)
        # 基准线
        plt.plot([0, len(indices)], [0, 0], c='k', 
                 linestyle='--', label='zero')
        # 设置图例样式
        ax.legend(loc='best',fontsize=15)
        # 设置y标签样式
        ax.set_ylabel('Max_drawdown', fontsize=15)
        # 设置x标签样式
        ax.set_yticklabels([str(x*100)+'% 'for x in ax.get_yticks()])
        # 设置图片标题样式
        ax.set_title("Strategy's performances of different quantile", fontsize=15)
        #   设定x轴长度
        plt.xlim(0, len(indices))
def group_backtest(start_date,end_date,num):
    warnings.filterwarnings("ignore")
    pa = parameter_analysis()
    pa.get_backtest_data(file_name = 'results.pkl',
                          running_max = 10,
                          algorithm_id = 'df3c8774e33e3f94ad068574276d94a3',
                          start_date=start_date,
                          end_date=end_date,
                          frequency = 'day',
                          initial_cash = '10000000',
                          param_names = ['num'],
                          param_values = [num]                     
                          )
start_date = '2013-01-01' 
end_date = '2018-01-01' 
num = range(1,6)
group_backtest(start_date,end_date,num)
【已完成|运行中|待运行】: [0|0|5]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. 
【回测完成】总用时:1788秒(即0.5小时)。

2.3.1 分层回测策略模型收益指标¶

pa = parameter_analysis()
pa.read_backtest_data('results.pkl')
pa.evaluations_df
num __version algorithm_return algorithm_volatility alpha annual_algo_return annual_bm_return avg_excess_return benchmark_return benchmark_volatility ... excess_return_sharpe information max_drawdown max_drawdown_period max_leverage period_label sharpe sortino trading_days treasury_return
0 1 101 1.932663 0.2481609 0.1608663 0.2477985 0.1012095 0.0005597281 0.5976734 0.2445952 ... 0.5400897 0.8510752 0.4187934 [2015-06-12, 2015-09-15] 0 2017-12 0.837354 0.9641156 1215 0.1994521
1 2 101 1.168112 0.233569 0.08594064 0.1726073 0.1012095 0.0002979421 0.5976734 0.2445952 ... 0.1628063 0.4691913 0.480852 [2015-06-12, 2016-01-28] 0 2017-12 0.5677436 0.6160955 1215 0.1994521
2 3 101 1.178165 0.2761494 0.07824558 0.1737241 0.1012095 0.0003107313 0.5976734 0.2445952 ... 0.1554558 0.4362111 0.493236 [2015-06-12, 2015-09-15] 0 2017-12 0.4842454 0.5400217 1215 0.1994521
3 4 101 0.8102447 0.2893782 0.03235476 0.1298801 0.1012095 0.00016554 0.5976734 0.2445952 ... -0.07909306 0.1625373 0.5212983 [2015-06-12, 2017-12-25] 0 2017-12 0.3105974 0.3480259 1215 0.1994521
4 5 101 0.2513894 0.3099486 -0.05167179 0.04722403 0.1012095 -0.0001188837 0.5976734 0.2445952 ... -0.4408885 -0.2673853 0.6343049 [2015-06-12, 2017-12-05] 0 2017-12 0.0233072 0.02718096 1215 0.1994521

5 rows × 24 columns

2.3.2 分层回测净值¶

为了进一步更直观的对 5 个组合进行分析,绘制了 5 个组合及 HS300 基准的净值收益曲线,具体下图所示。

pa.plot_returns()

由图可以看出,组合 1 能够明显跑赢组合 5,且每个组合都能够跑赢 HS300 指数,且组合 1 能够吗明显获得更高的收益。可见符合单因子有效性的检验,即证明反转因子是有效的。

2.3.3 模型策略组合回测分析表¶

pa.get_eval4_bar()
pa.get_profit_year()
2013 2014 2015 2016 2017
0 0.2506491 0.4726465 0.8775374 0.001742926 -0.1533904
1 0.1832616 0.3692289 0.7419862 -0.1218846 -0.1251602
2 0.1652528 0.4097099 0.7372976 -0.1098825 -0.1425292
3 0.1718918 0.4142254 0.6589003 -0.1714063 -0.2053621
4 0.1407754 0.3323591 0.5827961 -0.2567138 -0.3001741

从 5 组的具体绩效分析来看,年化收益率以及夏普比率基本呈现出单调的走势,组合 1 的效果远远优于组合 5,且最大回撤也体现出组合 1 的风险控制能力更强。从各年的收益来看,组合 1 至组合 5 每一年基本上也呈现出单调的走势,可见因子在每一年都具有较好的选股效果,体现出因子有效性的稳定性。

2.3.4 多空组合净值¶

从分层组合回测净值曲线图来看,每个组合波动性较大,策略存在较大的风险,因此考虑建立多空组合。多空组合是买入组合 1、卖空组合 5 (月度调仓)的一个资产组合,为了方便统计,多空组合每日收益率为(组合 1 每日收益率 - 组合 5 每日收益率)/2,然后获得多空组合的净值收益曲线。

long_short = pa.plot_long_short()
def MaxDrawdown(return_list):
    '''最大回撤率'''
    i = np.argmax((np.maximum.accumulate(return_list) - return_list) / np.maximum.accumulate(return_list))  # 结束位置
    if i == 0:
        return 0
    j = np.argmax(return_list[:i])  # 开始位置
    return (return_list[j] - return_list[i]) / (return_list[j])

def cal_indictor(long_short):
    total_return = long_short[-1] / long_short[0] - 1
    ann_return = pow((1+total_return), 250/float(len(long_short)))-1
    pchg = []
    #计算收益率
    for i in range(1, len(long_short)):
        pchg.append(long_short[i]/long_short[i-1] - 1)
    temp = 0
    for i in pchg:
        temp += pow(i-mean(pchg), 2)
    annualVolatility = sqrt(250/float((len(pchg)-1))*temp)
    sharpe_ratio = (ann_return - 0.04)/annualVolatility
    print "总收益: ", total_return
    print "年化收益: ", ann_return
    print "年化收益波动率: ", annualVolatility
    print "夏普比率: ",sharpe_ratio
    print "最大回撤: ",MaxDrawdown(long_short)
cal_indictor(long_short)
    
总收益:  0.46021377323
年化收益:  0.0809428225227
年化收益波动率:  0.044586380942
夏普比率:  0.918280911292
最大回撤:  0.0812605354762

如图所示,多空组合净值收益曲线明显比任何一个组合的波动性更低,能够获得更为稳定的收益,风险控制效果较好。
综上所述,从分层回测的分析来看,反转因子(M)有效性较强。

3 深入分析¶

3.1 参数 N 的敏感度¶

本文选择过去 N(N=20) 天的数据用于计算反转因子,但是参数 N 的选择对因子有效性的影响仍然不是非常清晰,因此接下来针对参数 N 的不同选择,对因子有效性进行分析。具体分析过程如下所示。

def GetData(N):
    factordata = {}
    for date in dateList:
        stockList = get_stock_A(date)
        df_data = get_price(stockList, count = N+1, end_date=date, frequency='1d', fields=['money','close'])
        Amount = df_data["money"]
        Amount = Amount.iloc[1:]
        Pchg = df_data["close"].pct_change()
        Pchg = Pchg.iloc[1:]  
        trade = Trades.loc[Pchg.index,Pchg.columns]
        SingleAmount = Amount / trade
        result = pd.DataFrame(index = SingleAmount.columns)
        M_high = []
        M_low = []
        for i in SingleAmount.columns:
            temp = SingleAmount.sort([i], ascending = False)
            M_high.append((1+Pchg.loc[temp.index[:10], i]).cumprod()[-1] - 1)
            M_low.append((1+Pchg.loc[temp.index[10:], i]).cumprod()[-1] - 1)
        result["M_high"] = M_high
        result["M_low"] = M_low
        result["reverse"] = -1 *(result["M_high"] - result["M_low"])
        result["ret"] = df_data["close"].iloc[-1] / df_data["close"].iloc[0] - 1
        factordata[date] = result
    return factordata

begin_date = '2013-01-01'
end_date = '2018-01-01'
dateList = get_period_date('M',begin_date, end_date)
Trades = pd.read_csv("trades.csv", index_col = 0)
Trades.columns = [normalize_code(code) for code in Trades.columns]
Trades.index = [datetime.datetime.strptime(str(i), "%Y%m%d") for i in Trades.index]
factorData_40 = GetData(40)
factorData_60 = GetData(60)
import scipy.stats as st
def factor_IC(factorData, begin_date, end_date, rule='normal'):  
    dateList = get_period_date('M', begin_date, end_date)
    IC = {}
    R_T = pd.DataFrame()
    for date in dateList[:-1]:
        #取股票池
        stockList = list(factorData[date].index)
        #获取横截面收益率
        df_close=get_price(stockList, date, dateList[dateList.index(date)+1], 'daily', ['close'])
        if df_close.empty:
            continue
        df_pchg=df_close['close'].iloc[-1,:]/df_close['close'].iloc[0,:]-1
        R_T['pchg']=df_pchg
        #获取因子数据
        factor_data = factorData[date]["reverse"]
        #factor_data = winsorize_med(factor_data, scale=1, inclusive=True, inf2nan=True, axis=0)
        # 行业市值中性化
        #factor_data = neutralize(factor_data, how=['sw_l1', 'market_cap'], date=dateList[0], axis=0)
        #数据标准化
        factor_data = standardlize(factor_data, inf2nan=True, axis=0)
        R_T['factor'] = factor_data
        R_T = R_T.dropna()
        if rule=='normal':
            IC[date]=st.pearsonr(R_T.pchg, R_T['factor'])[0]
        elif rule=='rank':
            IC[date]=st.pearsonr(R_T.pchg.rank(), R_T['factor'].rank())[0]
    IC = pd.Series(IC).dropna()
    return IC.mean()
IC_20 = factor_IC(factorData, begin_date, end_date)
IC_40 = factor_IC(factorData_40, begin_date, end_date)
IC_60 = factor_IC(factorData_60, begin_date, end_date)
print "N = 20 IC 均值: ", IC_20
print "N = 40 IC 均值: ", IC_40
print "N = 60 IC 均值: ", IC_60
N = 20 IC 均值:  0.0492137327837
N = 40 IC 均值:  0.041864465093
N = 60 IC 均值:  0.0305036425806

当 N 分别取 20、40、60 时,IC 分析结果同样效果出色,但是相对而言,随着 N 取值越大,因子有效性不断降低,可见合理选择 N 的取值有利于获取对预测未来收益更为有效性的因子。
根据上述分析结果,当 N = 20 时,因子有效性最高。

3.2 其他样本空间的测试¶

本文进行了理想反转因子在全 A 股的测试,为了进一步证明因子的有效性,在其他样本空间,对该因子进行分析,针对五分组的情况构建多空组合,分析理想反转因子与原始反转因子 Ret20 之间的区别。

begin_date = '2013-01-01'
end_date = '2018-01-01'
dateList = get_period_date('M',begin_date, end_date)
HS300Data = {}
for date in dateList:
    stockList = get_index_stocks('000300.XSHG',date)
    df_data = get_price(stockList, count = 21, end_date=date, frequency='1d', fields=['money','close'])
    Amount = df_data["money"]
    Amount = Amount.iloc[1:]
    Pchg = df_data["close"].pct_change()
    Pchg = Pchg.iloc[1:]  
    trade = Trades.loc[Pchg.index,Pchg.columns]
    SingleAmount = Amount / trade
    result = pd.DataFrame(index = SingleAmount.columns)
    M_high = []
    M_low = []
    for i in SingleAmount.columns:
        temp = SingleAmount.sort([i], ascending = False)
        M_high.append((1+Pchg.loc[temp.index[:10], i]).cumprod()[-1] - 1)
        M_low.append((1+Pchg.loc[temp.index[10:], i]).cumprod()[-1] - 1)
    result["M_high"] = M_high
    result["M_low"] = M_low
    result["reverse"] = -1 *(result["M_high"] - result["M_low"])
    result["ret20"] = -1*(df_data["close"].iloc[-1] / df_data["close"].iloc[0] - 1)
    HS300Data[date] = result
HS300Data[date].head()
M_high M_low reverse ret20
000001.XSHE 0.033571 -0.010335 -0.043905 -0.022889
000002.XSHE 0.080624 -0.064591 -0.145214 -0.010825
000008.XSHE 0.035967 -0.016506 -0.052473 -0.018868
000060.XSHE 0.039267 -0.015695 -0.054961 -0.022956
000063.XSHE 0.083332 0.000093 -0.083239 -0.083433
def GetPchg(Filed):
    pchg = []
    for date in dateList[:-1]:
        tempData = HS300Data[date].sort([Filed], ascending = False)
        top5 = list(tempData.index[:60])
        last5 = list(tempData.index[-60:])
        df_close = get_price(top5, date, dateList[dateList.index(date)+1], 'daily', ['close'])['close']
        top5_pchg = df_close.iloc[-1] / df_close.iloc[0] - 1
        df_close = get_price(last5, date, dateList[dateList.index(date)+1], 'daily', ['close'])['close']
        last5_pchg = df_close.iloc[-1] / df_close.iloc[0] - 1
        pchg.append((mean(top5_pchg) - mean(last5_pchg)) / 2)
    return pchg
pchgReverse = GetPchg('reverse')
pchRet20 = GetPchg('ret20')
   # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(111)
netValue1 = []
netValue1.append(1)
netValue2 = []
netValue2.append(1)
for i in range(len(pchgReverse)):
    netValue1.append(netValue1[i]*(1+pchgReverse[i]))
for i in range(len(pchRet20)):
    netValue2.append(netValue2[i]*(1+pchRet20[i]))
ax.plot(range(len(netValue1)), netValue1)
ax.plot(range(len(netValue2)), netValue2)
ticks = [int(x) for x in np.linspace(0, len(dateList)-1, 11)]
plt.xticks(ticks, [dateList[i] for i in ticks])
# 设置图例样式
ax.legend(loc = 2, fontsize = 10)
ax.set_title("Strategy's long_short performances",fontsize=20)
    
<matplotlib.text.Text at 0x7fbeefb7b550>

原始反转因子 Ret20 与理想反转因子的五分组多空对冲净值走势如下所示。理想反转因子的五分组多空对冲总收益为 47.74%,年化波动 31.38%,夏普比率为 0.314,最大回撤为 9.69%;原始反转因子 Ret20 的五分组多空对冲总收益为 16.22%,年化波动 53.88%,夏普比率为 -0.016,最大回撤为 20.88%。

3.3 因子收益的累积过程¶

在本报告中,因子回测均采用月频调仓,但是更高频率的调仓可能也会有需求,针对该情况,本文进行因子收益的累积计算,分别计算从调仓日开始,第一个交易日至第二十个交易日多空组合的累积收益。

pchg = []
for i in range(0, len(long_short) - 20, 20):
    tempPchg = []
    for j in range(20):
        tempPchg.append(long_short[i+j] / long_short[i] - 1)
    pchg.append(tempPchg)
pchgTotal = []
for i in range(20):
    pchgTotal.append(mean(np.array(pchg)[:,i]))
plt.bar(range(len(pchgTotal)), pchgTotal)  
plt.show()  

如上图所示,展示了 N=20 时理想反转因子在月初建仓后(全市场股票、分五组),多空对冲收益的累积过程。由于收益累积过程比较均匀,我们定性地判断,可以尝试做周频调仓或半月调仓。

3.4 分组比例的影响¶

在本文中,高 D 组与低 D 组的交易日各占回溯交易日的一半,也即 N/2 个。如果调整分组的比例,效果会有多大的区别呢?
接下来以 N=60 为例,将单笔成交金额大的 X 个交易日作为高 D 组,将剩余 60-X 个交易日作为低 D 组,遍历 X 的值,分别计算 M 因子的信息比率(IR)

begin_date = '2013-01-01'
end_date = '2018-01-01'
dateList = get_period_date('M',begin_date, end_date)
Trades = pd.read_csv("trades.csv", index_col = 0)
Trades.columns = [normalize_code(code) for code in Trades.columns]
Trades.index = [datetime.datetime.strptime(str(i), "%Y%m%d") for i in Trades.index]
factorData_60 = {}
for date in dateList:
    stockList = get_stock_A(date)
    df_data = get_price(stockList, count = 61, end_date=date, frequency='1d', fields=['money','close'])
    Amount = df_data["money"]
    Amount = Amount.iloc[1:]
    Pchg = df_data["close"].pct_change()
    Pchg = Pchg.iloc[1:]  
    trade = Trades.loc[Pchg.index,Pchg.columns]
    SingleAmount = Amount / trade
    result = pd.DataFrame(index = SingleAmount.columns)
    for j in range(10,51,4):
        M_high = []
        M_low = []
        for i in SingleAmount.columns:
            temp = SingleAmount.sort([i], ascending = False)
            M_high.append((1+Pchg.loc[temp.index[:j], i]).cumprod()[-1] - 1)
            M_low.append((1+Pchg.loc[temp.index[j:], i]).cumprod()[-1] - 1)
        result["M_high"] = M_high
        result["M_low"] = M_low
        result[j] = -1 *(result["M_high"] - result["M_low"])
    factorData_60[date] = result
import scipy.stats as st
def factor_IC_analysis(factorData, begin_date, end_date):  
    dateList = get_period_date('M', begin_date, end_date)
    R_T = pd.DataFrame()
    result = []
    for date in dateList[:-1]:
        #取股票池
        stockList = list(factorData[date].index)
        #获取横截面收益率
        df_close=get_price(stockList, date, dateList[dateList.index(date)+1], 'daily', ['close'])
        if df_close.empty:
            continue
        df_pchg=df_close['close'].iloc[-1,:]/df_close['close'].iloc[0,:]-1
        R_T['pchg']=df_pchg
        IC = []
        for j in range(10,51,4):
            #获取因子数据
            factor_data = factorData[date][j]
            #数据标准化
            factor_data = standardlize(factor_data, inf2nan=True, axis=0)
            R_T['factor'] = factor_data
            R_T = R_T.dropna()
            IC.append(st.pearsonr(R_T.pchg, R_T['factor'])[0])
        result.append(IC)
    return result
result = factor_IC_analysis(factorData_60, begin_date, end_date)
xtick = range(10,51,4)
IC = []
for j in range(len(xtick)):
    IC.append(np.mean(np.array(result)[:,j]))
plt.plot(xtick, IC)
[<matplotlib.lines.Line2D at 0x7fbef7ef1350>]

结果如上图所示。从图中可以发现,当 X 取值在 25 - 50 之间时,均能够取得较好的结果,当 X 为 42 时,IC 值能够获得最大值。

总结¶

以上我们对理想反转因子进行了有效性分析的具体测试,初步得到以下几个结论:
(1)对理想反转因子进行单因子有效性分析,根据因子收益率显著性检验结果,t 值绝对值序列的均值为 3.96,因子 IC 分析结果为 IC 序列均值为 0.0492,IR 值为 0.54,分层回测结果如下:组合 1 能够明显跑赢组合 5,且每个组合都能够跑赢 HS300 指数,且组合 1 能够吗明显获得更高的收益。
(2)对理想反转因子进行深入分析,当 N 分别取 20、40、60 时,IC 分析结果同样效果出色,但是相对而言,随着 N 取值越大,因子有效性不断降低,可见合理选择 N 的取值有利于获取对预测未来收益更为有效性的因子。
(3)针对 HS300 股票池,理想反转因子的五分组多空对冲总收益为 47.74%,年化波动 31.38%,夏普比率为 0.314,最大回撤为 9.69%;原始反转因子 Ret20 的五分组多空对冲总收益为 16.22%,年化波动 53.88%,夏普比率为 -0.016,最大回撤为 20.88%。
(4)由于多空组合收益累积过程比较均匀,因此可以尝试做周频调仓或半月调仓。针对 N=60 的情况,当高 D 组的分组比例 X 取值在 25 - 50 之间时,均能够取得较好的结果,当 X 为 42 时,IC 值能够获得最大值。
(5)为挖掘订单簿信息提供了一定思路,给高频交易数据的挖掘提供一定参考价值。

全部回复

0/140

量化课程

    移动端课程