繁簡切換您正在訪問的是FX168財經網,本網站所提供的內容及信息均遵守中華人民共和國香港特別行政區當地法律法規。

FX168财经网>人物频道>帖子

“订单簿的温度”系列研究(一):反转因子的精细结构

作者/djfjsfj 2019-07-25 12:00 0 来源: FX168财经网人物频道

研究目的:

本文参考东吴证券研报《“订单簿的温度”系列研究(一):反转因子的精细结构》,根据研报分析,A股市场是订单驱动型市场。从动力学的角度讲,股票行情的所有演化过程,都能由订单簿(orderbook)自下而上精确决定。逐笔成交与逐笔委托数据的信息量非常丰富。本篇报告我们从最简单的数据入手,考察了“成交笔数”这个指标。所谓成交笔数,即撮合交易的次数,是从逐笔成交数据中汇总出来的统计量借助成交笔数的信息,对传统反转因子进行切割,首次提出一个理想反转因子, 实现对未来收益的预测,为订单簿因子挖掘提供了一定思路。

研究内容:

(1)研究在订单簿数据中挖掘 alpha 因子,考虑到传统反转因子在稳定性上的困难,本文认为传统反转因子存在动量效应与反转效应,因此借助单笔成交金额信息用于实现 W 切割,切割后形成的新因子称为理想反转因子。
(2)针对全 A 股数据,对理想反转因子进行单因子有效性测试,分别从因子有效性显著性检验、因子 IC 分析以及分层回测这三个角度分析因子有效性。
(3)进一步分析理想反转因子,分别就参数 N 的取值、样本空间的选择、因子收益的累积以及分组比例这四个角度对理想反转因子进行分析。

研究结论:

(1)对理想反转因子进行单因子有效性分析,根据因子收益率显著性检验结果,t 值绝对值序列的均值为 3.96,因子 IC 分析结果为 IC 序列均值为 0.0492,IR 值为 0.54,分层回测结果如下:组合 1 能够明显跑赢组合 5,且每个组合都能够跑赢 HS300 指数,且组合 1 能够吗明显获得更高的收益。
(2)对理想反转因子进行深入分析,当 N 分别取 20、40、60 时,IC 分析结果同样效果出色,但是相对而言,随着 N 取值越大,因子有效性不断降低,可见合理选择 N 的取值有利于获取对预测未来收益更为有效性的因子。
(3)针对 HS300 股票池,理想反转因子的五分组多空对冲总收益为 47.74%,年化波动 31.38%,夏普比率为 0.314,最大回撤 为 9.69%;原始反转因子 Ret20 的五分组多空对冲总收益为 16.22%,年化波动 53.88%,夏普比率为 -0.016,最大回撤为 20.88%。
(4)由于多空组合收益累积过程比较均匀,因此可以尝试做周频调仓或半月调仓。针对 N=60 的情况,当高 D 组的分组比例 X 取值在 25 - 50 之间时,均能够取得较好的结果,当 X 为 42 时,IC 值能够获得最大值。

注:个股每日成交的逐笔数据见 trades.csv

1 数据准备¶

1.1 日期列表获取¶

在每个月的月末对因子数据进行提取,因此需要对每个月的月末日期进行统计。
输入参数分别为 peroid、start_date 和 end_date,其中 peroid 进行周期选择,可选周期为周(W)、月(M)和季(Q),start_date和end_date 分别为开始日期和结束日期。
函数返回值为对应的月末日期。本文选取开始日期为 2013.1.1,结束日期为 2018.1.1。

from jqdata import *
import datetime
import pandas as pd
import numpy as np
from six import StringIO
import warnings
import time
import pickle
from jqfactor import winsorize_med
from jqfactor import neutralize
from jqfactor import standardlize
import statsmodels.api as sm
warnings.filterwarnings("ignore")
#获取指定周期的日期列表 'W、M、Q'
def get_period_date(peroid,start_date, end_date):
    #设定转换周期period_type  转换为周是'W',月'M',季度线'Q',五分钟'5min',12天'12D'
    stock_data = get_price('000001.XSHE',start_date,end_date,'daily',fields=['close'])
    #记录每个周期中最后一个交易日
    stock_data['date']=stock_data.index
    #进行转换,周线的每个变量都等于那一周中最后一个交易日的变量值
    period_stock_data=stock_data.resample(peroid,how='last')
    date=period_stock_data.index
    pydate_array = date.to_pydatetime()
    date_only_array = np.vectorize(lambda s: s.strftime('%Y-%m-%d'))(pydate_array )
    date_only_series = pd.Series(date_only_array)
    start_date = datetime.datetime.strptime(start_date, "%Y-%m-%d")
    start_date=start_date-datetime.timedelta(days=1)
    start_date = start_date.strftime("%Y-%m-%d")
    date_list=date_only_series.values.tolist()
    date_list.insert(0,start_date)
    return date_list
get_period_date('M','2017-01-01', '2018-01-01')
['2016-12-31',
 '2017-01-31',
 '2017-02-28',
 '2017-03-31',
 '2017-04-30',
 '2017-05-31',
 '2017-06-30',
 '2017-07-31',
 '2017-08-31',
 '2017-09-30',
 '2017-10-31',
 '2017-11-30',
 '2017-12-31']

1.2 股票列表获取¶

股票池: 全 A 股
股票筛选:剔除 ST 股票,剔除上市 3 个月内的股票,每只股票视作一个样本
取 2016-08-31 当天的股票成分股

#去除上市距beginDate不足2个月的股票
def delect_stop(stocks,beginDate,n=30*2):
    stockList = []
    beginDate = datetime.datetime.strptime(beginDate, "%Y-%m-%d")
    for stock in stocks:
        start_date = get_security_info(stock).start_date
        if start_date < (beginDate-datetime.timedelta(days = n)).date():
            stockList.append(stock)
    return stockList

#获取股票池
def get_stock_A(begin_date):
    begin_date = str(begin_date)
    stockList = get_index_stocks('000002.XSHG',begin_date)+get_index_stocks('399107.XSHE',begin_date)
    #剔除ST股
    st_data = get_extras('is_st', stockList, count = 1, end_date=begin_date)
    stockList = [stock for stock in stockList if not st_data[stock][0]]
    #剔除停牌、新股及退市股票
    stockList = delect_stop(stockList, begin_date)
    return stockList
get_stock_A("2018-12-31")
[u'600000.XSHG',
 u'600004.XSHG',
 u'600006.XSHG',
 u'600007.XSHG',
 u'600008.XSHG',
 u'600009.XSHG',
 u'600010.XSHG',
 u'600011.XSHG',
 u'600012.XSHG',
 u'600015.XSHG',
 u'600016.XSHG',
 u'600017.XSHG',
 u'600018.XSHG',
 u'600019.XSHG',
 u'600020.XSHG',
 u'600021.XSHG',
 u'600022.XSHG',
 u'600023.XSHG',
 u'600025.XSHG',
 u'600026.XSHG',
 u'600027.XSHG',
 u'600028.XSHG',
 u'600029.XSHG',
 u'600030.XSHG',
 u'600031.XSHG',
 u'600033.XSHG',
 u'600035.XSHG',
 u'600036.XSHG',
 u'600037.XSHG',
 u'600038.XSHG',
 u'600039.XSHG',
 u'600048.XSHG',
 u'600050.XSHG',
 u'600051.XSHG',
 u'600052.XSHG',
 u'600053.XSHG',
 u'600054.XSHG',
 u'600055.XSHG',
 u'600056.XSHG',
 u'600057.XSHG',
 u'600058.XSHG',
 u'600059.XSHG',
 u'600060.XSHG',
 u'600061.XSHG',
 u'600062.XSHG',
 u'600063.XSHG',
 u'600064.XSHG',
 u'600066.XSHG',
 u'600067.XSHG',
 u'600068.XSHG',
 u'600069.XSHG',
 u'600070.XSHG',
 u'600071.XSHG',
 u'600072.XSHG',
 u'600073.XSHG',
 u'600075.XSHG',
 u'600076.XSHG',
 u'600077.XSHG',
 u'600078.XSHG',
 u'600079.XSHG',
 u'600080.XSHG',
 u'600081.XSHG',
 u'600082.XSHG',
 u'600083.XSHG',
 u'600084.XSHG',
 u'600085.XSHG',
 u'600086.XSHG',
 u'600088.XSHG',
 u'600089.XSHG',
 u'600090.XSHG',
 u'600093.XSHG',
 u'600094.XSHG',
 u'600095.XSHG',
 u'600096.XSHG',
 u'600097.XSHG',
 u'600098.XSHG',
 u'600099.XSHG',
 u'600100.XSHG',
 u'600101.XSHG',
 u'600103.XSHG',
 u'600104.XSHG',
 u'600105.XSHG',
 u'600106.XSHG',
 u'600107.XSHG',
 u'600108.XSHG',
 u'600109.XSHG',
 u'600110.XSHG',
 u'600111.XSHG',
 u'600112.XSHG',
 u'600113.XSHG',
 u'600114.XSHG',
 u'600115.XSHG',
 u'600116.XSHG',
 u'600117.XSHG',
 u'600118.XSHG',
 u'600119.XSHG',
 u'600120.XSHG',
 u'600121.XSHG',
 u'600122.XSHG',
 u'600123.XSHG',
 u'600125.XSHG',
 u'600126.XSHG',
 u'600127.XSHG',
 u'600128.XSHG',
 u'600129.XSHG',
 u'600130.XSHG',
 u'600131.XSHG',
 u'600132.XSHG',
 u'600133.XSHG',
 u'600135.XSHG',
 u'600136.XSHG',
 u'600137.XSHG',
 u'600138.XSHG',
 u'600139.XSHG',
 u'600141.XSHG',
 u'600143.XSHG',
 u'600146.XSHG',
 u'600148.XSHG',
 u'600151.XSHG',
 u'600152.XSHG',
 u'600153.XSHG',
 u'600155.XSHG',
 u'600156.XSHG',
 u'600157.XSHG',
 u'600158.XSHG',
 u'600159.XSHG',
 u'600160.XSHG',
 u'600161.XSHG',
 u'600162.XSHG',
 u'600163.XSHG',
 u'600165.XSHG',
 u'600166.XSHG',
 u'600167.XSHG',
 u'600168.XSHG',
 u'600169.XSHG',
 u'600170.XSHG',
 u'600171.XSHG',
 u'600172.XSHG',
 u'600173.XSHG',
 u'600175.XSHG',
 u'600176.XSHG',
 u'600177.XSHG',
 u'600178.XSHG',
 u'600179.XSHG',
 u'600180.XSHG',
 u'600183.XSHG',
 u'600184.XSHG',
 u'600185.XSHG',
 u'600186.XSHG',
 u'600187.XSHG',
 u'600188.XSHG',
 u'600189.XSHG',
 u'600190.XSHG',
 u'600191.XSHG',
 u'600192.XSHG',
 u'600195.XSHG',
 u'600196.XSHG',
 u'600197.XSHG',
 u'600199.XSHG',
 u'600200.XSHG',
 u'600201.XSHG',
 u'600203.XSHG',
 u'600206.XSHG',
 u'600207.XSHG',
 u'600208.XSHG',
 u'600210.XSHG',
 u'600211.XSHG',
 u'600212.XSHG',
 u'600213.XSHG',
 u'600215.XSHG',
 u'600216.XSHG',
 u'600217.XSHG',
 u'600218.XSHG',
 u'600219.XSHG',
 u'600220.XSHG',
 u'600221.XSHG',
 u'600222.XSHG',
 u'600223.XSHG',
 u'600225.XSHG',
 u'600226.XSHG',
 u'600227.XSHG',
 u'600229.XSHG',
 u'600230.XSHG',
 u'600231.XSHG',
 u'600232.XSHG',
 u'600233.XSHG',
 u'600235.XSHG',
 u'600236.XSHG',
 u'600237.XSHG',
 u'600239.XSHG',
 u'600240.XSHG',
 u'600241.XSHG',
 u'600242.XSHG',
 u'600243.XSHG',
 u'600246.XSHG',
 u'600248.XSHG',
 u'600249.XSHG',
 u'600250.XSHG',
 u'600251.XSHG',
 u'600252.XSHG',
 u'600255.XSHG',
 u'600256.XSHG',
 u'600257.XSHG',
 u'600258.XSHG',
 u'600259.XSHG',
 u'600260.XSHG',
 u'600261.XSHG',
 u'600262.XSHG',
 u'600266.XSHG',
 u'600267.XSHG',
 u'600268.XSHG',
 u'600269.XSHG',
 u'600271.XSHG',
 u'600272.XSHG',
 u'600273.XSHG',
 u'600276.XSHG',
 u'600277.XSHG',
 u'600278.XSHG',
 u'600279.XSHG',
 u'600280.XSHG',
 u'600281.XSHG',
 u'600282.XSHG',
 u'600283.XSHG',
 u'600284.XSHG',
 u'600285.XSHG',
 u'600287.XSHG',
 u'600288.XSHG',
 u'600290.XSHG',
 u'600291.XSHG',
 u'600292.XSHG',
 u'600293.XSHG',
 u'600295.XSHG',
 u'600297.XSHG',
 u'600298.XSHG',
 u'600299.XSHG',
 u'600300.XSHG',
 u'600302.XSHG',
 u'600303.XSHG',
 u'600305.XSHG',
 u'600306.XSHG',
 u'600307.XSHG',
 u'600308.XSHG',
 u'600309.XSHG',
 u'600310.XSHG',
 u'600311.XSHG',
 u'600312.XSHG',
 u'600313.XSHG',
 u'600315.XSHG',
 u'600316.XSHG',
 u'600317.XSHG',
 u'600318.XSHG',
 u'600319.XSHG',
 u'600320.XSHG',
 u'600322.XSHG',
 u'600323.XSHG',
 u'600325.XSHG',
 u'600326.XSHG',
 u'600327.XSHG',
 u'600328.XSHG',
 u'600329.XSHG',
 u'600330.XSHG',
 u'600331.XSHG',
 u'600332.XSHG',
 u'600333.XSHG',
 u'600335.XSHG',
 u'600336.XSHG',
 u'600337.XSHG',
 u'600338.XSHG',
 u'600339.XSHG',
 u'600340.XSHG',
 u'600343.XSHG',
 u'600345.XSHG',
 u'600346.XSHG',
 u'600348.XSHG',
 u'600350.XSHG',
 u'600351.XSHG',
 u'600352.XSHG',
 u'600353.XSHG',
 u'600354.XSHG',
 u'600355.XSHG',
 u'600356.XSHG',
 u'600358.XSHG',
 u'600359.XSHG',
 u'600360.XSHG',
 u'600361.XSHG',
 u'600362.XSHG',
 u'600363.XSHG',
 u'600365.XSHG',
 u'600366.XSHG',
 u'600367.XSHG',
 u'600368.XSHG',
 u'600369.XSHG',
 u'600370.XSHG',
 u'600371.XSHG',
 u'600372.XSHG',
 u'600373.XSHG',
 u'600375.XSHG',
 u'600376.XSHG',
 u'600377.XSHG',
 u'600378.XSHG',
 u'600379.XSHG',
 u'600380.XSHG',
 u'600381.XSHG',
 u'600382.XSHG',
 u'600383.XSHG',
 u'600385.XSHG',
 u'600386.XSHG',
 u'600387.XSHG',
 u'600388.XSHG',
 u'600389.XSHG',
 u'600390.XSHG',
 u'600391.XSHG',
 u'600392.XSHG',
 u'600393.XSHG',
 u'600395.XSHG',
 u'600396.XSHG',
 u'600398.XSHG',
 u'600400.XSHG',
 u'600403.XSHG',
 u'600405.XSHG',
 u'600406.XSHG',
 u'600409.XSHG',
 u'600410.XSHG',
 u'600415.XSHG',
 u'600416.XSHG',
 u'600418.XSHG',
 u'600419.XSHG',
 u'600420.XSHG',
 u'600422.XSHG',
 u'600425.XSHG',
 u'600426.XSHG',
 u'600428.XSHG',
 u'600429.XSHG',
 u'600433.XSHG',
 u'600435.XSHG',
 u'600436.XSHG',
 u'600438.XSHG',
 u'600439.XSHG',
 u'600444.XSHG',
 u'600446.XSHG',
 u'600448.XSHG',
 u'600449.XSHG',
 u'600452.XSHG',
 u'600455.XSHG',
 u'600456.XSHG',
 u'600458.XSHG',
 u'600459.XSHG',
 u'600460.XSHG',
 u'600461.XSHG',
 u'600462.XSHG',
 u'600463.XSHG',
 u'600466.XSHG',
 u'600467.XSHG',
 u'600468.XSHG',
 u'600469.XSHG',
 u'600470.XSHG',
 u'600475.XSHG',
 u'600476.XSHG',
 u'600477.XSHG',
 u'600478.XSHG',
 u'600479.XSHG',
 u'600480.XSHG',
 u'600481.XSHG',
 u'600482.XSHG',
 u'600483.XSHG',
 u'600485.XSHG',
 u'600486.XSHG',
 u'600487.XSHG',
 u'600488.XSHG',
 u'600489.XSHG',
 u'600490.XSHG',
 u'600491.XSHG',
 u'600493.XSHG',
 u'600495.XSHG',
 u'600496.XSHG',
 u'600497.XSHG',
 u'600498.XSHG',
 u'600499.XSHG',
 u'600500.XSHG',
 u'600501.XSHG',
 u'600502.XSHG',
 u'600503.XSHG',
 u'600505.XSHG',
 u'600506.XSHG',
 u'600507.XSHG',
 u'600508.XSHG',
 u'600509.XSHG',
 u'600510.XSHG',
 u'600511.XSHG',
 u'600512.XSHG',
 u'600513.XSHG',
 u'600515.XSHG',
 u'600516.XSHG',
 u'600517.XSHG',
 u'600518.XSHG',
 u'600519.XSHG',
 u'600520.XSHG',
 u'600521.XSHG',
 u'600522.XSHG',
 u'600523.XSHG',
 u'600525.XSHG',
 u'600526.XSHG',
 u'600527.XSHG',
 u'600528.XSHG',
 u'600529.XSHG',
 u'600530.XSHG',
 u'600531.XSHG',
 u'600532.XSHG',
 u'600533.XSHG',
 u'600535.XSHG',
 u'600536.XSHG',
 u'600537.XSHG',
 u'600538.XSHG',
 u'600540.XSHG',
 u'600543.XSHG',
 u'600545.XSHG',
 u'600546.XSHG',
 u'600547.XSHG',
 u'600548.XSHG',
 u'600549.XSHG',
 u'600550.XSHG',
 u'600551.XSHG',
 u'600552.XSHG',
 u'600555.XSHG',
 u'600557.XSHG',
 u'600558.XSHG',
 u'600559.XSHG',
 u'600560.XSHG',
 u'600561.XSHG',
 u'600562.XSHG',
 u'600563.XSHG',
 u'600565.XSHG',
 u'600566.XSHG',
 u'600567.XSHG',
 u'600568.XSHG',
 u'600569.XSHG',
 u'600570.XSHG',
 u'600571.XSHG',
 u'600572.XSHG',
 u'600573.XSHG',
 u'600575.XSHG',
 u'600576.XSHG',
 u'600577.XSHG',
 u'600578.XSHG',
 u'600579.XSHG',
 u'600580.XSHG',
 u'600581.XSHG',
 u'600582.XSHG',
 u'600583.XSHG',
 u'600584.XSHG',
 u'600585.XSHG',
 u'600586.XSHG',
 u'600587.XSHG',
 u'600588.XSHG',
 u'600589.XSHG',
 u'600590.XSHG',
 u'600592.XSHG',
 u'600593.XSHG',
 u'600594.XSHG',
 u'600595.XSHG',
 u'600596.XSHG',
 u'600597.XSHG',
 u'600598.XSHG',
 u'600599.XSHG',
 u'600600.XSHG',
 u'600601.XSHG',
 u'600602.XSHG',
 u'600603.XSHG',
 u'600604.XSHG',
 u'600605.XSHG',
 u'600606.XSHG',
 u'600609.XSHG',
 u'600611.XSHG',
 u'600612.XSHG',
 u'600613.XSHG',
 u'600614.XSHG',
 u'600615.XSHG',
 u'600616.XSHG',
 u'600617.XSHG',
 u'600618.XSHG',
 u'600619.XSHG',
 u'600620.XSHG',
 u'600621.XSHG',
 u'600622.XSHG',
 u'600623.XSHG',
 u'600624.XSHG',
 u'600626.XSHG',
 u'600628.XSHG',
 u'600629.XSHG',
 u'600630.XSHG',
 u'600633.XSHG',
 u'600635.XSHG',
 u'600636.XSHG',
 u'600637.XSHG',
 u'600638.XSHG',
 u'600639.XSHG',
 u'600640.XSHG',
 u'600641.XSHG',
 u'600642.XSHG',
 u'600643.XSHG',
 u'600644.XSHG',
 u'600645.XSHG',
 u'600647.XSHG',
 u'600648.XSHG',
 u'600649.XSHG',
 u'600650.XSHG',
 u'600651.XSHG',
 u'600652.XSHG',
 u'600653.XSHG',
 u'600655.XSHG',
 u'600657.XSHG',
 u'600658.XSHG',
 u'600660.XSHG',
 u'600661.XSHG',
 u'600662.XSHG',
 u'600663.XSHG',
 u'600664.XSHG',
 u'600665.XSHG',
 u'600666.XSHG',
 u'600667.XSHG',
 u'600668.XSHG',
 u'600671.XSHG',
 u'600673.XSHG',
 u'600674.XSHG',
 u'600675.XSHG',
 u'600676.XSHG',
 u'600677.XSHG',
 u'600678.XSHG',
 u'600679.XSHG',
 u'600681.XSHG',
 u'600682.XSHG',
 u'600683.XSHG',
 u'600684.XSHG',
 u'600685.XSHG',
 u'600686.XSHG',
 u'600687.XSHG',
 u'600688.XSHG',
 u'600689.XSHG',
 u'600690.XSHG',
 u'600691.XSHG',
 u'600692.XSHG',
 u'600693.XSHG',
 u'600694.XSHG',
 u'600695.XSHG',
 u'600697.XSHG',
 u'600698.XSHG',
 u'600699.XSHG',
 u'600702.XSHG',
 u'600703.XSHG',
 u'600704.XSHG',
 u'600705.XSHG',
 u'600706.XSHG',
 u'600707.XSHG',
 u'600708.XSHG',
 u'600710.XSHG',
 u'600711.XSHG',
 u'600712.XSHG',
 u'600713.XSHG',
 u'600714.XSHG',
 u'600715.XSHG',
 u'600716.XSHG',
 u'600717.XSHG',
 u'600718.XSHG',
 u'600719.XSHG',
 u'600720.XSHG',
 u'600721.XSHG',
 u'600722.XSHG',
 u'600723.XSHG',
 u'600724.XSHG',
 u'600726.XSHG',
 u'600727.XSHG',
 u'600728.XSHG',
 u'600729.XSHG',
 u'600730.XSHG',
 u'600731.XSHG',
 u'600733.XSHG',
 u'600734.XSHG',
 u'600735.XSHG',
 u'600736.XSHG',
 u'600737.XSHG',
 u'600738.XSHG',
 u'600739.XSHG',
 u'600740.XSHG',
 u'600741.XSHG',
 u'600742.XSHG',
 u'600743.XSHG',
 u'600744.XSHG',
 u'600745.XSHG',
 u'600746.XSHG',
 u'600748.XSHG',
 u'600750.XSHG',
 u'600751.XSHG',
 u'600753.XSHG',
 u'600754.XSHG',
 u'600755.XSHG',
 u'600756.XSHG',
 u'600757.XSHG',
 u'600758.XSHG',
 u'600759.XSHG',
 u'600760.XSHG',
 u'600761.XSHG',
 u'600763.XSHG',
 u'600764.XSHG',
 u'600765.XSHG',
 u'600766.XSHG',
 u'600768.XSHG',
 u'600769.XSHG',
 u'600770.XSHG',
 u'600771.XSHG',
 u'600773.XSHG',
 u'600774.XSHG',
 u'600775.XSHG',
 u'600776.XSHG',
 u'600777.XSHG',
 u'600779.XSHG',
 u'600780.XSHG',
 u'600781.XSHG',
 u'600782.XSHG',
 u'600783.XSHG',
 u'600784.XSHG',
 u'600785.XSHG',
 u'600787.XSHG',
 u'600789.XSHG',
 u'600790.XSHG',
 u'600791.XSHG',
 u'600792.XSHG',
 u'600793.XSHG',
 u'600794.XSHG',
 u'600795.XSHG',
 u'600796.XSHG',
 u'600797.XSHG',
 u'600798.XSHG',
 u'600800.XSHG',
 u'600801.XSHG',
 u'600802.XSHG',
 u'600803.XSHG',
 u'600804.XSHG',
 u'600805.XSHG',
 u'600808.XSHG',
 u'600809.XSHG',
 u'600810.XSHG',
 u'600811.XSHG',
 u'600812.XSHG',
 u'600814.XSHG',
 u'600815.XSHG',
 u'600816.XSHG',
 u'600818.XSHG',
 u'600819.XSHG',
 u'600820.XSHG',
 u'600821.XSHG',
 u'600822.XSHG',
 u'600823.XSHG',
 u'600824.XSHG',
 u'600825.XSHG',
 u'600826.XSHG',
 u'600827.XSHG',
 u'600828.XSHG',
 u'600829.XSHG',
 u'600830.XSHG',
 u'600831.XSHG',
 u'600833.XSHG',
 u'600834.XSHG',
 u'600835.XSHG',
 u'600836.XSHG',
 u'600837.XSHG',
 u'600838.XSHG',
 u'600839.XSHG',
 u'600841.XSHG',
 u'600843.XSHG',
 u'600844.XSHG',
 u'600845.XSHG',
 u'600846.XSHG',
 u'600847.XSHG',
 u'600848.XSHG',
 u'600850.XSHG',
 u'600851.XSHG',
 u'600853.XSHG',
 u'600854.XSHG',
 u'600855.XSHG',
 u'600856.XSHG',
 u'600857.XSHG',
 u'600858.XSHG',
 u'600859.XSHG',
 u'600860.XSHG',
 u'600861.XSHG',
 u'600862.XSHG',
 u'600863.XSHG',
 u'600864.XSHG',
 u'600865.XSHG',
 u'600866.XSHG',
 u'600867.XSHG',
 u'600868.XSHG',
 u'600869.XSHG',
 u'600872.XSHG',
 u'600873.XSHG',
 u'600874.XSHG',
 u'600875.XSHG',
 u'600876.XSHG',
 u'600879.XSHG',
 u'600880.XSHG',
 u'600881.XSHG',
 u'600882.XSHG',
 u'600883.XSHG',
 u'600884.XSHG',
 u'600885.XSHG',
 u'600886.XSHG',
 u'600887.XSHG',
 u'600888.XSHG',
 u'600889.XSHG',
 u'600890.XSHG',
 u'600891.XSHG',
 u'600892.XSHG',
 u'600893.XSHG',
 u'600894.XSHG',
 u'600895.XSHG',
 u'600897.XSHG',
 u'600898.XSHG',
 u'600900.XSHG',
 u'600901.XSHG',
 u'600903.XSHG',
 u'600908.XSHG',
 u'600909.XSHG',
 u'600917.XSHG',
 u'600919.XSHG',
 u'600926.XSHG',
 u'600929.XSHG',
 u'600933.XSHG',
 u'600936.XSHG',
 u'600939.XSHG',
 u'600958.XSHG',
 u'600959.XSHG',
 u'600960.XSHG',
 u'600961.XSHG',
 u'600962.XSHG',
 u'600963.XSHG',
 u'600965.XSHG',
 u'600966.XSHG',
 u'600967.XSHG',
 u'600969.XSHG',
 u'600970.XSHG',
 u'600971.XSHG',
 u'600973.XSHG',
 u'600975.XSHG',
 u'600976.XSHG',
 u'600977.XSHG',
 u'600978.XSHG',
 u'600979.XSHG',
 u'600980.XSHG',
 u'600981.XSHG',
 u'600982.XSHG',
 u'600983.XSHG',
 u'600984.XSHG',
 u'600985.XSHG',
 u'600986.XSHG',
 u'600987.XSHG',
 u'600988.XSHG',
 u'600990.XSHG',
 u'600992.XSHG',
 u'600993.XSHG',
 u'600995.XSHG',
 u'600996.XSHG',
 u'600997.XSHG',
 u'600998.XSHG',
 u'600999.XSHG',
 u'601000.XSHG',
 u'601001.XSHG',
 u'601002.XSHG',
 u'601003.XSHG',
 u'601005.XSHG',
 u'601006.XSHG',
 u'601007.XSHG',
 u'601008.XSHG',
 u'601009.XSHG',
 u'601010.XSHG',
 u'601011.XSHG',
 u'601012.XSHG',
 u'601015.XSHG',
 u'601016.XSHG',
 u'601018.XSHG',
 u'601019.XSHG',
 u'601020.XSHG',
 u'601021.XSHG',
 u'601028.XSHG',
 u'601038.XSHG',
 u'601058.XSHG',
 u'601066.XSHG',
 u'601068.XSHG',
 u'601069.XSHG',
 u'601086.XSHG',
 u'601088.XSHG',
 u'601098.XSHG',
 u'601099.XSHG',
 u'601100.XSHG',
 u'601101.XSHG',
 u'601106.XSHG',
 u'601107.XSHG',
 u'601108.XSHG',
 u'601111.XSHG',
 u'601113.XSHG',
 u'601116.XSHG',
 u'601117.XSHG',
 u'601118.XSHG',
 u'601126.XSHG',
 u'601127.XSHG',
 u'601128.XSHG',
 u'601137.XSHG',
 u'601138.XSHG',
 u'601139.XSHG',
 u'601155.XSHG',
 u'601158.XSHG',
 u'601162.XSHG',
 u'601163.XSHG',
 u'601166.XSHG',
 u'601168.XSHG',
 u'601169.XSHG',
 u'601177.XSHG',
 u'601179.XSHG',
 u'601186.XSHG',
 u'601188.XSHG',
 u'601198.XSHG',
 u'601199.XSHG',
 u'601200.XSHG',
 u'601208.XSHG',
 u'601211.XSHG',
 u'601212.XSHG',
 u'601216.XSHG',
 u'601218.XSHG',
 u'601222.XSHG',
 u'601225.XSHG',
 u'601226.XSHG',
 u'601228.XSHG',
 u'601229.XSHG',
 u'601231.XSHG',
 u'601233.XSHG',
 u'601238.XSHG',
 u'601258.XSHG',
 u'601288.XSHG',
 u'601311.XSHG',
 u'601318.XSHG',
 u'601326.XSHG',
 u'601328.XSHG',
 u'601330.XSHG',
 u'601333.XSHG',
 u'601336.XSHG',
 u'601339.XSHG',
 u'601360.XSHG',
 u'601366.XSHG',
 u'601368.XSHG',
 u'601369.XSHG',
 u'601375.XSHG',
 u'601377.XSHG',
 u'601388.XSHG',
 u'601390.XSHG',
 u'601398.XSHG',
 u'601500.XSHG',
 u'601515.XSHG',
 u'601518.XSHG',
 u'601519.XSHG',
 u'601555.XSHG',
 u'601566.XSHG',
 u'601567.XSHG',
 u'601577.XSHG',
 u'601579.XSHG',
 u'601588.XSHG',
 u'601595.XSHG',
 u'601599.XSHG',
 u'601600.XSHG',
 u'601601.XSHG',
 u'601606.XSHG',
 u'601607.XSHG',
 u'601608.XSHG',
 u'601611.XSHG',
 u'601616.XSHG',
 u'601618.XSHG',
 u'601619.XSHG',
 u'601628.XSHG',
 u'601633.XSHG',
 u'601636.XSHG',
 u'601666.XSHG',
 u'601668.XSHG',
 u'601669.XSHG',
 u'601677.XSHG',
 u'601678.XSHG',
 u'601688.XSHG',
 u'601689.XSHG',
 u'601699.XSHG',
 u'601700.XSHG',
 u'601717.XSHG',
 u'601718.XSHG',
 u'601727.XSHG',
 u'601766.XSHG',
 u'601777.XSHG',
 u'601788.XSHG',
 u'601789.XSHG',
 u'601799.XSHG',
 u'601800.XSHG',
 u'601801.XSHG',
 u'601808.XSHG',
 u'601811.XSHG',
 u'601818.XSHG',
 u'601828.XSHG',
 u'601838.XSHG',
 u'601857.XSHG',
 u'601858.XSHG',
 u'601866.XSHG',
 u'601869.XSHG',
 u'601872.XSHG',
 u'601877.XSHG',
 u'601878.XSHG',
 u'601880.XSHG',
 u'601881.XSHG',
 u'601882.XSHG',
 u'601886.XSHG',
 u'601888.XSHG',
 u'601890.XSHG',
 u'601898.XSHG',
 u'601899.XSHG',
 u'601900.XSHG',
 u'601901.XSHG',
 u'601908.XSHG',
 u'601918.XSHG',
 u'601919.XSHG',
 u'601928.XSHG',
 u'601929.XSHG',
 u'601933.XSHG',
 u'601939.XSHG',
 u'601949.XSHG',
 u'601952.XSHG',
 u'601958.XSHG',
 u'601965.XSHG',
 u'601966.XSHG',
 u'601968.XSHG',
 u'601969.XSHG',
 u'601985.XSHG',
 u'601988.XSHG',
 u'601989.XSHG',
 u'601990.XSHG',
 u'601991.XSHG',
 u'601992.XSHG',
 u'601996.XSHG',
 u'601997.XSHG',
 u'601998.XSHG',
 u'601999.XSHG',
 u'603000.XSHG',
 u'603001.XSHG',
 u'603002.XSHG',
 u'603003.XSHG',
 u'603005.XSHG',
 u'603006.XSHG',
 u'603007.XSHG',
 u'603008.XSHG',
 u'603009.XSHG',
 u'603010.XSHG',
 u'603011.XSHG',
 u'603012.XSHG',
 u'603013.XSHG',
 u'603015.XSHG',
 u'603016.XSHG',
 u'603017.XSHG',
 u'603018.XSHG',
 u'603019.XSHG',
 u'603020.XSHG',
 u'603021.XSHG',
 u'603022.XSHG',
 u'603023.XSHG',
 u'603025.XSHG',
 u'603026.XSHG',
 u'603027.XSHG',
 u'603028.XSHG',
 u'603029.XSHG',
 u'603030.XSHG',
 u'603031.XSHG',
 u'603032.XSHG',
 u'603033.XSHG',
 u'603035.XSHG',
 u'603036.XSHG',
 u'603037.XSHG',
 u'603038.XSHG',
 u'603039.XSHG',
 u'603040.XSHG',
 u'603041.XSHG',
 u'603042.XSHG',
 u'603043.XSHG',
 u'603045.XSHG',
 u'603050.XSHG',
 u'603055.XSHG',
 u'603056.XSHG',
 u'603058.XSHG',
 u'603059.XSHG',
 u'603060.XSHG',
 u'603063.XSHG',
 u'603066.XSHG',
 u'603067.XSHG',
 u'603069.XSHG',
 u'603076.XSHG',
 u'603077.XSHG',
 u'603078.XSHG',
 u'603079.XSHG',
 u'603080.XSHG',
 u'603081.XSHG',
 ...]

1.3 数据获取¶

具体因子的计算步骤如下所示:
(1)在每个月底,对于股票 s 回溯其过去 N 个交易日的数据(为方便处理, N 取偶数);
(2)对于股票 s 逐日计算平均单笔成交金额 D(D 当日成交金额 当日成交笔数),将 N 个交易日按 D 值从大到小排序,前 N/2 个交易日称为高 D 组,后 N/2 个交易日称为低 D组;
(3)对于股票 s ,将高 D 组交易日的涨跌幅加总,得到因子 M_high;将低 D 组交易日的涨跌幅加总,得到因子 M_low;
(4)对于所有股票,分别按照上述流程计算因子值。
反转因子的计算公式如下所示:

$M = M\_high - M\_low$
begin_date = '2013-01-01'
end_date = '2018-01-01'
dateList = get_period_date('M',begin_date, end_date)
Trades = pd.read_csv("trades.csv", index_col = 0)
Trades.columns = [normalize_code(code) for code in Trades.columns]
Trades.index = [datetime.datetime.strptime(str(i), "%Y%m%d") for i in Trades.index]
factorData = {}
for date in dateList:
    stockList = get_stock_A(date)
    df_data = get_price(stockList, count = 21, end_date=date, frequency='1d', fields=['money','close'])
    Amount = df_data["money"]
    Amount = Amount.iloc[1:]
    Pchg = df_data["close"].pct_change()
    Pchg = Pchg.iloc[1:]  
    trade = Trades.loc[Pchg.index,Pchg.columns]
    SingleAmount = Amount / trade
    result = pd.DataFrame(index = SingleAmount.columns)
    M_high = []
    M_low = []
    for i in SingleAmount.columns:
        temp = SingleAmount.sort([i], ascending = False)
        M_high.append((1+Pchg.loc[temp.index[:10], i]).cumprod()[-1] - 1)
        M_low.append((1+Pchg.loc[temp.index[10:], i]).cumprod()[-1] - 1)
    result["M_high"] = M_high
    result["M_low"] = M_low
    result["reverse"] = -1 *(result["M_high"] - result["M_low"])
    result["ret20"] = df_data["close"].iloc[-1] / df_data["close"].iloc[0] - 1
    factorData[date] = result
content = pickle.dumps(factorData) 
write_file('factorData.pkl', content, append=False)
factorData['2017-12-31'].head()
M_high M_low reverse ret20
600000.XSHG 0.004776 -0.029655 -0.034431 -0.025020
600004.XSHG 0.023991 0.021090 -0.002901 0.045586
600006.XSHG 0.016217 -0.034556 -0.050773 -0.018900
600007.XSHG 0.003648 -0.019194 -0.022842 -0.015616
600008.XSHG 0.038771 -0.081927 -0.120698 -0.046332

2 单因子有效性分析¶

2.1 因子收益率显著性检验¶

主要通过 T 检验分析,根据APT模型,对历史数据进行进行多元线性回归,从而得到需要分析的因子收益率的 t 值,然后进行以下两个方面的分析:
(1)t 值绝对值序列的均值: 之所以要取绝对值,是因为只要 t 值显著不等于 0 即可以认为在当期,因子和收益率存在明显的相关性。但是这种相关性有的时候为正,有的时候为负,如果不取绝对值,则很多正负抵消,会低估因子的有效性;
(2)t 值绝对值序列大于2的比例: 检验 |t| > 2 的比例主要是为了保证 |t| 平均值的稳定性, 避免出现少数数值特别大的样本值拉高均值。

def factor_t_test(factorData, begin_date, end_date):
    dateList = get_period_date('M', begin_date, end_date)
    WLS_params = {}
    WLS_t_test = {}
    for date in dateList[:-1]:
        R_T = pd.DataFrame()
        #取股票池
        stockList = list(factorData[date].index)
        #获取横截面收益率
        df_close = get_price(stockList, date, dateList[dateList.index(date)+1], 'daily', ['close'])
        if df_close.empty:
            continue
        df_pchg=df_close['close'].iloc[-1,:]/df_close['close'].iloc[0,:]-1
        R_T['pchg'] = df_pchg
        #获取因子数据
        factor_data = -1*factorData[date]["reverse"]
        #factor_data = winsorize_med(factor_data, scale=1, inclusive=True, inf2nan=True, axis=0)
        # 行业市值中性化
        #factor_data = neutralize(factor_data, how=['sw_l1', 'market_cap'], date=dateList[0], axis=0)
        #数据标准化
        factor_data = standardlize(factor_data, inf2nan=True, axis=0)
        R_T['factor'] = factor_data
        R_T = R_T.dropna()
        X = R_T['factor']
        y = R_T['pchg']   
        # WLS回归
        wls = sm.OLS(y, X)
        result = wls.fit()
        WLS_params[date] = result.params[-1]
        WLS_t_test[date] = result.tvalues[-1]  
    t_test = pd.Series(WLS_t_test).dropna()
    print 't值序列绝对值平均值: ',np.sum(np.abs(t_test.values))/len(t_test)
    n = [x for x in t_test.values if np.abs(x)>2]
    print 't值序列均值的绝对值除以t值序列的标准差: ',np.abs(t_test.mean())/t_test.std()
    return WLS_t_test
WLS_t_test = factor_t_test(factorData, begin_date, end_date)
t值序列绝对值平均值:  3.96367285165
t值序列均值的绝对值除以t值序列的标准差:  0.592209956948

根据上面结果分析,t 值绝对值序列的均值为 3.96,符合大于 2 的特征,且 t 值绝对值序列大于 2 的比例为 59.27%,根据因子收益率显著性检验的标准,该因子为有效因子。

2.2 因子 IC 分析¶

因子 k 的 IC 值一般是指个股第T期在因子k上的暴露度与 T + 1期的收益率的相关系数。当得到因子 IC 值序列后,我们可以仿照上一小节 t 检验的分析方法进行计算:
(1)IC 值序列的均值及绝对值均值: 判断因子有效性;
(2)IC 值序列的标准差:判断因子稳定性;
(3)IC 值系列的均值与标准差比值(IR):分析分析有效性
(4)IC 值序列大于零(或小于零)的占比:判断因子效果的一致性。

import scipy.stats as st
def factor_IC_analysis(factorData, begin_date, end_date, rule='normal'):  
    dateList = get_period_date('M', begin_date, end_date)
    IC = {}
    R_T = pd.DataFrame()
    for date in dateList[:-1]:
        #取股票池
        stockList = list(factorData[date].index)
        #获取横截面收益率
        df_close=get_price(stockList, date, dateList[dateList.index(date)+1], 'daily', ['close'])
        if df_close.empty:
            continue
        df_pchg=df_close['close'].iloc[-1,:]/df_close['close'].iloc[0,:]-1
        R_T['pchg']=df_pchg
        #获取因子数据
        factor_data = factorData[date]["reverse"]
        #factor_data = winsorize_med(factor_data, scale=1, inclusive=True, inf2nan=True, axis=0)
        # 行业市值中性化
        #factor_data = neutralize(factor_data, how=['sw_l1', 'market_cap'], date=dateList[0], axis=0)
        #数据标准化
        factor_data = standardlize(factor_data, inf2nan=True, axis=0)
        R_T['factor'] = factor_data
        R_T = R_T.dropna()
        if rule=='normal':
            IC[date]=st.pearsonr(R_T.pchg, R_T['factor'])[0]
        elif rule=='rank':
            IC[date]=st.pearsonr(R_T.pchg.rank(), R_T['factor'].rank())[0]
    IC = pd.Series(IC).dropna()
    print 'IC 值序列的均值大小',IC.mean()
    print 'IC值序列绝对值的均值大小',np.mean(np.abs(IC))
    print 'IC 值序列的标准差',IC.std()
    print 'IR 比率(IC值序列均值与标准差的比值)',IC.mean()/IC.std()
    n = [x for x in IC.values if x>0]
    print 'IC 值序列大于零的占比',len(n)/float(len(IC))
factor_IC_analysis(factorData, begin_date, end_date)
IC 值序列的均值大小 0.0492137327837
IC值序列绝对值的均值大小 0.0851283558956
IC 值序列的标准差 0.0917614290168
IR 比率(IC值序列均值与标准差的比值) 0.536322650061
IC 值序列大于零的占比 0.65

由上可知,IC 序列均值为 0.0492,IR 值为 0.54,IC 值序列大于 0 占比为 65%,由这几个指标可以看出,该因子收益预测稳定性较高,符合因子 IC 分析的筛选条件,判断该因子为有效因子。

2.3 分层回测¶

策略步骤:
(1)在每个月最后一个交易日,统计全 A 股反转因子值(M)的值;
(2)根据反转因子(M)值按照从小到大的顺序排序,并将其等分为 5 组
(3)每个调仓日对每组股票池进行调仓交易,从而获得 5 组股票组合的收益曲线
评价方法: 回测年化收益率、夏普比率、最大回撤、胜率等。

#1 先导入所需要的程序包
import datetime
import numpy as np 
import pandas as pd
import time
from jqdata import *
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
import itertools
import copy
import pickle

# 定义类'参数分析'
class parameter_analysis(object):
    
    # 定义函数中不同的变量
    def __init__(self, algorithm_id=None):
        self.algorithm_id = algorithm_id            # 回测id
        
        self.params_df = pd.DataFrame()             # 回测中所有调参备选值的内容,列名字为对应修改面两名称,对应回测中的 g.XXXX
        self.results = {}                           # 回测结果的回报率,key 为 params_df 的行序号,value 为
        self.evaluations = {}                       # 回测结果的各项指标,key 为 params_df 的行序号,value 为一个 dataframe
        self.backtest_ids = {}                      # 回测结果的 id
        
        # 新加入的基准的回测结果 id,可以默认为空 '',则使用回测中设定的基准
        self.benchmark_id = 'ae0684d86e9e7128b1ab9c7d77893029'                      
        
        self.benchmark_returns = []                 # 新加入的基准的回测回报率
        self.returns = {}                           # 记录所有回报率
        self.excess_returns = {}                    # 记录超额收益率
        self.log_returns = {}                       # 记录收益率的 log 值
        self.log_excess_returns = {}                # 记录超额收益的 log 值
        self.dates = []                             # 回测对应的所有日期
        self.excess_max_drawdown = {}               # 计算超额收益的最大回撤
        self.excess_annual_return = {}              # 计算超额收益率的年化指标
        self.evaluations_df = pd.DataFrame()        # 记录各项回测指标,除日回报率外
    
    # 定义排队运行多参数回测函数
    def run_backtest(self,                          #
                     algorithm_id=None,             # 回测策略id
                     running_max=10,                # 回测中同时巡行最大回测数量
                     start_date='2006-01-01',       # 回测的起始日期
                     end_date='2016-11-30',         # 回测的结束日期
                     frequency='day',               # 回测的运行频率
                     initial_cash='1000000',        # 回测的初始持仓金额
                     param_names=[],                # 回测中调整参数涉及的变量
                     param_values=[]                # 回测中每个变量的备选参数值
                     ):
        # 当此处回测策略的 id 没有给出时,调用类输入的策略 id
        if algorithm_id == None: algorithm_id=self.algorithm_id
        
        # 生成所有参数组合并加载到 df 中
        # 包含了不同参数具体备选值的排列组合中一组参数的 tuple 的 list
        param_combinations = list(itertools.product(*param_values))
        # 生成一个 dataframe, 对应的列为每个调参的变量,每个值为调参对应的备选值
        to_run_df = pd.DataFrame(param_combinations)
        # 修改列名称为调参变量的名字
        to_run_df.columns = param_names
        
        # 设定运行起始时间和保存格式
        start = time.time()
        # 记录结束的运行回测
        finished_backtests = {}
        # 记录运行中的回测
        running_backtests = {}
        # 计数器
        pointer = 0
        # 总运行回测数目,等于排列组合中的元素个数
        total_backtest_num = len(param_combinations)
        # 记录回测结果的回报率
        all_results = {}
        # 记录回测结果的各项指标
        all_evaluations = {}
        
        # 在运行开始时显示
        print '【已完成|运行中|待运行】:', 
        # 当运行回测开始后,如果没有全部运行完全的话:
        while len(finished_backtests)<total_backtest_num:
            # 显示运行、完成和待运行的回测个数
            print('[%s|%s|%s].' % (len(finished_backtests), 
                                   len(running_backtests), 
                                   (total_backtest_num-len(finished_backtests)-len(running_backtests)) )),
            # 记录当前运行中的空位数量
            to_run = min(running_max-len(running_backtests), total_backtest_num-len(running_backtests)-len(finished_backtests))
            # 把可用的空位进行跑回测
            for i in range(pointer, pointer+to_run):
                # 备选的参数排列组合的 df 中第 i 行变成 dict,每个 key 为列名字,value 为 df 中对应的值
                params = to_run_df.ix[i].to_dict()
                # 记录策略回测结果的 id,调整参数 extras 使用 params 的内容
                backtest = create_backtest(algorithm_id = algorithm_id,
                                           start_date = start_date, 
                                           end_date = end_date, 
                                           frequency = frequency, 
                                           initial_cash = initial_cash, 
                                           extras = params, 
                                           # 再回测中把改参数的结果起一个名字,包含了所有涉及的变量参数值
                                           name = str(params)
                                           )
                # 记录运行中 i 回测的回测 id
                running_backtests[i] = backtest
            # 计数器计数运行完的数量    
            pointer = pointer+to_run
            
            # 获取回测结果
            failed = []
            finished = []
            # 对于运行中的回测,key 为 to_run_df 中所有排列组合中的序数
            for key in running_backtests.keys():
                # 研究调用回测的结果,running_backtests[key] 为运行中保存的结果 id
                bt = get_backtest(running_backtests[key])
                # 获得运行回测结果的状态,成功和失败都需要运行结束后返回,如果没有返回则运行没有结束
                status = bt.get_status()
                # 当运行回测失败
                if status == 'failed':
                    # 失败 list 中记录对应的回测结果 id
                    failed.append(key)
                # 当运行回测成功时
                elif status == 'done':
                    # 成功 list 记录对应的回测结果 id,finish 仅记录运行成功的
                    finished.append(key)
                    # 回测回报率记录对应回测的回报率 dict, key to_run_df 中所有排列组合中的序数, value 为回报率的 dict
                    # 每个 value 一个 list 每个对象为一个包含时间、日回报率和基准回报率的 dict
                    all_results[key] = bt.get_results()
                    # 回测回报率记录对应回测结果指标 dict, key to_run_df 中所有排列组合中的序数, value 为回测结果指标的 dataframe
                    all_evaluations[key] = bt.get_risk()
            # 记录运行中回测结果 id 的 list 中删除失败的运行
            for key in failed:
                running_backtests.pop(key)
            # 在结束回测结果 dict 中记录运行成功的回测结果 id,同时在运行中的记录中删除该回测
            for key in finished:
                finished_backtests[key] = running_backtests.pop(key)
            # 当一组同时运行的回测结束时报告时间
            if len(finished_backtests) != 0 and len(finished_backtests) % running_max == 0 and to_run !=0:
                # 记录当时时间
                middle = time.time()
                # 计算剩余时间,假设没工作量时间相等的话
                remain_time = (middle - start) * (total_backtest_num - len(finished_backtests)) / len(finished_backtests)
                # print 当前运行时间
                print('[已用%s时,尚余%s时,请不要关闭浏览器].' % (str(round((middle - start) / 60.0 / 60.0,3)), 
                                          str(round(remain_time / 60.0 / 60.0,3)))),
            # 5秒钟后再跑一下
            time.sleep(5) 
        # 记录结束时间
        end = time.time() 
        print ''
        print('【回测完成】总用时:%s秒(即%s小时)。' % (str(int(end-start)), 
                                           str(round((end-start)/60.0/60.0,2)))),
        # 对应修改类内部对应
        self.params_df = to_run_df
        self.results = all_results
        self.evaluations = all_evaluations
        self.backtest_ids = finished_backtests

        
    #7 最大回撤计算方法
    def find_max_drawdown(self, returns):
        # 定义最大回撤的变量
        result = 0
        # 记录最高的回报率点
        historical_return = 0
        # 遍历所有日期
        for i in range(len(returns)):
            # 最高回报率记录
            historical_return = max(historical_return, returns[i])
            # 最大回撤记录
            drawdown = 1-(returns[i] + 1) / (historical_return + 1)
            # 记录最大回撤
            result = max(drawdown, result)
        # 返回最大回撤值
        return result

    # log 收益、新基准下超额收益和相对与新基准的最大回撤
    def organize_backtest_results(self, benchmark_id=None):
        # 若新基准的回测结果 id 没给出
        if benchmark_id==None:
            # 使用默认的基准回报率,默认的基准在回测策略中设定
            self.benchmark_returns = [x['benchmark_returns'] for x in self.results[0]]
        # 当新基准指标给出后    
        else:
            # 基准使用新加入的基准回测结果
            self.benchmark_returns = [x['returns'] for x in get_backtest(benchmark_id).get_results()]
        # 回测日期为结果中记录的第一项对应的日期
        self.dates = [x['time'] for x in self.results[0]]
        
        # 对应每个回测在所有备选回测中的顺序 (key),生成新数据
        # 由 {key:{u'benchmark_returns': 0.022480100091729405,
        #           u'returns': 0.03184566700000002,
        #           u'time': u'2006-02-14'}} 格式转化为:
        # {key: []} 格式,其中 list 为对应 date 的一个回报率 list
        for key in self.results.keys():
            self.returns[key] = [x['returns'] for x in self.results[key]]
        # 生成对于基准(或新基准)的超额收益率
        for key in self.results.keys():
            self.excess_returns[key] = [(x+1)/(y+1)-1 for (x,y) in zip(self.returns[key], self.benchmark_returns)]
        # 生成 log 形式的收益率
        for key in self.results.keys():
            self.log_returns[key] = [log(x+1) for x in self.returns[key]]
        # 生成超额收益率的 log 形式
        for key in self.results.keys():
            self.log_excess_returns[key] = [log(x+1) for x in self.excess_returns[key]]
        # 生成超额收益率的最大回撤
        for key in self.results.keys():
            self.excess_max_drawdown[key] = self.find_max_drawdown(self.excess_returns[key])
        # 生成年化超额收益率
        for key in self.results.keys():
            self.excess_annual_return[key] = (self.excess_returns[key][-1]+1)**(252./float(len(self.dates)))-1
        # 把调参数据中的参数组合 df 与对应结果的 df 进行合并
        self.evaluations_df = pd.concat([self.params_df, pd.DataFrame(self.evaluations).T], axis=1)
#         self.evaluations_df = 

    # 获取最总分析数据,调用排队回测函数和数据整理的函数    
    def get_backtest_data(self,
                          algorithm_id=None,                         # 回测策略id
                          benchmark_id=None,                         # 新基准回测结果id
                          file_name='results.pkl',                   # 保存结果的 pickle 文件名字
                          running_max=10,                            # 最大同时运行回测数量
                          start_date='2006-01-01',                   # 回测开始时间
                          end_date='2016-11-30',                     # 回测结束日期
                          frequency='day',                           # 回测的运行频率
                          initial_cash='1000000',                    # 回测初始持仓资金
                          param_names=[],                            # 回测需要测试的变量
                          param_values=[]                            # 对应每个变量的备选参数
                          ):
        # 调运排队回测函数,传递对应参数
        self.run_backtest(algorithm_id=algorithm_id,
                          running_max=running_max,
                          start_date=start_date,
                          end_date=end_date,
                          frequency=frequency,
                          initial_cash=initial_cash,
                          param_names=param_names,
                          param_values=param_values
                          )
        # 回测结果指标中加入 log 收益率和超额收益率等指标
        self.organize_backtest_results(benchmark_id)
        # 生成 dict 保存所有结果。
        results = {'returns':self.returns,
                   'excess_returns':self.excess_returns,
                   'log_returns':self.log_returns,
                   'log_excess_returns':self.log_excess_returns,
                   'dates':self.dates,
                   'benchmark_returns':self.benchmark_returns,
                   'evaluations':self.evaluations,
                   'params_df':self.params_df,
                   'backtest_ids':self.backtest_ids,
                   'excess_max_drawdown':self.excess_max_drawdown,
                   'excess_annual_return':self.excess_annual_return,
                   'evaluations_df':self.evaluations_df}
        # 保存 pickle 文件
        pickle_file = open(file_name, 'wb')
        pickle.dump(results, pickle_file)
        pickle_file.close()

    # 读取保存的 pickle 文件,赋予类中的对象名对应的保存内容    
    def read_backtest_data(self, file_name='results.pkl'):
        pickle_file = open(file_name, 'rb')
        results = pickle.load(pickle_file)
        self.returns = results['returns']
        self.excess_returns = results['excess_returns']
        self.log_returns = results['log_returns']
        self.log_excess_returns = results['log_excess_returns']
        self.dates = results['dates']
        self.benchmark_returns = results['benchmark_returns']
        self.evaluations = results['evaluations']
        self.params_df = results['params_df']
        self.backtest_ids = results['backtest_ids']
        self.excess_max_drawdown = results['excess_max_drawdown']
        self.excess_annual_return = results['excess_annual_return']
        self.evaluations_df = results['evaluations_df']
        
    # 回报率折线图    
    def plot_returns(self):
        # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
        fig = plt.figure(figsize=(20,8))
        ax = fig.add_subplot(111)
        # 作图
        for key in self.returns.keys():
            ax.plot(range(len(self.returns[key])), self.returns[key], label=key)
        # 设定benchmark曲线并标记
        ax.plot(range(len(self.benchmark_returns)), self.benchmark_returns, label='benchmark', c='k', linestyle='--') 
        ticks = [int(x) for x in np.linspace(0, len(self.dates)-1, 11)]
        plt.xticks(ticks, [self.dates[i] for i in ticks])
        # 设置图例样式
        ax.legend(loc = 2, fontsize = 10)
        # 设置y标签样式
        ax.set_ylabel('returns',fontsize=20)
        # 设置x标签样式
        ax.set_yticklabels([str(x*100)+'% 'for x in ax.get_yticks()])
        # 设置图片标题样式
        ax.set_title("Strategy's performances with different parameters", fontsize=21)
        plt.xlim(0, len(self.returns[0]))
    
    # 多空组合图
    def plot_long_short(self):
       # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
        fig = plt.figure(figsize=(20,8))
        ax = fig.add_subplot(111)
        # 作图
        a1 = [i+1 for i in self.returns[0]]
        a2 = [i+1 for i in self.returns[4]]
        a1.insert(0,1)   
        a2.insert(0,1)
        b = []
        for i in range(len(a1)-1):
            b.append((a1[i+1]/a1[i]-a2[i+1]/a2[i])/2)
        c = []
        c.append(1)
        for i in range(len(b)):
            c.append(c[i]*(1+b[i]))
        ax.plot(range(len(c)), c)
        ticks = [int(x) for x in np.linspace(0, len(self.dates)-1, 11)]
        plt.xticks(ticks, [self.dates[i] for i in ticks])
        # 设置图例样式
        ax.legend(loc = 2, fontsize = 10)
        ax.set_title("Strategy's long_short performances",fontsize=20)
        # 设置图片标题样式
        plt.xlim(0, len(c))     
        return c
        
    # 获取不同年份的收益及排名分析
    def get_profit_year(self):
        profit_year = {}
        for key in self.returns.keys():
            temp = []
            date_year = []
            for i in range(len(self.dates)-1):
                if self.dates[i][:4] != self.dates[i+1][:4]:
                    temp.append(self.returns[key][i])
                    date_year.append(self.dates[i][:4])
            temp.append(self.returns[key][-1])
            date_year.append(self.dates[-1][:4]) 
            temp1 = []
            temp1.append(temp[0])
            for i in range(len(temp)-1):
                temp1.append((temp[i+1]+1)/(temp[i]+1)-1)
            profit_year[key] = temp1
        result = pd.DataFrame(index = list(self.returns.keys()), columns = date_year)
        for key in self.returns.keys():
            result.loc[key,:] = profit_year[key]
        return result
            
    # 超额收益率图    
    def plot_excess_returns(self):
        
        # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
        fig = plt.figure(figsize=(20,8))
        ax = fig.add_subplot(111)
        # 作图
        for key in self.returns.keys():
            ax.plot(range(len(self.excess_returns[key])), self.excess_returns[key], label=key)
        # 设定benchmark曲线并标记
        ax.plot(range(len(self.benchmark_returns)), [0]*len(self.benchmark_returns), label='benchmark', c='k', linestyle='--')
        ticks = [int(x) for x in np.linspace(0, len(self.dates)-1, 11)]
        plt.xticks(ticks, [self.dates[i] for i in ticks])
        # 设置图例样式
        ax.legend(loc = 2, fontsize = 10)
        # 设置y标签样式
        ax.set_ylabel('excess returns',fontsize=20)
        # 设置x标签样式
        ax.set_yticklabels([str(x*100)+'% 'for x in ax.get_yticks()])
        # 设置图片标题样式
        ax.set_title("Strategy's performances with different parameters", fontsize=21)
        plt.xlim(0, len(self.excess_returns[0]))
        
    # log回报率图    
    def plot_log_returns(self):
        # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
        fig = plt.figure(figsize=(20,8))
        ax = fig.add_subplot(111)
        # 作图
        for key in self.returns.keys():
            ax.plot(range(len(self.log_returns[key])), self.log_returns[key], label=key)
        # 设定benchmark曲线并标记
        ax.plot(range(len(self.benchmark_returns)), [log(x+1) for x in self.benchmark_returns], label='benchmark', c='k', linestyle='--')
        ticks = [int(x) for x in np.linspace(0, len(self.dates)-1, 11)]
        plt.xticks(ticks, [self.dates[i] for i in ticks])
        # 设置图例样式
        ax.legend(loc = 2, fontsize = 10)
        # 设置y标签样式
        ax.set_ylabel('log returns',fontsize=20)
        # 设置图片标题样式
        ax.set_title("Strategy's performances with different parameters", fontsize=21)
        plt.xlim(0, len(self.log_returns[0]))
    
    # 超额收益率的 log 图
    def plot_log_excess_returns(self):
        # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
        fig = plt.figure(figsize=(20,8))
        ax = fig.add_subplot(111)
        # 作图
        for key in self.returns.keys():
            ax.plot(range(len(self.log_excess_returns[key])), self.log_excess_returns[key], label=key)
        # 设定benchmark曲线并标记
        ax.plot(range(len(self.benchmark_returns)), [0]*len(self.benchmark_returns), label='benchmark', c='k', linestyle='--')
        ticks = [int(x) for x in np.linspace(0, len(self.dates)-1, 11)]
        plt.xticks(ticks, [self.dates[i] for i in ticks])
        # 设置图例样式
        ax.legend(loc = 2, fontsize = 10)
        # 设置y标签样式
        ax.set_ylabel('log excess returns',fontsize=20)
        # 设置图片标题样式
        ax.set_title("Strategy's performances with different parameters", fontsize=21)
        plt.xlim(0, len(self.log_excess_returns[0]))

        
    # 回测的4个主要指标,包括总回报率、最大回撤夏普率和波动
    def get_eval4_bar(self, sort_by=[]): 
        
        sorted_params = self.params_df
        for by in sort_by:
            sorted_params = sorted_params.sort(by)
        indices = sorted_params.index
        
        fig = plt.figure(figsize=(20,7))

        # 定义位置
        ax1 = fig.add_subplot(221)
        # 设定横轴为对应分位,纵轴为对应指标
        ax1.bar(range(len(indices)), 
                [self.evaluations[x]['algorithm_return'] for x in indices], 0.6, label = 'Algorithm_return')
        plt.xticks([x+0.3 for x in range(len(indices))], indices)
        # 设置图例样式
        ax1.legend(loc='best',fontsize=15)
        # 设置y标签样式
        ax1.set_ylabel('Algorithm_return', fontsize=15)
        # 设置y标签样式
        ax1.set_yticklabels([str(x*100)+'% 'for x in ax1.get_yticks()])
        # 设置图片标题样式
        ax1.set_title("Strategy's of Algorithm_return performances of different quantile", fontsize=15)
        # x轴范围
        plt.xlim(0, len(indices))

        # 定义位置
        ax2 = fig.add_subplot(224)
        # 设定横轴为对应分位,纵轴为对应指标
        ax2.bar(range(len(indices)), 
                [self.evaluations[x]['max_drawdown'] for x in indices], 0.6, label = 'Max_drawdown')
        plt.xticks([x+0.3 for x in range(len(indices))], indices)
        # 设置图例样式
        ax2.legend(loc='best',fontsize=15)
        # 设置y标签样式
        ax2.set_ylabel('Max_drawdown', fontsize=15)
        # 设置x标签样式
        ax2.set_yticklabels([str(x*100)+'% 'for x in ax2.get_yticks()])
        # 设置图片标题样式
        ax2.set_title("Strategy's of Max_drawdown performances of different quantile", fontsize=15)
        # x轴范围
        plt.xlim(0, len(indices))
        # 定义位置
        ax3 = fig.add_subplot(223)
        # 设定横轴为对应分位,纵轴为对应指标
        ax3.bar(range(len(indices)),
                [self.evaluations[x]['sharpe'] for x in indices], 0.6, label = 'Sharpe')
        plt.xticks([x+0.3 for x in range(len(indices))], indices)
        # 设置图例样式
        ax3.legend(loc='best',fontsize=15)
        # 设置y标签样式
        ax3.set_ylabel('Sharpe', fontsize=15)
        # 设置x标签样式
        ax3.set_yticklabels([str(x*100)+'% 'for x in ax3.get_yticks()])
        # 设置图片标题样式
        ax3.set_title("Strategy's of Sharpe performances of different quantile", fontsize=15)
        # x轴范围
        plt.xlim(0, len(indices))

        # 定义位置
        ax4 = fig.add_subplot(222)
        # 设定横轴为对应分位,纵轴为对应指标
        ax4.bar(range(len(indices)), 
                [self.evaluations[x]['algorithm_volatility'] for x in indices], 0.6, label = 'Algorithm_volatility')
        plt.xticks([x+0.3 for x in range(len(indices))], indices)
        # 设置图例样式
        ax4.legend(loc='best',fontsize=15)
        # 设置y标签样式
        ax4.set_ylabel('Algorithm_volatility', fontsize=15)
        # 设置x标签样式
        ax4.set_yticklabels([str(x*100)+'% 'for x in ax4.get_yticks()])
        # 设置图片标题样式
        ax4.set_title("Strategy's of Algorithm_volatility performances of different quantile", fontsize=15)
        # x轴范围
        plt.xlim(0, len(indices))
        
    #14 年化回报和最大回撤,正负双色表示
    def get_eval(self, sort_by=[]):

        sorted_params = self.params_df
        for by in sort_by:
            sorted_params = sorted_params.sort(by)
        indices = sorted_params.index
        
        # 大小
        fig = plt.figure(figsize = (20, 8))
        # 图1位置
        ax = fig.add_subplot(111)
        # 生成图超额收益率的最大回撤
        ax.bar([x+0.3 for x in range(len(indices))],
               [-self.evaluations[x]['max_drawdown'] for x in indices], color = '#32CD32',  
                     width = 0.6, label = 'Max_drawdown', zorder=10)
        # 图年化超额收益
        ax.bar([x for x in range(len(indices))],
               [self.evaluations[x]['annual_algo_return'] for x in indices], color = 'r', 
                     width = 0.6, label = 'Annual_return')
        plt.xticks([x+0.3 for x in range(len(indices))], indices)
        # 设置图例样式
        ax.legend(loc='best',fontsize=15)
        # 基准线
        plt.plot([0, len(indices)], [0, 0], c='k', 
                 linestyle='--', label='zero')
        # 设置图例样式
        ax.legend(loc='best',fontsize=15)
        # 设置y标签样式
        ax.set_ylabel('Max_drawdown', fontsize=15)
        # 设置x标签样式
        ax.set_yticklabels([str(x*100)+'% 'for x in ax.get_yticks()])
        # 设置图片标题样式
        ax.set_title("Strategy's performances of different quantile", fontsize=15)
        #   设定x轴长度
        plt.xlim(0, len(indices))

    #14 超额收益的年化回报和最大回撤
    # 加入新的benchmark后超额收益和
    def get_excess_eval(self, sort_by=[]):

        sorted_params = self.params_df
        for by in sort_by:
            sorted_params = sorted_params.sort(by)
        indices = sorted_params.index
        
        # 大小
        fig = plt.figure(figsize = (20, 8))
        # 图1位置
        ax = fig.add_subplot(111)
        # 生成图超额收益率的最大回撤
        ax.bar([x+0.3 for x in range(len(indices))],
               [-self.excess_max_drawdown[x] for x in indices], color = '#32CD32',  
                     width = 0.6, label = 'Excess_max_drawdown')
        # 图年化超额收益
        ax.bar([x for x in range(len(indices))],
               [self.excess_annual_return[x] for x in indices], color = 'r', 
                     width = 0.6, label = 'Excess_annual_return')
        plt.xticks([x+0.3 for x in range(len(indices))], indices)
        # 设置图例样式
        ax.legend(loc='best',fontsize=15)
        # 基准线
        plt.plot([0, len(indices)], [0, 0], c='k', 
                 linestyle='--', label='zero')
        # 设置图例样式
        ax.legend(loc='best',fontsize=15)
        # 设置y标签样式
        ax.set_ylabel('Max_drawdown', fontsize=15)
        # 设置x标签样式
        ax.set_yticklabels([str(x*100)+'% 'for x in ax.get_yticks()])
        # 设置图片标题样式
        ax.set_title("Strategy's performances of different quantile", fontsize=15)
        #   设定x轴长度
        plt.xlim(0, len(indices))
def group_backtest(start_date,end_date,num):
    warnings.filterwarnings("ignore")
    pa = parameter_analysis()
    pa.get_backtest_data(file_name = 'results.pkl',
                          running_max = 10,
                          algorithm_id = 'df3c8774e33e3f94ad068574276d94a3',
                          start_date=start_date,
                          end_date=end_date,
                          frequency = 'day',
                          initial_cash = '10000000',
                          param_names = ['num'],
                          param_values = [num]                     
                          )
start_date = '2013-01-01' 
end_date = '2018-01-01' 
num = range(1,6)
group_backtest(start_date,end_date,num)
【已完成|运行中|待运行】: [0|0|5]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [0|5|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [1|4|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [2|3|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [3|2|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. [4|1|0]. 
【回测完成】总用时:1788秒(即0.5小时)。

2.3.1 分层回测策略模型收益指标¶

pa = parameter_analysis()
pa.read_backtest_data('results.pkl')
pa.evaluations_df
num __version algorithm_return algorithm_volatility alpha annual_algo_return annual_bm_return avg_excess_return benchmark_return benchmark_volatility ... excess_return_sharpe information max_drawdown max_drawdown_period max_leverage period_label sharpe sortino trading_days treasury_return
0 1 101 1.932663 0.2481609 0.1608663 0.2477985 0.1012095 0.0005597281 0.5976734 0.2445952 ... 0.5400897 0.8510752 0.4187934 [2015-06-12, 2015-09-15] 0 2017-12 0.837354 0.9641156 1215 0.1994521
1 2 101 1.168112 0.233569 0.08594064 0.1726073 0.1012095 0.0002979421 0.5976734 0.2445952 ... 0.1628063 0.4691913 0.480852 [2015-06-12, 2016-01-28] 0 2017-12 0.5677436 0.6160955 1215 0.1994521
2 3 101 1.178165 0.2761494 0.07824558 0.1737241 0.1012095 0.0003107313 0.5976734 0.2445952 ... 0.1554558 0.4362111 0.493236 [2015-06-12, 2015-09-15] 0 2017-12 0.4842454 0.5400217 1215 0.1994521
3 4 101 0.8102447 0.2893782 0.03235476 0.1298801 0.1012095 0.00016554 0.5976734 0.2445952 ... -0.07909306 0.1625373 0.5212983 [2015-06-12, 2017-12-25] 0 2017-12 0.3105974 0.3480259 1215 0.1994521
4 5 101 0.2513894 0.3099486 -0.05167179 0.04722403 0.1012095 -0.0001188837 0.5976734 0.2445952 ... -0.4408885 -0.2673853 0.6343049 [2015-06-12, 2017-12-05] 0 2017-12 0.0233072 0.02718096 1215 0.1994521

5 rows × 24 columns

2.3.2 分层回测净值¶

为了进一步更直观的对 5 个组合进行分析,绘制了 5 个组合及 HS300 基准的净值收益曲线,具体下图所示。

pa.plot_returns()

由图可以看出,组合 1 能够明显跑赢组合 5,且每个组合都能够跑赢 HS300 指数,且组合 1 能够吗明显获得更高的收益。可见符合单因子有效性的检验,即证明反转因子是有效的。

2.3.3 模型策略组合回测分析表¶

pa.get_eval4_bar()
pa.get_profit_year()
2013 2014 2015 2016 2017
0 0.2506491 0.4726465 0.8775374 0.001742926 -0.1533904
1 0.1832616 0.3692289 0.7419862 -0.1218846 -0.1251602
2 0.1652528 0.4097099 0.7372976 -0.1098825 -0.1425292
3 0.1718918 0.4142254 0.6589003 -0.1714063 -0.2053621
4 0.1407754 0.3323591 0.5827961 -0.2567138 -0.3001741

从 5 组的具体绩效分析来看,年化收益率以及夏普比率基本呈现出单调的走势,组合 1 的效果远远优于组合 5,且最大回撤也体现出组合 1 的风险控制能力更强。从各年的收益来看,组合 1 至组合 5 每一年基本上也呈现出单调的走势,可见因子在每一年都具有较好的选股效果,体现出因子有效性的稳定性。

2.3.4 多空组合净值¶

从分层组合回测净值曲线图来看,每个组合波动性较大,策略存在较大的风险,因此考虑建立多空组合。多空组合是买入组合 1、卖空组合 5 (月度调仓)的一个资产组合,为了方便统计,多空组合每日收益率为(组合 1 每日收益率 - 组合 5 每日收益率)/2,然后获得多空组合的净值收益曲线。

long_short = pa.plot_long_short()
def MaxDrawdown(return_list):
    '''最大回撤率'''
    i = np.argmax((np.maximum.accumulate(return_list) - return_list) / np.maximum.accumulate(return_list))  # 结束位置
    if i == 0:
        return 0
    j = np.argmax(return_list[:i])  # 开始位置
    return (return_list[j] - return_list[i]) / (return_list[j])

def cal_indictor(long_short):
    total_return = long_short[-1] / long_short[0] - 1
    ann_return = pow((1+total_return), 250/float(len(long_short)))-1
    pchg = []
    #计算收益率
    for i in range(1, len(long_short)):
        pchg.append(long_short[i]/long_short[i-1] - 1)
    temp = 0
    for i in pchg:
        temp += pow(i-mean(pchg), 2)
    annualVolatility = sqrt(250/float((len(pchg)-1))*temp)
    sharpe_ratio = (ann_return - 0.04)/annualVolatility
    print "总收益: ", total_return
    print "年化收益: ", ann_return
    print "年化收益波动率: ", annualVolatility
    print "夏普比率: ",sharpe_ratio
    print "最大回撤: ",MaxDrawdown(long_short)
cal_indictor(long_short)
    
总收益:  0.46021377323
年化收益:  0.0809428225227
年化收益波动率:  0.044586380942
夏普比率:  0.918280911292
最大回撤:  0.0812605354762

如图所示,多空组合净值收益曲线明显比任何一个组合的波动性更低,能够获得更为稳定的收益,风险控制效果较好。
综上所述,从分层回测的分析来看,反转因子(M)有效性较强。

3 深入分析¶

3.1 参数 N 的敏感度¶

本文选择过去 N(N=20) 天的数据用于计算反转因子,但是参数 N 的选择对因子有效性的影响仍然不是非常清晰,因此接下来针对参数 N 的不同选择,对因子有效性进行分析。具体分析过程如下所示。

def GetData(N):
    factordata = {}
    for date in dateList:
        stockList = get_stock_A(date)
        df_data = get_price(stockList, count = N+1, end_date=date, frequency='1d', fields=['money','close'])
        Amount = df_data["money"]
        Amount = Amount.iloc[1:]
        Pchg = df_data["close"].pct_change()
        Pchg = Pchg.iloc[1:]  
        trade = Trades.loc[Pchg.index,Pchg.columns]
        SingleAmount = Amount / trade
        result = pd.DataFrame(index = SingleAmount.columns)
        M_high = []
        M_low = []
        for i in SingleAmount.columns:
            temp = SingleAmount.sort([i], ascending = False)
            M_high.append((1+Pchg.loc[temp.index[:10], i]).cumprod()[-1] - 1)
            M_low.append((1+Pchg.loc[temp.index[10:], i]).cumprod()[-1] - 1)
        result["M_high"] = M_high
        result["M_low"] = M_low
        result["reverse"] = -1 *(result["M_high"] - result["M_low"])
        result["ret"] = df_data["close"].iloc[-1] / df_data["close"].iloc[0] - 1
        factordata[date] = result
    return factordata

begin_date = '2013-01-01'
end_date = '2018-01-01'
dateList = get_period_date('M',begin_date, end_date)
Trades = pd.read_csv("trades.csv", index_col = 0)
Trades.columns = [normalize_code(code) for code in Trades.columns]
Trades.index = [datetime.datetime.strptime(str(i), "%Y%m%d") for i in Trades.index]
factorData_40 = GetData(40)
factorData_60 = GetData(60)
import scipy.stats as st
def factor_IC(factorData, begin_date, end_date, rule='normal'):  
    dateList = get_period_date('M', begin_date, end_date)
    IC = {}
    R_T = pd.DataFrame()
    for date in dateList[:-1]:
        #取股票池
        stockList = list(factorData[date].index)
        #获取横截面收益率
        df_close=get_price(stockList, date, dateList[dateList.index(date)+1], 'daily', ['close'])
        if df_close.empty:
            continue
        df_pchg=df_close['close'].iloc[-1,:]/df_close['close'].iloc[0,:]-1
        R_T['pchg']=df_pchg
        #获取因子数据
        factor_data = factorData[date]["reverse"]
        #factor_data = winsorize_med(factor_data, scale=1, inclusive=True, inf2nan=True, axis=0)
        # 行业市值中性化
        #factor_data = neutralize(factor_data, how=['sw_l1', 'market_cap'], date=dateList[0], axis=0)
        #数据标准化
        factor_data = standardlize(factor_data, inf2nan=True, axis=0)
        R_T['factor'] = factor_data
        R_T = R_T.dropna()
        if rule=='normal':
            IC[date]=st.pearsonr(R_T.pchg, R_T['factor'])[0]
        elif rule=='rank':
            IC[date]=st.pearsonr(R_T.pchg.rank(), R_T['factor'].rank())[0]
    IC = pd.Series(IC).dropna()
    return IC.mean()
IC_20 = factor_IC(factorData, begin_date, end_date)
IC_40 = factor_IC(factorData_40, begin_date, end_date)
IC_60 = factor_IC(factorData_60, begin_date, end_date)
print "N = 20 IC 均值: ", IC_20
print "N = 40 IC 均值: ", IC_40
print "N = 60 IC 均值: ", IC_60
N = 20 IC 均值:  0.0492137327837
N = 40 IC 均值:  0.041864465093
N = 60 IC 均值:  0.0305036425806

当 N 分别取 20、40、60 时,IC 分析结果同样效果出色,但是相对而言,随着 N 取值越大,因子有效性不断降低,可见合理选择 N 的取值有利于获取对预测未来收益更为有效性的因子。
根据上述分析结果,当 N = 20 时,因子有效性最高。

3.2 其他样本空间的测试¶

本文进行了理想反转因子在全 A 股的测试,为了进一步证明因子的有效性,在其他样本空间,对该因子进行分析,针对五分组的情况构建多空组合,分析理想反转因子与原始反转因子 Ret20 之间的区别。

begin_date = '2013-01-01'
end_date = '2018-01-01'
dateList = get_period_date('M',begin_date, end_date)
HS300Data = {}
for date in dateList:
    stockList = get_index_stocks('000300.XSHG',date)
    df_data = get_price(stockList, count = 21, end_date=date, frequency='1d', fields=['money','close'])
    Amount = df_data["money"]
    Amount = Amount.iloc[1:]
    Pchg = df_data["close"].pct_change()
    Pchg = Pchg.iloc[1:]  
    trade = Trades.loc[Pchg.index,Pchg.columns]
    SingleAmount = Amount / trade
    result = pd.DataFrame(index = SingleAmount.columns)
    M_high = []
    M_low = []
    for i in SingleAmount.columns:
        temp = SingleAmount.sort([i], ascending = False)
        M_high.append((1+Pchg.loc[temp.index[:10], i]).cumprod()[-1] - 1)
        M_low.append((1+Pchg.loc[temp.index[10:], i]).cumprod()[-1] - 1)
    result["M_high"] = M_high
    result["M_low"] = M_low
    result["reverse"] = -1 *(result["M_high"] - result["M_low"])
    result["ret20"] = -1*(df_data["close"].iloc[-1] / df_data["close"].iloc[0] - 1)
    HS300Data[date] = result
HS300Data[date].head()
M_high M_low reverse ret20
000001.XSHE 0.033571 -0.010335 -0.043905 -0.022889
000002.XSHE 0.080624 -0.064591 -0.145214 -0.010825
000008.XSHE 0.035967 -0.016506 -0.052473 -0.018868
000060.XSHE 0.039267 -0.015695 -0.054961 -0.022956
000063.XSHE 0.083332 0.000093 -0.083239 -0.083433
def GetPchg(Filed):
    pchg = []
    for date in dateList[:-1]:
        tempData = HS300Data[date].sort([Filed], ascending = False)
        top5 = list(tempData.index[:60])
        last5 = list(tempData.index[-60:])
        df_close = get_price(top5, date, dateList[dateList.index(date)+1], 'daily', ['close'])['close']
        top5_pchg = df_close.iloc[-1] / df_close.iloc[0] - 1
        df_close = get_price(last5, date, dateList[dateList.index(date)+1], 'daily', ['close'])['close']
        last5_pchg = df_close.iloc[-1] / df_close.iloc[0] - 1
        pchg.append((mean(top5_pchg) - mean(last5_pchg)) / 2)
    return pchg
pchgReverse = GetPchg('reverse')
pchRet20 = GetPchg('ret20')
   # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(111)
netValue1 = []
netValue1.append(1)
netValue2 = []
netValue2.append(1)
for i in range(len(pchgReverse)):
    netValue1.append(netValue1[i]*(1+pchgReverse[i]))
for i in range(len(pchRet20)):
    netValue2.append(netValue2[i]*(1+pchRet20[i]))
ax.plot(range(len(netValue1)), netValue1)
ax.plot(range(len(netValue2)), netValue2)
ticks = [int(x) for x in np.linspace(0, len(dateList)-1, 11)]
plt.xticks(ticks, [dateList[i] for i in ticks])
# 设置图例样式
ax.legend(loc = 2, fontsize = 10)
ax.set_title("Strategy's long_short performances",fontsize=20)
    
<matplotlib.text.Text at 0x7fbeefb7b550>

原始反转因子 Ret20 与理想反转因子的五分组多空对冲净值走势如下所示。理想反转因子的五分组多空对冲总收益为 47.74%,年化波动 31.38%,夏普比率为 0.314,最大回撤为 9.69%;原始反转因子 Ret20 的五分组多空对冲总收益为 16.22%,年化波动 53.88%,夏普比率为 -0.016,最大回撤为 20.88%。

3.3 因子收益的累积过程¶

在本报告中,因子回测均采用月频调仓,但是更高频率的调仓可能也会有需求,针对该情况,本文进行因子收益的累积计算,分别计算从调仓日开始,第一个交易日至第二十个交易日多空组合的累积收益。

pchg = []
for i in range(0, len(long_short) - 20, 20):
    tempPchg = []
    for j in range(20):
        tempPchg.append(long_short[i+j] / long_short[i] - 1)
    pchg.append(tempPchg)
pchgTotal = []
for i in range(20):
    pchgTotal.append(mean(np.array(pchg)[:,i]))
plt.bar(range(len(pchgTotal)), pchgTotal)  
plt.show()  

如上图所示,展示了 N=20 时理想反转因子在月初建仓后(全市场股票、分五组),多空对冲收益的累积过程。由于收益累积过程比较均匀,我们定性地判断,可以尝试做周频调仓或半月调仓。

3.4 分组比例的影响¶

在本文中,高 D 组与低 D 组的交易日各占回溯交易日的一半,也即 N/2 个。如果调整分组的比例,效果会有多大的区别呢?
接下来以 N=60 为例,将单笔成交金额大的 X 个交易日作为高 D 组,将剩余 60-X 个交易日作为低 D 组,遍历 X 的值,分别计算 M 因子的信息比率(IR)

begin_date = '2013-01-01'
end_date = '2018-01-01'
dateList = get_period_date('M',begin_date, end_date)
Trades = pd.read_csv("trades.csv", index_col = 0)
Trades.columns = [normalize_code(code) for code in Trades.columns]
Trades.index = [datetime.datetime.strptime(str(i), "%Y%m%d") for i in Trades.index]
factorData_60 = {}
for date in dateList:
    stockList = get_stock_A(date)
    df_data = get_price(stockList, count = 61, end_date=date, frequency='1d', fields=['money','close'])
    Amount = df_data["money"]
    Amount = Amount.iloc[1:]
    Pchg = df_data["close"].pct_change()
    Pchg = Pchg.iloc[1:]  
    trade = Trades.loc[Pchg.index,Pchg.columns]
    SingleAmount = Amount / trade
    result = pd.DataFrame(index = SingleAmount.columns)
    for j in range(10,51,4):
        M_high = []
        M_low = []
        for i in SingleAmount.columns:
            temp = SingleAmount.sort([i], ascending = False)
            M_high.append((1+Pchg.loc[temp.index[:j], i]).cumprod()[-1] - 1)
            M_low.append((1+Pchg.loc[temp.index[j:], i]).cumprod()[-1] - 1)
        result["M_high"] = M_high
        result["M_low"] = M_low
        result[j] = -1 *(result["M_high"] - result["M_low"])
    factorData_60[date] = result
import scipy.stats as st
def factor_IC_analysis(factorData, begin_date, end_date):  
    dateList = get_period_date('M', begin_date, end_date)
    R_T = pd.DataFrame()
    result = []
    for date in dateList[:-1]:
        #取股票池
        stockList = list(factorData[date].index)
        #获取横截面收益率
        df_close=get_price(stockList, date, dateList[dateList.index(date)+1], 'daily', ['close'])
        if df_close.empty:
            continue
        df_pchg=df_close['close'].iloc[-1,:]/df_close['close'].iloc[0,:]-1
        R_T['pchg']=df_pchg
        IC = []
        for j in range(10,51,4):
            #获取因子数据
            factor_data = factorData[date][j]
            #数据标准化
            factor_data = standardlize(factor_data, inf2nan=True, axis=0)
            R_T['factor'] = factor_data
            R_T = R_T.dropna()
            IC.append(st.pearsonr(R_T.pchg, R_T['factor'])[0])
        result.append(IC)
    return result
result = factor_IC_analysis(factorData_60, begin_date, end_date)
xtick = range(10,51,4)
IC = []
for j in range(len(xtick)):
    IC.append(np.mean(np.array(result)[:,j]))
plt.plot(xtick, IC)
[<matplotlib.lines.Line2D at 0x7fbef7ef1350>]

结果如上图所示。从图中可以发现,当 X 取值在 25 - 50 之间时,均能够取得较好的结果,当 X 为 42 时,IC 值能够获得最大值。

总结¶

以上我们对理想反转因子进行了有效性分析的具体测试,初步得到以下几个结论:
(1)对理想反转因子进行单因子有效性分析,根据因子收益率显著性检验结果,t 值绝对值序列的均值为 3.96,因子 IC 分析结果为 IC 序列均值为 0.0492,IR 值为 0.54,分层回测结果如下:组合 1 能够明显跑赢组合 5,且每个组合都能够跑赢 HS300 指数,且组合 1 能够吗明显获得更高的收益。
(2)对理想反转因子进行深入分析,当 N 分别取 20、40、60 时,IC 分析结果同样效果出色,但是相对而言,随着 N 取值越大,因子有效性不断降低,可见合理选择 N 的取值有利于获取对预测未来收益更为有效性的因子。
(3)针对 HS300 股票池,理想反转因子的五分组多空对冲总收益为 47.74%,年化波动 31.38%,夏普比率为 0.314,最大回撤为 9.69%;原始反转因子 Ret20 的五分组多空对冲总收益为 16.22%,年化波动 53.88%,夏普比率为 -0.016,最大回撤为 20.88%。
(4)由于多空组合收益累积过程比较均匀,因此可以尝试做周频调仓或半月调仓。针对 N=60 的情况,当高 D 组的分组比例 X 取值在 25 - 50 之间时,均能够取得较好的结果,当 X 为 42 时,IC 值能够获得最大值。
(5)为挖掘订单簿信息提供了一定思路,给高频交易数据的挖掘提供一定参考价值。

分享到:
举报财经168客户端下载

全部回复

0/140

投稿 您想发表你的观点和看法?

更多人气分析师

  • 张亦巧

    人气2192文章4145粉丝45

    暂无个人简介信息

  • 王启蒙现货黄金

    人气296文章3215粉丝8

    本人做分析师以来,并专注于贵金属投资市场,尤其是在现货黄金...

  • 指导老师

    人气1864文章4423粉丝52

    暂无个人简介信息

  • 李冉晴

    人气2320文章3821粉丝34

    李冉晴,专业现贷实盘分析师。

  • 梁孟梵

    人气2176文章3177粉丝39

    qq:2294906466 了解群指导添加微信mfmacd

  • 张迎妤

    人气1896文章3305粉丝34

    个人专注于行情技术分析,消息面解读剖析,给予您第一时间方向...

  • 金泰铬J

    人气2328文章3925粉丝51

    投资问答解咨询金泰铬V/信tgtg67即可获取每日的实时资讯、行情...

  • 金算盘

    人气2696文章7761粉丝125

    高级分析师,混过名校,厮杀于股市和期货、证券市场多年,专注...

  • 金帝财神

    人气4760文章8329粉丝119

    本文由资深分析师金帝财神微信:934295330,指导黄金,白银,...

FX168财经

FX168财经学院

FX168财经

FX168北美