研究目的:
本文参考广发证券研报《从最大化复合因子单期 IC 角度看因子权重》,根据研报分析,现阶段应用较多的因子加权方法主要有以下几种: 等权加权、 IC 加权和 IC_IR 加权、以及最优化 IC_IR 加权。其中,等权加权是因子加权最传统的方法,这种方法受因子之间有效性差异、线性相关性影响明显。而 IC 加权、 IC_IR 加权对等权方式忽视了因子有效性差异的问题进行了改进,在大部分情况下会优于等权加权形式。最大化复合因子 IC_IR 加权已运用较广。
研究内容:
(1)传统因子加权方式的局限性: 选择 ZZ800 为股票池,以市值因子和营业利润同比增长率为例,分析等权加权与 IC 加权的差异,根据回测结果分析两种因子加权方式的效用;
(2)设计最大化复合因子单期 IC 的理论最优比例: 本文沿用 Qian 的最优化体系获取因子权重,与之不同的是,我们将优化目标由最大化复合因子 IC_IR 变为最大化复合因子单期 IC。理论解析解的形式表明,最大化复合因子单期 IC 的权重与两方面因素有关: 一是因子的有效性,即因子 IC; 二是因子之间的相关系数。
(3)最大化复合因子单期 IC 的应用: 本文通过例子实证研究发现,最大化单期 IC 能有效解决“等权”的配置偏差问题,在绝大部分因子空间,最优 IC 加权 所构建的组合,其表现均优于按照“等权”方式所构建的组合。
研究结论:
(1)通过对市值因子与营业利润同比增长率为例进行分析,IC 加权对等权方式忽视了因子有效性差异的问题进行了改进,在大部分情况下会优于等权加权形式。
(2)本文沿用 Qian 的最优化体系获取因子权重,与之不同的是,我们将优化目标由最大化复合因子 IR 变为最大化复合因子单期 IC,并根据该方法进行因子权重的计算。理论解析解的形式表明,最大化复合因子单期 IC 的权重与两方面因素有关: 一是因子的有效性,即因子 IC; 二是因子之间的相关系数。
(3)通过以下 7 个因子: 市盈率(PB)、市净率(PE)、市销率(PS)、营业利润同比增长率、资产负债率、反转(前 1 月累计收益)、换手率(前 10 个交易日日均换手率),进行不同因子加权方法的测试。文章实证结果也表明,最大化单期 IC 能有效解决“等权”的配置偏差问题,在绝大部分情况,最优 IC 加权所构建的组合,其表现均优于“等权”方式 所构建的组合,最大化单期 IC 能够获得最佳的结果。
在每个月的月末对因子数据进行提取,因此需要对每个月的月末日期进行统计。
输入参数分别为 peroid、start_date 和 end_date,其中 peroid 进行周期选择,可选周期为周(W)、月(M)和季(Q),start_date和end_date 分别为开始日期和结束日期。
函数返回值为对应的月末日期,如选取开始日期为 2017.1.1,结束日期为 2018.1.1。
from jqdata import *
import datetime
import pandas as pd
import numpy as np
from six import StringIO
import warnings
import time
import pickle
from jqfactor import standardlize
from jqfactor import winsorize_med
from jqfactor import neutralize
import matplotlib.pyplot as plt
import scipy.stats as st
warnings.filterwarnings("ignore")
plt.rcParams['axes.unicode_minus']=False
#获取指定周期的日期列表 'W、M、Q'
def get_period_date(peroid,start_date, end_date):
#设定转换周期period_type 转换为周是'W',月'M',季度线'Q',五分钟'5min',12天'12D'
stock_data = get_price('000001.XSHE',start_date,end_date,'daily',fields=['close'])
#记录每个周期中最后一个交易日
stock_data['date']=stock_data.index
#进行转换,周线的每个变量都等于那一周中最后一个交易日的变量值
period_stock_data=stock_data.resample(peroid,how='last')
date=period_stock_data.index
pydate_array = date.to_pydatetime()
date_only_array = np.vectorize(lambda s: s.strftime('%Y-%m-%d'))(pydate_array )
date_only_series = pd.Series(date_only_array)
start_date = datetime.datetime.strptime(start_date, "%Y-%m-%d")
start_date=start_date-datetime.timedelta(days=1)
start_date = start_date.strftime("%Y-%m-%d")
date_list=date_only_series.values.tolist()
date_list.insert(0,start_date)
return date_list
get_period_date('M','2018-01-01', '2019-01-01')
['2017-12-31', '2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30', '2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31', '2018-09-30', '2018-10-31', '2018-11-30', '2018-12-31']
股票池: ZZ800
股票筛选: 剔除 ST 股票,剔除上市 3 个月内的股票,每只股票视作一个样本
以 ZZ800 为例,取 2016-08-31 当天的股票成分股
#去除上市距beginDate不足3个月的股票
def delect_stop(stocks,beginDate,n=30*3):
stockList = []
beginDate = datetime.datetime.strptime(beginDate, "%Y-%m-%d")
for stock in stocks:
start_date = get_security_info(stock).start_date
if start_date < (beginDate-datetime.timedelta(days = n)).date():
stockList.append(stock)
return stockList
#获取股票池
def get_stock_ZZ800(begin_date):
begin_date = str(begin_date)
stockList = get_index_stocks('000906.XSHG')
#剔除ST股
st_data = get_extras('is_st', stockList, count = 1, end_date=begin_date)
stockList = [stock for stock in stockList if not st_data[stock][0]]
#剔除停牌、新股及退市股票
stockList = delect_stop(stockList, begin_date)
return stockList
get_stock_ZZ800('2016-08-31')
[u'000001.XSHE', u'000002.XSHE', u'000006.XSHE', u'000008.XSHE', u'000009.XSHE', u'000012.XSHE', u'000021.XSHE', u'000025.XSHE', u'000027.XSHE', u'000028.XSHE', u'000031.XSHE', u'000039.XSHE', u'000060.XSHE', u'000061.XSHE', u'000062.XSHE', u'000063.XSHE', u'000066.XSHE', u'000069.XSHE', u'000078.XSHE', u'000089.XSHE', u'000090.XSHE', u'000100.XSHE', u'000156.XSHE', u'000157.XSHE', u'000158.XSHE', u'000166.XSHE', u'000301.XSHE', u'000333.XSHE', u'000338.XSHE', u'000400.XSHE', u'000401.XSHE', u'000402.XSHE', u'000413.XSHE', u'000415.XSHE', u'000423.XSHE', u'000425.XSHE', u'000426.XSHE', u'000488.XSHE', u'000501.XSHE', u'000513.XSHE', u'000519.XSHE', u'000528.XSHE', u'000536.XSHE', u'000537.XSHE', u'000538.XSHE', u'000541.XSHE', u'000543.XSHE', u'000547.XSHE', u'000552.XSHE', u'000553.XSHE', u'000559.XSHE', u'000563.XSHE', u'000564.XSHE', u'000568.XSHE', u'000581.XSHE', u'000587.XSHE', u'000596.XSHE', u'000598.XSHE', u'000600.XSHE', u'000623.XSHE', u'000625.XSHE', u'000627.XSHE', u'000630.XSHE', u'000636.XSHE', u'000651.XSHE', u'000656.XSHE', u'000661.XSHE', u'000671.XSHE', u'000681.XSHE', u'000685.XSHE', u'000686.XSHE', u'000690.XSHE', u'000703.XSHE', u'000709.XSHE', u'000712.XSHE', u'000718.XSHE', u'000723.XSHE', u'000725.XSHE', u'000727.XSHE', u'000728.XSHE', u'000729.XSHE', u'000732.XSHE', u'000738.XSHE', u'000750.XSHE', u'000758.XSHE', u'000761.XSHE', u'000766.XSHE', u'000768.XSHE', u'000776.XSHE', u'000778.XSHE', u'000783.XSHE', u'000786.XSHE', u'000807.XSHE', u'000813.XSHE', u'000826.XSHE', u'000829.XSHE', u'000830.XSHE', u'000848.XSHE', u'000858.XSHE', u'000860.XSHE', u'000869.XSHE', u'000876.XSHE', u'000877.XSHE', u'000878.XSHE', u'000883.XSHE', u'000887.XSHE', u'000895.XSHE', u'000898.XSHE', u'000926.XSHE', u'000930.XSHE', u'000932.XSHE', u'000937.XSHE', u'000938.XSHE', u'000959.XSHE', u'000960.XSHE', u'000961.XSHE', u'000963.XSHE', u'000970.XSHE', u'000975.XSHE', u'000980.XSHE', u'000983.XSHE', u'000987.XSHE', u'000988.XSHE', u'000990.XSHE', u'000997.XSHE', u'000998.XSHE', u'000999.XSHE', u'001979.XSHE', u'002001.XSHE', u'002002.XSHE', u'002004.XSHE', u'002007.XSHE', u'002008.XSHE', u'002010.XSHE', u'002019.XSHE', u'002024.XSHE', u'002027.XSHE', u'002028.XSHE', u'002030.XSHE', u'002032.XSHE', u'002038.XSHE', u'002044.XSHE', u'002048.XSHE', u'002049.XSHE', u'002050.XSHE', u'002051.XSHE', u'002056.XSHE', u'002064.XSHE', u'002065.XSHE', u'002074.XSHE', u'002075.XSHE', u'002078.XSHE', u'002081.XSHE', u'002085.XSHE', u'002092.XSHE', u'002093.XSHE', u'002110.XSHE', u'002118.XSHE', u'002120.XSHE', u'002127.XSHE', u'002128.XSHE', u'002129.XSHE', u'002131.XSHE', u'002142.XSHE', u'002146.XSHE', u'002152.XSHE', u'002153.XSHE', u'002155.XSHE', u'002157.XSHE', u'002174.XSHE', u'002176.XSHE', u'002179.XSHE', u'002183.XSHE', u'002191.XSHE', u'002195.XSHE', u'002202.XSHE', u'002212.XSHE', u'002217.XSHE', u'002221.XSHE', u'002223.XSHE', u'002230.XSHE', u'002233.XSHE', u'002236.XSHE', u'002241.XSHE', u'002242.XSHE', u'002244.XSHE', u'002249.XSHE', u'002250.XSHE', u'002251.XSHE', u'002252.XSHE', u'002254.XSHE', u'002266.XSHE', u'002268.XSHE', u'002271.XSHE', u'002273.XSHE', u'002280.XSHE', u'002281.XSHE', u'002285.XSHE', u'002294.XSHE', u'002299.XSHE', u'002302.XSHE', u'002304.XSHE', u'002310.XSHE', u'002311.XSHE', u'002317.XSHE', u'002332.XSHE', u'002340.XSHE', u'002344.XSHE', u'002352.XSHE', u'002353.XSHE', u'002358.XSHE', u'002366.XSHE', u'002368.XSHE', u'002371.XSHE', u'002372.XSHE', u'002373.XSHE', u'002375.XSHE', u'002382.XSHE', u'002384.XSHE', u'002385.XSHE', u'002387.XSHE', u'002390.XSHE', u'002399.XSHE', u'002405.XSHE', u'002407.XSHE', u'002408.XSHE', u'002410.XSHE', u'002411.XSHE', u'002414.XSHE', u'002415.XSHE', u'002416.XSHE', u'002419.XSHE', u'002422.XSHE', u'002424.XSHE', u'002426.XSHE', u'002431.XSHE', u'002434.XSHE', u'002437.XSHE', u'002439.XSHE', u'002440.XSHE', u'002444.XSHE', u'002456.XSHE', u'002460.XSHE', u'002463.XSHE', u'002465.XSHE', u'002466.XSHE', u'002468.XSHE', u'002470.XSHE', u'002475.XSHE', u'002482.XSHE', u'002489.XSHE', u'002491.XSHE', u'002493.XSHE', u'002500.XSHE', u'002503.XSHE', u'002505.XSHE', u'002506.XSHE', u'002507.XSHE', u'002508.XSHE', u'002509.XSHE', u'002512.XSHE', u'002517.XSHE', u'002544.XSHE', u'002555.XSHE', u'002558.XSHE', u'002572.XSHE', u'002573.XSHE', u'002583.XSHE', u'002589.XSHE', u'002594.XSHE', u'002601.XSHE', u'002602.XSHE', u'002603.XSHE', u'002607.XSHE', u'002624.XSHE', u'002625.XSHE', u'002635.XSHE', u'002640.XSHE', u'002665.XSHE', u'002670.XSHE', u'002672.XSHE', u'002673.XSHE', u'002681.XSHE', u'002690.XSHE', u'002699.XSHE', u'002701.XSHE', u'002707.XSHE', u'002709.XSHE', u'002714.XSHE', u'002736.XSHE', u'002739.XSHE', u'002745.XSHE', u'002773.XSHE', u'002797.XSHE', u'300001.XSHE', u'300002.XSHE', u'300003.XSHE', u'300009.XSHE', u'300010.XSHE', u'300015.XSHE', u'300017.XSHE', u'300024.XSHE', u'300026.XSHE', u'300027.XSHE', u'300033.XSHE', u'300055.XSHE', u'300058.XSHE', u'300059.XSHE', u'300070.XSHE', u'300072.XSHE', u'300088.XSHE', u'300113.XSHE', u'300115.XSHE', u'300122.XSHE', u'300124.XSHE', u'300133.XSHE', u'300134.XSHE', u'300136.XSHE', u'300142.XSHE', u'300144.XSHE', u'300159.XSHE', u'300166.XSHE', u'300168.XSHE', u'300182.XSHE', u'300197.XSHE', u'300199.XSHE', u'300207.XSHE', u'300244.XSHE', u'300251.XSHE', u'300253.XSHE', u'300257.XSHE', u'300266.XSHE', u'300274.XSHE', u'300287.XSHE', u'300296.XSHE', u'300297.XSHE', u'300308.XSHE', u'300315.XSHE', u'300316.XSHE', u'300324.XSHE', u'300376.XSHE', u'300383.XSHE', u'300408.XSHE', u'300413.XSHE', u'300418.XSHE', u'300433.XSHE', u'300450.XSHE', u'300459.XSHE', u'300498.XSHE', u'600000.XSHG', u'600004.XSHG', u'600006.XSHG', u'600008.XSHG', u'600009.XSHG', u'600010.XSHG', u'600011.XSHG', u'600015.XSHG', u'600016.XSHG', u'600017.XSHG', u'600018.XSHG', u'600019.XSHG', u'600021.XSHG', u'600022.XSHG', u'600023.XSHG', u'600026.XSHG', u'600027.XSHG', u'600028.XSHG', u'600029.XSHG', u'600030.XSHG', u'600031.XSHG', u'600036.XSHG', u'600037.XSHG', u'600038.XSHG', u'600039.XSHG', u'600048.XSHG', u'600050.XSHG', u'600053.XSHG', u'600056.XSHG', u'600058.XSHG', u'600060.XSHG', u'600061.XSHG', u'600062.XSHG', u'600064.XSHG', u'600066.XSHG', u'600068.XSHG', u'600073.XSHG', u'600079.XSHG', u'600085.XSHG', u'600089.XSHG', u'600094.XSHG', u'600098.XSHG', u'600100.XSHG', u'600104.XSHG', u'600109.XSHG', u'600111.XSHG', u'600115.XSHG', u'600118.XSHG', u'600120.XSHG', u'600125.XSHG', u'600126.XSHG', u'600138.XSHG', u'600141.XSHG', u'600143.XSHG', u'600151.XSHG', u'600153.XSHG', u'600155.XSHG', u'600158.XSHG', u'600160.XSHG', u'600161.XSHG', u'600166.XSHG', u'600167.XSHG', u'600169.XSHG', u'600170.XSHG', u'600171.XSHG', u'600176.XSHG', u'600177.XSHG', u'600183.XSHG', u'600188.XSHG', u'600195.XSHG', u'600196.XSHG', u'600201.XSHG', u'600208.XSHG', u'600216.XSHG', u'600219.XSHG', u'600221.XSHG', u'600233.XSHG', u'600258.XSHG', u'600259.XSHG', u'600260.XSHG', u'600266.XSHG', u'600267.XSHG', u'600271.XSHG', u'600276.XSHG', u'600277.XSHG', u'600280.XSHG', u'600282.XSHG', u'600291.XSHG', u'600297.XSHG', u'600298.XSHG', u'600299.XSHG', u'600307.XSHG', u'600309.XSHG', u'600312.XSHG', u'600315.XSHG', u'600316.XSHG', u'600317.XSHG', u'600325.XSHG', u'600329.XSHG', u'600332.XSHG', u'600335.XSHG', u'600338.XSHG', u'600340.XSHG', u'600348.XSHG', u'600350.XSHG', u'600352.XSHG', u'600362.XSHG', u'600369.XSHG', u'600372.XSHG', u'600373.XSHG', u'600376.XSHG', u'600380.XSHG', u'600383.XSHG', u'600388.XSHG', u'600392.XSHG', u'600393.XSHG', u'600398.XSHG', u'600406.XSHG', u'600409.XSHG', u'600410.XSHG', u'600415.XSHG', u'600416.XSHG', u'600418.XSHG', u'600426.XSHG', u'600428.XSHG', u'600435.XSHG', u'600436.XSHG', u'600438.XSHG', u'600458.XSHG', u'600460.XSHG', u'600466.XSHG', u'600478.XSHG', u'600482.XSHG', u'600486.XSHG', u'600487.XSHG', u'600489.XSHG', u'600497.XSHG', u'600498.XSHG', u'600499.XSHG', u'600500.XSHG', u'600507.XSHG', u'600511.XSHG', u'600515.XSHG', u'600516.XSHG', u'600519.XSHG', u'600521.XSHG', u'600522.XSHG', u'600525.XSHG', u'600528.XSHG', u'600535.XSHG', u'600536.XSHG', u'600545.XSHG', u'600547.XSHG', u'600549.XSHG', u'600557.XSHG', u'600563.XSHG', u'600565.XSHG', u'600566.XSHG', u'600567.XSHG', u'600570.XSHG', u'600572.XSHG', u'600575.XSHG', u'600580.XSHG', u'600582.XSHG', u'600583.XSHG', u'600584.XSHG', u'600585.XSHG', u'600588.XSHG', u'600597.XSHG', u'600598.XSHG', u'600606.XSHG', u'600611.XSHG', u'600623.XSHG', u'600633.XSHG', u'600637.XSHG', u'600639.XSHG', u'600640.XSHG', u'600642.XSHG', u'600643.XSHG', u'600645.XSHG', u'600648.XSHG', u'600649.XSHG', u'600657.XSHG', u'600660.XSHG', u'600663.XSHG', u'600664.XSHG', u'600673.XSHG', u'600674.XSHG', u'600688.XSHG', u'600690.XSHG', u'600694.XSHG', u'600699.XSHG', u'600703.XSHG', u'600704.XSHG', u'600705.XSHG', u'600707.XSHG', u'600717.XSHG', u'600718.XSHG', u'600729.XSHG', u'600733.XSHG', u'600737.XSHG', u'600739.XSHG', u'600741.XSHG', u'600745.XSHG', u'600748.XSHG', u'600750.XSHG', u'600751.XSHG', u'600754.XSHG', u'600755.XSHG', u'600757.XSHG', u'600759.XSHG', u'600763.XSHG', u'600765.XSHG', u'600770.XSHG', u'600777.XSHG', u'600779.XSHG', u'600782.XSHG', u'600787.XSHG', u'600795.XSHG', u'600801.XSHG', u'600804.XSHG', u'600808.XSHG', u'600809.XSHG', u'600811.XSHG', u'600816.XSHG', u'600820.XSHG', u'600823.XSHG', u'600827.XSHG', u'600835.XSHG', u'600837.XSHG', u'600839.XSHG', u'600845.XSHG', u'600848.XSHG', u'600859.XSHG', u'600862.XSHG', u'600863.XSHG', u'600867.XSHG', u'600869.XSHG', u'600872.XSHG', u'600874.XSHG', u'600875.XSHG', u'600879.XSHG', u'600881.XSHG', u'600884.XSHG', u'600885.XSHG', u'600886.XSHG', u'600887.XSHG', u'600893.XSHG', u'600895.XSHG', u'600900.XSHG', u'600917.XSHG', u'600958.XSHG', u'600959.XSHG', u'600967.XSHG', u'600970.XSHG', u'600971.XSHG', u'600978.XSHG', u'600985.XSHG', u'600993.XSHG', u'600998.XSHG', u'600999.XSHG', u'601000.XSHG', u'601001.XSHG', u'601003.XSHG', u'601005.XSHG', u'601006.XSHG', u'601009.XSHG', u'601012.XSHG', u'601016.XSHG', u'601018.XSHG', u'601021.XSHG', u'601088.XSHG', u'601098.XSHG', u'601100.XSHG', u'601106.XSHG', u'601111.XSHG', u'601117.XSHG', u'601118.XSHG', u'601139.XSHG', u'601155.XSHG', u'601166.XSHG', u'601168.XSHG', u'601169.XSHG', u'601179.XSHG', u'601186.XSHG', u'601198.XSHG', u'601211.XSHG', u'601216.XSHG', u'601225.XSHG', u'601231.XSHG', u'601238.XSHG', u'601288.XSHG', u'601311.XSHG', u'601318.XSHG', u'601328.XSHG', u'601333.XSHG', u'601336.XSHG', u'601360.XSHG', u'601377.XSHG', u'601390.XSHG', u'601398.XSHG', u'601555.XSHG', u'601600.XSHG', u'601601.XSHG', u'601607.XSHG', u'601608.XSHG', u'601618.XSHG', u'601628.XSHG', u'601633.XSHG', u'601668.XSHG', u'601669.XSHG', u'601678.XSHG', u'601688.XSHG', u'601689.XSHG', u'601699.XSHG', u'601717.XSHG', u'601718.XSHG', u'601727.XSHG', u'601766.XSHG', u'601777.XSHG', u'601788.XSHG', u'601800.XSHG', u'601801.XSHG', u'601808.XSHG', u'601818.XSHG', u'601857.XSHG', u'601866.XSHG', u'601872.XSHG', u'601877.XSHG', u'601880.XSHG', u'601888.XSHG', u'601898.XSHG', u'601899.XSHG', u'601901.XSHG', u'601919.XSHG', u'601928.XSHG', u'601933.XSHG', u'601939.XSHG', u'601958.XSHG', u'601969.XSHG', u'601985.XSHG', u'601988.XSHG', u'601989.XSHG', u'601992.XSHG', u'601998.XSHG', u'603000.XSHG', u'603019.XSHG', u'603025.XSHG', u'603077.XSHG', u'603198.XSHG', u'603288.XSHG', u'603328.XSHG', u'603355.XSHG', u'603369.XSHG', u'603377.XSHG', u'603568.XSHG', u'603766.XSHG', u'603799.XSHG', u'603806.XSHG', u'603866.XSHG', u'603868.XSHG', u'603883.XSHG', u'603885.XSHG', u'603939.XSHG', u'603993.XSHG']
本章旨在分析因子等权加权回测效果,股票选为 ZZ800,回测时间为 2013.1.1 至 2018.1.1,因子选定为市值和营业利润同比增长率,在每个月最后一个自然日,获取当前最新的因子数据以及对应的股票超额收益。数据具体获取当时如下代码所示。
start = time.clock()
begin_date = '2013-01-01'
end_date = '2018-01-01'
dateList = get_period_date('M',begin_date, end_date)
factor = {}
for date in dateList[:-1]:
stockList = get_stock_ZZ800(date)
df_data= get_fundamentals(query(valuation.code, valuation.market_cap, indicator.inc_operation_profit_year_on_year).filter(valuation.code.in_(stockList)), date = date)
df_data = df_data.set_index(['code'])
df_data['market_cap'] = -1 * df_data['market_cap']
# 去极值
df_data = winsorize_med(df_data, scale=1, inclusive=True, inf2nan=True, axis=0)
# 数据标准化
df_data = standardlize(df_data, inf2nan=True, axis=0)
df_data['total_55'] = 0.5 * df_data['market_cap'] + 0.5 * df_data['inc_operation_profit_year_on_year']
df_data['total_64'] = 0.6 * df_data['market_cap'] + 0.4 * df_data['inc_operation_profit_year_on_year']
# 获取横截面收益率
df_close = get_price(stockList, date, dateList[dateList.index(date)+1], 'daily', ['close'])
if df_close.empty:
continue
df_pchg = df_close['close'].iloc[-1,:]/df_close['close'].iloc[0,:]-1
# 上证指数收益率
index_close = get_price('000906.XSHG', date, dateList[dateList.index(date)+1], 'daily', ['close'])
index_pchg = index_close['close'].iloc[-1]/index_close['close'].iloc[0]-1
df_data['excess_pchg'] = df_pchg - index_pchg
factor[date] = df_data
end = time.clock()
print "time: ", end-start
time: 25.545685
考虑一个包含市值、营业利润同比增长率的两因子模型,本章分别基于两种加权方式计算复合因子值,然后选择复合因子值最高的 100 只股票构建组合。其中,组合 1 为等权组合,即市值和营业利润同比增长率按照等权的方式加总为复合因子; 组合 2 为 60/40 组合,即市值和营业利润同比增长率因子的加权比例分别为 60%、40%。
股票池: ZZ800,剔除 ST 股票以及上市 3 个月内的股票
对比基准: ZZ800
交易费用: 千分之 1.5
调仓周期: 月
# 净值计算
def NetValue(temPchg):
# 费率
cost = 0.0015
netValue = []
netValue.append(1)
for i in range(len(temPchg)):
netValue.append(netValue[i]*(1+temPchg[i] - cost))
netValue.pop(0)
return netValue
# 净值曲线绘制
def plotNetValue(dates, netValue):
# 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(111)
xticks = range(len(netValue[netValue.keys()[0]]))
for key in netValue.keys():
ax.plot(xticks, netValue[key], label=key)
for i in range(len(xticks)):
xticks[i] = xticks[i] + 0.1
# 设置图例样式
ax.legend(loc = 2, fontsize = 10)
# 设置y标签样式
ax.set_ylabel('NetValue',fontsize=20)
ax.set_title("Strategy's NetValue Performances", fontsize=20)
plt.show()
# 月度收益柱状图绘制
def plotBar(dates, pchg, label):
# 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;
fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(111)
xticks = range(len(pchg))
ax.bar(xticks, pchg, label = label)
plt.xticks([x+0.3 for x in xticks], xticks)
ticks = [int(x) for x in np.linspace(0, len(dates)-1, 11)]
plt.xticks(ticks, [dates[i] for i in ticks])
# 设置图例样式
ax.legend(loc='best',fontsize=15)
# 设置y标签样式
ax.set_ylabel('Algorithm_return', fontsize=15)
# 设置图片标题样式
ax.set_title("月度收益柱状图", fontsize=15)
plt.show()
def MaxDrawdown(return_list):
'''最大回撤率'''
i = np.argmax((np.maximum.accumulate(return_list) - return_list) / np.maximum.accumulate(return_list)) # 结束位置
if i == 0:
return 0
j = np.argmax(return_list[:i]) # 开始位置
return (return_list[j] - return_list[i]) / (return_list[j])
# 收益风险统计
def cal_indictor(netValue):
result = {}
total_return = netValue[-1] / netValue[0] - 1
ann_return = pow((1+total_return), 240/float(len(netValue) * 20))-1
pchg = []
#计算收益率
for i in range(1, len(netValue)):
pchg.append(netValue[i]/netValue[i-1] - 1)
temp = 0
for i in pchg:
temp += pow(i-mean(pchg), 2)
annualVolatility = sqrt(240/float((len(pchg) * 20-1))*temp)
sharpe_ratio = (ann_return - 0.04)/annualVolatility
result['TotalReturn'] = total_return
result['AnnReturn'] = ann_return
result['AnnVol'] = annualVolatility
result['SR'] = sharpe_ratio
result['MaxDown'] = MaxDrawdown(netValue)
return result
pchg_55 = []
pchg_64 = []
tradingDays = sort(list(factor.keys()))
for date in tradingDays:
temp = factor[date]
temp_55 = temp.sort(['total_55'], ascending = False)
temp_64 = temp.sort(['total_64'], ascending = False)
pchg_55.append(np.mean(temp_55['excess_pchg'].iloc[:100]))
pchg_64.append(np.mean(temp_64['excess_pchg'].iloc[:100]))
# 绘制净值曲线
netValue = {}
netValue['等权组合'] = NetValue(pchg_55)
netValue['60/40 组合'] = NetValue(pchg_64)
plotNetValue(tradingDays, netValue)
plotBar(tradingDays, pchg_55, label = '等权组合')
plotBar(tradingDays, pchg_64, label = '60/40 组合')
MonthWinRate1 = len([i for i in pchg_55 if i >0])/float(len(pchg_55))
MonthWinRate2 = len([i for i in pchg_64 if i >0])/float(len(pchg_64))
# 收益风险指标统计
result55 = cal_indictor(netValue['等权组合'])
result64 = cal_indictor(netValue['60/40 组合'])
result = pd.DataFrame()
result['等权组合'] = pd.Series(result55)
result['60/40 组合'] =pd.Series(result64)
result_output = result.T
result_output = result_output[['TotalReturn', 'AnnReturn', 'AnnVol', 'SR', 'MaxDown']]
result_output['MonthWinRate'] = [MonthWinRate1, MonthWinRate2]
result_output
TotalReturn | AnnReturn | AnnVol | SR | MaxDown | MonthWinRate | |
---|---|---|---|---|---|---|
等权组合 | 4.283126 | 0.395013 | 0.177879 | 1.995809 | 0.191042 | 0.833333 |
60/40 组合 | 4.487443 | 0.405640 | 0.187984 | 1.945055 | 0.210825 | 0.816667 |
第一张图统计了两因子等权组合及 60/40 组合的净值曲线图,上表统计了两个组合的超额收益指标对比。从中可看出,提高了市值权重的60/40 组合,收益率高于等权组合,同时风险(最大回撤、年化波动率)也高于等权组合。此外,从月度角度来看,等权因子组合和 60/40 组合的稳定性都非常高。
IC_marketcap = []
IC_iop = []
for date in tradingDays:
temp = factor[date]
IC_marketcap.append(st.pearsonr(temp['market_cap'], temp['excess_pchg'])[0])
IC_iop.append(st.pearsonr(temp['inc_operation_profit_year_on_year'], temp['excess_pchg'])[0])
IC = pd.DataFrame(index = ['市值', '营业利润同比增长率'])
IC['均值'] = [np.nanmean(IC_marketcap), np.nanmean(IC_iop)]
IC['标准差'] = [np.nanstd(IC_marketcap), np.nanstd(IC_iop)]
IC['IR'] = [np.nanmean(IC_marketcap) / np.nanstd(IC_iop), np.nanmean(IC_iop) / np.nanstd(IC_iop)]
IC
均值 | 标准差 | IR | |
---|---|---|---|
市值 | 0.094206 | 0.132315 | 1.219512 |
营业利润同比增长率 | 0.028299 | 0.077249 | 0.366341 |
对比市值和营业利润同比增长率的 IC 序列统计特征可发现,市值因子的IC均值(0.09)明显优于营业利润同比增长率(0.028),同时前者IC 序列的波动性也高于后者。从 IR 来看,市值因子效果更好,这种效果相差明显的情况下,简单的等权加权并不能体现市值因子的强有效选股效应,从而拖累了多因子组合的表现。
# 计算市值因子的权重
ratio_marketcap = np.nanmean(IC_marketcap) / (abs(np.nanmean(IC_marketcap)) + np.nanmean(IC_iop))
print '市值因子权重: ', ratio_marketcap
print '营业利润同比增长率因子权重: ', 1-ratio_marketcap
pchg_IC_avg = []
for date in tradingDays:
temp = factor[date]
temp['IC_avg'] = ratio_marketcap * temp['market_cap'] + (1 - ratio_marketcap) * temp['inc_operation_profit_year_on_year']
temp = temp.sort(['IC_avg'], ascending = False)
pchg_IC_avg.append(np.mean(temp['excess_pchg'].iloc[:100]))
netValue['IC 加权组合'] = NetValue(pchg_IC_avg)
plotNetValue(tradingDays, netValue)
plotBar(tradingDays, pchg_IC_avg, label = 'IC 加权组合')
MonthWinRate3 = len([i for i in pchg_IC_avg if i >0])/float(len(pchg_IC_avg))
# 收益风险指标统计
result_IC_avg = cal_indictor(netValue['IC 加权组合'])
result['IC 加权组合'] = pd.Series(result_IC_avg)
result_output = result.T
result_output = result_output[['TotalReturn', 'AnnReturn', 'AnnVol', 'SR', 'MaxDown']]
result_output['MonthWinRate'] = [MonthWinRate1, MonthWinRate2, MonthWinRate3]
result_output
市值因子权重: 0.768994608996 营业利润同比增长率因子权重: 0.231005391004
TotalReturn | AnnReturn | AnnVol | SR | MaxDown | MonthWinRate | |
---|---|---|---|---|---|---|
等权组合 | 4.283126 | 0.395013 | 0.177879 | 1.995809 | 0.191042 | 0.833333 |
60/40 组合 | 4.487443 | 0.405640 | 0.187984 | 1.945055 | 0.210825 | 0.816667 |
IC 加权组合 | 4.845176 | 0.423507 | 0.180512 | 2.124549 | 0.197202 | 0.850000 |
第一张图统计了两因子等权组合、60/40 组合以及 IC 加权因子组合的净值曲线图,上表统计了这三个组合的超额收益指标对比。从中可看出,
出IC 加权组合的收益率明显高于等权组合以及60/40 组合,同时风险(最大回撤、年化波动率)也高于等权组合以及 60/40 组合。但是从夏普比率来看,因子 IC 加权组合的收益风险比高于其余两个组合。从月度收益来看,因子 IC 加权组合的月胜率为 85%,可见模型收益稳定性得到进一步提高。
进一步分析发现,在市值、营业利润同比增长率的例子中,两因子按照 IC 加权的权重分别为 76.90% 和 23.10%;市值因子的权重高于等权形式,也高于主观的 60/40 组合。也就是说,因子 IC 加权组合增加了收益高、波动大的“市值因子”权重,减少了收益低、波动小的“营业利润同比增长率”权重,从而使得 IC 加权组合的整体收益、波动均高于等权组合。
由前面的分析可知,在对因子加权时,需考虑因子本身的有效性(IC),但因子 IC 加权并非在所有情况下都优于等权组合。那么,从理论上看若以最大化复合因子单期 IC 为目标,最优加权比例与哪些因素相关呢?
假设有 M 个因子,分别为 $F_1、F_2、…、F_M$,它们基于权重序列 $W=(w_1,…,w_M)$ 加总为复合因子 $F_C$,即
接下来,利用前一部分推导的最优 IC 加权方式,针对 7 个因子构建多因子模型: 市盈率(PB)、市净率(PE)、市销率(PS)、营业利润同比增长率、资产负债率、反转(前 1 月累计收益)、换手率(前 10 个交易日日均换手率)。每个月最后一个自然日,获取这 7 个因子数据以及下个月相对 ZZ800 的超额收益。为综合比较前文所提及的三种加权方式,在此部分的应用中我们仍然构建 3 个组合进行对比,分别是因子等权组合、因子 IC 加权以及最优 IC 加权组合。
股票池: ZZ800,剔除 ST 股票以及上市 3 个月内的股票
对比基准: ZZ800
交易费用: 千分之 1.5
调仓周期: 月
start = time.clock()
begin_date = '2013-01-01'
end_date = '2018-01-01'
dateList = get_period_date('M',begin_date, end_date)
factor = {}
for date in dateList[:-1]:
stockList = get_stock_ZZ800(date)
df_data = get_fundamentals(query(valuation.code, valuation.pb_ratio, valuation.pcf_ratio, valuation.pe_ratio, indicator.inc_operation_profit_year_on_year, balance.total_liability, balance.total_assets).filter(valuation.code.in_(stockList)), date = date)
df_data = df_data.set_index(['code'])
df_turnover = get_fundamentals_continuously(query(valuation.code, valuation.turnover_ratio).filter(valuation.code.in_(stockList)), end_date = date, count = 10)
df_data['pb_ratio'] = df_data['pb_ratio']
df_data['turnover_ratio'] = -1 * np.mean(df_turnover['turnover_ratio'])
df_data['pe_ratio'] = -1 * df_data['pe_ratio']
df_data['pcf_ratio'] = -1 * df_data['pcf_ratio']
df_close = get_price(stockList, count = 21, end_date=date, frequency='daily', fields=['close'])['close']
df_data['pchg'] = -1 * (df_close.iloc[-1] / df_close.iloc[0] - 1)
df_data['AssetsLiab'] = df_data['total_assets'] / df_data['total_liability']
# 去极值
df_data = winsorize_med(df_data, scale=1, inclusive=True, inf2nan=True, axis=0)
# 数据标准化
df_data = standardlize(df_data, inf2nan=True, axis=0)
# 获取横截面收益率
df_close = get_price(stockList, date, dateList[dateList.index(date)+1], 'daily', ['close'])
if df_close.empty:
continue
df_pchg = df_close['close'].iloc[-1,:]/df_close['close'].iloc[0,:]-1
# 上证指数收益率
index_close = get_price('000906.XSHG', date, dateList[dateList.index(date)+1], 'daily', ['close'])
index_pchg = index_close['close'].iloc[-1]/index_close['close'].iloc[0]-1
df_data['excess_pchg'] = df_pchg - index_pchg
factor[date] = df_data
end = time.clock()
print "time: ", end-start
time: 50.025179
pchg_mean = []
tradingDays = sort(list(factor.keys()))
for date in tradingDays:
temp = factor[date]
temp['total_mean'] = 1/float(7) * temp['pb_ratio'] + 1/float(7) * temp['pcf_ratio'] + 1/float(7) * temp['turnover_ratio'] + 1/float(7) * temp['pchg'] + \
1/float(7) * temp['pe_ratio'] + 1/float(7) * temp['AssetsLiab'] + 1/float(7) * temp['inc_operation_profit_year_on_year']
temp_mean = temp.sort(['total_mean'], ascending = False)
pchg_mean.append(np.mean(temp_mean['excess_pchg'].iloc[:100]))
IC_pb_ratio = []
IC_turnover = []
IC_pe_ratio = []
IC_pcf_ratio = []
IC_pchg = []
IC_AssetsLiab = []
IC_iop = []
for date in tradingDays:
temp = factor[date]
IC_pb_ratio.append(st.pearsonr(temp['pb_ratio'], temp['excess_pchg'])[0])
IC_turnover.append(st.pearsonr(temp['turnover_ratio'], temp['excess_pchg'])[0])
IC_pe_ratio.append(st.pearsonr(temp['pe_ratio'], temp['excess_pchg'])[0])
IC_pcf_ratio.append(st.pearsonr(temp['pcf_ratio'], temp['excess_pchg'])[0])
IC_pchg.append(st.pearsonr(temp['pchg'], temp['excess_pchg'])[0])
IC_AssetsLiab.append(st.pearsonr(temp['AssetsLiab'], temp['excess_pchg'])[0])
IC_iop.append(st.pearsonr(temp['inc_operation_profit_year_on_year'], temp['excess_pchg'])[0])
# 计算市值因子的权重
totalWeight = np.nanmean(IC_pb_ratio) + np.nanmean(IC_turnover) + np.nanmean(IC_pe_ratio) + np.nanmean(IC_pchg) +\
np.nanmean(IC_AssetsLiab) + np.nanmean(IC_iop) + + np.nanmean(IC_pcf_ratio)
ratio_pb_ratio = np.nanmean(IC_pb_ratio) / totalWeight
ratio_turnover = np.nanmean(IC_turnover) / totalWeight
ratio_pe_ratio = np.nanmean(IC_pe_ratio) / totalWeight
ratio_pcf_ratio = np.nanmean(IC_pcf_ratio) / totalWeight
ratio_pchg = np.nanmean(IC_pchg) / totalWeight
ratio_AssetsLiab = np.nanmean(IC_AssetsLiab) / totalWeight
ratio_iop = np.nanmean(IC_iop) / totalWeight
pchg_IC_avg = []
tradingDays = sort(list(factor.keys()))
for date in tradingDays:
temp = factor[date]
temp['total_IC_avg'] = ratio_pb_ratio * temp['pb_ratio'] + ratio_turnover * temp['turnover_ratio'] + ratio_pchg * temp['pchg'] + \
ratio_pe_ratio * temp['pe_ratio'] + ratio_AssetsLiab * temp['AssetsLiab'] + ratio_iop * temp['inc_operation_profit_year_on_year'] + ratio_pcf_ratio * temp['pcf_ratio']
temp_IC_avg = temp.sort(['total_IC_avg'], ascending = False)
pchg_IC_avg.append(np.mean(temp_IC_avg['excess_pchg'].iloc[:100]))
tempData = pd.DataFrame()
for date in tradingDays[:-1]:
temp = factor[date][['pb_ratio', 'turnover_ratio', 'pe_ratio', 'inc_operation_profit_year_on_year', 'pchg', 'AssetsLiab', 'pcf_ratio']]
if tempData.empty:
tempData = temp
else:
tempData = tempData.append(temp)
a = temp.cov()
weight = np.dot(np.linalg.inv(np.array(a)), np.array([np.nanmean(IC_pb_ratio), np.nanmean(IC_turnover), np.nanmean(IC_pe_ratio), np.nanmean(IC_iop), np.nanmean(IC_pchg), np.nanmean(IC_AssetsLiab), np.nanmean(IC_pcf_ratio)]))
pchg_IC_max = []
tradingDays = sort(list(factor.keys()))
for date in tradingDays:
temp = factor[date]
temp['total_IC_max'] = weight[0] * temp['pb_ratio'] + weight[1] * temp['turnover_ratio'] + weight[4] * temp['pchg'] + \
weight[2] * temp['pe_ratio'] + weight[5] * temp['AssetsLiab'] + weight[3] * temp['inc_operation_profit_year_on_year'] + weight[6] * temp['pcf_ratio']
temp_IC_max = temp.sort(['total_IC_max'], ascending = False)
pchg_IC_max.append(np.mean(temp_IC_max['excess_pchg'].iloc[:100]))
# 绘制净值曲线
netValue = {}
netValue['等权组合'] = NetValue(pchg_mean)
netValue['IC 加权组合'] = NetValue(pchg_IC_avg)
netValue['最优 IC 加权组合'] = NetValue(pchg_IC_max)
plotNetValue(tradingDays, netValue)
MonthWinRate1 = len([i for i in pchg_mean if i >0])/float(len(pchg_mean))
MonthWinRate2 = len([i for i in pchg_IC_avg if i >0])/float(len(pchg_IC_avg))
MonthWinRate3 = len([i for i in pchg_IC_max if i >0])/float(len(pchg_IC_max))
# 收益风险指标统计
result_temp = cal_indictor(netValue['等权组合'])
result = pd.DataFrame()
result['等权组合'] = pd.Series(result_temp)
result_temp = cal_indictor(netValue['IC 加权组合'])
result['IC 加权组合'] = pd.Series(result_temp)
result_temp = cal_indictor(netValue['最优 IC 加权组合'])
result['最优 IC 加权组合'] = pd.Series(result_temp)
result_output = result.T
result_output = result_output[['TotalReturn', 'AnnReturn', 'AnnVol', 'SR', 'MaxDown']]
result_output['MonthWinRate'] = [MonthWinRate1, MonthWinRate2, MonthWinRate3]
result_output
TotalReturn | AnnReturn | AnnVol | SR | MaxDown | MonthWinRate | |
---|---|---|---|---|---|---|
等权组合 | 1.475105 | 0.198723 | 0.158053 | 1.004238 | 0.250984 | 0.75 |
IC 加权组合 | 1.732261 | 0.222656 | 0.166839 | 1.094807 | 0.238932 | 0.75 |
最优 IC 加权组合 | 1.885302 | 0.236056 | 0.167213 | 1.172492 | 0.240355 | 0.75 |
在上述 7 个因子的例子中,回测时间在 2013 - 2018,等权组合的年化超额收益为 19.87%,但夏普比率较低,为1.00。因子 IC 加权组合的收益表现略优于等权组合,年化收益增加至 22.27%,夏普比率也比其更高,为 1.09。三个组合中表现最好的是最优 IC 加权组合,其年化超额达 23.61%,夏普比率在三种也是最高,为 1.17。
目前而言,应用较多的因子加权方法主要有以下几种: 等权加权、IC 加权和 IC_IR 加权、以及最优化 IC_IR 加权。其中,等权加权是因子加权最传统的方法,这种方法受因子之间有效性差异、线性相关性影响明显。而 IC 加权对等权方式忽视了因子有效性差异的问题进行了改进,在大部分情况下会优于等权加权形式。
Qian 在《Quantitative Equtiy Portfolio Management》一书中提出以最大化复合因子 IC_IR 获得因子权重,综合考虑了因子的 IC 大小以及 IC 时间序列的稳定性,目前已有许多文章对此种加权方式进行了测试。
本文沿用 Qian 的最优化体系获取因子权重,与之不同的是,我们将优化目标由最大化复合因子 IR 变为最大化复合因子单期 IC。理论解析解的形式表明,最大化复合因子单期 IC 的权重与两方面因素有关: 一是因子的有效性,即因子 IC;二是因子之间的相关系数。同时,文章实证结果也表明,最大化单期 IC 能有效解决“等权”的配置偏差问题,在绝大部分因子空间,最优 IC 加权所构建的组合,其表现均优于“等权”方式 所构建的组合。
本社区仅针对特定人员开放
查看需注册登录并通过风险意识测评
5秒后跳转登录页面...
移动端课程