请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  量化平台 帖子:3364737 新帖:1

基于日内高频数据的短周期期货因子研究-高频数据因子研究系列一

不做外汇索罗斯发表于:7 月 31 日 16:00回复(1)

运用到期货上,拆了@一梦春秋的代码,最后结论都懒得copy了

参考: 广发证券《基于日内高频数据的短周期选股因子研究-高频数据因子研究系列一》;

@一梦春秋 https://www.joinquant.com/view/community/detail/417356f8e6a03f2b42952844e0c587b8?type=1

一、因子构建¶

因子构建过程摘自研报,具体因子指标构建如下:

  1. 对于每个个股在交易日t,首先计算个股在特定分钟频率下第i个的收益率 $r_{t,i}$, $r_{t,i}$ = $p_{t,i}$ - $p_{t,i-1}$,其中$p_{t,i}$表示在交易日t,个股在第i个特定分钟频率下的对数价格,$p_{i,i-1}$表示在交易日t,个股在第i-1个特定分钟频率下的对数价格。

  2. 对于每个个股,根据rt,i分别计算个股在交易日t下的已实现方差(Realized Variance) $RDVar_t$、已实现偏度(Realized Skewness)$RDSkew_t$,已实现峰度(Realized kurtosis) $RDKurt_t$。其中:

$$RDVar_t = \sum\limits_{ i=1}^{n}r_{t,i}^2$$

$$RDSkew_t = \frac {\sqrt N\sum\limits_{ i=1}^{n}r_{t,i}^3}{RDVar_t^{3/2}}$$

$$RDKurt_t = \frac {N \sum\limits_{ i=1}^{n}r_{t,i}^4}{RDVar_t^2}$$

其中N表示个股在交易日t中特定频率的分钟级别数据个数,如在1分钟行情级别下,数据个数N为60*4=240;在五分钟行情级别下,数据个数N为240/5=48。

  1. 对于每个个股在交易日t计算累计已实现波动(Realized Volatility)$RVol_t$,已实现偏度(Realized Skewness)$RSkew_t$、已实现峰度(Realized Kurtosis) $RKurt_t$, 其中: $$RVol_t = \left(\frac{242}{n} {\sum\limits_{ i=0}^{n}}RDVar_{t-i}\right)^{1/2}$$

$$RSkew = \frac{1}{n}{\sum\limits_{ i=0}^{n}}RDSkew_{t-i}$$

$$RKurt_t = \frac{1}{n}{\sum\limits_{ i=0}^{n}}RDKur_{t-i}$$

  1. 在每期调仓日截面上,按照上述公式计算每个个股的已实现波动(Realized Volatility)$RVol_t$,已实现偏度(Realized Skewness)$RSkew_t$、已实现峰度(Realized Kurtosis)$RKurt_t$指标,针对每个由高频数据计算得到的因子指标在历史上的分档组合表现,试图寻找出相对有效的因子指标。

二、 构造因子数据¶

import numpy as np
import pandas as pd
import math
from jqdata import *
import matplotlib.pyplot as plt
from datetime import date, timedelta
#上期所

shang = {'AG8888.XSGE':'白银期货指数', 'PB8888.XSGE':'铅期货指数',
'AU8888.XSGE':'黄金期货指数', 'RB8888.XSGE':'螺纹钢期货指数',
'AL8888.XSGE':'铝期货指数',   'RU8888.XSGE':'天然橡胶期货指数',
'BU8888.XSGE':'石油沥青期货指数', 'SN8888.XSGE':'锡期货指数',
'CU8888.XSGE':'铜期货指数', 
'FU8888.XSGE':'燃料油期货指数', 'ZN8888.XSGE':'锌期货指数',
'HC8888.XSGE':'热轧卷板期货指数', 'NI8888.XSGE':'镍期货指数',
'SP8888.XSGE':'纸浆主力合约',}

#郑商所

zheng = {'RM8888.XZCE':'菜籽粕期货指数',
'CF8888.XZCE':'棉花期货指数',  'FG8888.XZCE':'玻璃期货指数',
'SF8888.XZCE':'硅铁期货指数', 
'SM8888.XZCE':'锰硅期货指数',  'MA8888.XZCE':'甲醇期货指数',
'SR8888.XZCE':'白糖期货指数',  
'TA8888.XZCE':'PTA期货指数',   'OI8888.XZCE':'菜籽油期货指数', 
'ZC8888.XZCE':'动力煤期货指数',  
'AP8888.XZCE':'苹果期货指数',  'CJ8888.XZCE':'红枣合约',}

#大商所

da = {'A8888.XDCE':'豆一期货指数', 'JD8888.XDCE':'鸡蛋期货指数',
'B8888.XDCE':'豆二期货指数', 'JM8888.XDCE':'焦煤期货指数',
'L8888.XDCE':'聚乙烯期货指数',
'C8888.XDCE':'玉米期货指数', 'M8888.XDCE':'豆粕期货指数',
'CS8888.XDCE':'玉米淀粉期货指数', 'P8888.XDCE':'棕榈油期货指数',
'PP8888.XDCE':'聚丙烯期货指数',
'I8888.XDCE':'铁矿石期货指数', 'V8888.XDCE':'聚氯乙烯期货指数',
'J8888.XDCE':'焦炭期货指数', 'Y8888.XDCE':'豆油期货指数',
'EG8888.XDCE':'乙二醇期货指数',}

futures = list(shang.keys()) + list(zheng.keys()) + list(da.keys())
futures[:5]
['AG8888.XSGE', 'PB8888.XSGE', 'AU8888.XSGE', 'RB8888.XSGE', 'AL8888.XSGE']
future_list = []
date_ = date(2014,1,1)
for future in futures:
    start_date = get_security_info(future).start_date
    if start_date < (date_ - timedelta(days=365)):
        future_list.append(future)
print(len(future_list))
future_list[:5]
24
['AG8888.XSGE', 'PB8888.XSGE', 'AU8888.XSGE', 'RB8888.XSGE', 'AL8888.XSGE']
n=5
trade_days = get_trade_days(start_date='2018-01-01', end_date='2019-06-01')
panel_dict = {}

for i in range(1, len(trade_days)):
    daily_start = str(trade_days[i - 1])+' 21:31:00'
    daily_end = str(trade_days[i])+' 15:05:00'
    
    factor_df_index = []
    factor_df_data = []

    for future in future_list:
        price = get_price(future,start_date=daily_start,end_date=daily_end,frequency='5m',fields=['close'],
                          fq='pre')
        sum_rt2 = 0.0
        sum_rt3 = 0.0
        sum_rt4 = 0.0
        for j in range(1, len(price)):
            pi = math.log(price.iloc[j]['close'])
            pi_1 = math.log(price.iloc[j - 1]['close'])
            rt = pi - pi_1
            sum_rt2 += math.pow(rt, 2)
            sum_rt3 += math.pow(rt, 3)
            sum_rt4 += math.pow(rt, 4)
            
        rd_var = sum_rt2
        if sum_rt3 == 0:
            rd_skew = 0
        else:
            rd_skew = math.sqrt(len(price)) * sum_rt3 / (math.pow(rd_var, 3 / 2))
        
        if sum_rt4 == 0:
            rd_kurt = 0
        else:
            rd_kurt = len(price) * sum_rt4 / (math.pow(rd_var, 2))
            
        factor_df_index.append(future)
        factor_df_data.append([price.close.iloc[-1], rd_var, rd_skew, rd_kurt])
        factor_df = pd.DataFrame(data=factor_df_data, index=factor_df_index, 
                                 columns=['close', 'rd_var', 'rd_skew', 'rd_kurt'])
    panel_dict[trade_days[i]] = factor_df
panel = pd.Panel(panel_dict)
panel
/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py:3267: FutureWarning: 
Panel is deprecated and will be removed in a future version.
The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method
Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/.
Pandas provides a `.to_xarray()` method to help automate this conversion.

  exec(code_obj, self.user_global_ns, self.user_ns)
<class 'pandas.core.panel.Panel'>
Dimensions: 341 (items) x 24 (major_axis) x 4 (minor_axis)
Items axis: 2018-01-03 to 2019-05-31
Major_axis axis: AG8888.XSGE to Y8888.XDCE
Minor_axis axis: close to rd_kurt
panel.major_xs('AG8888.XSGE')
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
2018-01-03 2018-01-04 2018-01-05 2018-01-08 2018-01-09 2018-01-10 2018-01-11 2018-01-12 2018-01-15 2018-01-16 2018-01-17 2018-01-18 2018-01-19 2018-01-22 2018-01-23 2018-01-24 2018-01-25 2018-01-26 2018-01-29 2018-01-30 2018-01-31 2018-02-01 2018-02-02 2018-02-05 2018-02-06 2018-02-07 2018-02-08 2018-02-09 2018-02-12 2018-02-13 2018-02-14 2018-02-22 2018-02-23 2018-02-26 2018-02-27 2018-02-28 2018-03-01 2018-03-02 2018-03-05 2018-03-06 ... 2019-04-02 2019-04-03 2019-04-04 2019-04-08 2019-04-09 2019-04-10 2019-04-11 2019-04-12 2019-04-15 2019-04-16 2019-04-17 2019-04-18 2019-04-19 2019-04-22 2019-04-23 2019-04-24 2019-04-25 2019-04-26 2019-04-29 2019-04-30 2019-05-06 2019-05-07 2019-05-08 2019-05-09 2019-05-10 2019-05-13 2019-05-14 2019-05-15 2019-05-16 2019-05-17 2019-05-20 2019-05-21 2019-05-22 2019-05-23 2019-05-24 2019-05-27 2019-05-28 2019-05-29 2019-05-30 2019-05-31
close 3896.929000 3884.829000 3889.934000 3891.012000 3895.981000 3873.110000 3885.140000 3890.233000 3926.186000 3907.923000 3877.015000 3853.076000 3835.058000 3839.049000 3848.178000 3837.149000 3917.340000 3890.659000 3876.701000 3824.738000 3833.709000 3831.796000 3826.264000 3731.234000 3769.407000 3733.506000 3676.579000 3689.624000 3692.757000 3714.779000 3720.807000 3677.940000 3699.089000 3729.238000 3722.266000 3687.433000 3679.198000 3689.655000 3704.325000 3695.468000 ... 3565.935000 3584.237000 3586.444000 3594.315000 3607.918000 3603.018000 3593.508000 3557.054000 3536.476000 3548.285000 3549.532000 3539.680000 3548.146000 3565.847000 3543.072000 3520.911000 3548.540000 3569.960000 3564.559000 3557.919000 3558.014000 3570.186000 3581.351000 3577.475000 3570.274000 3575.036000 3611.352000 3595.858000 3606.801000 3565.411000 3545.635000 3550.679000 3549.994000 3564.392000 3577.204000 3577.151000 3570.683000 3554.734000 3551.236000 3572.920000
rd_var 0.000028 0.000051 0.000015 0.000020 0.000013 0.000014 0.000010 0.000051 0.000038 0.000015 0.000024 0.000039 0.000026 0.000017 0.000011 0.000045 0.000051 0.000071 0.000021 0.000027 0.000024 0.000019 0.000028 0.000098 0.000112 0.000033 0.000049 0.000049 0.000091 0.000032 0.000049 0.000015 0.000023 0.000029 0.000026 0.000061 0.000021 0.000063 0.000028 0.000015 ... 0.000019 0.000017 0.000015 0.000009 0.000012 0.000012 0.000017 0.000022 0.000026 0.000018 0.000019 0.000015 0.000008 0.000011 0.000010 0.000020 0.000020 0.000017 0.000015 0.000017 0.000008 0.000016 0.000012 0.000015 0.000016 0.000013 0.000019 0.000026 0.000010 0.000032 0.000012 0.000013 0.000018 0.000013 0.000017 0.000011 0.000010 0.000017 0.000037 0.000019
rd_skew -0.095043 -4.612178 -0.602814 -1.070272 -0.511970 -1.511876 -0.426560 -1.922123 3.371732 -1.203940 -0.831105 -4.053758 -0.198022 2.548430 0.557565 -2.177223 0.646749 -5.753084 -0.471969 -0.922546 1.094639 1.014053 1.290568 -1.661635 2.478196 1.192341 -0.773851 0.292206 5.248286 0.789771 2.318899 0.367297 0.160604 1.241179 1.280568 -1.161667 -0.382821 3.096876 0.958635 0.284492 ... -0.687768 1.530637 1.065321 0.487290 -0.497250 0.392057 -0.222383 -1.309289 -2.145156 1.165266 0.694311 -0.565016 -0.425492 0.539512 -1.331245 -0.412768 0.525464 0.700912 -0.358176 1.481828 -0.240613 0.484989 1.365373 -0.824448 0.745331 -0.408234 0.303964 -1.572724 -0.171120 -1.217455 -0.296289 -1.535254 0.219072 0.311837 0.425111 0.546481 -0.176504 1.168907 -1.742897 1.595885
rd_kurt 3.628721 34.083203 4.045513 6.473492 4.336460 7.774328 3.220249 15.008555 21.212018 6.424850 5.267951 27.223348 5.279877 18.254578 4.824025 17.177066 4.211245 47.769682 3.741693 3.477501 7.859363 9.918721 7.878246 6.548559 15.890135 9.007872 5.369693 4.543329 43.210418 5.970432 14.208564 4.585190 4.178374 5.324524 11.035961 14.480319 5.628787 18.476909 4.583459 5.970725 ... 3.671092 5.198868 4.020347 3.710831 6.239751 4.043122 4.190775 11.694857 13.463961 6.641241 6.581335 3.466956 3.202595 3.486636 5.666463 5.929159 5.626616 4.646697 4.673658 7.751242 4.040043 4.143220 7.647355 7.431755 5.194859 3.527151 3.473310 12.271466 2.912957 4.486629 2.771105 12.777991 5.051282 4.585726 3.388234 3.775833 3.487026 5.840408 12.939675 8.240106
rvol = np.sqrt(panel.minor_xs('rd_var').T.rolling(5).mean()*242).shift(1)
rvol.head(10)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
AG8888.XSGE PB8888.XSGE AU8888.XSGE RB8888.XSGE AL8888.XSGE RU8888.XSGE CU8888.XSGE FU8888.XSGE ZN8888.XSGE RM8888.XZCE CF8888.XZCE FG8888.XZCE SR8888.XZCE TA8888.XZCE OI8888.XZCE A8888.XDCE B8888.XDCE L8888.XDCE C8888.XDCE M8888.XDCE P8888.XDCE V8888.XDCE J8888.XDCE Y8888.XDCE
2018-01-03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-04 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-05 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-08 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-09 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-10 0.078532 0.105625 0.060853 0.163671 0.116236 0.161929 0.082414 0.0 0.098106 0.086055 0.078949 0.177890 0.061341 0.122266 0.078103 0.090980 0.098327 0.120537 0.076204 0.076046 0.082018 0.145638 0.246453 0.081067
2018-01-11 0.074320 0.106632 0.056683 0.168901 0.111546 0.180640 0.080016 0.0 0.098854 0.087240 0.078779 0.171566 0.062342 0.127703 0.082621 0.087136 0.159171 0.127870 0.073437 0.075131 0.087239 0.145208 0.255219 0.088806
2018-01-12 0.059529 0.102069 0.046859 0.168358 0.098163 0.168535 0.074416 0.0 0.100184 0.087271 0.076613 0.168290 0.058443 0.108304 0.082248 0.089587 0.169835 0.127178 0.073047 0.076681 0.084758 0.142930 0.238597 0.085892
2018-01-15 0.072649 0.105241 0.046285 0.166876 0.099854 0.159496 0.076432 0.0 0.100922 0.085507 0.076038 0.162798 0.054206 0.104343 0.079189 0.083143 0.153919 0.119289 0.073983 0.073661 0.091048 0.143651 0.224717 0.084393
2018-01-16 0.078394 0.103543 0.043652 0.149066 0.094901 0.166233 0.082110 0.0 0.098571 0.106478 0.081554 0.129357 0.055094 0.098014 0.085089 0.093870 0.167715 0.121203 0.063419 0.103468 0.098195 0.149519 0.247190 0.088007
rskew = panel.minor_xs('rd_skew').T.rolling(5).mean().shift(1)
rskew.tail()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
AG8888.XSGE PB8888.XSGE AU8888.XSGE RB8888.XSGE AL8888.XSGE RU8888.XSGE CU8888.XSGE FU8888.XSGE ZN8888.XSGE RM8888.XZCE CF8888.XZCE FG8888.XZCE SR8888.XZCE TA8888.XZCE OI8888.XZCE A8888.XDCE B8888.XDCE L8888.XDCE C8888.XDCE M8888.XDCE P8888.XDCE V8888.XDCE J8888.XDCE Y8888.XDCE
2019-05-27 -0.175104 -0.098253 -0.067989 -0.126786 -0.776771 -1.223528 -0.050564 -0.043689 0.022579 0.120305 -0.535873 0.367370 -0.696847 -0.097802 0.075797 -0.119445 -0.372945 0.637237 -0.393494 -0.163280 0.075275 0.714072 -0.196096 -0.483185
2019-05-28 -0.006550 -0.748347 0.133832 -0.461609 -0.317813 -0.210288 0.293991 0.105746 0.190589 -0.014780 -0.014979 0.281588 -0.558965 1.058356 0.351417 -0.046069 -0.363897 0.709613 -0.570543 -0.067322 0.241694 0.908246 -0.372345 -0.349835
2019-05-29 0.265200 -1.010981 0.356060 -0.850822 -0.205528 -0.665941 -0.010998 0.466548 0.012581 -0.387606 -0.203280 0.113810 -0.508173 0.844013 0.261749 -0.162345 -0.402435 0.893345 -0.300750 0.088379 0.113034 1.102814 -0.733780 -0.499796
2019-05-30 0.455167 -1.282686 0.772734 -0.400378 0.020745 -0.301800 0.082120 0.375143 -0.143340 0.849013 0.335597 -0.236364 0.012732 0.914475 0.508935 0.807184 1.057709 0.906341 0.358698 1.797040 0.954387 1.186858 -0.458667 0.227020
2019-05-31 0.044220 -1.042543 0.236909 -0.673350 0.108086 -0.186491 0.338462 1.008258 0.145217 0.793839 0.125658 -0.559639 0.429029 1.009303 0.470637 0.880507 1.265849 0.284821 -0.002156 1.864890 1.071081 1.222379 -0.505347 0.292943
rkurt = panel.minor_xs('rd_kurt').T.rolling(5).mean().shift(1)
rkurt.tail()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
AG8888.XSGE PB8888.XSGE AU8888.XSGE RB8888.XSGE AL8888.XSGE RU8888.XSGE CU8888.XSGE FU8888.XSGE ZN8888.XSGE RM8888.XZCE CF8888.XZCE FG8888.XZCE SR8888.XZCE TA8888.XZCE OI8888.XZCE A8888.XDCE B8888.XDCE L8888.XDCE C8888.XDCE M8888.XDCE P8888.XDCE V8888.XDCE J8888.XDCE Y8888.XDCE
2019-05-27 5.714868 5.263076 4.175594 5.434553 8.984689 14.605471 6.469306 8.652241 6.597865 7.190487 8.400903 4.350366 8.539037 5.570203 5.104087 5.356956 4.898469 4.481038 7.210029 5.263679 4.323041 5.571098 8.871790 4.862249
2019-05-28 5.915813 8.190477 4.475250 5.714088 9.321683 8.728391 6.623149 9.779292 6.630368 6.719982 9.439344 4.459139 9.884842 12.135892 5.551983 5.146355 4.721424 4.640307 7.243167 5.504183 4.582042 5.952238 8.419125 4.833600
2019-05-29 4.057620 8.103784 5.011199 5.429023 9.786781 8.017176 5.610571 9.621175 6.497516 6.578347 8.446911 4.381520 10.637737 11.701365 5.238671 5.357968 5.186984 5.361547 4.999149 6.607914 4.152856 7.096729 9.038384 4.691304
2019-05-30 4.215445 8.806066 6.570760 4.096090 7.248866 6.798045 4.945815 8.878251 6.544527 7.979803 9.368163 3.938108 9.928262 10.666351 5.783691 6.469350 9.963528 5.673635 5.970576 13.926214 6.608399 7.063728 7.509983 4.911892
2019-05-31 5.886235 8.139018 8.681616 5.786130 7.281216 5.842953 4.987463 7.730391 7.101016 8.029187 9.445432 4.831540 7.433998 10.217890 5.842374 6.384113 9.910771 7.673921 6.090323 13.627367 7.032325 6.679847 7.705261 4.456813

三、 因子特征展示¶

import matplotlib.dates as mdate
# 设置字体 用来正常显示中文标签
plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']

# 用来正常显示负号
plt.rcParams['axes.unicode_minus'] = False
"""
绘制直方图
data:必选参数,绘图数据
bins:直方图的长条形数目,可选项,默认为10
density:是否将得到的直方图向量归一化,可选项,默认为0,代表不归一化,显示频数。normed=1,表示归一化,显示频率。
facecolor:长条形的颜色
edgecolor:长条形边框的颜色
alpha:透明度
"""

# 这个颜色是我把研报的图截图用取色器取出来的,为什么画出来还是有色差?
color = "#1F77B4"
plt.hist(rvol, bins=40, density=0, facecolor=color, edgecolor=None, alpha=1)
# 显示横轴标签
plt.xlabel("区间")
# 显示纵轴标签
plt.ylabel("频数")
# 显示图标题
plt.title("个股波动率分布")
plt.show()

plt.hist(rskew, bins=40, density=0, facecolor=color, edgecolor=None, alpha=1)
# 显示横轴标签
plt.xlabel("区间")
# 显示纵轴标签
plt.ylabel("频数")
# 显示图标题
plt.title("个股偏度分布")
plt.show()

plt.hist(rkurt, bins=40, density=0, facecolor=color, edgecolor=None, alpha=1)
# 显示横轴标签
plt.xlabel("区间")
# 显示纵轴标签
plt.ylabel("频数")
# 显示图标题
plt.title("个股峰度分布")
plt.show()
/opt/conda/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6575: RuntimeWarning: All-NaN slice encountered
  xmin = min(xmin, np.nanmin(xi))
/opt/conda/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6576: RuntimeWarning: All-NaN slice encountered
  xmax = max(xmax, np.nanmax(xi))
/opt/conda/lib/python3.6/site-packages/numpy/lib/function_base.py:780: RuntimeWarning: invalid value encountered in greater_equal
  keep = (tmp_a >= first_edge)
/opt/conda/lib/python3.6/site-packages/numpy/lib/function_base.py:781: RuntimeWarning: invalid value encountered in less_equal
  keep &= (tmp_a <= last_edge)

从以上因子分布三图看出,整个期货市场品种的波动率分布整体上呈现右偏分布;各品种的偏度分布,整体偏度水平保持在零附近,呈现较为明显厚尾状态;各品种的峰度分布与个股波动率水平类似,分布整体上右偏,且样本内的峰度水平大部分大于3,呈现厚尾的现象。

# 百分位走势5档颜色 蓝 橙 绿 红 紫
color_list = ['#5698c6', '#ff9e4a', '#60b760', '#e05c5d', '#ae8ccd']

label_list = ['10', '25', 'median', '75', '90']

all_df = [rvol, rskew, rkurt]
title_name = ['rvol', 'rskew', 'rkurt']

num = 0
for df in all_df:
    # 这里需要用每天的因子数据分档,计算出5个折线
    y_list = [[], [], [], [], []]
    q_list = [0.10, 0.25, 0.50, 0.75, 0.90]
    
    for i in range(len(df)):
        factor = df.iloc[i].sort_values()
        for j in range(len(q_list)):
            num_signal = int(len(q_list) * q_list[j])
            factor_value = factor.iloc[num_signal]
            y_list[j].append(factor_value)
    
    # 可以设置生成图片的大小
    fig = plt.figure(figsize=(12, 8))
    plt.title(title_name[num]+' 百分位走势')
    num += 1
    for i in range(len(y_list)):
        plt.plot(y_list[i], color_list[i], label=label_list[i])
        
        x = np.arange(0, len(df),50)
        x_label = []
        for i in range(0,len(df)):
            if i in x:
                date = list(df.index)[i]
                x_label.append(date)
        plt.xticks(x, x_label, rotation='vertical')
        
        plt.xticks(rotation=360)
        plt.xlabel("TRADE_DT")
        plt.ylabel("因子值")
        legend()

四、 实证分析¶

all_df = [rvol, rskew, rkurt]
Q_list = ['Q1', 'Q2', 'Q3', 'Q4', 'Q5']
df_list = []
num = 0
for i in range(len(all_df)):
    price_close = panel.minor_xs('close').T
    price_pct = price_close.pct_change().dropna().iloc[4:]
    groups = [[], [], [], [], []]
    length = int(len(all_df[i].T) / 5)
    for j in range(len(price_pct)):
        df = all_df[i].dropna().iloc[0:]
        index_list = list(df.iloc[j].sort_values().index)
        daily_price = price_pct[index_list].iloc[j]
        groups[0].append(daily_price[0:5].mean())
        groups[1].append(daily_price[5:9].mean())
        groups[2].append(daily_price[9:14].mean())
        groups[3].append(daily_price[14:19].mean())
        groups[4].append(daily_price[19:].mean())
    df_group = pd.DataFrame(groups).T
    df_group.index = price_pct.index
    df_group = df_group.cumsum()
    df_group.columns = ['Q1', 'Q2', 'Q3', 'Q4', 'Q5']
    df_list.append(df_group)
    
    fig = plt.figure(figsize=(12, 8))
    plt.title(title_name[num]+' 累计收益率')
    
    plt.plot(df_group['Q1'])
    plt.plot(df_group['Q2'])
    plt.plot(df_group['Q3'])
    plt.plot(df_group['Q4'])
    plt.plot(df_group['Q5'])
    plt.legend()
    num += 1
num = 0
for df in df_list:
    df = df['Q3'] - df['Q2']
    
    plt.figure(figsize=(12, 8))
    plt.title(title_name[num]+' 累计收益率')
    plt.plot(df, label=title_name[num])
    plt.legend()
    num += 1
import scipy.stats as st
all_df = [rvol, rskew, rkurt]
name = ['rvol', 'rskew', 'rkurt']
color_list = ['#2B4C80', '#B00004']
label_list = ['IC', 'IC均值(12期)']

for i in range(len(all_df)):
    # 每天的ic
    ic_list = []
    # ic均值(12期)
    ic_ma_list = []
    y_list = [ic_list, ic_ma_list]
    for j in range(len(all_df[i].iloc[5:])):
        ic = st.pearsonr(price_pct.iloc[j].values, all_df[i].iloc[5:].iloc[j].values)[0]
        ic_list.append(ic)
    ic_list = np.array(ic_list)
    print("%s ic 小于0的个数占比:%s" % (name[i], np.sum(ic_list < 0) / len(ic_list)))
    for z in range(len(ic_list)):
        if z < 12:
            ic_ma_list.append(np.nan)
            continue
        ic_ma = np.array(ic_list[z - 12:z]).mean()
        ic_ma_list.append(ic_ma)
        

    fig = plt.figure(figsize=(12, 8))
    ax = fig.add_subplot(1, 1, 1)
    ax.set_title(name[i] + "因子ic")

    for i in range(len(y_list)):
        yi = y_list[i]
        ax.plot(yi, color_list[i], label=label_list[i])

    # 绘制Y轴的网格线便于查看IC
    plt.grid(axis='y')
    plt.show()
rvol ic 小于0的个数占比:0.48214285714285715
rskew ic 小于0的个数占比:0.5327380952380952
rkurt ic 小于0的个数占比:0.5178571428571429
 

全部回复

0/140

量化课程

    移动端课程