请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  数理科学 帖子:3364762 新帖:0

前复权/分红/派股 数据处理和策略讨论

外汇工厂发表于:5 月 10 日 02:45回复(1)

前复权/分红/派股 数据处理和策略

问题:

  • jqdata前复权价格和google历史价格不符合,验证对于分红派股的处理方法
    • 2005-03-04 为例:
    • jqdata给出前复权因子=0.2311320
    • google给出前复权因子=0.322327
  • 猜想
    • jqdata get_price使用前复权fq='pre'得到factor包含了分红和拆股,
    • google的价格只包含了拆股,分红收益作为现金发放因此没有调整
  • 验证结论
      google复权因子忽略某些事件,数据错误
      jqdata同时使用分红和拆股调整复权因子,分红现金发放假设重新投资,没有单独考虑。
      建议jqdata将分红调整单独列出方便进行因子分析
    

基本概念

  • 分红,股东得现金股息 (dividend),股价下降,总市值下降
      price_post * shares_post   dividend_yield * shares_post = price_pre * shares_pre 
    
  • 派股,股东所得股数增加,股价下降(总市值不变)
      price_post * shares_post = price_pre * shares_pre
      shares_post = shares_pre * split_ratio
    

符号:

  • price 真实收盘价
  • shares 真实收盘股数
  • _post = ex_date 除权除息日
  • _pre = ex_date-1 除权除息日前一个交易日
  • dividend_yield 分红收益率 = dividend_amount / price_post
  • split_ratio 拆股比例

潜在策略, 结合股息预期和实际公告进行事件分析。

  • 超预期因子是否对中期股价(3month)有预测性?
  • 如果实施与公告不符,是否长期股价(6month)有预测性?

前复权/分红/派股 数据处理和策略¶

问题:¶

  • jqdata前复权价格和google历史价格不符合,验证对于分红派股的处理方法
    • 2005-03-04 为例:
    • jqdata给出前复权因子=0.2311320
    • google给出前复权因子=0.322327
  • 猜想
    • jqdata get_price使用前复权fq='pre'得到factor包含了分红和拆股,
    • google的价格只包含了拆股,分红收益作为现金发放因此没有调整
  • 验证结论
      google复权因子忽略某些事件,数据错误
      jqdata同时使用分红和拆股调整复权因子,分红现金发放假设重新投资,没有单独考虑。
      建议jqdata将分红调整单独列出方便进行因子分析

基本概念¶

  • 分红,股东得现金股息 (dividend),股价下降,总市值下降
      price_post * shares_post + dividend_yield * shares_post = price_pre * shares_pre 
  • 派股,股东所得股数增加,股价下降(总市值不变)
      price_post * shares_post = price_pre * shares_pre
      shares_post = shares_pre * split_ratio

符号:¶

  • price 真实收盘价
  • shares 真实收盘股数
  • _post = ex_date 除权除息日
  • _pre = ex_date-1 除权除息日前一个交易日
  • dividend_yield 分红收益率 = dividend_amount / price_post
  • split_ratio 拆股比例

潜在策略, 结合股息预期和实际公告进行事件分析。¶

  • 超预期因子是否对中期股价(3month)有预测性?
  • 如果实施与公告不符,是否长期股价(6month)有预测性?
# imports
import time
from datetime import datetime, timedelta
import jqdata as jq
import numpy as np
import pandas as pd
import math
from statsmodels import regression
import statsmodels.api as sm
import matplotlib.pyplot as plt
from jqfactor import get_factor_values
import datetime
#from jqlib.technical_analysis import *
from scipy import stats

logger = logging.getLogger()
logger.setLevel("INFO")

import matplotlib.pyplot as plt
%matplotlib inline
today = pd.to_datetime('2019-04-12').date()
date0 = '1990-01-01'
sec1= '000001.XSHE'
date_valid = '2005-03-04'
df_sec = get_all_securities()
df_sec.loc[sec1]
display_name          平安银行
name                  PAYH
start_date      1991-04-03
end_date        2200-01-01
type                 stock
Name: 000001.XSHE, dtype: object
#取得复权数据
list_field = ['open', 'close', 'low', 'high', 'volume', 'money', 'factor', 'high_limit','low_limit', 'avg', 'pre_close', 'paused']
df_px_pre = get_price(sec1,start_date=date0, end_date=today, fields=list_field, fq='pre')
df_px_none = get_price(sec1,start_date=date0, end_date=today, fields=list_field, fq=None)
df_px_post = get_price(sec1,start_date=date0, end_date=today, fields=list_field, fq='post')

举例¶

在20005-03-04, jqdata收盘价=1.47, google收盘价=2.05

# 计算前复权因子
print(date_valid)
px_none = df_px_none.close.loc[date_valid]
px_pre = df_px_pre.close.loc[date_valid]
adj_valid = px_pre/px_none
print('jqdata 前复权='+str(adj_valid))
px_google = 2.05
adj_google = px_google/px_none
print('google 前复权='+str(adj_google))
2005-03-04
jqdata 前复权=0.2311320754716981
google 前复权=0.32232704402515716
# 数据来源作图
df_px_pre.close.plot(title=sec1+'_复权=pre')
px_valid = df_px_pre.loc[date_valid].close
print("{t} clsoe price={x}".format(t=date_valid, x=px_valid))
2005-03-04 clsoe price=1.47

SHE000001.jpg

# jqdata 前复权因子作图
df_px_pre.factor.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f6dce8e5e10>

验证思路¶

  • 提取所有分红拆股公告
  • 将分红和拆股合并起来验证jqdata
  • 只计算拆股,验证google
#jqdata 前复权factor 变化日期
df_adj = df_px_pre.factor.pct_change().replace(0, np.nan).dropna()
df_adj
2007-06-20    0.100286
2008-10-31    0.303997
2012-10-19    0.007445
2013-06-20    0.615207
2014-06-12    0.216487
2015-04-13    0.210518
2016-06-16    0.217843
2017-07-21    0.014501
2018-07-12    0.015313
Name: factor, dtype: float64
# 读取事件
q=jq.query(jq.finance.STK_XR_XD).filter(jq.finance.STK_XR_XD.code==sec1)
df_div_split_data  = jq.finance.run_query(q)
#combine a_xr_date and dividend_arrival_date
df_xr_temp = df_div_split_data['a_xr_date']
df_xr_temp.update(df_div_split_data['dividend_arrival_date'])
df_div_split_data['a_xr_date']  = df_xr_temp
list_field_basic = ['code', 'board_plan_pub_date', 'board_plan_bonusnote', 'implementation_bonusnote']
list_field_div =  [ 'a_bonus_date','bonus_ratio_rmb']
list_field_split =  [ 'a_xr_date','dividend_ratio', 'transfer_ratio', 'dividend_arrival_date']
df_div_data = df_div_split_data[list_field_basic + list_field_div]
df_split_data = df_div_split_data[list_field_basic + list_field_split]
print('股息事件')
df_div = df_div_data[df_div_data.a_bonus_date.notnull()].set_index('a_bonus_date')
df_div = df_div.join(df_px_none[['close']])
df_div = df_div.join(df_px_none['close'].shift(-1).to_frame('close_preday'))
#df_div['div_yield'] = (df_div['bonus_ratio_rmb']/(df_div['bonus_ratio_rmb']+df_div['close']*10.))
df_div['div_yield'] = (df_div['bonus_ratio_rmb']/(df_div['close']*10.))
df_div.sort_index(ascending=True)
股息事件
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
code board_plan_pub_date board_plan_bonusnote implementation_bonusnote bonus_ratio_rmb close close_preday div_yield
a_bonus_date
1993-05-24 000001.XSHE None None 10送3.5转增5股派3元 3.000 NaN NaN NaN
1994-07-14 000001.XSHE 1994-05-14 10送3转增2股派5元 10送3转增2股派5元 5.000 NaN NaN NaN
1997-08-29 000001.XSHE 1997-04-21 10送5股派2.00元(含税) 10送5股派2.00元(含税) 2.000 NaN NaN NaN
1999-10-22 000001.XSHE 1999-07-17 10派6元(含税) 10派6元(含税) 6.000 NaN NaN NaN
2002-07-23 000001.XSHE 2002-04-19 10派1.5元(含税) 10派1.5元(含税) 1.500 NaN NaN NaN
2003-09-29 000001.XSHE 2003-04-24 10股派1.5元(含税) 10股派1.5元(含税) 1.500 NaN NaN NaN
2007-06-20 000001.XSHE 2007-05-24 10送1股派0.09元(含税) 10送1股派0.09元(含税) 0.090 31.19 34.31 0.000289
2008-10-31 000001.XSHE 2008-08-21 10送3股派0.335元(含税) 10送3股派0.335元(含税) 0.335 8.37 8.40 0.004002
2012-10-19 000001.XSHE 2012-08-16 10派1元(含税) 10派1元(含税) 1.000 13.43 13.47 0.007446
2013-06-20 000001.XSHE 2013-03-08 10送6股派1.7元(含税) 10送6股派1.7元(含税) 1.700 11.18 11.28 0.015206
2014-06-12 000001.XSHE 2014-03-07 10转增2股派1.6元(含税) 10转增2股派1.6元(含税) 1.600 9.71 10.12 0.016478
2015-04-13 000001.XSHE 2015-03-13 10转增2股派1.74元(含税) 10转增2股派1.74元(含税) 1.740 16.54 16.30 0.010520
2016-06-16 000001.XSHE 2016-03-10 10转增2股派1.53元(含税) 10转增2股派1.53元(含税) 1.530 8.57 8.58 0.017853
2017-07-21 000001.XSHE 2017-03-17 10派1.58元(含税) 10派1.58元(含税) 1.580 10.89 10.95 0.014509
2018-07-12 000001.XSHE 2018-03-15 10派1.36元(含税) 10派1.36元(含税) 1.360 8.88 8.88 0.015315
print('拆股事件')
df_split = df_split_data[df_split_data.a_xr_date.notnull()].set_index('a_xr_date')
df_split = df_split.join(df_px_none[['close']])
df_split['dividend_ratio'] = df_split['dividend_ratio'].fillna(0)
df_split['transfer_ratio'] = df_split['transfer_ratio'].fillna(0)
#df_div['div_yield'] = (df_div['bonus_ratio_rmb']/(df_div['bonus_ratio_rmb']+df_div['close']*10.))
df_split['split_ratio'] = (df_split['dividend_ratio']+df_split['transfer_ratio'])/10.
df_split.drop('dividend_arrival_date', axis=1).sort_index(ascending=True)
拆股事件
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
code board_plan_pub_date board_plan_bonusnote implementation_bonusnote dividend_ratio transfer_ratio close split_ratio
a_xr_date
1991-05-02 000001.XSHE None None 10送4股派3元 4.0 0.0 NaN 0.40
1992-03-23 000001.XSHE None None 10派2元送5股 5.0 0.0 NaN 0.50
1993-05-24 000001.XSHE None None 10送3.5转增5股派3元 3.5 5.0 NaN 0.85
1994-07-11 000001.XSHE 1994-05-14 10送3转增2股派5元 10送3转增2股派5元 3.0 2.0 NaN 0.50
1995-09-25 000001.XSHE 1995-03-10 10送2股派3.00元 10送2股派3.00元 2.0 0.0 NaN 0.20
1996-05-27 000001.XSHE 1996-03-14 10送5转增5股 10送5转增5股 5.0 5.0 NaN 1.00
1997-08-25 000001.XSHE 1997-04-21 10送5股派2.00元(含税) 10送5股派2.00元(含税) 5.0 0.0 NaN 0.50
1999-10-18 000001.XSHE 1999-07-17 10派6元(含税) 10派6元(含税) 0.0 0.0 NaN 0.00
2002-07-23 000001.XSHE 2002-04-19 10派1.5元(含税) 10派1.5元(含税) 0.0 0.0 NaN 0.00
2003-09-29 000001.XSHE 2003-04-24 10股派1.5元(含税) 10股派1.5元(含税) 0.0 0.0 NaN 0.00
2007-06-20 000001.XSHE 2007-05-24 10送1股派0.09元(含税) 10送1股派0.09元(含税) 1.0 0.0 31.19 0.10
2008-10-31 000001.XSHE 2008-08-21 10送3股派0.335元(含税) 10送3股派0.335元(含税) 3.0 0.0 8.37 0.30
2012-10-19 000001.XSHE 2012-08-16 10派1元(含税) 10派1元(含税) 0.0 0.0 13.43 0.00
2013-06-20 000001.XSHE 2013-03-08 10送6股派1.7元(含税) 10送6股派1.7元(含税) 6.0 0.0 11.18 0.60
2014-06-12 000001.XSHE 2014-03-07 10转增2股派1.6元(含税) 10转增2股派1.6元(含税) 0.0 2.0 9.71 0.20
2015-04-13 000001.XSHE 2015-03-13 10转增2股派1.74元(含税) 10转增2股派1.74元(含税) 0.0 2.0 16.54 0.20
2016-06-16 000001.XSHE 2016-03-10 10转增2股派1.53元(含税) 10转增2股派1.53元(含税) 0.0 2.0 8.57 0.20
2017-07-21 000001.XSHE 2017-03-17 10派1.58元(含税) 10派1.58元(含税) 0.0 0.0 10.89 0.00
2018-07-12 000001.XSHE 2018-03-15 10派1.36元(含税) 10派1.36元(含税) 0.0 0.0 8.88 0.00
# div_yield + split_ratio
df_adj_data = pd.concat([df_div[['div_yield']], df_split[['split_ratio']]], axis=1).fillna(0).sort_index(ascending=True)
df_adj_valid = df_adj_data
df_adj_valid['adj_ratio'] = df_adj_data['div_yield'] + df_adj_data['split_ratio']
#df_adj_valid = df_adj_valid['2005-01-01':]
df_adj_valid['factor'] = (df_adj_valid.adj_ratio+1).cumprod()
df_adj_valid['factor'] = df_adj_valid['factor']/df_adj_valid['factor'].iloc[-1]
print(date_valid+'使用factor为0.2318, 验证jqdata')
print(df_adj_data[:date_valid].iloc[-1].factor)
df_adj_data['2000-01-01':]
2005-03-04使用factor为0.2318, 验证jqdata
0.2318663063660144
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
div_yield split_ratio adj_ratio factor
2002-07-23 0.000000 0.0 0.000000 0.231866
2003-09-29 0.000000 0.0 0.000000 0.231866
2007-06-20 0.000289 0.1 0.100289 0.255120
2008-10-31 0.004002 0.3 0.304002 0.332677
2012-10-19 0.007446 0.0 0.007446 0.335154
2013-06-20 0.015206 0.6 0.615206 0.541343
2014-06-12 0.016478 0.2 0.216478 0.658531
2015-04-13 0.010520 0.2 0.210520 0.797165
2016-06-16 0.017853 0.2 0.217853 0.970830
2017-07-21 0.014509 0.0 0.014509 0.984916
2018-07-12 0.015315 0.0 0.015315 1.000000
# split_ratio only
df_adj_data = pd.concat([df_div[['div_yield']], df_split[['split_ratio']]], axis=1).fillna(0).sort_index(ascending=True)
df_adj_google = df_adj_data
df_adj_google['adj_ratio'] =  df_adj_google['split_ratio']
#df_adj_valid = df_adj_valid['2005-01-01':]
df_adj_google['factor'] = (df_adj_google.adj_ratio+1).cumprod()
df_adj_google['factor'] = df_adj_google['factor']/df_adj_google['factor'].iloc[-1]
print(date_valid+'若不考虑股息排放,factor为0.25292')
print(df_adj_google[:date_valid].iloc[-1].factor)
print('与google factor(0.32)相差40%,考虑应为google忽略了某些事件,数据错误')
df_adj_google['2000-01-01':]
2005-03-04若不考虑股息排放,factor为0.25292
0.25292994042994044
与google factor(0.32)相差40%,考虑应为google忽略了某些事件,数据错误
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
div_yield split_ratio adj_ratio factor
2002-07-23 0.000000 0.0 0.0 0.252930
2003-09-29 0.000000 0.0 0.0 0.252930
2007-06-20 0.000289 0.1 0.1 0.278223
2008-10-31 0.004002 0.3 0.3 0.361690
2012-10-19 0.007446 0.0 0.0 0.361690
2013-06-20 0.015206 0.6 0.6 0.578704
2014-06-12 0.016478 0.2 0.2 0.694444
2015-04-13 0.010520 0.2 0.2 0.833333
2016-06-16 0.017853 0.2 0.2 1.000000
2017-07-21 0.014509 0.0 0.0 1.000000
2018-07-12 0.015315 0.0 0.0 1.000000
 

全部回复

0/140

量化课程

    移动端课程