请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  量化平台 帖子:3364677 新帖:0

jqdata本地化及开源项目zvt

蜡笔小新炒外汇发表于:5 月 13 日 16:37回复(1)

数据是最宝贵的资源,首先感谢joinquant团队提供的优质数据服务.    


这篇帖子主要介绍项目数据处理方面的思路及如何使用jqdata数据来做扩展.

获取数据,目前无非两种方式,爬取和购买.

而无论哪种方式,在本地化方面都需要做不少工作,这个工作的好坏从某种程度上决定了后面整个分析和交易系统的成败.

根据自己的经验,本地化在数据结构上至少要做到:

  •  1.数据schema的稳定性  
    整个金融市场的属性是稳定的,你需要关注的东西是稳定的,比如k线,比如财务报表,比如资金流.

  •  2.schema字段的确定性  
    举个栗子,资产负债表里面无形资产字段,joinquant为intangible_assets,eastmoney为Intangibleasset,如果你直接依赖provider提供的字段来进行后续的计算,将有极大的风险,一家provider不可用,将使整个程序不可用。因此必须将相应的provider的数据转为自己定义的格式。

  • 3.多provider支持  
    前面的两个稳定性决定了多provider支持的可行,并为数据的完整性提供了保障,当某家provider数据不能满足需求时,可以多provider互补。当然,还可以利用多provider对数据进行交叉验证。

将各provider提供(或者自己爬取)的数据变成符合data schema的数据需要做好以下几点:

  • 1.初始化要抓取的标的
    可抓取单标的来调试,然后抓取全量标的

  • 2.能够从上次抓取的地方接着抓
    减少不必要的请求,增量抓取

  • 3.封装常用的请求方式
    对时间序列数据的请求,无非start,end,size,time list的组合

  • 4.能够自动去重

  • 5.能够设置抓取速率

  • 6.提供抓取完成的回调函数
    方便数据校验和多provider数据补全

流程图如下:
Img

下面介绍zvt项目的数据结构及如何添加jqdata数据支持

1. 数据结构

相关概念及关系

1.1 provider

代表数据提供商,比如joinquant,eastmoney,sina,netease

1.2 store category

数据的逻辑分类,物理上代表一个db,其下面一般有多个data schema,schema间可能有关系

1.3 data schema

数据的结构描述,物理上代表一个table

逻辑视图

Img

物理视图

Img

一般来说,data schema是稳定的,有些数据需要多个provider来一起生成,这时也认为数据只属于某个provider;某类数据有多个provider时,可以相互验证,api上只需要指定相应的provider即可

2. 如何添加provider支持

下面以joinquant来举个栗子

2.1 添加joinquant provider

代码

class Provider(enum.Enum):EASTMONEY = 'eastmoney'SINA = 'sina'NETEASE = 'netease'EXCHANGE = 'exchange'JOINQUANT = 'joinquant'

2.2 添加store category

代码
目前定义的类别,需要扩展的在其基础上添加

class StoreCategory(enum.Enum):meta = 'meta'#个股日线行情数据stock_day_kdata = 'stock_day_kdata'index_day_kdata = 'index_day_kdata'finance = 'finance'dividend_financing = 'dividend_financing'holder = 'holder'trading = 'trading'money_flow = 'money_flow'macro = 'macro'business = 'business'category_map_db = {
    StoreCategory.meta: MetaBase,#个股日线行情数据StoreCategory.stock_day_kdata: StockDayKdataBase,
    StoreCategory.index_day_kdata: IndexDayKdataBase,
    StoreCategory.finance: FinanceBase,
    StoreCategory.dividend_financing: DividendFinancingBase,
    StoreCategory.holder: HolderBase,
    StoreCategory.trading: TradingBase,
    StoreCategory.money_flow: MoneyFlowBase,
    StoreCategory.macro: MacroBase,
    StoreCategory.business: BusinessBase,
}

2.3 添加data schema

代码
个股日线行情数据结构

class StockDayKdata(StockDayKdataBase):__tablename__ = 'stock_day_kdata'id = Column(String(length=128), primary_key=True)provider = Column(String(length=32))timestamp = Column(DateTime)security_id = Column(String(length=128))code = Column(String(length=32))name = Column(String(length=32))    # level = Column(Enum(TradingLevel, values_callable=enum_value))level = Column(String(length=32))open = Column(Float)hfq_open = Column(Float)qfq_open = Column(Float)close = Column(Float)hfq_close = Column(Float)qfq_close = Column(Float)high = Column(Float)hfq_high = Column(Float)qfq_high = Column(Float)low = Column(Float)hfq_low = Column(Float)qfq_low = Column(Float)volume = Column(Float)turnover = Column(Float)change_pct = Column(Float)turnover_rate = Column(Float)factor = Column(Float)

2.4 关联provider和相应的store category

代码
支持其他数据时,在此扩展

provider_map_category = {
    Provider.JOINQUANT:[StoreCategory.stock_day_kdata],
    Provider.JOINQUANT.value:[StoreCategory.stock_day_kdata]
}

2.5 实现相应的recorder

代码
核心代码

#将聚宽数据转换为标准zvt数据class MyApiWrapper(ApiWrapper):def request(self, url=None, method='get', param=None, path_fields=None):
        security_item = param['security_item']
        start_timestamp = param['start_timestamp']# 不复权df = get_price(to_jq_security_id(security_item), start_date=to_time_str(start_timestamp),
                       end_date=now_time_str(),
                       frequency='daily',
                       fields=['open', 'close', 'low', 'high', 'volume', 'money'],
                       skip_paused=True, fq=None)
        df.index.name = 'timestamp'df.reset_index(inplace=True)
        df['name'] = security_item.name
        df.rename(columns={'money': 'turnover'}, inplace=True)

        df['timestamp'] = pd.to_datetime(df['timestamp'])
        df['provider'] = Provider.JOINQUANT.value
        df['level'] = param['level']return df.to_dict(orient='records')#补全复权数据def on_finish(self, security_item):
    kdatas = get_kdata(security_id=security_item.id, level=self.level.value, order=StockDayKdata.timestamp.asc(),
                       return_type='domain',
                       session=self.session,
                       filters=[StockDayKdata.hfq_close.is_(None),
                                StockDayKdata.timestamp >= to_pd_timestamp('2005-01-01')])if kdatas:start = kdatas[0].timestampend = kdatas[-1].timestamp# get hfq from joinquantdf = get_price(to_jq_security_id(security_item), start_date=to_time_str(start), end_date=now_time_str(),
                       frequency='daily',
                       fields=['factor', 'open', 'close', 'low', 'high'],
                       skip_paused=True, fq='post')if df is not None and not df.empty:# fill hfq datafor kdata in kdatas:if kdata.timestamp in df.index:kdata.hfq_open = df.loc[kdata.timestamp, 'open']
                    kdata.hfq_close = df.loc[kdata.timestamp, 'close']
                    kdata.hfq_high = df.loc[kdata.timestamp, 'high']
                    kdata.hfq_low = df.loc[kdata.timestamp, 'low']
                    kdata.factor = df.loc[kdata.timestamp, 'factor']self.session.commit()

            latest_factor = df.factor[-1]# factor not change yet, no need to reset the qfq pastif latest_factor == self.current_factors.get(security_item.id):
                sql = 'UPDATE stock_day_kdata SET qfq_close=hfq_close/{},qfq_high=hfq_high/{}, qfq_open= hfq_open/{}, qfq_low= hfq_low/{} where ' \                      'security_id=\'{}\' and level=\'{}\' and (qfq_close isnull or qfq_high isnull or qfq_low isnull or qfq_open isnull)'.format(
                    latest_factor, latest_factor, latest_factor, latest_factor, security_item.id, self.level.value)else:sql = 'UPDATE stock_day_kdata SET qfq_close=hfq_close/{},qfq_high=hfq_high/{}, qfq_open= hfq_open/{}, qfq_low= hfq_low/{} where ' \                      'security_id=\'{}\' and level=\'{}\''.format(latest_factor,
                                                                   latest_factor,
                                                                   latest_factor,
                                                                   latest_factor,
                                                                   security_item.id,                                                                   self.level.value)self.logger.info(sql)self.session.execute(sql)self.session.commit()# TODO:use netease provider to get turnover_rateself.logger.info('use netease provider to get turnover_rate')

这里留了一个练习,由于聚宽的数据没有提供换手率和当日涨跌幅,可以通过其他数据源补全或者自己计算的方式来完成.

网易的数据没有复权信息,通过聚宽的factor来补全,同理,可以用网易的换手率,涨跌幅数据来补全聚宽数据.
参考代码

2.6 运行recorder

在settings设置自己的jqdata账户和密码

jqdata目前免费使用一年,注册地址如下
https://www.joinquant.com/default/index/sdk?f=home&m=banner

if __name__ == '__main__':
    init_process_log('jq_china_stock_day_kdata.log')
    ChinaStockDayKdataRecorder(level=TradingLevel.LEVEL_1DAY, codes=['300027']).run()

这里codes填写需要抓取的标的,如果不设置codes就是全市场抓取。

3. 获得的能力

添加一种数据源后,天然就获得相应的api,factor,selector和trader的能力,这里展示使用聚宽的数据的能力

In [1]: from zvt.api.technical import * 
In [2]: from zvt.api.domain import * 
In [3]: df1=get_kdata(security_id='stock_sz_300027', provider='joinquant',start_timestamp='2019-01-01',limit=10)
In [4]: df1                                                                     
                           id   provider  timestamp      security_id    code  name level  open  hfq_open  qfq_open  close  hfq_close  qfq_close  high  hfq_high  qfq_high   low  hfq_low   qfq_low      volume      turnover change_pct turnover_rate  factor0  stock_sz_300027_2019-01-02  joinquant 2019-01-02  stock_sz_300027  300027  华谊兄弟    1d  4.54     68.58  4.539918   4.40      66.47   4.400238  4.58     69.19  4.580299  4.35    65.71  4.349927  29554330.0  1.306117e 08       None          None  15.1061  stock_sz_300027_2019-01-03  joinquant 2019-01-03  stock_sz_300027  300027  华谊兄弟    1d  4.40     66.47  4.400238   4.42      66.77   4.420098  4.45     67.22  4.449887  4.36    65.86  4.359857  15981569.0  7.052363e 07       None          None  15.1062  stock_sz_300027_2019-01-04  joinquant 2019-01-04  stock_sz_300027  300027  华谊兄弟    1d  4.36     65.86  4.359857   4.52      68.28   4.520058  4.54     68.58  4.539918  4.33    65.41  4.330068  17103081.0  7.657399e 07       None          None  15.1063  stock_sz_300027_2019-01-07  joinquant 2019-01-07  stock_sz_300027  300027  华谊兄弟    1d  4.54     68.58  4.539918   4.59      69.34   4.590229  4.63     69.94  4.629948  4.48    67.67  4.479677  16163938.0  7.383168e 07       None          None  15.1064  stock_sz_300027_2019-01-08  joinquant 2019-01-08  stock_sz_300027  300027  华谊兄弟    1d  4.59     69.34  4.590229   4.60      69.49   4.600159  4.66     70.39  4.659738  4.56    68.88  4.559778  10908603.0  5.034655e 07       None          None  15.1065  stock_sz_300027_2019-01-09  joinquant 2019-01-09  stock_sz_300027  300027  华谊兄弟    1d  4.63     69.94  4.629948   4.58      69.19   4.580299  4.73     71.45  4.729909  4.58    69.19  4.580299  16901976.0  7.881876e 07       None          None  15.1066  stock_sz_300027_2019-01-10  joinquant 2019-01-10  stock_sz_300027  300027  华谊兄弟    1d  4.63     69.94  4.629948   4.61      69.64   4.610089  4.76     71.90  4.759698  4.59    69.34  4.590229  20855469.0  9.717176e 07       None          None  15.1067  stock_sz_300027_2019-01-11  joinquant 2019-01-11  stock_sz_300027  300027  华谊兄弟    1d  4.60     69.49  4.600159   4.67      70.55   4.670330  4.67     70.55  4.670330  4.56    68.88  4.559778  13216260.0  6.089670e 07       None          None  15.1068  stock_sz_300027_2019-01-14  joinquant 2019-01-14  stock_sz_300027  300027  华谊兄弟    1d  4.63     69.94  4.629948   4.57      69.03   4.569707  4.65     70.24  4.649808  4.55    68.73  4.549848  12421993.0  5.705187e 07       None          None  15.1069  stock_sz_300027_2019-01-15  joinquant 2019-01-15  stock_sz_300027  300027  华谊兄弟    1d  4.56     68.88  4.559778   4.64      70.09   4.639878  4.66     70.39  4.659738  4.54    68.58  4.539918  14403671.0  6.637258e 07       None          None  15.106#跟网易的数据比较In [24]: df2=get_kdata(security_id='stock_sz_300027', provider='netease',start_timestamp='2019-01-01',limit=10)                                                                          

In [25]: df2                                                                                                                                                                             
Out[25]: 
                           id provider  timestamp      security_id    code  name level  open  hfq_open  qfq_open  close  hfq_close  qfq_close  high  hfq_high  qfq_high   low  hfq_low   qfq_low      volume      turnover  change_pct  turnover_rate  factor0  stock_sz_300027_2019-01-02  netease 2019-01-02  stock_sz_300027  300027  华谊兄弟    1d  4.54     68.58  4.539918   4.40      66.47   4.400238  4.58     69.19  4.580299  4.35    65.71  4.349927  29554330.0  1.306117e 08     -6.1834         1.0652  15.1061  stock_sz_300027_2019-01-03  netease 2019-01-03  stock_sz_300027  300027  华谊兄弟    1d  4.40     66.47  4.400238   4.42      66.77   4.420098  4.45     67.22  4.449887  4.36    65.86  4.359857  15981569.0  7.052363e 07      0.4545         0.5760  15.1062  stock_sz_300027_2019-01-04  netease 2019-01-04  stock_sz_300027  300027  华谊兄弟    1d  4.36     65.86  4.359857   4.52      68.28   4.520058  4.54     68.58  4.539918  4.33    65.41  4.330068  17103081.0  7.657399e 07      2.2624         0.6164  15.1063  stock_sz_300027_2019-01-07  netease 2019-01-07  stock_sz_300027  300027  华谊兄弟    1d  4.54     68.58  4.539918   4.59      69.34   4.590229  4.63     69.94  4.629948  4.48    67.67  4.479677  16163938.0  7.383168e 07      1.5487         0.5826  15.1064  stock_sz_300027_2019-01-08  netease 2019-01-08  stock_sz_300027  300027  华谊兄弟    1d  4.59     69.34  4.590229   4.60      69.49   4.600159  4.66     70.39  4.659738  4.56    68.88  4.559778  10908603.0  5.034655e 07      0.2179         0.3932  15.1065  stock_sz_300027_2019-01-09  netease 2019-01-09  stock_sz_300027  300027  华谊兄弟    1d  4.63     69.94  4.629948   4.58      69.19   4.580299  4.73     71.45  4.729909  4.58    69.19  4.580299  16901976.0  7.881876e 07     -0.4348         0.6092  15.1066  stock_sz_300027_2019-01-10  netease 2019-01-10  stock_sz_300027  300027  华谊兄弟    1d  4.63     69.94  4.629948   4.61      69.64   4.610089  4.76     71.90  4.759698  4.59    69.34  4.590229  20855469.0  9.717176e 07      0.6550         0.7517  15.1067  stock_sz_300027_2019-01-11  netease 2019-01-11  stock_sz_300027  300027  华谊兄弟    1d  4.60     69.49  4.600159   4.67      70.55   4.670330  4.67     70.55  4.670330  4.56    68.88  4.559778  13216260.0  6.089670e 07      1.3015         0.4763  15.1068  stock_sz_300027_2019-01-14  netease 2019-01-14  stock_sz_300027  300027  华谊兄弟    1d  4.63     69.94  4.629948   4.57      69.03   4.569707  4.65     70.24  4.649808  4.55    68.73  4.549848  12421993.0  5.705187e 07     -2.1413         0.4477  15.1069  stock_sz_300027_2019-01-15  netease 2019-01-15  stock_sz_300027  300027  华谊兄弟    1d  4.56     68.88  4.559778   4.64      70.09   4.639878  4.66     70.39  4.659738  4.54    68.58  4.539918  14403671.0  6.637258e 07      1.5317         0.5191  15.106

比较两家数据

In [26]: df1.loc[:,['open','close','high','low','volume']]-df2.loc[:,['open','close','high','low','volume']]                                                                             
Out[26]: 
   open  close  high  low  volume0   0.0    0.0   0.0  0.0     0.01   0.0    0.0   0.0  0.0     0.02   0.0    0.0   0.0  0.0     0.03   0.0    0.0   0.0  0.0     0.04   0.0    0.0   0.0  0.0     0.05   0.0    0.0   0.0  0.0     0.06   0.0    0.0   0.0  0.0     0.07   0.0    0.0   0.0  0.0     0.08   0.0    0.0   0.0  0.0     0.09   0.0    0.0   0.0  0.0     0.0

嗯,两家的数据是一致的,数据的准确性得到进一步的确认.

全部回复

0/140

量化课程

    移动端课程