大家刚接触平台可能对于一些数据的提取方式不甚明了,尤其是对query对象的使用。可能将简单问题复杂化,以下整理了大家使用过程中一般能用到的操作方式以及一些常用的数据获取方式。如果大家对于query对象的使用或者某些数据的提取存在疑问,可以在此贴下留言讨论。
涉及到使用数据库操作的数据有:
get_fundamentals (股票单季度财务数据)
finance (股票数据,基金数据等)
opt (期权数据)
macro (宏观数据)
jy (聚源数据)
bond (债券数据)
基本的查询方式
query() 填写需要查询的对象,可以是整张表,也可以是表中的多个字段或计算出的结果
filter 填写过滤条件,多个过滤条件可以用逗号隔开,或者用and,or这样的语法
order_by 填写排序条件
.desc() 降序排列
.asc() 升序排列
limit 限制返回的个数
order_by 分组统计
获取截至目前 前N季度的单季度财务指标
可以通过因子分析模块取到数据如何理解dependencies中的财务因子
相关代码参考TTM计算方法(推荐直接在factor类中计算合并获取到的数据)
计算/获取TTM数据(示例代码放在研究后半部分)
各种指标算法多式多样,我们提供的财务数据中最主要的还是单季度和报告期的财务指标,具体算法以及含义在文档中有详细说明研究与回测中计算因子
对于数据的查询过滤,一般通过filter函数进行实现,filter可以对多个条件进行过滤,中间用,
隔开。
比如查询某只股票某一个时间段,满足特定值的数据,可以用到 in_ ,and_,> , < , == ,!=
等符号进行基本的查询
涉及到使用数据库操作的数据有
get_fundamentals (股票单季度财务数据)
finance (股票数据,基金数据等)
opt (期权数据)
macro (宏观数据)
jy (聚源数据)
基本的查询方式¶
query() 填写需要查询的对象,可以是整张表,也可以是表中的多个字段或计算出的结果
filter 填写过滤条件,多个过滤条件可以用逗号隔开,或者用and,or这样的语法
order_by 填写排序条件
.desc() 降序排列
.asc() 升序排列
limit 限制返回的个数
order_by 分组统计
# 查询 000001 2015 - 2017 年的整张合并利润表,且满足 net_profit(净利润) < 0 的条件from jqdata import financeq = query(finance.STK_INCOME_STATEMENT).filter(finance.STK_INCOME_STATEMENT.code=='000783.XSHE', #选定股票 000783.XSHEfinance.STK_INCOME_STATEMENT.end_date > '2005-01-01', #指定查询时间段大于2005年1月1日finance.STK_INCOME_STATEMENT.end_date < '2018-01-01', #指定查询时间段小于2018年1月1日finance.STK_INCOME_STATEMENT.net_profit <0, #指定查询到的数据中net_profit为负finance.STK_INCOME_STATEMENT.report_type == 0, #指定报告期类型为本期).order_by(finance.STK_INCOME_STATEMENT.end_date.desc() ).limit(5) #根据end_date降序排序,并返回前5条数据 finance.run_query(q)
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
id | company_id | company_name | code | a_code | b_code | h_code | pub_date | start_date | end_date | ... | commission_income | interest_expense | commission_expense | refunded_premiums | net_pay_insurance_claims | withdraw_insurance_contract_reserve | policy_dividend_payout | reinsurance_cost | non_current_asset_disposed | other_earnings | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 257078 | 430000783 | 长江证券股份有限公司 | 000783.XSHE | 000783 | None | None | 2007-10-22 | 2007-01-01 | 2007-09-30 | ... | None | None | None | None | None | None | None | None | None | None |
1 | 257070 | 430000783 | 长江证券股份有限公司 | 000783.XSHE | 000783 | None | None | 2006-10-25 | 2006-01-01 | 2006-09-30 | ... | None | None | None | None | None | None | None | None | None | None |
2 | 257068 | 430000783 | 长江证券股份有限公司 | 000783.XSHE | 000783 | None | None | 2006-08-17 | 2006-01-01 | 2006-06-30 | ... | None | None | None | None | None | None | None | None | None | None |
3 | 257066 | 430000783 | 长江证券股份有限公司 | 000783.XSHE | 000783 | None | None | 2006-04-29 | 2006-01-01 | 2006-03-31 | ... | None | None | None | None | None | None | None | None | None | None |
4 | 257064 | 430000783 | 长江证券股份有限公司 | 000783.XSHE | 000783 | None | None | 2006-04-29 | 2005-01-01 | 2005-12-31 | ... | None | None | None | None | None | None | None | None | None | None |
5 rows × 60 columns
get_fundamentals 获取某个/多个季度/年度的数据(指定参数statDate)¶
# 查询平安银行2014年3-6月份的单季度报表q = query( income.statDate, income.code, income.basic_eps, cash_flow.goods_sale_and_service_render_cash ).filter( income.code == '000001.XSHE', )ret = get_fundamentals(q, statDate='2014q2')ret
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
statDate | code | basic_eps | goods_sale_and_service_render_cash | |
---|---|---|---|---|
0 | 2014-06-30 | 000001.XSHE | 0.35 | NaN |
# 查询平安银行2014年的年报q = query( income.statDate, income.code, income.basic_eps, cash_flow.goods_sale_and_service_render_cash ).filter( income.code == '000001.XSHE', )ret = get_fundamentals(q, statDate='2014')ret
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
statDate | code | basic_eps | goods_sale_and_service_render_cash | |
---|---|---|---|---|
0 | 2014-12-31 | 000001.XSHE | 1.73 | NaN |
# 查询平安银行2014年四个季度的季报, 放到数组中并拼接为dataframeq = query( income.statDate, income.code, income.basic_eps, balance.cash_equivalents, cash_flow.goods_sale_and_service_render_cash ).filter( income.code == '000001.XSHE', )rets = [get_fundamentals(q, statDate='2014q'+str(i)) for i in range(1, 5)]import pandas as pdpd.concat(rets)
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
statDate | code | basic_eps | cash_equivalents | goods_sale_and_service_render_cash | |
---|---|---|---|---|---|
0 | 2014-03-31 | 000001.XSHE | 0.53 | 2.581100e+11 | NaN |
0 | 2014-06-30 | 000001.XSHE | 0.35 | 2.596040e+11 | NaN |
0 | 2014-09-30 | 000001.XSHE | 0.49 | 2.773250e+11 | NaN |
0 | 2014-12-31 | 000001.XSHE | 0.36 | 3.062980e+11 | NaN |
in_ 判断某个字段的值是否在列表之中(一般用于查询多个标的)¶
stocks = ['000001.XSHE','600741.XSHG','600507.XSHG']q = query(balance.code,balance.pubDate,balance.statDate, # 指定返回的字段只包括code,pubDate,statDate,total_assets及total_sheet_owner_equities balance.total_assets,balance.total_sheet_owner_equities).filter( balance.code.in_(stocks) #指定查询到的数据只包括code在 stocks中的数据)get_fundamentals(q,date='2018-01-05') #查询单季度数据中在2018-01-05之前发布的数据,没有未来函数
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
code | pubDate | statDate | total_assets | total_sheet_owner_equities | |
---|---|---|---|---|---|
0 | 000001.XSHE | 2017-10-21 | 2017-09-30 | 3.137481e+12 | 3.137481e+12 |
1 | 600507.XSHG | 2017-10-24 | 2017-09-30 | 1.041599e+10 | 1.041599e+10 |
2 | 600741.XSHG | 2017-10-27 | 2017-09-30 | 1.153612e+11 | 1.153612e+11 |
distinct 去重,用于查看数据库中某个字段都存在哪些值¶
# 查看十大流通股东中都有哪些类别from jqdata import financeq = query( finance.STK_SHAREHOLDER_FLOATING_TOP10.shareholder_class_id.distinct(), #提取ID不同的数据finance.STK_SHAREHOLDER_FLOATING_TOP10.shareholder_class, ).order_by(finance.STK_SHAREHOLDER_FLOATING_TOP10.change_reason_id) df = finance.run_query(q)df.tail()
anon_1 | shareholder_class | |
---|---|---|
23 | 307012 | 银行 |
24 | 307024 | 银行和上市公司 |
25 | 307025 | 信托投资管理公司和上市公司 |
26 | 307024 | 上市公司和银行 |
27 | 307027 | 银行和QFII |
~ 反向查询符号¶
q = query( finance.STK_SHAREHOLDER_FLOATING_TOP10.shareholder_class_id.distinct(),finance.STK_SHAREHOLDER_FLOATING_TOP10.shareholder_class, ).filter(~finance.STK_SHAREHOLDER_FLOATING_TOP10.shareholder_class_id.in_(['307007','307001']),#等同于notin_)finance.run_query(q)
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
anon_1 | shareholder_class | |
---|---|---|
0 | 307002 | 券商 |
1 | 307003 | 证券投资基金 |
2 | 307004 | 社保基金 |
3 | 307005 | 企业年金 |
4 | 307006 | 保险公司 |
5 | 307008 | 基金管理公司 |
6 | 307009 | 信托投资管理公司 |
7 | 307011 | 上市公司 |
8 | 307012 | 银行 |
9 | 307013 | 风险投资 |
10 | 307014 | 保险投资组合 |
11 | 307015 | 券商资产管理计划 |
12 | 307016 | 信托资产管理计划 |
13 | 307017 | 基金资产管理计划 |
14 | 307018 | 资产管理公司资产管理计划 |
15 | 307019 | 国有资产经营公司 |
16 | 307020 | 期货资产管理计划 |
17 | 307021 | 地方国资委 |
18 | 307022 | 券商和上市公司 |
19 | 307023 | 保险公司和上市公司 |
20 | 307024 | 银行和上市公司 |
21 | 307024 | 上市公司和银行 |
22 | 307025 | 信托投资管理公司和上市公司 |
23 | 307027 | 银行和QFII |
24 | 307099 | 其他机构 |
运算和命名(label)¶
# label的作用是命名获得数据的标签,一般用于直接运算后的重命名q = query(indicator.code.label('股票代码'),indicator.operating_profit.label('get_operating_profit'), (income.total_operating_revenue - income.total_operating_cost).label('my_operating_profit') ).filter(indicator.code=='600507.XSHG')get_fundamentals(q)
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
股票代码 | get_operating_profit | my_operating_profit | |
---|---|---|---|
0 | 600507.XSHG | 1.329762e+09 | 1.329762e+09 |
or_ 或查询¶
from sqlalchemy.sql.expression import or_df = get_fundamentals(query( valuation.code,valuation.market_cap,valuation.pe_ratio ).filter( or_(valuation.market_cap < 10,valuation.pe_ratio > 1000))) # 查询当天总市值小于1000亿或pe大于10亿的所有股票df.tail()
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
code | market_cap | pe_ratio | |
---|---|---|---|
23 | 300336.XSHE | 32.8136 | 2772.8604 |
24 | 600127.XSHG | 30.1638 | 1195.8562 |
25 | 600462.XSHG | 8.3803 | -3.0123 |
26 | 601008.XSHG | 46.6999 | 3504.9221 |
27 | 603718.XSHG | 87.9060 | 1225.4053 |
contains/like/ilike 数据库中的字符串模糊匹配¶
% 百分号通配符: 表示任何字符出现任意次数(可以是0次).
_ 下划线通配符:表示只能匹配单个字符,不能多也不能少,就是一个字符.
from jqdata import finance# 获取000001.XSHG每一年的年报 df = finance.run_query(query(finance.FINANCE_INCOME_STATEMENT ).filter(finance.FINANCE_INCOME_STATEMENT.code.in_(['000001.XSHE']),# finance.FINANCE_INCOME_STATEMENT.end_date.like('%-12-31%'), #sqlalchemy的版本,数据库的构建等问题可能导致查询报错,可以换以下两种方法尝试# finance.FINANCE_INCOME_STATEMENT.end_date.contains('12-31') , finance.FINANCE_INCOME_STATEMENT.end_date.ilike('_____12-31'),finance.FINANCE_INCOME_STATEMENT.report_type==0 ))df.tail()
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
id | company_id | company_name | code | a_code | b_code | h_code | pub_date | start_date | end_date | report_date | report_type | source_id | source | operating_revenue | interest_net_revenue | interest_income | interest_expense | commission_net_income | commission_income | commission_expense | agent_security_income | sell_security_income | manage_income | premiums_earned | assurance_income | premiums_income | premiums_expense | prepare_money | investment_income | invest_income_associates | fair_value_variable_income | exchange_income | other_income | operation_expense | refunded_premiums | compensate_loss | compensation_back | insurance_reserve | insurance_reserve_back | policy_dividend_payout | reinsurance_cost | operating_tax_surcharges | commission_expense2 | operation_manage_fee | separate_fee | asset_impairment_loss | other_cost | operating_profit | subsidy_income | non_operating_revenue | non_operating_expense | other_items_influenced_profit | total_profit | income_tax_expense | other_influence_net_profit | net_profit | np_parent_company_owners | minority_profit | eps | basic_eps | diluted_eps | other_composite_income | total_composite_income | ci_parent_company_owners | ci_minority_owners | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
21 | 2543 | 430000001 | 平安银行股份有限公司 | 000001.XSHE | 000001 | None | None | 2015-03-13 | 2014-01-01 | 2014-12-31 | 2014-12-31 | 0 | 321003 | 定期报告 | 7.340700e+10 | 5.304600e+10 | 1.192020e+11 | 6.615600e+10 | 1.737800e+10 | 1.970600e+10 | 2.328000e+09 | None | NaN | None | None | None | None | None | None | 3.168000e+09 | 28000000.0 | -10000000.0 | -388000000.0 | 213000000.0 | 4.716100e+10 | NaN | None | None | None | None | None | None | 5.482000e+09 | None | 2.666800e+10 | None | 1.501100e+10 | 0.000000e+00 | 2.624600e+10 | None | 40000000.0 | 92000000.0 | NaN | 2.619400e+10 | 6.392000e+09 | None | 1.980200e+10 | 1.980200e+10 | NaN | None | 1.73 | 1.73 | 586000000.0 | 2.038800e+10 | NaN | NaN |
22 | 2551 | 430000001 | 平安银行股份有限公司 | 000001.XSHE | 000001 | None | None | 2016-03-10 | 2015-01-01 | 2015-12-31 | 2015-12-31 | 0 | 321003 | 定期报告 | 9.616300e+10 | 6.609900e+10 | 1.316490e+11 | 6.555000e+10 | 2.644500e+10 | 2.918500e+10 | 2.740000e+09 | None | NaN | None | None | None | None | None | None | 3.924000e+09 | 28000000.0 | 107000000.0 | -573000000.0 | 161000000.0 | 6.726800e+10 | NaN | None | None | None | None | None | None | 6.671000e+09 | None | 3.011200e+10 | None | 3.048500e+10 | 0.000000e+00 | 2.889500e+10 | None | 40000000.0 | 89000000.0 | NaN | 2.884600e+10 | 6.981000e+09 | None | 2.186500e+10 | 2.186500e+10 | NaN | None | 1.56 | 1.56 | NaN | 2.259900e+10 | NaN | NaN |
23 | 2559 | 430000001 | 平安银行股份有限公司 | 000001.XSHE | 000001 | None | None | 2017-03-17 | 2016-01-01 | 2016-12-31 | 2016-12-31 | 0 | 321003 | 定期报告 | 1.077150e+11 | 7.641100e+10 | 1.311190e+11 | 5.470800e+10 | 2.785900e+10 | 3.130900e+10 | 3.450000e+09 | None | NaN | None | None | None | None | None | None | 2.368000e+09 | -141000000.0 | 49000000.0 | 882000000.0 | 146000000.0 | 7.793600e+10 | NaN | None | None | None | None | None | None | 3.445000e+09 | None | 2.797300e+10 | None | 4.651800e+10 | 0.000000e+00 | 2.977900e+10 | None | 221000000.0 | 65000000.0 | NaN | 2.993500e+10 | 7.336000e+09 | None | 2.259900e+10 | 2.259900e+10 | NaN | None | 1.32 | 1.32 | 308000000.0 | 2.290700e+10 | 2.290700e+10 | NaN |
24 | 2564 | 430000001 | 平安银行股份有限公司 | 000001.XSHE | 000001 | None | None | 2018-03-15 | 2017-01-01 | 2017-12-31 | 2017-12-31 | 0 | 321003 | 定期报告 | 1.057860e+11 | 7.400900e+10 | 1.480680e+11 | 7.405900e+10 | 3.067400e+10 | 3.572500e+10 | 5.051000e+09 | None | NaN | None | None | None | None | None | None | 6.320000e+08 | NaN | -61000000.0 | 166000000.0 | 186000000.0 | 7.556300e+10 | NaN | None | None | None | None | None | None | 1.022000e+09 | None | 3.161600e+10 | None | 4.292500e+10 | NaN | 3.022300e+10 | None | 38000000.0 | 104000000.0 | NaN | 3.015700e+10 | 6.968000e+09 | None | 2.318900e+10 | 2.318900e+10 | NaN | None | 1.30 | 1.30 | 281000000.0 | 2.347000e+10 | 2.347000e+10 | NaN |
25 | 4931 | 430000001 | 平安银行股份有限公司 | 000001.XSHE | 000001 | None | None | 2019-03-07 | 2018-01-01 | 2018-12-31 | 2018-12-31 | 0 | 321003 | 定期报告 | 1.167160e+11 | 7.474500e+10 | 1.628880e+11 | 8.814300e+10 | 3.129700e+10 | 3.936200e+10 | 8.065000e+09 | None | NaN | None | None | None | None | None | None | 9.186000e+09 | NaN | 892000000.0 | 209000000.0 | 170000000.0 | 8.441100e+10 | NaN | None | None | None | None | None | None | 1.149000e+09 | None | 3.539100e+10 | None | NaN | 4.787100e+10 | 3.230500e+10 | None | 28000000.0 | 102000000.0 | NaN | 3.223100e+10 | 7.413000e+09 | None | 2.481800e+10 | 2.481800e+10 | NaN | None | 1.39 | 1.39 | 912000000.0 | 2.573000e+10 | 2.573000e+10 | NaN |
简化计算的方法(sqlalchemy.sql.func) 和 group_by¶
func中的函数可以完成常用的一些计算,配合group_by完成简单的统计
下边以简单计算股票池/全市场每天的的等权重pe,pb为例
(当然,这个计算中一些细节并没有考虑到,实际还是以官方数据为准,可以通过聚源数据获取,具体下边会有演示)
from sqlalchemy.sql import funcfrom jqdata import financestocks = get_index_stocks('000016.XSHG')q = query(valuation.day,# func.count('*'), # 信息条数(也就是股票数量) (func.count()/func.sum(1/valuation.pe_ratio)).label('*g_pe'), # 等权重pe (func.count()/func.sum(1/valuation.pb_ratio)).label('*g_pb'), # 等权重pb ).group_by(valuation.day ).filter(valuation.pe_ratio>0, #排除亏损的valuation.pb_ratio>0, valuation.day>'2009-12-01', #设置一个时间valuation.code.in_(stocks) #设置股票池,注释即为全市场)df = finance.run_query(q)df.set_index('day',inplace=True)df.tail()
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
*g_pe | *g_pb | |
---|---|---|
day | ||
2019-01-16 | 10.081409 | 1.295044 |
2019-01-17 | 10.036275 | 1.289927 |
2019-01-18 | 10.201209 | 1.309428 |
2019-01-21 | 10.262107 | 1.314957 |
2019-01-22 | 10.262107 | 1.314957 |
更多关于query的可导入函数,可以使用以下方法查看,配合官网文档使用:
import sqlalchemy dir(sqlalchemy.sql.expression)
其他常用数据获取方式整理¶
获取旧的申万指数列表和新的申万指数列表(14年有过改动)¶
from jqdata import *def get_sw_quote(day=None):'''获取指定日期的申万指数列表'''day = get_trade_days(end_date=day,count=1)[-1]df = jy.run_query(query( jy.QT_SYWGIndexQuote.InnerCode.distinct().label('InnerCode')).filter(jy.QT_SYWGIndexQuote.TradingDay==day,))code_df = jy.run_query(query( jy.SecuMain.SecuCode,jy.SecuMain.ChiName).filter(jy.SecuMain.InnerCode.in_(df.InnerCode)))return code_dfdf = get_sw_quote('2013-02-05')df.tail()
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
SecuCode | ChiName | |
---|---|---|
466 | 852041 | 申银万国指数-专业连锁 |
467 | 852051 | 申银万国指数-一般物业经营 |
468 | 852052 | 申银万国指数-专业市场 |
469 | 852225 | 申银万国指数-软件开发 |
470 | 852226 | 申银万国指数-IT服务 |
查询申万行情( 包含行业pe/pb)¶
from jqdata import jyfrom jqdata import *import pandas as pd#注意申万指数在2014年有一次大改,聚源使用的是为改变之前的代码,官网包含更改前和更改后的代码,如果遇到找不到的标的可以根据需求自行查找#如801124 >>801121食品加工IIdef get_sw_quote(code,end_date=None,count=None,start_date=None):'''获取申万指数行情,返回panel结构'''if isinstance(code,str):code=[code]days = get_trade_days(start_date,end_date,count)code_df = jy.run_query(query( jy.SecuMain.InnerCode,jy.SecuMain.SecuCode,jy.SecuMain.ChiName).filter(jy.SecuMain.SecuCode.in_(code)))df = jy.run_query(query( jy.QT_SYWGIndexQuote).filter(jy.QT_SYWGIndexQuote.InnerCode.in_(code_df.InnerCode),jy.QT_SYWGIndexQuote.TradingDay.in_(days),))df2 = pd.merge(code_df, df, on='InnerCode').set_index(['TradingDay','SecuCode'])df2.drop(['InnerCode','ID','UpdateTime','JSID'],axis=1,inplace=True)return df2.to_panel()code = get_industries(name='sw_l2').index[:5]df = get_sw_quote(code,end_date='2018-01-01',count=10)df.to_frame(False).tail()
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:23: DeprecationWarning: Panel is deprecated and will be removed in a future version. The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/. Pandas provides a `.to_xarray()` method to help automate this conversion.
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
ChiName | PrevClosePrice | OpenPrice | HighPrice | LowPrice | ClosePrice | TurnoverVolume | TurnoverValue | TurnoverDeals | ChangePCT | RightLevel | IndexPE | IndexPB | TotalMarketValue | AShareTotalMV | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TradingDay | SecuCode | |||||||||||||||
2017-12-29 | 801011 | 申银万国指数-林业 | 1800.092 | 1795.480 | 1845.60 | 1784.41 | 1802.798 | 2853232.0 | 3.554701e+07 | None | 0.0015 | 4 | 777.37 | 3.13 | 2036129.0 | 1121708.0 |
801012 | 申银万国指数-农产品加工 | 2551.776 | 2546.600 | 2573.80 | 2515.61 | 2556.512 | 93670965.0 | 1.234257e+09 | None | 0.0019 | 4 | 35.58 | 3.12 | 14529320.0 | 7936090.0 | |
801013 | 申银万国指数-农业综合 | 2269.610 | 2260.530 | 2278.20 | 2254.30 | 2272.480 | 9084945.0 | 4.234125e+07 | None | 0.0013 | 4 | 125.71 | 2.92 | 867493.0 | 464852.0 | |
801014 | 申银万国指数-饲料 | 3644.520 | 3646.568 | 3709.26 | 3610.05 | 3685.350 | 133928157.0 | 1.452713e+09 | None | 0.0112 | 4 | 23.78 | 2.71 | 21027565.0 | 8124732.0 | |
801015 | 申银万国指数-渔业 | 1540.178 | 1535.748 | 1566.71 | 1528.66 | 1555.760 | 34618204.0 | 3.012524e+08 | None | 0.0101 | 4 | 43.95 | 2.52 | 5217722.0 | 2511131.0 |
查询中证行情( 包含指数市值/市盈率/股息率以及红利等指数)¶
def get_zz_quote(code,end_date=None,count=None,start_date=None):'''获取中证指数行情,包含市值,市盈率,股息率,返回panel结构'''if isinstance(code,str):code=[code]code.sort()code = [x[:6] for x in code]days = get_trade_days(start_date,end_date,count)code_df = jy.run_query(query( jy.SecuMain.InnerCode,jy.SecuMain.SecuCode,jy.SecuMain.ChiName).filter(jy.SecuMain.SecuCode.in_(code)).order_by(jy.SecuMain.SecuCode))df = jy.run_query(query( jy.QT_CSIIndexQuote).filter(jy.QT_CSIIndexQuote.IndexCode.in_(code_df.InnerCode),jy.QT_CSIIndexQuote.TradingDay.in_(days),))df2 = pd.merge(code_df, df, left_on='InnerCode',right_on='IndexCode').set_index(['TradingDay','SecuCode'])df2.drop(['InnerCode','IndexCode','ID','UpdateTime','JSID','OpenInterest','SettleValue','IndexCSIType'],axis=1,inplace=True)return df2.to_panel()panel = get_zz_quote(['000016.XSHG','000001.XSHG'],end_date='2019-01-21',count=10)panel.ClosePrice.tail()
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:19: DeprecationWarning: Panel is deprecated and will be removed in a future version. The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/. Pandas provides a `.to_xarray()` method to help automate this conversion.
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
SecuCode | 000001 | 000016 |
---|---|---|
TradingDay | ||
2019-01-15 | 2570.3449 | 2378.3696 |
2019-01-16 | 2570.4221 | 2381.2178 |
2019-01-17 | 2559.6370 | 2371.3481 |
2019-01-18 | 2596.0056 | 2417.3630 |
2019-01-21 | 2610.5094 | 2432.4870 |
查询股息率(近12个月)¶
from jqdata import *def DividendRatio(security_list,end_date,count=1):'''查询股息率(日更新) 输入:股票池,截止日期,获取数量 输出:panel结构,单位:1'''trade_days = get_trade_days(end_date=end_date,count = count)security_list.sort()secu_list = [x[:6] for x in security_list]code_df = jy.run_query(query( jy.SecuMain.InnerCode,jy.SecuMain.SecuCode,# jy.SecuMain.ChiName,jy.SecuMain.CompanyCode).filter(jy.SecuMain.SecuCode.in_(secu_list),jy.SecuMain.SecuCategory==1).order_by(jy.SecuMain.SecuCode))code_df['code'] = security_listdf = jy.run_query(query(# jy.LC_DIndicesForValuation #得到整表jy.LC_DIndicesForValuation.InnerCode,jy.LC_DIndicesForValuation.TradingDay, jy.LC_DIndicesForValuation.DividendRatio,).filter(jy.LC_DIndicesForValuation.InnerCode.in_(code_df.InnerCode),jy.LC_DIndicesForValuation.TradingDay.in_(trade_days)))f_df = df.merge(code_df,on='InnerCode').set_index(['TradingDay','code']).drop(['InnerCode','SecuCode'],axis=1)panel = f_df.to_panel()return panelsecuritys =get_index_stocks('399015.XSHE')[:10]date='2018-01-01'DividendRatio(securitys,date,count=5)
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:24: DeprecationWarning: Panel is deprecated and will be removed in a future version. The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/. Pandas provides a `.to_xarray()` method to help automate this conversion.
<class 'pandas.core.panel.Panel'> Dimensions: 1 (items) x 5 (major_axis) x 10 (minor_axis) Items axis: DividendRatio to DividendRatio Major_axis axis: 2017-12-25 00:00:00 to 2017-12-29 00:00:00 Minor_axis axis: 000007.XSHE to 000049.XSHE
获取期货合约的基本信息(合约乘数、商品报价的计数单位、最小变动单位)¶
# lru缓存可以不要,加上后查询相同的合约可以避免重复数据库请求from fastcache import clru_cache as lru_cache@lru_cache(maxsize=128)def future_basic_info(future):from jqdata import jyfrom numpy import nanimport reif "9999" in future or "8888" in future:match = re.match(r"(?P<underlying_symbol>[A-Z]{1,})", future)if not match:raise ValueError("未知期货标的:{}".format(future))else:future = get_dominant_future(match.groupdict()["underlying_symbol"])q = query(jy.Fut_ContractMain).filter(jy.Fut_ContractMain.ContractCode == future.split(".")[0])result = jy.run_query(query_object=q).to_dict("record")if result:result = result.pop()min_point = re.match("(?P<value>^[0-9]+([.]{1}[0-9]+){0,1})", result["LittlestChangeUnit"]).groupdict(nan)["value"]return {"ContractUnit": result["CMValue"], "PriceScale": float(str(min_point)[:-1] + "1") if float(min_point) < 1 else 1, "MinPoint": float(min_point)}else:return {"ContractUnit": nan,"PriceScale": nan,"MinPoint": nan}future_basic_info('IF1801.CCFX')
{'ContractUnit': 300, 'MinPoint': 0.2, 'PriceScale': 0.1}
多个dataframe的合并 (数据拼接) 获取多份财务报表¶
#获取多年的季度度数据import pandas as pddef get_more_state_fund(q_object,year_list):df_list = []for year in year_list:rets = [get_fundamentals(q, statDate=str(year)+'q'+str(i)) for i in range(1, 5)]df = pd.concat(rets).set_index('statDate') #个人习惯df_list.append(df)df_ = pd.concat(df_list,keys=year_list,axis=0) #axis=1或axis=0,依据个人习惯return df_q = query(indicator.code, indicator.statDate,indicator.roe,indicator.inc_return,indicator.pubDate, ).filter( income.code.in_(['000001.XSHE','600507.XSHG']))df = get_more_state_fund(q,['2017','2018'])# df.loc[('2014',slice(None))]df
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
code | roe | inc_return | pubDate | ||
---|---|---|---|---|---|
statDate | |||||
2017 | 2017-03-31 | 000001.XSHE | 3.0319 | 3.0309 | 2017-04-22 |
2017-03-31 | 600507.XSHG | 10.2024 | 10.0997 | 2017-04-22 | |
2017-06-30 | 000001.XSHE | 3.0249 | 3.0058 | 2017-08-11 | |
2017-06-30 | 600507.XSHG | 14.0929 | 14.2133 | 2017-08-19 | |
2017-09-30 | 000001.XSHE | 3.0724 | 3.0813 | 2017-10-21 | |
2017-09-30 | 600507.XSHG | 24.2866 | 23.9243 | 2017-10-24 | |
2017-12-31 | 000001.XSHE | 1.8339 | 1.8320 | 2018-03-15 | |
2017-12-31 | 600507.XSHG | 22.9191 | 19.8184 | 2018-02-09 | |
2018 | 2018-03-31 | 000001.XSHE | 2.9611 | 2.9431 | 2018-04-20 |
2018-03-31 | 600507.XSHG | 13.6345 | 13.6067 | 2018-04-21 | |
2018-06-30 | 000001.XSHE | 3.0000 | 3.0000 | 2018-08-16 | |
2018-06-30 | 600507.XSHG | 19.9900 | 19.7200 | 2018-08-18 | |
2018-09-30 | 000001.XSHE | 3.0600 | 3.0300 | 2018-10-24 | |
2018-09-30 | 600507.XSHG | 21.2400 | 21.1700 | 2018-10-25 | |
2018-12-31 | 000001.XSHE | 1.8400 | 1.8300 | 2019-03-07 | |
2018-12-31 | 600507.XSHG | 10.4100 | 10.2300 | 2019-02-22 |
#h获取多个年度财务数据import pandas as pdstock_list=['000001.XSHE','000527.XSHE','600970.XSHG','600875.XSHG','600880.XSHG']years = range(2005,2018)df_list=[]for year in years:df_year= get_fundamentals(query( indicator.code, indicator.inc_return,indicator.roe ).filter( indicator.code.in_(stock_list) ),statDate=year).set_index('code')df_list.append(df_year)name = [str(x) for x in range(2005,2017)]df=pd.concat(df_list,axis=1,keys=name)df
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
inc_return | roe | inc_return | roe | inc_return | roe | inc_return | roe | inc_return | roe | inc_return | roe | inc_return | roe | inc_return | roe | inc_return | roe | inc_return | roe | inc_return | roe | inc_return | roe | |
000001.XSHE | 6.8734 | 6.3868 | 21.2849 | 22.6025 | 26.7708 | 27.2057 | 4.2435 | 4.1761 | 26.7942 | 27.2887 | 22.7620 | 23.2809 | 19.0581 | 19.2441 | 16.9316 | 16.9537 | 15.4064 | 15.4724 | 16.3280 | 16.2959 | 14.9783 | 14.9530 | 12.4321 | 12.4283 |
000527.XSHE | 13.3490 | 13.3300 | 15.7667 | 16.3908 | 26.3179 | 29.8230 | 24.7264 | 22.3681 | 26.2619 | 26.6314 | 27.1383 | 28.7052 | 19.3616 | 22.8784 | 16.2383 | 16.5337 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
600875.XSHG | 34.1418 | 34.0314 | 22.7201 | 57.1331 | 16.2666 | 47.2531 | 56.7490 | 6.7020 | 30.2382 | 28.6972 | 24.8096 | 26.0992 | 23.2666 | 24.6060 | 13.7332 | 14.8649 | 13.4276 | 14.0803 | 6.2436 | 6.8654 | 1.4852 | 2.0616 | -8.9718 | -8.0745 |
600880.XSHG | 18.1740 | 18.5634 | 21.3198 | 12.3548 | 19.8543 | 20.2199 | 18.9755 | 19.9203 | 20.9028 | 22.6681 | 20.9537 | 21.7254 | 16.5323 | 21.3662 | 13.1731 | 13.2258 | 11.4463 | 11.7852 | 8.0076 | 7.9088 | 0.7765 | 2.0698 | -0.7119 | 1.6685 |
600970.XSHG | 15.8088 | 17.5163 | 14.8404 | 30.9030 | 25.1281 | 28.2336 | 43.4164 | 30.6846 | 48.4384 | 47.1297 | 49.0580 | 52.5023 | 39.2612 | 39.8747 | 16.3465 | 17.0652 | -0.5751 | 2.0248 | 2.2335 | 3.4041 | 8.9119 | 12.2016 | 4.3695 | 7.6902 |
需要注意的: finance,jy,opt,macro有一次性获取数据不超过3000条的限制(主要是为了提速,jqdatasdk上限是4000条,官网后续也会增加到4000条,get_fundamentals的限制为10000条),
所以当获取的数据超过这个限制时,需要对股票池或者时间段进行分割,多次获取,也可以通过offset。
利用offset多次查询以跳过限制¶
from jqdata import financefrom sqlalchemy.sql import funcimport pandas as pdimport mathsum_count = finance.run_query(query(func.count('*') ).filter(finance.CCTV_NEWS.day<'2012-01-01')).iloc[0,0] #先查询总共有多少条数据print ('总共有{}条数据,需要获取{}次'.format(sum_count,int(math.ceil(sum_count/3000.0))))l = []for i in range(0,14909,3000): #以3000为步长循环offset的参数q = query(finance.CCTV_NEWS).filter(finance.CCTV_NEWS.day<'2012-01-01' ).order_by(finance.CCTV_NEWS.day #可以先按照一定规律排序 ).offset(i) #自第i条数据之后进行获取df=finance.run_query(q)l.append(df)df = pd.concat(l).reset_index() #数据拼接print(df.shape)df.tail()
总共有14909条数据,需要获取5次 (14909, 5)
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
index | id | day | title | content | |
---|---|---|---|---|---|
14904 | 2904 | 14715 | 2011-12-31 | 国内联播快讯 | “十二五”期间工业领域重点行业淘汰落后产能目标任务日前下达,与“十一五”相比,新增了铜冶... |
14905 | 2905 | 14702 | 2011-12-31 | 胡锦涛主席发表2012年新年贺词 | 2012年新年来临之际,国家主席胡锦涛通过中国国际广播电台、中央人民广播电台和中央电视台... |
14906 | 2906 | 14706 | 2011-12-31 | 胡锦涛签署第五十二号、五十三号主席令 | 本台消息,国家主席胡锦涛31号在北京签署了第五十二号和第五十三号主席令。\n 第五... |
14907 | 2907 | 14704 | 2011-12-31 | 胡锦涛致电祝贺金正恩担任朝鲜人民军最高司令官 | 本台消息,12月31日,中华人民共和国中央军事委员会主席胡锦涛致电祝贺金正恩担任朝鲜... |
14908 | 2908 | 14717 | 2011-12-31 | 金正恩成为朝鲜人民军最高司令官 | 据朝中社今天报道,朝鲜劳动党中央政治局会议30号在平壤召开会议宣布,根据已故最高领导人金... |
获取前N季度的财务数据¶
相关说明: https://www.joinquant.com/help/api/help?name=factor#如何理解dependencies中的财务因子
一般获取的前N季度数据可以直接在因子分析中直接处理了,如果有特殊需求,可以使用下边的方法获得一个panel(不推荐,推荐直接在factor类中直接计算获取计算完成后的dataframe)
from jqfactor import Factor, calc_factorsimport pandas as pddef get_more_data(data_name,securities,start_date,end_date):class get_N_data(Factor):name = 'more_data'max_window = 1# 设置依赖的数据,即前四季度的营业收入dependencies = data_namedef calc(self, data):# 计算 ttm , 为前四季度相加all_data = pd.concat(data.values())all_data.index = data_name# 将 ttm 转换成 seriesall_data = pd.Series(all_data.to_dict())return all_data# 计算因子值data1 = calc_factors(securities, [get_N_data()], start_date=start_date, end_date=end_date)all_data = pd.Panel(data1['more_data'].to_dict()) return all_data #处理成panel结构data_name = ['net_profit','net_profit_1','net_profit_2','net_profit_3']securities = ['600741.XSHG','000001.XSHG']get_more_data(data_name,securities,start_date='2019-01-01',end_date='2019-02-01')
/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:25: DeprecationWarning: Panel is deprecated and will be removed in a future version. The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/. Pandas provides a `.to_xarray()` method to help automate this conversion.
<class 'pandas.core.panel.Panel'> Dimensions: 2 (items) x 4 (major_axis) x 23 (minor_axis) Items axis: 000001.XSHG to 600741.XSHG Major_axis axis: net_profit to net_profit_3 Minor_axis axis: 2019-01-02 00:00:00 to 2019-02-01 00:00:00
计算TTM数据¶
各种指标算法多式多样,我们提供的财务数据中最主要的还是单季度和报告期的财务指标,具体算法以及含义在文档中有详细说明
https://www.joinquant.com/help/api/help?name=Stock#获取单季度年度财务数据
关于两者的区别可以参看这篇帖子理解:
https://www.joinquant.com/view/community/detail/fcb3baa6f926259*caac3bce7c12b1c?type=2
TTM数据有些在聚宽因子库中可以直接获取(对于一些TTM的算法以及口径有多种方式,所以各平台可能存在不一致)
https://www.joinquant.com/help/api/help?name=factor_values
没有提供的可以借助因子分析进行计算
https://www.joinquant.com/help/api/help?name=factor#示例-计算TTM数据
https://www.joinquant.com/help/api/help?name=factor#在研究与回测中计算因子
下边以计算/获取营业收入TTM及ROE TTM为例:
# 使用因子分析进行计算from jqfactor import Factor, calc_factors# 计算营业收入TTMclass OR_TTM(Factor):# 设置因子名称name = 'operating_revenue_ttm'# 设置获取数据的时间窗口长度max_window = 1# 设置依赖的数据,即前四季度的营业收入dependencies = ['operating_revenue','operating_revenue_1','operating_revenue_2','operating_revenue_3']# 计算因子的函数, 需要返回一个 pandas.Series, index 是股票代码,value 是因子值def calc(self, data):# 计算 ttm , 为前四季度相加ttm = data['operating_revenue'] + data['operating_revenue_1'] + data['operating_revenue_2'] + data['operating_revenue_3']# 将 ttm 转换成 seriesreturn ttm.mean()securities = ['000001.XSHE','600000.XSHG']# 计算因子值data1 = calc_factors(securities, [OR_TTM()], start_date='2019-01-01', end_date='2019-06-01', use_real_price=False, skip_paused=False)data1['operating_revenue_ttm'].tail()
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
000001.XSHE | 600000.XSHG | |
---|---|---|
2019-05-27 | 1.211660e+11 | 1.819970e+11 |
2019-05-28 | 1.211660e+11 | 1.819970e+11 |
2019-05-29 | 1.211660e+11 | 1.819970e+11 |
2019-05-30 | 1.211660e+11 | 1.819970e+11 |
2019-05-31 | 1.211660e+11 | 1.819970e+11 |
# 通过聚宽因子库直接获取from jqfactor import get_factor_valuesdata2 = get_factor_values(securities,factors='operating_revenue_ttm',start_date='2019-01-01', end_date='2019-06-01')data2['operating_revenue_ttm'].tail()
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
code | 000001.XSHE | 600000.XSHG |
---|---|---|
2019-05-27 | 1.211660e+11 | 1.819970e+11 |
2019-05-28 | 1.211660e+11 | 1.819970e+11 |
2019-05-29 | 1.211660e+11 | 1.819970e+11 |
2019-05-30 | 1.211660e+11 | 1.819970e+11 |
2019-05-31 | 1.211660e+11 | 1.819970e+11 |
# 使用因子分析进行计算roefrom jqfactor import Factor, calc_factors# 计算营业收入TTM# roe_ttm = np_parent_company_owners_ttm/equities_parent_company_owners(最近一期) wind算法class ROE_TTM(Factor):# 设置因子名称name = 'roe_ttm'# 设置获取数据的时间窗口长度max_window = 1# 设置依赖的数据dependencies = ['np_parent_company_owners', 'equities_parent_company_owners','np_parent_company_owners_1', #'np_parent_company_owners_ttm''np_parent_company_owners_2', 'np_parent_company_owners_3',] # 计算因子的函数, 需要返回一个 pandas.Series, index 是股票代码,value 是因子值def calc(self, data):# 计算 ttm ttm_1 = (data['np_parent_company_owners'] + data['np_parent_company_owners_1'] + data['np_parent_company_owners_2'] + data['np_parent_company_owners_3'])# ttm_1 = (data['np_parent_company_owners_ttm'] #ttm1 = 归属于母公司股东的净利润TTM,和上边代码结果一致,这里也可以直接从聚宽因子库获取数据ttm_2 = data['equities_parent_company_owners']return (ttm_1/ttm_2).mean()securities = ['000001.XSHE','600000.XSHG','000002.XSHE']# 计算因子值data1 = calc_factors(securities, [ROE_TTM()], start_date='2019-01-01', end_date='2019-06-01', use_real_price=False, skip_paused=False)data1['roe_ttm'].tail()
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
000001.XSHE | 600000.XSHG | 000002.XSHE | |
---|---|---|---|
2019-05-27 | 0.102292 | 0.118846 | 0.215792 |
2019-05-28 | 0.102292 | 0.118846 | 0.215792 |
2019-05-29 | 0.102292 | 0.118846 | 0.215792 |
2019-05-30 | 0.102292 | 0.118846 | 0.215792 |
2019-05-31 | 0.102292 | 0.118846 | 0.215792 |
# 通过聚宽因子库直接获取roe_ttmfrom jqfactor import get_factor_valuesdata2 = get_factor_values(securities,factors='roe_ttm',start_date='2019-01-01', end_date='2019-06-01')data2['roe_ttm'].tail()
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
code | 000001.XSHE | 600000.XSHG | 000002.XSHE |
---|---|---|---|
2019-05-27 | 0.102292 | 0.118409 | 0.214653 |
2019-05-28 | 0.102292 | 0.118409 | 0.214653 |
2019-05-29 | 0.102292 | 0.118409 | 0.214653 |
2019-05-30 | 0.102292 | 0.118409 | 0.214653 |
2019-05-31 | 0.102292 | 0.118409 | 0.214653 |