量化交易吧 / 量化平台帖子：3369986 新帖：6

【有用功】Query的简单教程及TTM/同比/环比算法

只求稳定发表于：7 月 1 日 12：49回复(1)

对于数据的查询过滤,一般通过filter函数进行实现,filter可以对多个条件进行过滤,中间用,隔开。
比如查询某只股票某一个时间段,满足特定值的数据,可以用到 in_ ,and_,> , < , == ,!=等符号进行基本的查询

涉及到使用数据库操作的数据有:

get_fundamentals (股票单季度财务数据)
finance (股票数据,基金数据等)
opt (期权数据)
macro (宏观数据)
jy (聚源数据)

基本的查询方式¶

query() 填写需要查询的对象,可以是整张表,也可以是表中的多个字段或计算出的结果
filter 填写过滤条件,多个过滤条件可以用逗号隔开,或者用and,or这样的语法
order_by 填写排序条件

.desc() 降序排列
.asc() 升序排列

limit 限制返回的个数
order_by 分组统计

# 查询 000001 2015 - 2017 年的整张合并利润表,且满足 net_profit(净利润) < 0 的条件from jqdata import financeq = query(finance.STK_INCOME_STATEMENT).filter(finance.STK_INCOME_STATEMENT.code=='000783.XSHE',              #选定股票  000783.XSHEfinance.STK_INCOME_STATEMENT.end_date > '2005-01-01',          #指定查询时间段大于2005年1月1日finance.STK_INCOME_STATEMENT.end_date < '2018-01-01',          #指定查询时间段小于2018年1月1日finance.STK_INCOME_STATEMENT.net_profit <0,                    #指定查询到的数据中net_profit为负finance.STK_INCOME_STATEMENT.report_type == 0,                 #指定报告期类型为本期).order_by(finance.STK_INCOME_STATEMENT.end_date.desc()  ).limit(5)   #根据end_date降序排序,并返回前5条数据 finance.run_query(q)

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	id	company_id	company_name	code	a_code	b_code	h_code	pub_date	start_date	end_date	...	commission_income	interest_expense	commission_expense	refunded_premiums	net_pay_insurance_claims	withdraw_insurance_contract_reserve	policy_dividend_payout	reinsurance_cost	non_current_asset_disposed	other_earnings
0	257078	430000783	长江证券股份有限公司	000783.XSHE	000783	None	None	2007-10-22	2007-01-01	2007-09-30	...	None	None	None	None	None	None	None	None	None	None
1	257070	430000783	长江证券股份有限公司	000783.XSHE	000783	None	None	2006-10-25	2006-01-01	2006-09-30	...	None	None	None	None	None	None	None	None	None	None
2	257068	430000783	长江证券股份有限公司	000783.XSHE	000783	None	None	2006-08-17	2006-01-01	2006-06-30	...	None	None	None	None	None	None	None	None	None	None
3	257066	430000783	长江证券股份有限公司	000783.XSHE	000783	None	None	2006-04-29	2006-01-01	2006-03-31	...	None	None	None	None	None	None	None	None	None	None
4	257064	430000783	长江证券股份有限公司	000783.XSHE	000783	None	None	2006-04-29	2005-01-01	2005-12-31	...	None	None	None	None	None	None	None	None	None	None

5 rows × 60 columns

get_fundamentals 获取某个/多个季度/年度的数据(指定参数statDate)¶

# 查询平安银行2014年3-6月份的单季度报表q = query(  income.statDate,  income.code,  income.basic_eps,  cash_flow.goods_sale_and_service_render_cash
  ).filter(  income.code == '000001.XSHE',
  )ret = get_fundamentals(q, statDate='2014q2')ret

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	statDate	code	basic_eps	goods_sale_and_service_render_cash
0	2014-06-30	000001.XSHE	0.35	NaN

# 查询平安银行2014年的年报q = query(  income.statDate,  income.code,  income.basic_eps,  cash_flow.goods_sale_and_service_render_cash
  ).filter(  income.code == '000001.XSHE',
  )ret = get_fundamentals(q, statDate='2014')ret

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	statDate	code	basic_eps	goods_sale_and_service_render_cash
0	2014-12-31	000001.XSHE	1.73	NaN

# 查询平安银行2014年四个季度的季报, 放到数组中并拼接为dataframeq = query(  income.statDate,  income.code,  income.basic_eps,  balance.cash_equivalents,  cash_flow.goods_sale_and_service_render_cash
  ).filter(  income.code == '000001.XSHE',
  )rets = [get_fundamentals(q, statDate='2014q'+str(i)) for i in range(1, 5)]import pandas as pdpd.concat(rets)

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

statDate	code	basic_eps	cash_equivalents	goods_sale_and_service_render_cash
2014-03-31	000001.XSHE	0.53	2.581100e+11	NaN
2014-06-30	000001.XSHE	0.35	2.596040e+11	NaN
2014-09-30	000001.XSHE	0.49	2.773250e+11	NaN
2014-12-31	000001.XSHE	0.36	3.062980e+11	NaN

in_ 判断某个字段的值是否在列表之中(一般用于查询多个标的)¶

stocks = ['000001.XSHE','600741.XSHG','600507.XSHG']q = query(balance.code,balance.pubDate,balance.statDate,    # 指定返回的字段只包括code,pubDate,statDate,total_assets及total_sheet_owner_equities  balance.total_assets,balance.total_sheet_owner_equities).filter( balance.code.in_(stocks)   #指定查询到的数据只包括code在 stocks中的数据)get_fundamentals(q,date='2018-01-05')    #查询单季度数据中在2018-01-05之前发布的数据,没有未来函数

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	code	pubDate	statDate	total_assets	total_sheet_owner_equities
0	000001.XSHE	2017-10-21	2017-09-30	3.137481e+12	3.137481e+12
1	600507.XSHG	2017-10-24	2017-09-30	1.041599e+10	1.041599e+10
2	600741.XSHG	2017-10-27	2017-09-30	1.153612e+11	1.153612e+11

distinct 去重,用于查看数据库中某个字段都存在哪些值¶

#  查看十大流通股东中都有哪些类别from jqdata import financeq = query( finance.STK_SHAREHOLDER_FLOATING_TOP10.shareholder_class_id.distinct(),  #提取ID不同的数据finance.STK_SHAREHOLDER_FLOATING_TOP10.shareholder_class, ).order_by(finance.STK_SHAREHOLDER_FLOATING_TOP10.change_reason_id)  df = finance.run_query(q)df.tail()

	anon_1	shareholder_class
23	307012	银行
24	307024	银行和上市公司
25	307025	信托投资管理公司和上市公司
26	307024	上市公司和银行
27	307027	银行和QFII

~ 反向查询符号¶

q = query( finance.STK_SHAREHOLDER_FLOATING_TOP10.shareholder_class_id.distinct(),finance.STK_SHAREHOLDER_FLOATING_TOP10.shareholder_class, ).filter(~finance.STK_SHAREHOLDER_FLOATING_TOP10.shareholder_class_id.in_(['307007','307001']),#等同于notin_)finance.run_query(q)

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	anon_1	shareholder_class
0	307002	券商
1	307003	证券投资基金
2	307004	社保基金
3	307005	企业年金
4	307006	保险公司
5	307008	基金管理公司
6	307009	信托投资管理公司
7	307011	上市公司
8	307012	银行
9	307013	风险投资
10	307014	保险投资组合
11	307015	券商资产管理计划
12	307016	信托资产管理计划
13	307017	基金资产管理计划
14	307018	资产管理公司资产管理计划
15	307019	国有资产经营公司
16	307020	期货资产管理计划
17	307021	地方国资委
18	307022	券商和上市公司
19	307023	保险公司和上市公司
20	307024	银行和上市公司
21	307024	上市公司和银行
22	307025	信托投资管理公司和上市公司
23	307027	银行和QFII
24	307099	其他机构

运算和命名(label)¶

# label的作用是命名获得数据的标签,一般用于直接运算后的重命名q = query(indicator.code.label('股票代码'),indicator.operating_profit.label('get_operating_profit'), (income.total_operating_revenue - income.total_operating_cost).label('my_operating_profit') ).filter(indicator.code=='600507.XSHG')get_fundamentals(q)

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	股票代码	get_operating_profit	my_operating_profit
0	600507.XSHG	1.329762e+09	1.329762e+09

or_ 或查询¶

from sqlalchemy.sql.expression import or_df = get_fundamentals(query(  valuation.code,valuation.market_cap,valuation.pe_ratio  ).filter(  or_(valuation.market_cap < 10,valuation.pe_ratio > 1000)))  # 查询当天总市值小于1000亿或pe大于10亿的所有股票df.tail()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	code	market_cap	pe_ratio
23	300336.XSHE	32.8136	2772.8604
24	600127.XSHG	30.1638	1195.8562
25	600462.XSHG	8.3803	-3.0123
26	601008.XSHG	46.6999	3504.9221
27	603718.XSHG	87.9060	1225.4053

contains/like/ilike 数据库中的字符串模糊匹配¶

% 百分号通配符: 表示任何字符出现任意次数(可以是0次).
_ 下划线通配符:表示只能匹配单个字符,不能多也不能少,就是一个字符.

from jqdata import finance# 获取000001.XSHG每一年的年报   df = finance.run_query(query(finance.FINANCE_INCOME_STATEMENT   ).filter(finance.FINANCE_INCOME_STATEMENT.code.in_(['000001.XSHE']),#                 finance.FINANCE_INCOME_STATEMENT.end_date.like('%-12-31%'),    #sqlalchemy的版本,数据库的构建等问题可能导致查询报错,可以换以下两种方法尝试#               finance.FINANCE_INCOME_STATEMENT.end_date.contains('12-31') ,      finance.FINANCE_INCOME_STATEMENT.end_date.ilike('_____12-31'),finance.FINANCE_INCOME_STATEMENT.report_type==0   ))df.tail()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	id	company_id	company_name	code	a_code	b_code	h_code	pub_date	start_date	end_date	report_date	source_id	source	operating_revenue	interest_net_revenue	interest_income	interest_expense	commission_net_income	commission_income	commission_expense	agent_security_income	sell_security_income	manage_income	premiums_earned	assurance_income	premiums_income	premiums_expense	prepare_money	investment_income	invest_income_associates	fair_value_variable_income	exchange_income	other_income	operation_expense	refunded_premiums	compensate_loss	compensation_back	insurance_reserve	insurance_reserve_back	policy_dividend_payout	reinsurance_cost	operating_tax_surcharges	commission_expense2	operation_manage_fee	separate_fee	asset_impairment_loss	other_cost	operating_profit	subsidy_income	non_operating_revenue	non_operating_expense	other_items_influenced_profit	total_profit	income_tax_expense	other_influence_net_profit	net_profit	np_parent_company_owners	minority_profit	eps	basic_eps	diluted_eps	other_composite_income	total_composite_income	ci_parent_company_owners	ci_minority_owners
21	2543	430000001	平安银行股份有限公司	000001.XSHE	000001	None	None	2015-03-13	2014-01-01	2014-12-31	2014-12-31	321003	定期报告	7.340700e+10	5.304600e+10	1.192020e+11	6.615600e+10	1.737800e+10	1.970600e+10	2.328000e+09	None	NaN	None	None	None	None	None	None	3.168000e+09	28000000.0	-10000000.0	-388000000.0	213000000.0	4.716100e+10	NaN	None	None	None	None	None	None	5.482000e+09	None	2.666800e+10	None	1.501100e+10	0.000000e+00	2.624600e+10	None	40000000.0	92000000.0	NaN	2.619400e+10	6.392000e+09	None	1.980200e+10	1.980200e+10	NaN	None	1.73	1.73	586000000.0	2.038800e+10	NaN	NaN
22	2551	430000001	平安银行股份有限公司	000001.XSHE	000001	None	None	2016-03-10	2015-01-01	2015-12-31	2015-12-31	321003	定期报告	9.616300e+10	6.609900e+10	1.316490e+11	6.555000e+10	2.644500e+10	2.918500e+10	2.740000e+09	None	NaN	None	None	None	None	None	None	3.924000e+09	28000000.0	107000000.0	-573000000.0	161000000.0	6.726800e+10	NaN	None	None	None	None	None	None	6.671000e+09	None	3.011200e+10	None	3.048500e+10	0.000000e+00	2.889500e+10	None	40000000.0	89000000.0	NaN	2.884600e+10	6.981000e+09	None	2.186500e+10	2.186500e+10	NaN	None	1.56	1.56	NaN	2.259900e+10	NaN	NaN
23	2559	430000001	平安银行股份有限公司	000001.XSHE	000001	None	None	2017-03-17	2016-01-01	2016-12-31	2016-12-31	321003	定期报告	1.077150e+11	7.641100e+10	1.311190e+11	5.470800e+10	2.785900e+10	3.130900e+10	3.450000e+09	None	NaN	None	None	None	None	None	None	2.368000e+09	-141000000.0	49000000.0	882000000.0	146000000.0	7.793600e+10	NaN	None	None	None	None	None	None	3.445000e+09	None	2.797300e+10	None	4.651800e+10	0.000000e+00	2.977900e+10	None	221000000.0	65000000.0	NaN	2.993500e+10	7.336000e+09	None	2.259900e+10	2.259900e+10	NaN	None	1.32	1.32	308000000.0	2.290700e+10	2.290700e+10	NaN
24	2564	430000001	平安银行股份有限公司	000001.XSHE	000001	None	None	2018-03-15	2017-01-01	2017-12-31	2017-12-31	321003	定期报告	1.057860e+11	7.400900e+10	1.480680e+11	7.405900e+10	3.067400e+10	3.572500e+10	5.051000e+09	None	NaN	None	None	None	None	None	None	6.320000e+08	NaN	-61000000.0	166000000.0	186000000.0	7.556300e+10	NaN	None	None	None	None	None	None	1.022000e+09	None	3.161600e+10	None	4.292500e+10	NaN	3.022300e+10	None	38000000.0	104000000.0	NaN	3.015700e+10	6.968000e+09	None	2.318900e+10	2.318900e+10	NaN	None	1.30	1.30	281000000.0	2.347000e+10	2.347000e+10	NaN
25	4931	430000001	平安银行股份有限公司	000001.XSHE	000001	None	None	2019-03-07	2018-01-01	2018-12-31	2018-12-31	321003	定期报告	1.167160e+11	7.474500e+10	1.628880e+11	8.814300e+10	3.129700e+10	3.936200e+10	8.065000e+09	None	NaN	None	None	None	None	None	None	9.186000e+09	NaN	892000000.0	209000000.0	170000000.0	8.441100e+10	NaN	None	None	None	None	None	None	1.149000e+09	None	3.539100e+10	None	NaN	4.787100e+10	3.230500e+10	None	28000000.0	102000000.0	NaN	3.223100e+10	7.413000e+09	None	2.481800e+10	2.481800e+10	NaN	None	1.39	1.39	912000000.0	2.573000e+10	2.573000e+10	NaN

简化计算的方法(sqlalchemy.sql.func) 和 group_by¶

func中的函数可以完成常用的一些计算,配合group_by完成简单的统计
下边以简单计算股票池/全市场每天的的等权重pe,pb为例
(当然,这个计算中一些细节并没有考虑到,实际还是以官方数据为准,可以通过聚源数据获取,具体下边会有演示)

from sqlalchemy.sql import funcfrom jqdata import financestocks = get_index_stocks('000016.XSHG')q = query(valuation.day,#           func.count('*'),       # 信息条数(也就是股票数量)  (func.count()/func.sum(1/valuation.pe_ratio)).label('*g_pe'),  # 等权重pe  (func.count()/func.sum(1/valuation.pb_ratio)).label('*g_pb'),  # 等权重pb ).group_by(valuation.day   ).filter(valuation.pe_ratio>0,   #排除亏损的valuation.pb_ratio>0,   valuation.day>'2009-12-01',  #设置一个时间valuation.code.in_(stocks)   #设置股票池,注释即为全市场)df = finance.run_query(q)df.set_index('day',inplace=True)df.tail()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	*g_pe	*g_pb
day
2019-01-16	10.081409	1.295044
2019-01-17	10.036275	1.289927
2019-01-18	10.201209	1.309428
2019-01-21	10.262107	1.314957
2019-01-22	10.262107	1.314957

更多关于query的可导入函数,可以使用以下方法查看,配合官网文档使用:

import sqlalchemy
dir(sqlalchemy.sql.expression)

其他常用数据获取方式整理¶

获取旧的申万指数列表和新的申万指数列表(14年有过改动)¶

from jqdata import *def get_sw_quote(day=None):'''获取指定日期的申万指数列表'''day = get_trade_days(end_date=day,count=1)[-1]df = jy.run_query(query( jy.QT_SYWGIndexQuote.InnerCode.distinct().label('InnerCode')).filter(jy.QT_SYWGIndexQuote.TradingDay==day,))code_df = jy.run_query(query( jy.SecuMain.SecuCode,jy.SecuMain.ChiName).filter(jy.SecuMain.InnerCode.in_(df.InnerCode)))return code_dfdf = get_sw_quote('2013-02-05')df.tail()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	SecuCode	ChiName
466	852041	申银万国指数-专业连锁
467	852051	申银万国指数-一般物业经营
468	852052	申银万国指数-专业市场
469	852225	申银万国指数-软件开发
470	852226	申银万国指数-IT服务

查询申万行情( 包含行业pe/pb)¶

from jqdata import jyfrom jqdata import *import pandas as pd#注意申万指数在2014年有一次大改,聚源使用的是为改变之前的代码,官网包含更改前和更改后的代码,如果遇到找不到的标的可以根据需求自行查找#如801124 >>801121食品加工IIdef get_sw_quote(code,end_date=None,count=None,start_date=None):'''获取申万指数行情,返回panel结构'''if isinstance(code,str):code=[code]days = get_trade_days(start_date,end_date,count)code_df = jy.run_query(query( jy.SecuMain.InnerCode,jy.SecuMain.SecuCode,jy.SecuMain.ChiName).filter(jy.SecuMain.SecuCode.in_(code)))df = jy.run_query(query( jy.QT_SYWGIndexQuote).filter(jy.QT_SYWGIndexQuote.InnerCode.in_(code_df.InnerCode),jy.QT_SYWGIndexQuote.TradingDay.in_(days),))df2  = pd.merge(code_df, df, on='InnerCode').set_index(['TradingDay','SecuCode'])df2.drop(['InnerCode','ID','UpdateTime','JSID'],axis=1,inplace=True)return df2.to_panel()code = get_industries(name='sw_l2').index[:5]df = get_sw_quote(code,end_date='2018-01-01',count=10)df.to_frame(False).tail()

/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:23: DeprecationWarning: 
Panel is deprecated and will be removed in a future version.
The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method
Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/.
Pandas provides a `.to_xarray()` method to help automate this conversion.

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

		ChiName	PrevClosePrice	OpenPrice	HighPrice	LowPrice	ClosePrice	TurnoverVolume	TurnoverValue	TurnoverDeals	ChangePCT	RightLevel	IndexPE	IndexPB	TotalMarketValue	AShareTotalMV
TradingDay	SecuCode
2017-12-29	801011	申银万国指数-林业	1800.092	1795.480	1845.60	1784.41	1802.798	2853232.0	3.554701e+07	None	0.0015	4	777.37	3.13	2036129.0	1121708.0
801012	申银万国指数-农产品加工	2551.776	2546.600	2573.80	2515.61	2556.512	93670965.0	1.234257e+09	None	0.0019	4	35.58	3.12	14529320.0	7936090.0
801013	申银万国指数-农业综合	2269.610	2260.530	2278.20	2254.30	2272.480	9084945.0	4.234125e+07	None	0.0013	4	125.71	2.92	867493.0	464852.0
801014	申银万国指数-饲料	3644.520	3646.568	3709.26	3610.05	3685.350	133928157.0	1.452713e+09	None	0.0112	4	23.78	2.71	21027565.0	8124732.0
801015	申银万国指数-渔业	1540.178	1535.748	1566.71	1528.66	1555.760	34618204.0	3.012524e+08	None	0.0101	4	43.95	2.52	5217722.0	2511131.0

查询中证行情( 包含指数市值/市盈率/股息率以及红利等指数)¶

def get_zz_quote(code,end_date=None,count=None,start_date=None):'''获取中证指数行情,包含市值,市盈率,股息率,返回panel结构'''if isinstance(code,str):code=[code]code.sort()code = [x[:6] for x in code]days = get_trade_days(start_date,end_date,count)code_df = jy.run_query(query( jy.SecuMain.InnerCode,jy.SecuMain.SecuCode,jy.SecuMain.ChiName).filter(jy.SecuMain.SecuCode.in_(code)).order_by(jy.SecuMain.SecuCode))df = jy.run_query(query( jy.QT_CSIIndexQuote).filter(jy.QT_CSIIndexQuote.IndexCode.in_(code_df.InnerCode),jy.QT_CSIIndexQuote.TradingDay.in_(days),))df2  = pd.merge(code_df, df, left_on='InnerCode',right_on='IndexCode').set_index(['TradingDay','SecuCode'])df2.drop(['InnerCode','IndexCode','ID','UpdateTime','JSID','OpenInterest','SettleValue','IndexCSIType'],axis=1,inplace=True)return df2.to_panel()panel = get_zz_quote(['000016.XSHG','000001.XSHG'],end_date='2019-01-21',count=10)panel.ClosePrice.tail()

/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:19: DeprecationWarning: 
Panel is deprecated and will be removed in a future version.
The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method
Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/.
Pandas provides a `.to_xarray()` method to help automate this conversion.

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

SecuCode	000001	000016
TradingDay
2019-01-15	2570.3449	2378.3696
2019-01-16	2570.4221	2381.2178
2019-01-17	2559.6370	2371.3481
2019-01-18	2596.0056	2417.3630
2019-01-21	2610.5094	2432.4870

查询股息率(近12个月)¶

from jqdata import *def DividendRatio(security_list,end_date,count=1):'''查询股息率(日更新)    输入:股票池,截止日期,获取数量    输出:panel结构,单位:1'''trade_days = get_trade_days(end_date=end_date,count = count)security_list.sort()secu_list = [x[:6] for x in security_list]code_df = jy.run_query(query( jy.SecuMain.InnerCode,jy.SecuMain.SecuCode,#     jy.SecuMain.ChiName,jy.SecuMain.CompanyCode).filter(jy.SecuMain.SecuCode.in_(secu_list),jy.SecuMain.SecuCategory==1).order_by(jy.SecuMain.SecuCode))code_df['code'] = security_listdf = jy.run_query(query(#         jy.LC_DIndicesForValuation    #得到整表jy.LC_DIndicesForValuation.InnerCode,jy.LC_DIndicesForValuation.TradingDay, jy.LC_DIndicesForValuation.DividendRatio,).filter(jy.LC_DIndicesForValuation.InnerCode.in_(code_df.InnerCode),jy.LC_DIndicesForValuation.TradingDay.in_(trade_days)))f_df = df.merge(code_df,on='InnerCode').set_index(['TradingDay','code']).drop(['InnerCode','SecuCode'],axis=1)panel = f_df.to_panel()return panelsecuritys =get_index_stocks('399015.XSHE')[:10]date='2018-01-01'DividendRatio(securitys,date,count=5)

/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:24: DeprecationWarning: 
Panel is deprecated and will be removed in a future version.
The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method
Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/.
Pandas provides a `.to_xarray()` method to help automate this conversion.

<class 'pandas.core.panel.Panel'>
Dimensions: 1 (items) x 5 (major_axis) x 10 (minor_axis)
Items axis: DividendRatio to DividendRatio
Major_axis axis: 2017-12-25 00:00:00 to 2017-12-29 00:00:00
Minor_axis axis: 000007.XSHE to 000049.XSHE

获取期货合约的基本信息（合约乘数、商品报价的计数单位、最小变动单位）¶

# lru缓存可以不要，加上后查询相同的合约可以避免重复数据库请求from fastcache import clru_cache as lru_cache@lru_cache(maxsize=128)def future_basic_info(future):from jqdata import jyfrom numpy import nanimport reif "9999" in future or "8888" in future:match = re.match(r"(?P<underlying_symbol>[A-Z]{1,})", future)if not match:raise ValueError("未知期货标的：{}".format(future))else:future = get_dominant_future(match.groupdict()["underlying_symbol"])q = query(jy.Fut_ContractMain).filter(jy.Fut_ContractMain.ContractCode == future.split(".")[0])result = jy.run_query(query_object=q).to_dict("record")if result:result = result.pop()min_point = re.match("(?P<value>^[0-9]+([.]{1}[0-9]+){0,1})", result["LittlestChangeUnit"]).groupdict(nan)["value"]return {"ContractUnit": result["CMValue"],   "PriceScale": float(str(min_point)[:-1] + "1") if float(min_point) < 1 else 1,   "MinPoint": float(min_point)}else:return {"ContractUnit": nan,"PriceScale": nan,"MinPoint": nan}future_basic_info('IF1801.CCFX')

{'ContractUnit': 300, 'MinPoint': 0.2, 'PriceScale': 0.1}

多个dataframe的合并 (数据拼接) 获取多份财务报表¶

#获取多年的季度度数据import pandas as pddef get_more_state_fund(q_object,year_list):df_list = []for year in year_list:rets = [get_fundamentals(q, statDate=str(year)+'q'+str(i)) for i in range(1, 5)]df = pd.concat(rets).set_index('statDate')  #个人习惯df_list.append(df)df_ = pd.concat(df_list,keys=year_list,axis=0) #axis=1或axis=0,依据个人习惯return df_q = query(indicator.code,  indicator.statDate,indicator.roe,indicator.inc_return,indicator.pubDate,
  ).filter(  income.code.in_(['000001.XSHE','600507.XSHG']))df = get_more_state_fund(q,['2017','2018'])# df.loc[('2014',slice(None))]df

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

		code	roe	inc_return	pubDate
	statDate
2017	2017-03-31	000001.XSHE	3.0319	3.0309	2017-04-22
2017-03-31	600507.XSHG	10.2024	10.0997	2017-04-22
2017-06-30	000001.XSHE	3.0249	3.0058	2017-08-11
2017-06-30	600507.XSHG	14.0929	14.2133	2017-08-19
2017-09-30	000001.XSHE	3.0724	3.0813	2017-10-21
2017-09-30	600507.XSHG	24.2866	23.9243	2017-10-24
2017-12-31	000001.XSHE	1.8339	1.8320	2018-03-15
2017-12-31	600507.XSHG	22.9191	19.8184	2018-02-09
2018	2018-03-31	000001.XSHE	2.9611	2.9431	2018-04-20
2018-03-31	600507.XSHG	13.6345	13.6067	2018-04-21
2018-06-30	000001.XSHE	3.0000	3.0000	2018-08-16
2018-06-30	600507.XSHG	19.9900	19.7200	2018-08-18
2018-09-30	000001.XSHE	3.0600	3.0300	2018-10-24
2018-09-30	600507.XSHG	21.2400	21.1700	2018-10-25
2018-12-31	000001.XSHE	1.8400	1.8300	2019-03-07
2018-12-31	600507.XSHG	10.4100	10.2300	2019-02-22

#h获取多个年度财务数据import pandas as pdstock_list=['000001.XSHE','000527.XSHE','600970.XSHG','600875.XSHG','600880.XSHG']years = range(2005,2018)df_list=[]for year in years:df_year= get_fundamentals(query( indicator.code, indicator.inc_return,indicator.roe   ).filter( indicator.code.in_(stock_list)   ),statDate=year).set_index('code')df_list.append(df_year)name = [str(x) for x in range(2005,2017)]df=pd.concat(df_list,axis=1,keys=name)df

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016
	inc_return	roe	inc_return	roe	inc_return	roe	inc_return	roe	inc_return	roe	inc_return	roe	inc_return	roe	inc_return	roe	inc_return	roe	inc_return	roe	inc_return	roe	inc_return	roe
000001.XSHE	6.8734	6.3868	21.2849	22.6025	26.7708	27.2057	4.2435	4.1761	26.7942	27.2887	22.7620	23.2809	19.0581	19.2441	16.9316	16.9537	15.4064	15.4724	16.3280	16.2959	14.9783	14.9530	12.4321	12.4283
000527.XSHE	13.3490	13.3300	15.7667	16.3908	26.3179	29.8230	24.7264	22.3681	26.2619	26.6314	27.1383	28.7052	19.3616	22.8784	16.2383	16.5337	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
600875.XSHG	34.1418	34.0314	22.7201	57.1331	16.2666	47.2531	56.7490	6.7020	30.2382	28.6972	24.8096	26.0992	23.2666	24.6060	13.7332	14.8649	13.4276	14.0803	6.2436	6.8654	1.4852	2.0616	-8.9718	-8.0745
600880.XSHG	18.1740	18.5634	21.3198	12.3548	19.8543	20.2199	18.9755	19.9203	20.9028	22.6681	20.9537	21.7254	16.5323	21.3662	13.1731	13.2258	11.4463	11.7852	8.0076	7.9088	0.7765	2.0698	-0.7119	1.6685
600970.XSHG	15.8088	17.5163	14.8404	30.9030	25.1281	28.2336	43.4164	30.6846	48.4384	47.1297	49.0580	52.5023	39.2612	39.8747	16.3465	17.0652	-0.5751	2.0248	2.2335	3.4041	8.9119	12.2016	4.3695	7.6902

需要注意的: finance,jy,opt,macro有一次性获取数据不超过3000条的限制(主要是为了提速,jqdatasdk上限是4000条,官网后续也会增加到4000条,get_fundamentals的限制为10000条),

所以当获取的数据超过这个限制时,需要对股票池或者时间段进行分割,多次获取,也可以通过offset。

利用offset多次查询以跳过限制¶

from jqdata import financefrom sqlalchemy.sql import funcimport pandas as pdimport mathsum_count = finance.run_query(query(func.count('*')   ).filter(finance.CCTV_NEWS.day<'2012-01-01')).iloc[0,0]  #先查询总共有多少条数据print ('总共有{}条数据,需要获取{}次'.format(sum_count,int(math.ceil(sum_count/3000.0))))l = []for i in range(0,14909,3000): #以3000为步长循环offset的参数q = query(finance.CCTV_NEWS).filter(finance.CCTV_NEWS.day<'2012-01-01'   ).order_by(finance.CCTV_NEWS.day  #可以先按照一定规律排序 ).offset(i)   #自第i条数据之后进行获取df=finance.run_query(q)l.append(df)df = pd.concat(l).reset_index()  #数据拼接print(df.shape)df.tail()

总共有14909条数据,需要获取5次
(14909, 5)

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	index	id	day	title	content
14904	2904	14715	2011-12-31	国内联播快讯	“十二五”期间工业领域重点行业淘汰落后产能目标任务日前下达，与“十一五”相比，新增了铜冶...
14905	2905	14702	2011-12-31	胡锦涛主席发表2012年新年贺词	2012年新年来临之际，国家主席胡锦涛通过中国国际广播电台、中央人民广播电台和中央电视台...
14906	2906	14706	2011-12-31	胡锦涛签署第五十二号、五十三号主席令	本台消息，国家主席胡锦涛31号在北京签署了第五十二号和第五十三号主席令。\n　　第五...
14907	2907	14704	2011-12-31	胡锦涛致电祝贺金正恩担任朝鲜人民军最高司令官	本台消息，12月31日，中华人民共和国中央军事委员会主席胡锦涛致电祝贺金正恩担任朝鲜...
14908	2908	14717	2011-12-31	金正恩成为朝鲜人民军最高司令官	据朝中社今天报道，朝鲜劳动党中央政治局会议30号在平壤召开会议宣布，根据已故最高领导人金...

获取前N季度的财务数据¶

相关说明: https://www.joinquant.com/help/api/help?name=factor#如何理解dependencies中的财务因子
一般获取的前N季度数据可以直接在因子分析中直接处理了,如果有特殊需求,可以使用下边的方法获得一个panel(不推荐,推荐直接在factor类中直接计算获取计算完成后的dataframe)

from jqfactor import Factor, calc_factorsimport pandas as pddef get_more_data(data_name,securities,start_date,end_date):class get_N_data(Factor):name = 'more_data'max_window = 1dependencies = data_namedef calc(self, data):all_data = pd.concat(data.values())all_data.index = data_nameall_data =  pd.Series(all_data.to_dict())return all_datadata1 = calc_factors(securities, [get_N_data()], start_date=start_date, end_date=end_date)all_data = pd.Panel(data1['more_data'].to_dict()) return all_data   #处理成panel结构data_name = ['net_profit','net_profit_1','net_profit_2','net_profit_3']securities = ['600741.XSHG','000001.XSHG']get_more_data(data_name,securities,start_date='2019-01-01',end_date='2019-02-01')

/opt/conda/envs/python2new/lib/python2.7/site-packages/ipykernel_launcher.py:25: DeprecationWarning: 
Panel is deprecated and will be removed in a future version.
The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method
Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/.
Pandas provides a `.to_xarray()` method to help automate this conversion.

<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 23 (minor_axis)
Items axis: 000001.XSHG to 600741.XSHG
Major_axis axis: net_profit to net_profit_3
Minor_axis axis: 2019-01-02 00:00:00 to 2019-02-01 00:00:00

计算TTM数据¶

各种指标算法多式多样,我们提供的财务数据中最主要的还是单季度和报告期的财务指标,具体算法以及含义在文档中有详细说明 https://www.joinquant.com/help/api/help?name=Stock#获取单季度年度财务数据
关于两者的区别可以参看这篇帖子理解:
https://www.joinquant.com/view/community/detail/fcb3baa6f926259*caac3bce7c12b1c?type=2

TTM数据有些在聚宽因子库中可以直接获取(对于一些TTM的算法以及口径有多种方式,所以各平台可能存在不一致)
https://www.joinquant.com/help/api/help?name=factor_values
没有提供的可以借助因子分析进行计算
https://www.joinquant.com/help/api/help?name=factor#示例-计算TTM数据
https://www.joinquant.com/help/api/help?name=factor#在研究与回测中计算因子
下边以计算/获取营业收入TTM及ROE TTM为例:

# 使用因子分析进行计算from jqfactor import Factor, calc_factors# 计算营业收入TTMclass OR_TTM(Factor):# 设置因子名称name = 'operating_revenue_ttm'# 设置获取数据的时间窗口长度max_window = 1# 设置依赖的数据，即前四季度的营业收入dependencies = ['operating_revenue','operating_revenue_1','operating_revenue_2','operating_revenue_3']# 计算因子的函数， 需要返回一个 pandas.Series, index 是股票代码，value 是因子值def calc(self, data):# 计算 ttm ， 为前四季度相加ttm = data['operating_revenue'] + data['operating_revenue_1'] + data['operating_revenue_2'] + data['operating_revenue_3']# 将 ttm 转换成 seriesreturn ttm.mean()securities = ['000001.XSHE','600000.XSHG']# 计算因子值data1 = calc_factors(securities, [OR_TTM()], start_date='2019-01-01', end_date='2019-06-01', 
                                       use_real_price=False, skip_paused=False)data1['operating_revenue_ttm'].tail()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	000001.XSHE	600000.XSHG
2019-05-27	1.211660e+11	1.819970e+11
2019-05-28	1.211660e+11	1.819970e+11
2019-05-29	1.211660e+11	1.819970e+11
2019-05-30	1.211660e+11	1.819970e+11
2019-05-31	1.211660e+11	1.819970e+11

# 通过聚宽因子库直接获取from jqfactor import get_factor_valuesdata2 = get_factor_values(securities,factors='operating_revenue_ttm',start_date='2019-01-01', end_date='2019-06-01')data2['operating_revenue_ttm'].tail()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

code	000001.XSHE	600000.XSHG
2019-05-27	1.211660e+11	1.819970e+11
2019-05-28	1.211660e+11	1.819970e+11
2019-05-29	1.211660e+11	1.819970e+11
2019-05-30	1.211660e+11	1.819970e+11
2019-05-31	1.211660e+11	1.819970e+11

# 使用因子分析进行计算roefrom jqfactor import Factor, calc_factors# 计算营业收入TTM# roe_ttm = np_parent_company_owners_ttm/equities_parent_company_owners(最近一期)  wind算法class ROE_TTM(Factor):# 设置因子名称name = 'roe_ttm'# 设置获取数据的时间窗口长度max_window = 1# 设置依赖的数据dependencies = ['np_parent_company_owners',   'equities_parent_company_owners','np_parent_company_owners_1', #'np_parent_company_owners_ttm''np_parent_company_owners_2', 'np_parent_company_owners_3',] # 计算因子的函数， 需要返回一个 pandas.Series, index 是股票代码，value 是因子值def calc(self, data):# 计算 ttm ttm_1 = (data['np_parent_company_owners'] + data['np_parent_company_owners_1'] + data['np_parent_company_owners_2'] + data['np_parent_company_owners_3'])#         ttm_1 = (data['np_parent_company_owners_ttm']  #ttm1 = 归属于母公司股东的净利润TTM,和上边代码结果一致,这里也可以直接从聚宽因子库获取数据ttm_2 = data['equities_parent_company_owners']return (ttm_1/ttm_2).mean()securities = ['000001.XSHE','600000.XSHG','000002.XSHE']# 计算因子值data1 = calc_factors(securities, [ROE_TTM()], start_date='2019-01-01', end_date='2019-06-01', 
                                       use_real_price=False, skip_paused=False)data1['roe_ttm'].tail()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	000001.XSHE	600000.XSHG	000002.XSHE
2019-05-27	0.102292	0.118846	0.215792
2019-05-28	0.102292	0.118846	0.215792
2019-05-29	0.102292	0.118846	0.215792
2019-05-30	0.102292	0.118846	0.215792
2019-05-31	0.102292	0.118846	0.215792

# 通过聚宽因子库直接获取roe_ttmfrom jqfactor import get_factor_valuesdata2 = get_factor_values(securities,factors='roe_ttm',start_date='2019-01-01', end_date='2019-06-01')data2['roe_ttm'].tail()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

code	000001.XSHE	600000.XSHG	000002.XSHE
2019-05-27	0.102292	0.118846	0.215792
2019-05-28	0.102292	0.118846	0.215792
2019-05-29	0.102292	0.118846	0.215792
2019-05-30	0.102292	0.118846	0.215792
2019-05-31	0.102292	0.118846	0.215792

计算同比/环比指标¶

import pandas as pdfrom jqfactor import Factor, calc_factorsdef get_growth_rate(security_list,field,start_date=None,end_date=None,type_ = 'q_to_q'):"""    field: 计算的指标名,填写财务数据中单季度指标名 ,如adjusted_profit 扣非净利润    type_: 计算类型q_to_q代表环比,y_to_y    """type_dict = {'y_to_y':field+'_4','q_to_q':field+'_1'}class Get_Annual(Factor):name = field+type_max_window = 1dependencies = [field, type_dict[type_]] def calc(self, data):df = pd.Panel(data).iloc[:,0]res = (df.iloc[:,0]-df.iloc[:,1])/df.iloc[:,1]extra_df = df[(df.iloc[:,1]<0)]res[extra_df.index] = res[extra_df.index]*-1   #上期为负的数据乘以-1,如果要表示亏损增长率可以用1去减res[extra_df.index]return resdata1 = calc_factors(security_list, [Get_Annual()], start_date=start_date,end_date=end_date, skip_paused=False)return data1[field+type_]type_ = 'q_to_q'security_list = ["000001.XSHE","600741.XSHG"]field = 'adjusted_profit'   # 计算扣费净利润环比增长率df = get_growth_rate(security_list,field,start_date='2015-01-01',end_date='2019-06-25',type_ = 'y_to_y')df.tail()

.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

	000001.XSHE	600741.XSHG
2019-06-19	0.132265	-0.151126
2019-06-20	0.132265	-0.151126
2019-06-21	0.132265	-0.151126
2019-06-24	0.132265	-0.151126
2019-06-25	0.132265	-0.151126

全部回复

0/140

本社区仅针对特定人员开放

查看需注册登录并通过风险意识测评

5秒后跳转登录页面...

达人推荐

关注
律孚

粉丝:734

帖子数:0

律孚

0帖子2关注734粉丝

关注拉黑私信
关注
时间都去哪了

粉丝:555

帖子数:0

时间都去哪了

0帖子5关注555粉丝

关注拉黑私信
关注
李强bruce

粉丝:676

帖子数:391

交易匠人：震荡行情多耐心，黄金早间有望延续...

交易匠人：黄金震荡回踩1708，有波段机会

李强bruce

391帖子0关注676粉丝

关注拉黑私信

量化课程

移动端课程

量化交易吧 / 量化平台 帖子：3369986 新帖：6

【有用功】Query的简单教程及TTM/同比/环比算法

只求稳定发表于：7 月 1 日 12：49回复(1)

基本的查询方式¶

get_fundamentals 获取某个/多个季度/年度的数据(指定参数statDate)¶

in_ 判断某个字段的值是否在列表之中(一般用于查询多个标的)¶

distinct 去重,用于查看数据库中某个字段都存在哪些值¶

~ 反向查询符号¶

运算和命名(label)¶

or_ 或查询¶

contains/like/ilike 数据库中的字符串模糊匹配¶

简化计算的方法(sqlalchemy.sql.func) 和 group_by¶

其他常用数据获取方式整理¶

获取旧的申万指数列表和新的申万指数列表(14年有过改动)¶

查询申万行情( 包含行业pe/pb)¶

查询中证行情( 包含指数市值/市盈率/股息率以及红利等指数)¶

查询股息率(近12个月)¶

获取期货合约的基本信息（合约乘数、商品报价的计数单位、最小变动单位）¶

多个dataframe的合并 (数据拼接) 获取多份财务报表¶

需要注意的: finance,jy,opt,macro有一次性获取数据不超过3000条的限制(主要是为了提速,jqdatasdk上限是4000条,官网后续也会增加到4000条,get_fundamentals的限制为10000条),

所以当获取的数据超过这个限制时,需要对股票池或者时间段进行分割,多次获取,也可以通过offset。

利用offset多次查询以跳过限制¶

获取前N季度的财务数据¶

计算TTM数据¶

计算同比/环比指标¶

全部回复

0/140

粉丝:734

帖子数:0

粉丝:555

帖子数:0

粉丝:676

帖子数:391

量化课程

热门标签

删除回复

确认要删除这篇文章么？

举报用户

信息提示

该文章已删除

设置置顶

完成设置【置顶】！

设置置顶

已取消设置【置顶】！

设置精华

完成设置【精华】！

设置精华

已取消设置【精华】！

审核信息

该文章已审核通过

审核信息

您已设置该文章审核不通过

举报成功

您已举报成功

用户登录

移动帖子

创建私信

屏蔽提示

确认要屏蔽该用户么？

屏蔽回复

您已对该用户实现屏蔽

信息回复

已发送成功

量化交易吧 / 量化平台帖子：3369986 新帖：6