请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  源码分享 帖子:3366781 新帖:20

共享函数 | 外部数据获取类

交易资深人士发表于:5 月 9 日 20:27回复(1)

1.概述

本帖收集了小伙伴们分享的获取数据方法,仅供学习及交流使用,无利益相关。
本系列将持续更新,标题为 【共享函数】


2.包含的函数:

爬取新浪热门股票(by. 股票疯赢)
爬取选股宝涨停原因(by.包希仁)
抓取港股新股数据统计打新收益(by.止一之路)
爬取申万官网行业行情/估值数据(by.ssk)
爬虫获取国债收益率数据(by.tinysnowing )

爬取新浪热门股票¶

作者:股票疯赢 

import requestsimport anyjsonimport pandas as pddef get_hot_stock_from_sina():'''从新浪得到热门数据'''html = requests.get('https://ssl-data.sina.com.cn/api/openapi.php/WeiboReferService.getListSymbol?code=CNHOUR6&callback=var%20AHM=').content.decode()  n = html[html.index('(')+1:html.index(')')]h = anyjson.deserialize(n)data = pd.DataFrame(h['result']['data'])data.SYMBOL = data.SYMBOL.apply(normalize_code)return dataget_hot_stock_from_sina().head()

.dataframe tbody tr th:only-of-type {        vertical-align: middle;    }    .dataframe tbody tr th {        vertical-align: top;    }    .dataframe thead th {        text-align: right;    }


NAMEREFSYMBOL
0东方通信891768600776.XSHG
1银之杰735779300085.XSHE
2东方财富654869300059.XSHE
3网宿科技592289300017.XSHE
4安控科技498015300370.XSHE

爬取选股宝涨停原因¶

作者: 包希仁 

import urllibimport jsonimport pandas as pddef Xuangubao():url = "https://flash-api.xuangubao.cn/api/pool/detail?pool_name=limit_up"  #涨停#     url = 'https://flash-api.xuangubao.cn/api/pool/detail?pool_name=limit_up_broken'  #炸板header_dict = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko'}#     req = urllib2.Request(url=url, headers=header_dict)#     df = pd.DataFrame(json.loads(urllib2.urlopen(req).read())['data'])req = urllib.request.Request(url,headers = header_dict)df = pd.DataFrame(json.loads(urllib.request.urlopen(req).read())['data'])df['stock_reason'] = df.surge_reason.apply(lambda x: x['stock_reason'])df['plate_name'] = df.surge_reason.apply(lambda x: x['related_plates'][0]['plate_name'])def get_plate_reason(x):try: return x['related_plates'][0][u'plate_reason']except:returndf['plate_reason'] = df.surge_reason.apply(get_plate_reason)df['limit_timeline'] = df.limit_timeline.apply(lambda x: datetime.datetime.fromtimestamp(x['items'][0]['timestamp']))df.index = df.surge_reason.apply(lambda x: normalize_code(x['symbol']))df.index.name=Nonereturn df.drop('surge_reason',axis=1)Xuangubao().head()

.dataframe tbody tr th:only-of-type {        vertical-align: middle;    }    .dataframe tbody tr th {        vertical-align: top;    }    .dataframe thead th {        text-align: right;    }


break_limit_down_timesbreak_limit_up_timesbuy_lock_volume_ratiochange_percentfirst_break_limit_downfirst_break_limit_upfirst_limit_downfirst_limit_upis_new_stockissue_pricelast_break_limit_downlast_break_limit_uplast_limit_downlast_limit_uplimit_down_dayslimit_timelinelimit_up_dayslisted_datem_days_n_boards_boardsm_days_n_boards_daysmtmnearly_new_acc_*nearly_new_break_daysnew_stock_acc_*new_stock_break_limit_upnew_stock_limit_up_daysnew_stock_limit_up_price_before_brokennon_restricted_capitalpricesell_lock_volume_ratiostock_chi_namesymboltotal_capitalturnover_ratiovolume_bias_ratioyesterday_break_limit_up_timesyesterday_first_limit_upyesterday_last_limit_upyesterday_limit_down_daysyesterday_limit_up_daysstock_reasonplate_nameplate_reason
002450.XSHE000.0091740.0503820001551662703False14.20000155166270302019-03-04 09:25:03512792096009150.00.00-0.515493000.02.224831e+106.880ST康得新002450.SZ2.436139e+100.0007680.015875015514035031551403503042018年度实现净利润4.02亿元ST股年报披露高峰期,扭亏个股有望摘帽
300538.XSHE000.1872400.1001100001551662703False15.85000155166270302019-03-04 09:25:0331472140800330.00.000.899685000.07.225360e+0830.110同益股份300538.SZ2.538039e+090.0465960.6307240155140350315514035030218年年报10转8高送转None
002207.XSHE080.0009310.0503250155167947901551679467False7.85015516819000155168198102019-03-04 14:04:2711201449600360.00.00-0.175796000.01.536120e+096.470ST准油002207.SZ1.547478e+090.0444711.64192331551405519155140643100主营石油技术服务、建筑*、运输服务和化工产品销售,属于上游石油天然气采掘服务业ST股年报披露高峰期,扭亏个股有望摘帽
002552.XSHE040.0012810.0506490155168120701551681189False20.00015516814440155168211002019-03-04 14:33:0911298563200000.00.00-0.595500000.01.619205e+098.090*ST宝鼎002552.SZ2.477420e+090.0302131.32643500000三季报扭亏,主营大型铸锻件ST股年报披露高峰期,扭亏个股有望摘帽
000727.XSHE020.0071090.0995850155166381301551663003False6.16015516667500155166750002019-03-04 09:30:031864057600480.00.00-0.569805000.07.766236e+092.650华东科技000727.SZ1.200335e+100.1254761.18371500000广东聚华印刷显示技术有限公司为参股公司,目前聚华公司建成了“国家印刷及柔性显示创新中心”,开...柔性屏华为发布MATE X折叠屏手机

抓取港股新股数据统计打新收益¶

作者:止一之路   

内容较多,请直接点击原文链接查看¶

爬取申万官网行业行情/估值数据¶

作者: ssk  

#获取申万官网申万行业数据#导入库import numpy as npimport pandas as pdimport requestsimport jsonfrom datetime import timedelta,date# 获取申万官网申万行业数据# code:行业代码  https://www.joinquant.com/help/api/help?name=plateData#申万行业# frequency:day/week/month# start_date:None(表示最早日期)# end_date:None(表示今天日期)# fields:None(表示所有字段)def get_sw_data(code=None,start_date=None,end_date=None,frequency='day',fields=None): #headersheader={'HOST':'www.swsindex.com','Referer':'http://www.swsindex.com/idx0200.aspx?columnid=8838&type=Day','User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) \    Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4482.400 QQBrowser/9.7.13001.400'}#传入参数param={'tablename':'V_Report','key':'id',#页面序号,每页返回20条数据'p':'1',#查询语句,查询的代码、日期、数据类型"where":"swindexcode in ('801020') and   BargainDate>='2018-04-02' and  BargainDate<='2018-04-24' and type='Day'",#排序(swindexcode asc表示按照代码升序,BargainDate_1表示按照日期降序,_2表示按照升序)'orderby':'swindexcode asc,BargainDate_2',#返回的字段'fieldlist':'SwIndexCode,SwIndexName,BargainDate,OpenIndex,CloseIndex,MaxIndex,MinIndex,BargainAmount,BargainSum,Markup,TurnoverRate,\    PE,PB,MeanPrice,BargainSumRate,NegotiablesShareSum,NegotiablesShareSum2,DP','pagecount':'1','timed':'1524497094532',}#数据表表头sw_columns_list=['SwIndexCode','SwIndexName','BargainDate','OpenIndex','CloseIndex','MaxIndex','MinIndex','BargainAmount','BargainSum', 'Markup','TurnoverRate','PE','PB','MeanPrice','BargainSumRate','NegotiablesShareSum','NegotiablesShareSum2','DP']#数据类型(日、周、月)frequency_list=['day','week','month']#配置查询语句where="swindexcode in ("if code is None:#如果代码为空,则代码为代码列表code='801010'else:    if type(code)==list:code_str=str(code).replace('[','').replace(']','')if type(code)==str:code_str="'"+code+"'"where+=code_str   
    #配置日期today_str=pd.datetime.today().strftime('%Y-%m-%d')if (start_date is None) or (start_date<'1999-12-30') or (start_date>today_str):start_date='1999-12-30'where+=") and BargainDate>='"     where+=start_dateif (end_date is None) or (end_date>today_str) or (end_date<'1999-12-30'):end_date=today_strwhere+="' and BargainDate<='" where+=end_date  
    #配置数据类型if not(frequency in frequency_list):  frequency='day'where+="' and type='"where+=frequencywhere+="'"param['where']=where 
    #配置字段columns=sw_columns_listfieldlist=str(sw_columns_list).replace(" ","").replace("'","").replace('[',"").replace(']',"")   if not(fields is None):if(set(fields).issubset(set(sw_columns_list))):  if not (['SwIndexCode','SwIndexName','BargainDate'] in fields):fields=['SwIndexCode','SwIndexName','BargainDate']+fieldsfieldlist=str(fields).replace(" ","").replace("'","").replace('[',"").replace(']',"") columns=fieldsparam['fieldlist']=fieldlistdf=pd.DataFrame()#urlurl='http://www.swsindex.com/handler.aspx'#页面计数器page=1while True:#获取数据ret=requests.get(url,data=param,headers=header)if not (ret.ok is True):break#整理引号、日期格式    data=ret.text.replace("'", '"').replace(' 0:00:00','').replace('/','-')#解析数据data=json.loads(data).get('root')if len(data)==0:break#追加数据表    df=df.append(pd.DataFrame(data,columns=columns))#设置页面计数器page+=1param['p']=str(page)    if len(df)!=0:   df.BargainDate=pd.to_datetime(df.BargainDate,format='%Y-%m-%d')#返回数据return df df=get_sw_data('850111',start_date='2019-02-23')df

.dataframe tbody tr th:only-of-type {        vertical-align: middle;    }    .dataframe tbody tr th {        vertical-align: top;    }    .dataframe thead th {        text-align: right;    }


SwIndexCodeSwIndexNameBargainDateOpenIndexCloseIndexMaxIndexMinIndexBargainAmountBargainSumMarkupTurnoverRatePEPBMeanPriceBargainSumRateNegotiablesShareSumNegotiablesShareSum2DP
0850111种子生产2019-02-252493.652603.852612.972469.67185441090294.573.604345.492.686.700.103496540.23437067.530.57
1850111种子生产2019-02-262601.412577.022643.982534.8320089115323-1.033.904745.022.656.650.113470405.45433800.680.58
2850111种子生产2019-02-272571.522547.742603.462530.191365178331-1.142.653344.512.626.580.093430769.77428846.220.59
3850111种子生产2019-02-282550.002559.182584.132523.738255503260.451.604444.712.646.620.083449990.37431248.800.58
4850111种子生产2019-03-012567.252570.262590.562519.639037532910.431.756444.912.656.640.083462555.34432819.420.58
5850111种子生产2019-03-042581.012590.012636.932559.2714772919840.772.871145.252.676.700.093489184.93436148.120.58
6850111种子生产2019-03-052588.812651.592662.002560.3816510940812.383.209046.332.736.890.113577580.69447197.590.56
7850111种子生产2019-03-062669.462688.302715.862620.91192281150491.383.737346.972.776.980.103627549.76453443.720.56
8850111种子生产2019-03-072691.752723.262791.492645.81188511154971.303.664047.582.817.120.103680740.95460092.620.55
9850111种子生产2019-03-082672.422600.622751.952574.5818283105621-4.503.553645.442.686.730.093506135.99438267.000.58
10850111种子生产2019-03-112602.072717.112721.982593.2015197911434.482.953847.472.807.140.103682245.16460280.650.55
11850111种子生产2019-03-122738.212723.112783.132673.28179731153380.223.493347.582.817.150.103686361.67460795.210.55
12850111种子生产2019-03-132755.562686.942826.682650.8520313126599-1.333.948246.952.777.050.123631513.24453939.150.56
13850111种子生产2019-03-142658.122567.432679.732527.971359278529-4.452.641944.682.656.720.103469352.20433669.020.59
14850111种子生产2019-03-152578.162594.872640.552554.479537616401.071.853645.162.676.830.083508430.54438553.820.58
15850111种子生产2019-03-182617.852751.262759.712587.8912125986336.032.356747.882.837.100.123701939.95462742.490.55
16850111种子生产2019-03-192776.052799.752830.072727.96119111080911.762.315148.722.887.170.143752415.20469051.900.54
17850111种子生产2019-03-202792.862796.132858.792746.401216590645-0.132.364548.662.887.170.123751792.49468974.060.54

爬虫获取国债收益率数据¶

作者:tinysnowing 

import requestsimport jsonimport pandas as pdimport timefrom sqlalchemy import create_enginedef get_bnd_yield(year=10):ids = {10: '29227', 5: '29234', 1: '29231'}url = 'https://cn.investing.com/common/modules/js_instrument_chart/api/data.php?' + \'pair_id={}&pair_id_for_news={}'.format(ids[year], ids[year]) +\'&chart_type=area&pair_interval=month&candle_count=120&events=yes&volume_series=yes&period=5-years'headers = {}headers['X-Requested-With'] = 'XMLHttpRequest'headers['Host'] = 'cn.investing.com'headers['Referer'] = 'https://cn.investing.com/rates-bonds/china-{}-year-bond-yield'.format(year)headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'res = requests.get(url, headers=headers)res = json.loads(res.content.decode('utf-8').replace("'", "\""))data = pd.DataFrame(res['candles'])data = data.iloc[:, :2]data.columns = ['date', 'y'+str(year)]data['date'] = data['date'].map(lambda x: time.strftime("%Y-%m-%d", time.localtime(int(str(x)[:10]))))data.set_index('date', inplace=True)return datadef get_bnd_yields(years=[1, 5, 10]):bag = pd.DataFrame()for yr in years:bag = pd.concat([bag, get_bnd_yield(year=yr)], axis=1)#print(bag.head())return bag
df = get_bnd_yields()df

.dataframe tbody tr th:only-of-type {        vertical-align: middle;    }    .dataframe tbody tr th {        vertical-align: top;    }    .dataframe thead th {        text-align: right;    }


y1y5y10
date


2014-04-013.6504.1604.330
2014-05-013.3604.0104.160
2014-06-013.3703.8604.060
2014-07-013.7634.0314.298
2014-08-013.7993.9984.248
2014-09-013.7673.9314.028
2014-10-013.4063.5653.786
2014-11-013.0703.4133.546
2014-12-013.2633.5383.648
2015-01-013.1513.4093.514
2015-02-013.0773.2573.379
2015-03-013.1903.4793.623
2015-04-012.8693.2883.422
2015-05-011.9603.2533.591
2015-06-011.7673.2473.629
2015-07-012.2463.1783.474
2015-08-012.3073.1653.394
2015-09-012.3603.0873.276
2015-10-012.3792.9033.087
2015-11-012.5992.9263.088
2015-12-012.3292.7132.862
2016-01-012.3932.7892.909
2016-02-012.2792.6552.909
2016-03-012.1632.5312.886
2016-04-012.2182.7692.946
2016-05-012.3382.7672.995
2016-06-012.3902.7002.875
2016-07-012.2402.6062.805
2016-08-012.1502.5942.805
2016-09-012.1852.5652.769
2016-10-012.1902.4802.744
2016-11-012.3002.7652.943
2016-12-012.7512.8833.066
2017-01-012.7703.0373.363
2017-02-012.7833.0003.358
2017-03-012.8853.0853.310
2017-04-013.1603.3473.477
2017-05-013.4753.6533.670
2017-06-013.4533.5023.578
2017-07-013.4283.5743.629
2017-08-013.4283.6353.675
2017-09-013.4603.6303.638
2017-10-013.5833.9633.916
2017-11-013.7003.8763.917
2017-12-013.8033.8603.915
2018-01-013.5833.8453.944
2018-02-013.3133.7593.857
2018-03-013.3503.6903.778
2018-04-013.0073.1753.653
2018-05-013.1853.4523.646
2018-06-013.2103.4103.543
2018-07-012.8933.2273.533
2018-08-012.8363.3863.600
2018-09-012.9903.4703.655
2018-10-012.8373.3643.533
2018-11-012.6453.1683.398
2018-12-012.5753.0143.270
2019-01-012.4152.9233.130
2019-02-012.4093.0333.208
2019-03-012.4453.0403.148
 

全部回复

0/140

量化课程

    移动端课程