dataframe 常用统计函数¶
摘要¶
常用统计函数
describe 针对Series或个DataFrame列计算汇总统计
count 非na值的数量
min、max 计算最小值和最大值
idxmin、idxmax 计算能够获取到最大值和最小值得索引值
quantile 计算样本的分位数(0到1)
sum 值的总和
mean 值得平均数
median 值得算术中位数(50%分位数)
mad 根据平均值计算平均绝对离差
var 样本值的方差
std 样本值的标准差
skew 样本值得偏度(三阶矩)
kurt 样本值得峰度(四阶矩)
cumsum 样本值得累计和
cummin,cummax 样本值得累计最大值和累计最小值
cumprod 样本值得累计积
diff 计算一阶差分
pct_change 计算百分数变化
查看函数的详细信息
更多的函数
获得一个dataframe数据类型的样例¶
df=get_price('000001.XSHE',start_date='2016-02-01',end_date='2016-02-04',frequency='daily',fields=['open','high','low','close'])df
open | high | low | close | |
---|---|---|---|---|
2016-02-01 | 8.08 | 8.10 | 7.88 | 7.93 |
2016-02-02 | 7.93 | 8.12 | 7.92 | 8.05 |
2016-02-03 | 7.97 | 8.00 | 7.91 | 7.97 |
2016-02-04 | 8.00 | 8.09 | 8.00 | 8.05 |
df.describe()¶
# describe 针对Series或个DataFrame列计算汇总统计df.describe()
open | high | low | close | |
---|---|---|---|---|
count | 4.000000 | 4.000000 | 4.000000 | 4.00 |
mean | 7.995000 | 8.077500 | 7.927500 | 8.00 |
std | 0.063509 | 0.053151 | 0.051235 | 0.06 |
min | 7.930000 | 8.000000 | 7.880000 | 7.93 |
25% | 7.960000 | 8.067500 | 7.902500 | 7.96 |
50% | 7.985000 | 8.095000 | 7.915000 | 8.01 |
75% | 8.020000 | 8.105000 | 7.940000 | 8.05 |
max | 8.080000 | 8.120000 | 8.000000 | 8.05 |
df.count()¶
# count 非na值的数量df.count()
open 4 high 4 low 4 close 4 dtype: int64
df.min() df.max()¶
# min、max 计算最小值和最大值df.min()
open 7.93 high 8.00 low 7.88 close 7.93 dtype: float64
df.idxmin() df.idxmax()¶
# idxmin、idxmax 计算能够获取到最大值和最小值得索引值df.idxmin()
open 2016-02-02 high 2016-02-03 low 2016-02-01 close 2016-02-01 dtype: datetime64[ns]
df.quantile()¶
# quantile 计算样本的分位数(0到1)df.quantile()
open 7.985 high 8.095 low 7.915 close 8.010 dtype: float64
df.sum()¶
# sum 值的总和df.sum()
open 31.98 high 32.31 low 31.71 close 32.00 dtype: float64
df.mean()¶
# mean 值得平均数df.mean()
open 7.9950 high 8.0775 low 7.9275 close 8.0000 dtype: float64
df.median()¶
# median 值得算术中位数(50%分位数)df.median()
open 7.985 high 8.095 low 7.915 close 8.010 dtype: float64
df.mad()¶
# mad 根据平均值计算平均绝对离差df.mad()
open 0.04500 high 0.03875 low 0.03625 close 0.05000 dtype: float64
df.var()¶
# var 样本值的方差df.var()
open 0.004033 high 0.002825 low 0.002625 close 0.003600 dtype: float64
df.std()¶
# std 样本值的标准差df.std()
open 0.063509 high 0.053151 low 0.051235 close 0.060000 dtype: float64
df.skew()¶
# skew 样本值得偏度(三阶矩)df.skew()
open 0.843252 high -1.666658 low 1.329083 close -0.370370 dtype: float64
df.kurt()¶
# kurt 样本值得峰度(四阶矩)df.kurt()
open 0.933953 high 3.047698 low 2.374596 close -3.901230 dtype: float64
df.cumsum()¶
# cumsum 样本值得累计和df.cumsum()
open | high | low | close | |
---|---|---|---|---|
2016-02-01 | 8.08 | 8.10 | 7.88 | 7.93 |
2016-02-02 | 16.01 | 16.22 | 15.80 | 15.98 |
2016-02-03 | 23.98 | 24.22 | 23.71 | 23.95 |
2016-02-04 | 31.98 | 32.31 | 31.71 | 32.00 |
df.cummin()¶
# cummin,cummax 样本值得累计最大值和累计最小值df.cummin()
open | high | low | close | |
---|---|---|---|---|
2016-02-01 | 8.08 | 8.1 | 7.88 | 7.93 |
2016-02-02 | 7.93 | 8.1 | 7.88 | 7.93 |
2016-02-03 | 7.93 | 8.0 | 7.88 | 7.93 |
2016-02-04 | 7.93 | 8.0 | 7.88 | 7.93 |
df.cumprod()¶
# cumprod 样本值得累计积df.cumprod()
open | high | low | close | |
---|---|---|---|---|
2016-02-01 | 8.080000 | 8.10000 | 7.880000 | 7.930000 |
2016-02-02 | 64.074400 | 65.77200 | 62.409600 | 63.836500 |
2016-02-03 | 510.672968 | 526.17600 | 493.659936 | 508.776905 |
2016-02-04 | 4085.383744 | 4256.76384 | 3949.279488 | 4095.654085 |
df.diff()¶
# diff 计算一阶差分df.diff()
open | high | low | close | |
---|---|---|---|---|
2016-02-01 | NaN | NaN | NaN | NaN |
2016-02-02 | -0.15 | 0.02 | 0.04 | 0.12 |
2016-02-03 | 0.04 | -0.12 | -0.01 | -0.08 |
2016-02-04 | 0.03 | 0.09 | 0.09 | 0.08 |
df.pct_change()¶
# pct_change 计算百分数变化df.pct_change()
open | high | low | close | |
---|---|---|---|---|
2016-02-01 | NaN | NaN | NaN | NaN |
2016-02-02 | -0.018564 | 0.002469 | 0.005076 | 0.015132 |
2016-02-03 | 0.005044 | -0.014778 | -0.001263 | -0.009938 |
2016-02-04 | 0.003764 | 0.011250 | 0.011378 | 0.010038 |
查看函数的详细信息¶
在函数后面加个问号(英文的),在研究中执行下,会弹出一个窗口,可以看到更详细的信息,包括比较全面的可用参数即介绍,不过是英文的。例子如下:
df.pct_change?