dataframe 常用统计函数¶
摘要¶
常用统计函数
describe 针对Series或个DataFrame列计算汇总统计
count 非na值的数量
min、max 计算最小值和最大值
idxmin、idxmax 计算能够获取到最大值和最小值得索引值
quantile 计算样本的分位数(0到1)
sum 值的总和
mean 值得平均数
median 值得算术中位数(50%分位数)
mad 根据平均值计算平均绝对离差
var 样本值的方差
std 样本值的标准差
skew 样本值得偏度(三阶矩)
kurt 样本值得峰度(四阶矩)
cumsum 样本值得累计和
cummin,cummax 样本值得累计最大值和累计最小值
cumprod 样本值得累计积
diff 计算一阶差分
pct_change 计算百分数变化
查看函数的详细信息
更多的函数
获得一个dataframe数据类型的样例¶
df=get_price('000001.XSHE',start_date='2016-02-01',end_date='2016-02-04',frequency='daily',fields=['open','high','low','close'])df
| open | high | low | close |
---|
2016-02-01 | 8.08 | 8.10 | 7.88 | 7.93 |
---|
2016-02-02 | 7.93 | 8.12 | 7.92 | 8.05 |
---|
2016-02-03 | 7.97 | 8.00 | 7.91 | 7.97 |
---|
2016-02-04 | 8.00 | 8.09 | 8.00 | 8.05 |
---|
df.describe()¶
# describe 针对Series或个DataFrame列计算汇总统计df.describe()
| open | high | low | close |
---|
count | 4.000000 | 4.000000 | 4.000000 | 4.00 |
---|
mean | 7.995000 | 8.077500 | 7.927500 | 8.00 |
---|
std | 0.063509 | 0.053151 | 0.051235 | 0.06 |
---|
min | 7.930000 | 8.000000 | 7.880000 | 7.93 |
---|
25% | 7.960000 | 8.067500 | 7.902500 | 7.96 |
---|
50% | 7.985000 | 8.095000 | 7.915000 | 8.01 |
---|
75% | 8.020000 | 8.105000 | 7.940000 | 8.05 |
---|
max | 8.080000 | 8.120000 | 8.000000 | 8.05 |
---|
df.count()¶
# count 非na值的数量df.count()
open 4
high 4
low 4
close 4
dtype: int64
df.min() df.max()¶
# min、max 计算最小值和最大值df.min()
open 7.93
high 8.00
low 7.88
close 7.93
dtype: float64
df.idxmin() df.idxmax()¶
# idxmin、idxmax 计算能够获取到最大值和最小值得索引值df.idxmin()
open 2016-02-02
high 2016-02-03
low 2016-02-01
close 2016-02-01
dtype: datetime64[ns]
df.quantile()¶
# quantile 计算样本的分位数(0到1)df.quantile()
open 7.985
high 8.095
low 7.915
close 8.010
dtype: float64
df.sum()¶
# sum 值的总和df.sum()
open 31.98
high 32.31
low 31.71
close 32.00
dtype: float64
df.mean()¶
# mean 值得平均数df.mean()
open 7.9950
high 8.0775
low 7.9275
close 8.0000
dtype: float64
# median 值得算术中位数(50%分位数)df.median()
open 7.985
high 8.095
low 7.915
close 8.010
dtype: float64
df.mad()¶
# mad 根据平均值计算平均绝对离差df.mad()
open 0.04500
high 0.03875
low 0.03625
close 0.05000
dtype: float64
df.var()¶
# var 样本值的方差df.var()
open 0.004033
high 0.002825
low 0.002625
close 0.003600
dtype: float64
df.std()¶
# std 样本值的标准差df.std()
open 0.063509
high 0.053151
low 0.051235
close 0.060000
dtype: float64
df.skew()¶
# skew 样本值得偏度(三阶矩)df.skew()
open 0.843252
high -1.666658
low 1.329083
close -0.370370
dtype: float64
df.kurt()¶
# kurt 样本值得峰度(四阶矩)df.kurt()
open 0.933953
high 3.047698
low 2.374596
close -3.901230
dtype: float64
df.cumsum()¶
# cumsum 样本值得累计和df.cumsum()
| open | high | low | close |
---|
2016-02-01 | 8.08 | 8.10 | 7.88 | 7.93 |
---|
2016-02-02 | 16.01 | 16.22 | 15.80 | 15.98 |
---|
2016-02-03 | 23.98 | 24.22 | 23.71 | 23.95 |
---|
2016-02-04 | 31.98 | 32.31 | 31.71 | 32.00 |
---|
df.cummin()¶
# cummin,cummax 样本值得累计最大值和累计最小值df.cummin()
| open | high | low | close |
---|
2016-02-01 | 8.08 | 8.1 | 7.88 | 7.93 |
---|
2016-02-02 | 7.93 | 8.1 | 7.88 | 7.93 |
---|
2016-02-03 | 7.93 | 8.0 | 7.88 | 7.93 |
---|
2016-02-04 | 7.93 | 8.0 | 7.88 | 7.93 |
---|
df.cumprod()¶
# cumprod 样本值得累计积df.cumprod()
| open | high | low | close |
---|
2016-02-01 | 8.080000 | 8.10000 | 7.880000 | 7.930000 |
---|
2016-02-02 | 64.074400 | 65.77200 | 62.409600 | 63.836500 |
---|
2016-02-03 | 510.672968 | 526.17600 | 493.659936 | 508.776905 |
---|
2016-02-04 | 4085.383744 | 4256.76384 | 3949.279488 | 4095.654085 |
---|
df.diff()¶
# diff 计算一阶差分df.diff()
| open | high | low | close |
---|
2016-02-01 | NaN | NaN | NaN | NaN |
---|
2016-02-02 | -0.15 | 0.02 | 0.04 | 0.12 |
---|
2016-02-03 | 0.04 | -0.12 | -0.01 | -0.08 |
---|
2016-02-04 | 0.03 | 0.09 | 0.09 | 0.08 |
---|
df.pct_change()¶
# pct_change 计算百分数变化df.pct_change()
| open | high | low | close |
---|
2016-02-01 | NaN | NaN | NaN | NaN |
---|
2016-02-02 | -0.018564 | 0.002469 | 0.005076 | 0.015132 |
---|
2016-02-03 | 0.005044 | -0.014778 | -0.001263 | -0.009938 |
---|
2016-02-04 | 0.003764 | 0.011250 | 0.011378 | 0.010038 |
---|
查看函数的详细信息¶
在函数后面加个问号(英文的),在研究中执行下,会弹出一个窗口,可以看到更详细的信息,包括比较全面的可用参数即介绍,不过是英文的。例子如下:
df.pct_change?