时间序列预测模型-论文解读|热门发帖

时间序列预测模型-论文解读

作者/qwdkjhd 2019-05-22 13:47 0 来源: FX168财经网人物频道

本周总结：
[1] 论文精读Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks
本周具体工作内容：
[1] Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks
Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. 2018. Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks.
In SIGIR ’18: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, July 8–12, 2018, Ann Arbor, MI, USA. ACM,
New York, NY, USA, 10 pages.

论文提要
多元变量时间序列预测在多个领域都会涉及，包括太阳能发电厂发电量、耗电量预测。这些问题会涉及到长期模式和短期模式的融合，对于这些模式，传统的方法如自回归模型和高斯过程可能会失败。该文提出了一种新颖的模型框架，即LSTNet，其采用CNN和RNN提取变量间短期局部依赖模式及时间序列趋势的长期模式。此外，其采用自回归模型来解决神经网络模型尺度不敏感问题。测试发现其比基础模型性能有很大的提高。

（一）问题提出
输入：Y = {y1,y2, . . . ,yT } where yt ∈ Rn, and n is the variable dimension 即输入一系列时序数据，n是变量的维度。
输出：predict yT h, where h is the desirable horizon ahead of the current time stamp.
Horizon的设置：根据不同的数据集有不同的设置，例如对于交通状况，horizon可设为小时到天；而对于股票市场，即使是未来几秒/几分钟的预测产生效应都有很大的意义。
（二）模型建立

为了解决多元变量时间序列预测问题，此文章提出使用LSTNet模型，这个模型融合了卷积层、循环层、传统自回归；下面是对各个层的解释：
Convolutional Component：
没有池化层的CNN，为了提取出时间维度的短时模式和变量间的依赖关系，CNN层由多个宽度为w，高度为h的过滤器组成，第k个过滤器的扫描输入矩阵X并产生：
hk = RELU (Wk ? X bk)
?代表卷积操作，hk 为一个向量，通过对X进行左侧补零使hk 长度为T，卷积层的输出矩阵大小为dc × T，dc 为过滤器的个数。
Recurrent Component：
RNN层使用GRU模型和RELU函数作为激活函数，时间t的隐藏状态计算为：
rt = σ (xtWxr ht?1Whr br )
ut = σ (xtWxu ht?1Whu bu )
ct = RELU (xtWxc rt ⊙ (ht?1Whc ) bc )
ht = (1 ? ut ) ⊙ ht?1 ut ⊙ ct
rt 为重置门计算公式（表示忽略前一时刻状态信息的程度）；ut 为更新门计算公式（表示前一时刻的状态信息被带入到当前状态中的程度）；
Recurrent-skip Component:
与上一层相近，只是输入从上一时刻的状态变为上p个时刻的状态。P根据不同的数据集有不同的设置。如对于每小时的电力消耗数据集，我们可设p为24.
Temporal Attention Layer:
对于非季节性数据，p是很难确定的，因此我们可以使用Attention机制为每个窗口大小设一个权重
αt = AttnScore(HtR, htR?1)
HtR = [htR?q, . . . , htR?1]
AttnScore是一些相似性函数如点积，cosine, 或者简单的多层感知机。
这一层最后的输出为：
hDt = W [ct ; htR?1] b
ct = Htαt
Autogressive Component:

自回归模型公式如上。
而最后的LSTNet的预测结果为NN部分和AR部分相加：

两个损失函数：
（1）平方误差

（2）绝对值误差

优化使用SGD（梯度下降法）或者Adam
（二）实验及结果
实验数据
实验数据为四组，包括交通数据，太阳能数据，电力数据，汇率数据。每个数据集60%作为训练集，20%作为验证集，20%作为测试集。
? Traffic: A collection of 48 months (2015-2016) hourly data from the California Department of Transportation. The data describes the road occupancy rates (between 0 and 1) measured by different sensors on San Francisco Bay area freeways.
? Solar-Energy : the solar power production records in the year of 2006, which is sampled every 10 minutes from 137PV plants in Alabama State.
? Electricity: The electricity consumption in kWh was recorded every 15 minutes from 2012 to 2014, for n = 321 clients. We converted the data to reflect hourly consumption;
? Exchange-Rate: the collection of the daily exchange rates of eight foreign countries including Australia, British, Canada, Switzerland, China, Japan, New Zealand and Singapore ranging from 1990 to 2016

评估标准

相对平方根误差（RSE）和相关系数（CORR）：RSE越小越好，CORR越大越好
实验模型对比
? AR stands for the autoregressive model, which is equivalent to the one dimensional VAR model.
? LRidge is the vector autoregression (VAR) model with L2-regularization, which has been most popular for multivariate time series forecasting.
? LSVR is the vector autoregression (VAR) model with Support Vector Regression objective function [31] .
? TRMF is the autoregressive model using temporal regularized matrix factorization
? GP is the Gaussian Process for time series modeling. [11, 29]
? VAR-MLP is the model proposed in [36] that combines Multilayer Perception (MLP) and autoregressive model.
? RNN-GRU is the Recurrent Neural Network model using GRU cell.
? LSTNet-skip is our proposed LSTNet model with skip-RNN layer.
? LSTNet-Attn is our proposed LSTNet model with temporal attention layer

加粗的为表现最好的模型，由上表可看出LSTnet模型的变体在三个自相关性高的数据集中表现最好，而第四个数据集没有自相关性，该模型表现很差。
（三）模型简化测试
此测试分别去除LSTnet的不同模块来看其效果：

我们可以发现LSTnet具有最高的鲁棒性，尤其是在horizon变大之后。
（四）我的思考
在提出新的模型时可以考虑将现有模型进行整合使用，并通过测试找出效果最好的模型。

聲明：本文為入駐FX168財經網人物頻道的作者發布，不代表FX168財經網的觀點。文中觀點僅供參考，投資有風險，入市需謹慎

分享到：

举报财经168客户端下载

全部回复

0/140

本社区仅针对特定人员开放

查看需注册登录并通过风险意识测评

5秒后跳转登录页面...

好的名字都没了

0关注3粉丝29帖子

热门标签

原油黄金技术分析市场热点市场分析白银环球财经美元经纪商