乐趣区

关于机器学习:使用PyTorchLSTM进行单变量时间序列预测的示例教程

工夫序列是指在一段时间内产生的任何可量化的度量或事件。只管这听起来微不足道,但简直任何货色都能够被认为是工夫序列。一个月里你每小时的均匀心率,一年里一只股票的日收盘价,一年里某个城市每周产生的交通事故数。在任何一段时间段内记录这些信息都被认为是一个工夫序列。对于这些例子中的每一个,都有事件产生的频率 (每天、每周、每小时等) 和事件产生的工夫长度(一个月、一年、一天等)。

在本教程中,咱们将应用 PyTorch-LSTM 进行深度学习工夫序列预测。

咱们的指标是接管一个值序列,预测该序列中的下一个值。最简略的办法是应用自回归模型,咱们将专一于应用 LSTM 来解决这个问题。

数据筹备

让咱们看一个工夫序列样本。下图显示了 2013 年至 2018 年石油价格的一些数据。

这只是一个日期轴上单个数字序列的图。下表显示了这个工夫序列的前 10 个条目。每天都有价格数据。

 date        dcoilwtico
 2013-01-01  NaN
 2013-01-02  93.14
 2013-01-03  92.97
 2013-01-04  93.12
 2013-01-07  93.20
 2013-01-08  93.21
 2013-01-09  93.08
 2013-01-10  93.81
 2013-01-11  93.60
 2013-01-14  94.27

许多机器学习模型在标准化数据上的体现要好得多。标准化数据的规范办法是对数据进行转换,使得每一列的均值为 0,标准差为 1。上面的代码 scikit-learn 进行标准化

 fromsklearn.preprocessingimportStandardScaler
 
 # Fit scalers
 scalers= {}
 forxindf.columns:
   scalers[x] =StandardScaler().fit(df[x].values.reshape(-1, 1))
 
 # Transform data via scalers
 norm_df=df.copy()
 fori, keyinenumerate(scalers.keys()):
   norm=scalers[key].transform(norm_df.iloc[:, i].values.reshape(-1, 1))
   norm_df.iloc[:, i] =norm

咱们还心愿数据具备对立的频率——在这个例子中,有这 5 年里每天的石油价格,如果你的数据状况并非如此,Pandas 有几种不同的办法来从新采样数据以适应对立的频率,请参考咱们公众号以前的文章

对于训练数据咱们须要将残缺的工夫序列数据截取成固定长度的序列。假如咱们有一个序列:[1, 2, 3, 4, 5, 6]。

通过抉择长度为 3 的序列,咱们能够生成以下序列及其相干指标:

[Sequence] Target

[1, 2, 3] → 4

[2, 3, 4] → 5

[3, 4, 5] → 6

或者说咱们定义了为了预测下一个值须要回溯多少步。咱们将这个值称为训练窗口,而要预测的值的数量称为预测窗口。在这个例子中,它们别离是 3 和 1。上面的函数具体阐明了这是如何实现的。

 # Defining a function that creates sequences and targets as shown above
 defgenerate_sequences(df: pd.DataFrame, tw: int, pw: int, target_columns, drop_targets=False):
   '''
   df: Pandas DataFrame of the univariate time-series
   tw: Training Window - Integer defining how many steps to look back
   pw: Prediction Window - Integer defining how many steps forward to predict
 
   returns: dictionary of sequences and targets for all sequences
   '''
   data=dict() # Store results into a dictionary
   L=len(df)
   foriinrange(L-tw):
     # Option to drop target from dataframe
     ifdrop_targets:
       df.drop(target_columns, axis=1, inplace=True)
 
     # Get current sequence  
     sequence=df[i:i+tw].values
     # Get values right after the current sequence
     target=df[i+tw:i+tw+pw][target_columns].values
     data[i] = {'sequence': sequence, 'target': target}
   returndata

这样咱们就能够在 PyTorch 中应用 Dataset 类自定义数据集

 classSequenceDataset(Dataset):
 
   def__init__(self, df):
     self.data=df
 
   def__getitem__(self, idx):
     sample=self.data[idx]
     returntorch.Tensor(sample['sequence']), torch.Tensor(sample['target'])
   
   def__len__(self):
     returnlen(self.data)

而后,咱们能够应用 PyTorch DataLoader 来遍历数据。应用 DataLoader 的益处是它在外部主动进行批处理和数据的打乱,所以咱们不用本人实现它,代码如下:

 # Here we are defining properties for our model
 
 BATCH_SIZE=16# Training batch size
 split=0.8# Train/Test Split ratio
 
 sequences=generate_sequences(norm_df.dcoilwtico.to_frame(), sequence_len, nout, 'dcoilwtico')
 dataset=SequenceDataset(sequences)
 
 # Split the data according to our split ratio and load each subset into a
 # separate DataLoader object
 train_len=int(len(dataset)*split)
 lens= [train_len, len(dataset)-train_len]
 train_ds, test_ds=random_split(dataset, lens)
 trainloader=DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True, drop_last=True)
 testloader=DataLoader(test_ds, batch_size=BATCH_SIZE, shuffle=True, drop_last=True)

在每次迭代中,DataLoader 将产生 16 个 (批量大小) 序列及其相干指标,咱们将这些指标传递到模型中。

模型架构

咱们将应用一个独自的 LSTM 层,而后是模型的回归局部的一些线性层,当然在它们之间还有 dropout 层。该模型将为每个训练输入输出单个值。

 classLSTMForecaster(nn.Module):
 
 
   def__init__(self, n_features, n_hidden, n_outputs, sequence_len, n_lstm_layers=1, n_deep_layers=10, use_cuda=False, dropout=0.2):
     '''
     n_features: number of input features (1 for univariate forecasting)
     n_hidden: number of neurons in each hidden layer
     n_outputs: number of outputs to predict for each training example
     n_deep_layers: number of hidden dense layers after the lstm layer
     sequence_len: number of steps to look back at for prediction
     dropout: float (0 < dropout < 1) dropout ratio between dense layers
     '''
     super().__init__()
 
     self.n_lstm_layers=n_lstm_layers
     self.nhid=n_hidden
     self.use_cuda=use_cuda# set option for device selection
 
     # LSTM Layer
     self.lstm=nn.LSTM(n_features,
                         n_hidden,
                         num_layers=n_lstm_layers,
                         batch_first=True) # As we have transformed our data in this way
     
     # first dense after lstm
     self.fc1=nn.Linear(n_hidden*sequence_len, n_hidden) 
     # Dropout layer 
     self.dropout=nn.Dropout(p=dropout)
 
     # Create fully connected layers (n_hidden x n_deep_layers)
     dnn_layers= []
     foriinrange(n_deep_layers):
       # Last layer (n_hidden x n_outputs)
       ifi==n_deep_layers-1:
         dnn_layers.append(nn.ReLU())
         dnn_layers.append(nn.Linear(nhid, n_outputs))
       # All other layers (n_hidden x n_hidden) with dropout option
       else:
         dnn_layers.append(nn.ReLU())
         dnn_layers.append(nn.Linear(nhid, nhid))
         ifdropout:
           dnn_layers.append(nn.Dropout(p=dropout))
     # compile DNN layers
     self.dnn=nn.Sequential(*dnn_layers)
 
   defforward(self, x):
 
     # Initialize hidden state
     hidden_state=torch.zeros(self.n_lstm_layers, x.shape[0], self.nhid)
     cell_state=torch.zeros(self.n_lstm_layers, x.shape[0], self.nhid)
 
     # move hidden state to device
     ifself.use_cuda:
       hidden_state=hidden_state.to(device)
       cell_state=cell_state.to(device)
         
     self.hidden= (hidden_state, cell_state)
 
     # Forward Pass
     x, h=self.lstm(x, self.hidden) # LSTM
     x=self.dropout(x.contiguous().view(x.shape[0], -1)) # Flatten lstm out 
     x=self.fc1(x) # First Dense
     returnself.dnn(x) # Pass forward through fully connected DNN.

咱们设置了 2 个能够自在地调优的参数 n_hidden 和 n_deep_players。更大的参数意味着模型更简单和更长的训练工夫,所以这里咱们能够应用这两个参数灵便调整。

剩下的参数如下:sequence_len 指的是训练窗口,nout 定义了要预测多少步; 将 sequence_len 设置为 180,nout 设置为 1,意味着模型将查看 180 天 (半年) 后的状况,以预测今天将产生什么。

 nhid=50# Number of nodes in the hidden layer
 n_dnn_layers=5# Number of hidden fully connected layers
 nout=1# Prediction Window
 sequence_len=180# Training Window
 
 # Number of features (since this is a univariate timeseries we'll set
 # this to 1 -- multivariate analysis is coming in the future)
 ninp=1
 
 # Device selection (CPU | GPU)
 USE_CUDA=torch.cuda.is_available()
 device='cuda'ifUSE_CUDAelse'cpu'
 
 # Initialize the model
 model=LSTMForecaster(ninp, nhid, nout, sequence_len, n_deep_layers=n_dnn_layers, use_cuda=USE_CUDA).to(device)

模型训练

定义好模型后,咱们能够抉择损失函数和优化器,设置学习率和周期数,并开始咱们的训练循环。因为这是一个回归问题(即咱们试图预测一个间断值),最简略也是最平安的损失函数是均方误差。这提供了一种持重的办法来计算理论值和模型预测值之间的误差。

优化器和损失函数如下:

 # Set learning rate and number of epochs to train over
 lr=4e-4
 n_epochs=20
 
 # Initialize the loss function and optimizer
 criterion=nn.MSELoss().to(device)
 optimizer=torch.optim.AdamW(model.parameters(), lr=lr)

上面就是训练循环的代码:在每次训练迭代中,咱们将计算之前创立的训练集和验证集的损失:

# Lists to store training and validation losses
t_losses, v_losses = [], []
# Loop over epochs
for epoch in range(n_epochs):
  train_loss, valid_loss = 0.0, 0.0

  # train step
  model.train()
  # Loop over train dataset
  for x, y in trainloader:
    optimizer.zero_grad()
    # move inputs to device
    x = x.to(device)
    y  = y.squeeze().to(device)
    # Forward Pass
    preds = model(x).squeeze()
    loss = criterion(preds, y) # compute batch loss
    train_loss += loss.item()
    loss.backward()
    optimizer.step()
  epoch_loss = train_loss / len(trainloader)
  t_losses.append(epoch_loss)

  # validation step
  model.eval()
  # Loop over validation dataset
  for x, y in testloader:
    with torch.no_grad():
      x, y = x.to(device), y.squeeze().to(device)
      preds = model(x).squeeze()
      error = criterion(preds, y)
    valid_loss += error.item()
  valid_loss = valid_loss / len(testloader)
  v_losses.append(valid_loss)

  print(f'{epoch} - train: {epoch_loss}, valid: {valid_loss}')
plot_losses(t_losses, v_losses)

这样模型曾经训练好了,能够评估预测了。

推理

咱们调用训练过的模型来预测未打乱的数据,并比拟预测与实在察看有多大不同。

def make_predictions_from_dataloader(model, unshuffled_dataloader):
  model.eval()
  predictions, actuals = [], []
  for x, y in unshuffled_dataloader:
    with torch.no_grad():
      p = model(x)
      predictions.append(p)
      actuals.append(y.squeeze())
  predictions = torch.cat(predictions).numpy()
  actuals = torch.cat(actuals).numpy()
  return predictions.squeeze(), actuals

咱们的预测看起来还不错! 预测的成果还能够,表明咱们没有适度拟合模型,让咱们看看是否用它来预测将来。

预测

如果咱们将历史定义为预测时刻之前的序列,算法很简略:

  1. 从历史 (训练窗口长度) 中获取最新的无效序列。
  2. 将最新的序列输出模型并预测下一个值。
  3. 将预测值附加到历史记录上。
  4. 迭代反复步骤 1。

这里须要留神的是,依据训练模型时抉择的参数,你预测的越长(远),模型就越容易体现出它本人的偏差,开始预测平均值。因而,如果没有必要,咱们不心愿总是预测得太超前,因为这会影响预测的准确性。

这在上面的函数中实现:

def one_step_forecast(model, history):
      '''
      model: PyTorch model object
      history: a sequence of values representing the latest values of the time 
      series, requirement -> len(history.shape) == 2

      outputs a single value which is the prediction of the next value in the
      sequence.
      '''
      model.cpu()
      model.eval()
      with torch.no_grad():
        pre = torch.Tensor(history).unsqueeze(0)
        pred = self.model(pre)
      return pred.detach().numpy().reshape(-1)

  def n_step_forecast(data: pd.DataFrame, target: str, tw: int, n: int, forecast_from: int=None, plot=False):
      '''
      n: integer defining how many steps to forecast
      forecast_from: integer defining which index to forecast from. None if
      you want to forecast from the end.
      plot: True if you want to output a plot of the forecast, False if not.
      '''
      history = data[target].copy().to_frame()

      # Create initial sequence input based on where in the series to forecast 
      # from.
      if forecast_from:
        pre = list(history[forecast_from - tw : forecast_from][target].values)
      else:
        pre = list(history[self.target])[-tw:]

      # Call one_step_forecast n times and append prediction to history
      for i, step in enumerate(range(n)):
        pre_ = np.array(pre[-tw:]).reshape(-1, 1)
        forecast = self.one_step_forecast(pre_).squeeze()
        pre.append(forecast)

      # The rest of this is just to add the forecast to the correct time of 
      # the history series
      res = history.copy()
      ls = [np.nan for i in range(len(history))]

      # Note: I have not handled the edge case where the start index + n is 
      # before the end of the dataset and crosses past it.
      if forecast_from:
        ls[forecast_from : forecast_from + n] = list(np.array(pre[-n:]))
        res['forecast'] = ls
        res.columns = ['actual', 'forecast']
      else:
        fc = ls + list(np.array(pre[-n:]))
        ls = ls + [np.nan for i in range(len(pre[-n:]))]
        ls[:len(history)] = history[self.target].values
        res = pd.DataFrame([ls, fc], index=['actual', 'forecast']).T
      return res

咱们来看看理论的成果

咱们在这个工夫序列的两头从不同的中央进行预测,这样咱们就能够将预测与理论产生的状况进行比拟。咱们的预测程序,能够从任何中央对任何正当数量的步骤进行预测,红线示意预测。(这些图表显示的是 y 轴上的标准化后的价格)

预测 2013 年第三季度后 200 天

预测 2014/15 后 200 天

从 2016 年第一季度开始预测 200 天

从数据的最初一天开始预测 200 天

总结

咱们这个模型体现的还算个别! 然而咱们通过这个示例残缺的介绍了工夫序列预测的全副过程,咱们能够通过尝试架构和参数的调整使模型变得得更好,预测得更精确。

本文只解决单变量工夫序列,其中只有一个值序列。还有一些办法能够应用多个系列来进行预测。这被称为多元工夫序列预测,我将在当前的文章中介绍。

本文的代码在这里:

https://avoid.overfit.cn/post/3c8a4160c79041ed8d89b18738f65058

作者:Zain Baquar

退出移动版