如何从pandas数据框创建LSTM模型的输入样本？

我正在尝试创建一个LSTM模型，该模型提供二进制输出，即买入或不买入。我的数据格式为：[日期时间, 收盘价, 交易量]，有数百万行数据。我在将数据格式化为3-D（样本，时间步长，特征）时遇到了困难。

我已经使用pandas读取了数据。我希望将数据格式化为4000个样本，每个样本有400个时间步长，以及两个特征（收盘价和交易量）。有人能告诉我如何做到这一点吗？

编辑：我按照建议使用了TimeseriesGenerator，但我不确定如何检查我的序列并用我自己的二进制买入输出替换输出Y。

df = normalize_data(df)print("为神经网络创建序列 \n")targets = df.drop('date_time', 1)train = keras.preprocessing.sequence.TimeseriesGenerator(df, targets, 1, sampling_rate=1, stride=1,                                                         start_index=0, end_index=int(len(df.index)*0.8),                                                         shuffle=True, reverse=False, batch_size=time_steps)

这段代码运行时没有错误，但现在输出的是输入时间序列后的第一个收盘价值。

编辑2：到目前为止，我的代码如下所示：

df = data.normalize_data(df)targets = df.iloc[:, 3]  # 买入信号目标df.drop('y1', axis=1, inplace=True)df.drop('y2', axis=1, inplace=True)train = TimeseriesGenerator(df, targets, length=1, sampling_rate=1, stride=1,                            start_index=0, end_index=int(len(df.index) * 0.8),                            shuffle=True, reverse=False, batch_size=time_steps)# 样本数量print("样本数: " + str(len(train)))x, y = train[0]print(str(x))

输出如下：

样本数: 8Traceback (most recent call last):File "/home/stian/.local/lib/python3.6/site- packages/pandas/core/indexes/base.py", line 3078, in get_locreturn self._engine.get_loc(key)File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_locFile "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_locFile "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_itemFile "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_itemKeyError: range(418, 419)During handling of the above exception, another exception occurred:Traceback (most recent call last):File "./main.py", line 94, in <module>data_menu()File "./main.py", line 42, in data_menudata_menu()File "./main.py", line 56, in data_menunn_menu()File "./main.py", line 76, in nn_menunn.nn_gen(pre_processed_data)File "/home/stian/git/stian9k/nn.py", line 33, in nn_genx, y = train[0]File "/home/stian/.local/lib/python3.6/site-packages/keras_preprocessing/sequence.py", line 378, in __getitem__samples[j] = self.data[indices]File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2688, in __getitem__return self._getitem_column(key)File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2695, in _getitem_columnreturn self._get_item_cache(key)File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 2489, in _get_item_cachevalues = self._data.get(item)File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/internals.py", line 4115, in getloc = self.items.get_loc(item)File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3080, in get_locreturn self._engine.get_loc(self._maybe_cast_indexer(key))File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_locFile "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_locFile "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_itemFile "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_itemKeyError: range(418, 419)

因此，尽管我从生成器中获得了8个对象，但我无法查找它们。如果我测试类型：print(str(type(train)))，我得到的是TimeseriesGenerator对象。再次感谢您的任何建议。

编辑3：事实证明，TimeseriesGenerator不喜欢pandas数据框。通过将数据转换为numpy数组以及将pandas时间戳类型转换为浮点数解决了这个问题。

回答：

您可以简单地使用Keras的TimeseriesGenerator来达到这个目的。您可以轻松设置每个样本的时间步长数量（即长度）、采样率和步长来对数据进行子采样。

它将返回Sequence类的实例，然后您可以将其传递给fit_generator以便在由它生成的数据上拟合模型。我强烈建议阅读文档以获取更多关于这个类的信息、其参数及其使用方法。

学技术

如何从pandas数据框创建LSTM模型的输入样本？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复