我是机器学习的新手,正在尝试使用Keras预测里拉汇率。我认为数值是正确的,但无法正确绘制这些值。看起来像这样: 图片
这是我的代码(csv文件是德语的,因此这里是翻译:Datum -> 日期,Erster -> 开盘价,Hoch -> 最高价,Tief -> 最低价,Schlusskurs -> 收盘价):
问题如下:
import pandas as pdimport numpy as npfrom keras.models import Sequentialfrom keras.layers import LSTMX_train = []y_train = []csv_file = "wkn_A0C32V_historic.csv" #csv文件(路径)data = pd.read_csv(csv_file, sep=";") #读取csv文件data["Erster vorher"] = data["Erster"].shift(-1) #将Erster(开盘价)数据向后移动一步data["Erster"] = data["Erster"].str.replace(",", ".") #用点替换所有逗号,以便使用浮点数进行计算data["Erster vorher"] = data["Erster vorher"].str.replace(",", ".") #同上data["Changes"] = (data["Erster"].astype(float) / data["Erster vorher"].astype(float)) - 1 #计算变化data = data.dropna() #删除NaN值changes = data["Changes"]#X_train = (样本数量, 序列长度, 输入维度)for i in range(len(changes) - 20): X_train.append(np.array(changes[i+1:i+21][::-1])) y_train.append(changes[i])X_train = np.array(X_train).reshape(-1, 20, 1)y_train = np.array(y_train)print("X_train 形状: " + str(X_train.shape))print("y_train 形状: " + str(y_train.shape))#训练数据model = Sequential()model.add(LSTM(1, input_shape=(20, 1)))model.compile(optimizer="rmsprop", loss="mse", metrics=["accuracy"])model.fit(X_train, y_train, batch_size=32, epochs=10)preds = model.predict(X_train)preds = preds.reshape(-1)print("预测的形状: " + str(preds.shape))preds = np.append(preds, np.zeros(20))data["predictions"] = predsdata["Open_predicted"] = data["Erster vorher"].astype(float) * (1 + data["predictions"].astype(float)) #使用预测数值计算新的开盘价print(data)import matplotlib.pyplot as pltdates = np.array(data["Datum"]).astype(np.datetime64)#问题从这里开始...plt.plot(dates, data["Erster"], label="Erster")plt.plot(dates, data["Open_predicted"], label="Erster (预测)")plt.legend()plt.show()
输出:
Epoch 9/10 32/3444 [..............................] - ETA: 0s - loss: 9.5072e-05 - accuracy: 0.1250 448/3444 [==>...........................] - ETA: 0s - loss: 1.8344e-04 - accuracy: 0.0513 960/3444 [=======>......................] - ETA: 0s - loss: 1.2734e-04 - accuracy: 0.05831472/3444 [===========>..................] - ETA: 0s - loss: 1.0480e-04 - accuracy: 0.05771984/3444 [================>.............] - ETA: 0s - loss: 9.7956e-05 - accuracy: 0.06002464/3444 [====================>.........] - ETA: 0s - loss: 9.0399e-05 - accuracy: 0.06212976/3444 [========================>.....] - ETA: 0s - loss: 8.5287e-05 - accuracy: 0.06493444/3444 [==============================] - 0s 122us/step - loss: 8.1555e-05 - accuracy: 0.0633Epoch 10/10 32/3444 [..............................] - ETA: 0s - loss: 5.5561e-05 - accuracy: 0.0312 544/3444 [===>..........................] - ETA: 0s - loss: 6.1705e-05 - accuracy: 0.06621056/3444 [========>.....................] - ETA: 0s - loss: 1.2215e-04 - accuracy: 0.06441536/3444 [============>.................] - ETA: 0s - loss: 9.9676e-05 - accuracy: 0.06512048/3444 [================>.............] - ETA: 0s - loss: 9.2219e-05 - accuracy: 0.06252592/3444 [=====================>........] - ETA: 0s - loss: 8.8050e-05 - accuracy: 0.06253104/3444 [==========================>...] - ETA: 0s - loss: 8.1685e-05 - accuracy: 0.06513444/3444 [==============================] - 0s 118us/step - loss: 8.1349e-05 - accuracy: 0.0633预测的形状: (3444,) Datum Erster Hoch ... Changes predictions Open_predicted0 2020-09-04 8.8116 8,8226 ... 0.011816 0.000549 8.7134791 2020-09-03 8.7087 8,8263 ... -0.006457 0.001141 8.7753012 2020-09-02 8.7653 8,7751 ... -0.005051 0.001849 8.8260933 2020-09-01 8.8098 8,8377 ... 0.009465 0.001102 8.7368184 2020-08-31 8.7272 8,7993 ... 0.000069 0.001149 8.736630... ... ... ... ... ... ... ...3459 2009-01-07 2.0449 2,1288 ... -0.021392 0.000000 2.0896003460 2009-01-06 2.0896 2,0922 ... -0.020622 0.000000 2.1336003461 2009-01-05 2.1336 2,1477 ... 0.002914 0.000000 2.1274003462 2009-01-04 2.1274 2,1323 ... -0.005377 0.000000 2.1389003463 2009-01-02 2.1389 2,1521 ... 0.000000 0.000000 2.138900[3464 rows x 9 columns]
回答:
从图表来看,有两点引人注目:(1)Erster 和 Erster(预测)看起来像是处于不同的数量级,(2)y轴上的大量标签让人联想到绘制日期时间而不是数字时得到的结果。我猜测在某处可能存在一些混淆,但不明显是哪里。
我的排查建议是:(i)绘制Erster与Erster(预测)的对比图,检查它们的规模是否相似,(ii)打印data.info()
的输出,检查数据类型是否符合预期。
附注:我建议对数据框架进行排序,使日期按升序排列。