两个月前,我开始使用 Keras 来获取泵的模式,以便在其他软件中使用这些模式。
我不知道为什么我得到的模式与实际模式完全不符。我尝试过在数据集中设置少量特征(输入),也尝试过增加输入数量,但无论如何都无法正常工作。结果看起来像这样:
其中:
- 蓝色:数据集(我试图“近似”的真实数据)
- 橙色:预测
数据集是一个时间序列
这里是包含数据集的 csv 文件
这是代码:
import numpyimport matplotlib.pyplot as pltimport pandasimport mathfrom keras.models import Sequentialfrom keras.layers import Dense, LSTM, Dropoutfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.metrics import mean_squared_errorfrom keras.regularizers import l2, activity_l2def create_dataset(dataset, look_back=1): dataX, dataY = [], [] for i in range(len(dataset) - look_back - 1): a = dataset[i:(i + look_back), 0:4] dataX.append(a) dataY.append(dataset[i + look_back, 4]) return numpy.array(dataX), numpy.array(dataY)# fix random seed for reproducibilityseed=7numpy.random.seed(seed)# load datasetdataframe = pandas.read_csv('datos_horarios.csv', engine='python') dataset = dataframe.values# normalizar el datasetscaler = MinMaxScaler(feature_range=(0, 1))dataset = scaler.fit_transform(dataset)#split data into train data and test datatrain_size = int(len(dataset) * 0.67) test_size = len(dataset) - train_sizetrain, test = dataset[0:train_size, :], dataset[train_size:len(dataset), :]# reshape to X=t y Y=t+1look_back = 1trainX, trainY = create_dataset(train, look_back) testX, testY = create_dataset(test, look_back)# reshape inputs to be [samples, time steps, features]trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 4))testX = numpy.reshape(testX, (testX.shape[0], look_back, 4))# create and adjust LSTM networkmodel = Sequential()model.add(Dropout(0.3, input_shape=(look_back,4))) model.add(LSTM(6, input_shape=(look_back,4), W_regularizer=l2(0.001))) model.add(Dense(10)) model.add(Dense(1))model.compile(loss='mean_squared_error', optimizer='adam' ,momentum=0.99)history= model.fit(trainX, trainY,validation_split=0.33, nb_epoch=250, batch_size=32)# Plotplt.plot(history.history['loss'])plt.plot(history.history['val_loss'])plt.title('model loss')plt.ylabel('loss')plt.xlabel('epochs')plt.legend(['training', 'validation'], loc='upper right')plt.show()# make predictionstrainPredict = model.predict(trainX)testPredict = model.predict(testX)print(trainPredict)numero_inputs=4inp=numero_inputs-1# Get something which has as many features as datasettrainPredict_extended = numpy.zeros((len(trainPredict),numero_inputs+1))# Put the predictions theretrainPredict_extended[:,inp+1] = trainPredict[:,0]# Inverse transform it and select the 3rd column.trainPredict = scaler.inverse_transform(trainPredict_extended)[:,inp+1]# Get something which has as many features as datasettestPredict_extended = numpy.zeros((len(testPredict),numero_inputs+1))# Put the predictions theretestPredict_extended[:,inp+1] = testPredict[:,0]# Inverse transform it and select the 3rd column.testPredict = scaler.inverse_transform(testPredict_extended)[:,inp+1]trainY_extended = numpy.zeros((len(trainY),numero_inputs+1))trainY_extended[:,inp+1]=trainYtrainY=scaler.inverse_transform(trainY_extended)[:,inp+1]testY_extended = numpy.zeros((len(testY),numero_inputs+1))testY_extended[:,inp+1]=testYtestY=scaler.inverse_transform(testY_extended)[:,inp+1]# Calcular error medio cuadraticotrainScore = math.sqrt(mean_squared_error(trainY, trainPredict))print('Train Score: %.2f RMSE' % (trainScore))testScore = math.sqrt(mean_squared_error(testY, testPredict))print('Test Score: %.2f RMSE' % (testScore))# add train predictions to the plottrainPredictPlot = numpy.empty_like(dataset)trainPredictPlot[:, :] = numpy.nantrainPredictPlot[look_back:len(trainPredict)+look_back, 0] = trainPredict# add test predictions to the plottestPredictPlot = numpy.empty_like(dataset)testPredictPlot[:, :] = numpy.nantestPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, 0] = testPredict# Plot real data and training and test predictionsserie,=plt.plot(scaler.inverse_transform(dataset)[:,numero_inputs]) #invierto muestras en formato (0,1) a valores reales y los ploteoentrenamiento,=plt.plot(trainPredictPlot[:,0],linestyle='--') #ploteo las predicciones de entrenamientoprediccion_test,=plt.plot(testPredictPlot[:,0],linestyle='--')plt.ylabel(' (m3)')plt.xlabel('h')plt.legend([serie,entrenamiento,prediccion_test],['Time series','Training','Prediction'], loc='upper right')plt.show()
关于如何解决这个问题,您有什么想法吗?或者,至少,问题是什么?
按列输入:
- 一天中的时间(每半小时),转换为小数。
- 一周中的天数(1-星期一…7-星期日)
- 一年中的月份(1-12)
- 一个月中的天数(1-31)
输出:
- 泵送的水量(m3)
编辑使用@的人的代码,并更改了一些参数,例如训练轮数或history
值,结果非常好:
回答: