如何在使用scikit_learn和pandas训练模型后预测未来数据（在我这里是降雨量）？

我正在训练一个模型以预测未来的降雨数据。我已经完成了模型的训练。我使用的数据集是： https://www.kaggle.com/redikod/historical-rainfall-data-in-bangladesh数据集看起来像这样：

              Station   Year  Month Day Rainfall dayofyear1970-01-01  1   Dhaka   1970    1   1   0           11970-01-02  1   Dhaka   1970    1   2   0           21970-01-03  1   Dhaka   1970    1   3   0           31970-01-04  1   Dhaka   1970    1   4   0           41970-01-05  1   Dhaka   1970    1   5   0           5

我已经通过在线找到的参考代码完成了使用训练和测试数据的训练，并且还检查了预测值与真实值的对比。

这是代码，

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsimport tensorflow as tf#data is in local folderdf = pd.read_csv("data.csv")df.head(5)df.drop(df[(df['Day']>28) & (df['Month']==2) & (df['Year']%4!=0)].index,inplace=True)df.drop(df[(df['Day']>29) & (df['Month']==2) & (df['Year']%4==0)].index,inplace=True)df.drop(df[(df['Day']>30) & ((df['Month']==4)|(df['Month']==6)|(df['Month']==9)|(df['Month']==11))].index,inplace=True)date = [str(y)+'-'+str(m)+'-'+str(d) for y, m, d in zip(df.Year, df.Month, df.Day)]df.index = pd.to_datetime(date)df['date'] = df.indexdf['dayofyear']=df['date'].dt.dayofyeardf.drop('date',axis=1,inplace=True)df.head()df.size()df.info()df.plot(x='Year',y='Rainfall',style='.', figsize=(15,5))train = df.loc[df['Year'] <= 2015]test = df.loc[df['Year'] == 2016]train=train[train['Station']=='Dhaka']test=test[test['Station']=='Dhaka']X_train=train.drop(['Station','StationIndex','dayofyear'],axis=1)Y_train=train['Rainfall']X_test=test.drop(['Station','StationIndex','dayofyear'],axis=1)Y_test=test['Rainfall']from sklearn import svmfrom sklearn.svm import SVCmodel = svm.SVC(gamma='auto',kernel='linear')model.fit(X_train, Y_train)Y_pred = model.predict(X_test)df1 = pd.DataFrame({'Actual Rainfall': Y_test, 'Predicted Rainfall': Y_pred})  df1[df1['Predicted Rainfall']!=0].head(10)

在这之后，我尝试实际使用模型预测未来几天/几个月/几年的降雨量。我使用了一些方法，比如那些用于预测股票价格的方法（在调整代码后）。但似乎没有一个方法有效。由于我已经训练了模型，我以为预测未来几天会很容易。比如，我用1970年到2015年的数据进行训练，用2016年的数据进行测试。现在我想预测2017年的降雨量。类似这样的事情。

我的问题是，如何以一种直观的方式做到这一点？

如果有人能回答这个问题，我将非常感激。

编辑 @Mercury:这是使用那个代码后的实际结果。我怀疑模型根本没有运行…这是实际结果的图片： https://i.sstatic.net/81Vk1.png

回答：

我注意到这里有一个非常简单的错误：

X_train=train.drop(['Station','StationIndex','dayofyear'],axis=1)Y_train=train['Rainfall']X_test=test.drop(['Station','StationIndex','dayofyear'],axis=1)Y_test=test['Rainfall']

你没有从训练数据中删除Rainfall列。

我大胆假设，你在训练和测试中都得到了100%的完美准确率，对吗？这就是原因。你的模型看到训练数据中的’Rainfall’列里无论有什么都是答案，所以它在测试时也照做不误，因此得到了完美的结果——但实际上它根本没有进行任何预测！

试着这样运行：

X_train=train.drop(['Station','StationIndex','dayofyear','Rainfall'],axis=1)Y_train=train['Rainfall']X_test=test.drop(['Station','StationIndex','dayofyear','Rainfall'],axis=1)Y_test=test['Rainfall']from sklearn import svmmodel = svm.SVC(gamma='auto',kernel='linear')model.fit(X_train, Y_train)print('Accuracy on training set: {:.2f}%'.format(100*model.score(X_train, Y_train)))print('Accuracy on testing set: {:.2f}%'.format(100*model.score(X_test, Y_test)))

学技术

如何在使用scikit_learn和pandas训练模型后预测未来数据（在我这里是降雨量）？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复