我在使用Tensorflow对kaggle数据集进行价格预测。我的神经网络正在学习,但是我的成本函数非常高,我的预测结果与实际输出相差甚远。我尝试通过增加或减少一些层、神经元和激活函数来改变我的网络。我也对超参数进行了大量尝试,但这些并没有带来太大变化。我认为问题不太可能出在数据上,我在kaggle上检查过,这是大多数人使用的相同数据集。
如果你有任何关于为什么我的成本如此之高以及如何降低成本的想法,并且能向我解释,那将非常棒!
这是我的代码:
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsimport tensorflow as tffrom sklearn.utils import shuffledf = pd.read_csv(r"C:\Users\User\Documents\TENSORFLOW\Prediction prix\train2.csv", sep=';')df.head()df = df.loc[:, ['OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF', 'FullBath', 'SalePrice']]df = df.replace(np.nan, 0)df%matplotlib inlineplt = sns.pairplot(df)pltdf = shuffle(df)df_train = df[0:1000]df_test = df[1001:1451]inputX = df_train.drop('SalePrice', 1).as_matrix()inputX = inputX.astype(int)inputY = df_train.loc[:, ['SalePrice']].as_matrix()inputY = inputY.astype(int)inputX_test = df_test.drop('SalePrice', 1).as_matrix()inputX_test = inputX_test.astype(int)inputY_test = df_test.loc[:, ['SalePrice']].as_matrix()inputY_test = inputY_test.astype(int)# Parameterslearning_rate = 0.01training_epochs = 1000batch_size = 500display_step = 50n_samples = inputX.shape[0]x = tf.placeholder(tf.float32, [None, 5])y = tf.placeholder(tf.float32, [None, 1])def add_layer(inputs, in_size, out_size, activation_function=None): Weights = tf.Variable(tf.random_normal([in_size, out_size], stddev=0.1)) biases = tf.Variable(tf.zeros([1, out_size]) + 0.1) Wx_plus_b = tf.matmul(inputs, Weights) + biases if activation_function is None: output = Wx_plus_b else: output = activation_function(Wx_plus_b) return outputl1 = add_layer(x, 5, 3, activation_function=tf.nn.relu)pred = add_layer(l1, 3, 1)# Mean squared errorcost = tf.reduce_sum(tf.pow(pred-y, 2))/(2*n_samples)# Gradient descentoptimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)# Initializing the variablesinit = tf.global_variables_initializer()# Launch the graphwith tf.Session() as sess: sess.run(init) # Training cycle for epoch in range(training_epochs): avg_cost = 0. total_batch = batch_size # Loop over all batches for i in range(total_batch): # Run optimization op (backprop) and cost op (to get loss value) _, c = sess.run([optimizer, cost], feed_dict={x: inputX, y: inputY}) # Compute average loss avg_cost += c / total_batch # Display logs per epoch step if epoch % display_step == 0: print("Epoch:", '%04d' % (epoch+1), "cost=", \ "{:.9f}".format(avg_cost)) print("Optimization Finished!") # Test model correct_prediction = tf.equal(pred,y) # Calculate accuracy accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) print("Accuracy:", accuracy.eval({x: inputX, y: inputY})) print(sess.run(pred, feed_dict={x: inputX_test}))
周期: 0001 成本= 10142407502702304395526144.000000000
周期: 0051 成本= 3256106752.000019550
周期: 0101 成本= 3256106752.000019550
周期: 0151 成本= 3256106752.000019550
周期: 0201 成本= 3256106752.000019550
…
谢谢你的帮助!
回答:
我发现实现中有几个问题:
-
输入没有进行缩放。
使用sklearn的StandardScaler来缩放输入inputX, inputY(以及inputX_text和inputY_text),使其均值为零,标准差为单位。你可以使用inverse_transform将输出转换回适当的尺度。sc = StandardScaler().fit(inputX)inputX = sc.transform(inputX)inputX_test = sc.transform(inputX_test)
-
批量大小过大,你将整个数据集作为一个批次传递。这可能不会导致你面临的特定问题,但为了更好地收敛,尝试减小批量大小。实现一个get_batch()生成器函数,并执行以下操作:
for batch_X, batch_Y in get_batch(input_X, input_Y, batch_size): _, c = sess.run([optimizer, cost], feed_dict={x: batch_X, y: batch_Y})
- 如果仍然存在问题,尝试使用较小的权重初始化(标准差)。
以下是工作代码:
inputX = df_train.drop('SalePrice', 1).as_matrix()inputX = inputX.astype(int)sc = StandardScaler().fit(inputX)inputX = sc.transform(inputX)inputY = df_train.loc[:, ['SalePrice']].as_matrix()inputY = inputY.astype(int)sc1 = StandardScaler().fit(inputY)inputY = sc1.transform(inputY)inputX_test = df_test.drop('SalePrice', 1).as_matrix()inputX_test = inputX_test.astype(int)inputX_test = sc.transform(inputX_test)inputY_test = df_test.loc[:, ['SalePrice']].as_matrix()inputY_test = inputY_test.astype(int)inputY_test = sc1.transform(inputY_test)learning_rate = 0.01training_epochs = 1000batch_size = 50display_step = 50n_samples = inputX.shape[0]x = tf.placeholder(tf.float32, [None, 5])y = tf.placeholder(tf.float32, [None, 1])def get_batch(inputX, inputY, batch_size): duration = len(inputX) for i in range(0,duration//batch_size): idx = i*batch_size yield inputX[idx:idx+batch_size], inputY[idx:idx+batch_size]def add_layer(inputs, in_size, out_size, activation_function=None): Weights = tf.Variable(tf.random_normal([in_size, out_size], stddev=0.005)) biases = tf.Variable(tf.zeros([1, out_size])) Wx_plus_b = tf.matmul(inputs, Weights) + biases if activation_function is None: output = Wx_plus_b else: output = activation_function(Wx_plus_b) return outputl1 = add_layer(x, 5, 3, activation_function=tf.nn.relu)pred = add_layer(l1, 3, 1)# Mean squared errorcost = tf.reduce_mean(tf.pow(tf.subtract(pred, y), 2))# Gradient descentoptimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)# Initializing the variablesinit = tf.global_variables_initializer()# Launch the graphwith tf.Session() as sess: sess.run(init) # Training cycle for epoch in range(training_epochs): avg_cost = 0. total_batch = batch_size # Loop over all batches #for i in range(total_batch): for batch_x, batch_y in get_batch(inputX, inputY, batch_size): # Run optimization op (backprop) and cost op (to get loss value) _, c, _l1, _pred = sess.run([optimizer, cost, l1, pred], feed_dict={x: batch_x, y: batch_y}) # Compute average loss avg_cost += c / total_batch # Display logs per epoch step if epoch % display_step == 0: print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f} ".format(avg_cost)) #print(_l1, _pred)print("Optimization Finished!")