使用梯度下降的多元线性回归

我正在学习使用梯度下降的多元线性回归。我编写了以下Python代码:

    import pandas as pd    import numpy as np        x1 = np.array([1,2,3,4,5,6,7,8,9,10],dtype='float64')      x2 = np.array([5,10,20,40,80,160,320,640,1280,2560],dtype='float64')    y = np.array([350,700,1300,2400,4500,8600,16700,32800,64900,129000],dtype='float64')        def multivar_gradient_descent(x1,x2,y):        w1=w2=w0=0        iteration=500        n=len(x1)        learning_rate=0.02                for i in range(iteration):            y_predicted = w1 * x1 + w2 * x2 +w0             cost = (1*(2/n))*float(sum((y_predicted-y)**2))  # cost function                        x1d = sum(x1*(y_predicted-y))/n  # derivative for feature x1            x2d = sum(x2*(y_predicted-y))/n   # derivative for feature x2            cd =  sum(1*(y-y_predicted))/n # derivative for bias                w1 = w1 - learning_rate * x1d            w2 = w2 - learning_rate * x2d            w0 = w0 - learning_rate * cd            print(f"Iteration {i}: a= {w1}, b = {w2}, c = {w0}, cost = {cost} ")            return w1,w2, w0        w1,w2,w0 = multivar_gradient_descent(x1,x2,y)    w1,w2,w0

然而,结果是成本函数不断增加,直到变成无穷大(如下所示)。我花了几个小时检查导数和成本函数的公式,但我无法找出错误所在。我感到非常沮丧,希望有人能帮助我解决这个问题。谢谢你。

Iteration 0: a= 4685.5, b = 883029.5, c = -522.5, cost = 4462002500.0 Iteration 1: a= -81383008.375, b = -15430704757.735, c = 9032851.74, cost = 1.3626144151911089e+18 Iteration 2: a= 1422228350500.3176, b = 269662832866446.66, c = -157855848816.2755, cost = 4.161440004246925e+26 Iteration 3: a= -2.4854478828631716e+16, b = -4.712554891970221e+18, c = 2758646212375989.0, cost = 1.2709085355243152e+35 Iteration 4: a= 4.343501644116814e+20, b = 8.235533749226551e+22, c = -4.820935671838988e+19, cost = 3.881369199171854e+43 Iteration 5: a= -7.590586253095058e+24, b = -1.4392196523846473e+27, c = 8.424937075201089e+23, cost = 1.1853745914189544e+52 Iteration 6: a= 1.326510368511469e+29, b = 2.5151414235959125e+31, c = -1.472319266480111e+28, cost = 3.620147555871397e+60 Iteration 7: a= -2.3181737208386835e+33, b = -4.3953932745475034e+35, c = 2.5729854159139745e+32, cost = 1.105597202871857e+69 Iteration 8: a= 4.051177832870898e+37, b = 7.681270666011396e+39, c = -4.496479874458965e+36, cost = 3.37650649906685e+77 Iteration 9: a= -7.079729049644685e+41, b = -1.3423581317783506e+44, c = 7.857926879944079e+40, cost = 1.0311889455424087e+86 Iteration 10: a= 1.2372343423113349e+46, b = 2.3458688442326932e+48, c = -1.3732300949746233e+45, cost = 3.1492628303921182e+94 Iteration 11: a= -2.1621573467862958e+50, b = -4.099577083092681e+52, c = 2.3998198539580117e+49, cost = 9.617884692967256e+102 Iteration 12: a= 3.7785278280657085e+54, b = 7.164310273158479e+56, c = -4.193860411686855e+53, cost = 2.937312982406619e+111 Iteration 13: a= -6.603253259383672e+58, b = -1.2520155286691985e+61, c = 7.32907727374022e+57, cost = 8.970587433766233e+119 Iteration 14: a= 1.1539667190934036e+63, b = 2.187988549158328e+65, c = -1.280809765026251e+62, cost = 2.739627659321216e+128 Iteration 15: a= -2.0166410956339498e+67, b = -3.823669740212017e+69, c = 2.238308579532037e+66, cost = 8.366854196711946e+136 Iteration 16: a= 3.524227554668779e+71, b = 6.682142046784112e+73, c = -3.9116076672823015e+70, cost = 2.5552468384109146e+145 Iteration 17: a= -6.158844964518726e+75, b = -1.1677531106785476e+78, c = 6.835819994909099e+74, cost = 7.80375306142527e+153 Iteration 18: a= 1.0763031248287995e+80, b = 2.0407338215081817e+82, c = -1.194609454154816e+79, cost = 2.3832751078395456e+162 Iteration 19: a= -1.8809182942418207e+84, b = -3.5663313522046286e+86, c = 2.0876672425822773e+83, cost = 7.278549429920333e+170 Iteration 20: a= 3.287042049772272e+88, b = 6.232424424816986e+90, c = -3.648350932258958e+87, cost = 2.2228773182554595e+179 Iteration 21: a= -5.744345977200645e+92, b = -1.0891616727381027e+95, c = 6.375759629418162e+91, cost = 6.788692746528022e+187 Iteration 22: a= 1.0038664004334024e+97, b = 1.9033895455483145e+99, c = -1.1142105462686083e+96, cost = 2.0732745270409844e+196 Iteration 23: a= -1.7543298295730705e+101, b = -3.326312202113057e+103, c = 1.9471642809242535e+100, cost = 6.331804111587467e+204 Iteration 24: a= 3.065819465220816e+105, b = 5.812973435628952e+107, c = -3.402811748286256e+104, cost = 1.9337402155196325e+213 Iteration 25: a= -5.357743358678581e+109, b = -1.0158595498601174e+112, c = 5.946661977991267e+108, cost = 5.905664728753603e+221 Iteration 26: a= 9.363047701635277e+113, b = 1.7752887338463183e+116, c = -1.0392225987316703e+113, cost = 1.8035967607506306e+230 Iteration 27: a= -1.6362609478315793e+118, b = -3.102446680700735e+120, c = 1.816117367544431e+117, cost = 5.508205129817299e+238 Iteration 28: a= 2.8594854738709632e+122, b = 5.421752091975047e+124, c = -3.1737976990896245e+121, cost = 1.6822121447766637e+247 Iteration 29: a= -4.997159643830032e+126, b = -9.474907636509772e+128, c = 5.546443206127292e+125, cost = 5.13749512471037e+255 Iteration 30: a= 8.732901332811723e+130, b = 1.655809288168471e+133, c = -9.692814462503292e+129, cost = 1.5689968853439082e+264 Iteration 31: a= -1.5261382690222234e+135, b = -2.8936476258832726e+137, c = 1.6938900970034892e+134, cost = 4.791734427889445e+272 Iteration 32: a= 2.667038052317318e+139, b = 5.056860498736353e+141, c = -2.960196619698286e+138, cost = 1.46340117318896e+281 Iteration 33: a= -4.660843723593812e+143, b = -8.837232935670386e+145, c = 5.173159724337836e+142, cost = 4.4692439155775235e+289 Iteration 34: a= 8.145164706926056e+147, b = 1.5443709783730996e+150, c = -9.040474323708519e+146, cost = 1.364912201990395e+298 Iteration 35: a= -1.4234270024354842e+152, b = -2.698901043124031e+154, c = 1.5798888948493553e+151, cost = 4.168457471405497e+306 Iteration 36: a= 2.487542614748579e+156, b = 4.716526626425798e+158, c = -2.760971195418877e+155, cost = inf Iteration 37: a= -4.347162341028204e+160, b = -8.24247464517401e+162, c = 4.824998749459281e+159, cost = inf Iteration 38: a= 7.596983588224419e+164, b = 1.4404326246286964e+167, c = -8.432037599998082e+163, cost = inf Iteration 39: a= -1.3276283495338805e+169, b = -2.517261181154549e+171, c = 1.473560135031107e+168, cost = inf Iteration 40: a= 2.32012747430196e+173, b = 4.399097705650062e+175, c = -2.5751539243057795e+172, cost = inf 

回答:

问题在于您将权重初始化为0,如w1=w2=w0=0所示。

如果所有权重都初始化为0,则相对于损失函数的导数对于W[l]中的每个w都是相同的,因此在后续迭代中所有权重具有相同的值。

因此,我们需要将权重初始化为随机值。

使用较大随机值初始化权重:

当权重以非常高的值初始化时,np.dot(W,X)+b项会变得显著增大,如果应用像sigmoid()这样的激活函数,函数将其值映射到接近1的地方,此时梯度的斜率变化缓慢,学习过程需要很长时间。

有许多方法可以初始化权重,例如在Keras中,DenseLSTMCNN层都使用glorot_uniform初始化,否则称为Xavier初始化

对于您的目的,您可以使用numpy的random.randn按照以下公式随机初始化权重,其中l表示特定层。这将导致权重随机初始化为0到1之间的值:

# 指定随机种子值以确保可重现性np.random.seed(3)W[l] = np.random.randn(l, l-1)

您还应该在预处理步骤中进行特征归一化,返回数据的归一化版本,其中每个特征的均值为0,标准差为1。这通常是处理学习算法时一个很好的预处理步骤。

def  featureNormalize(X):    """    X : 形状为(m x n)的数据集    """    X_norm = X.copy()    mu = np.zeros(X.shape[1])    sigma = np.zeros(X.shape[1])    mu = np.mean(X, axis=0)    sigma = np.std(X, axis=0)    X_norm = (X-mu)/ sigma    return X_norm

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注