你好,我正在学习一些机器学习算法,为了理解,我尝试实现了一个使用单一特征的线性回归算法,并使用残差平方和作为梯度下降方法的成本函数,如下所示:
我的伪代码如下:
while not converge w <- w - step*gradient
python代码 Linear.py
import mathimport numpy as numdef get_regression_predictions(input_feature, intercept, slope): predicted_output = [intercept + xi*slope for xi in input_feature] return(predicted_output)def rss(input_feature, output, intercept,slope): return sum( [ ( output.iloc[i] - (intercept + slope*input_feature.iloc[i]) )**2 for i in range(len(output))])def train(input_feature,output,intercept,slope): file = open("train.csv","w") file.write("ID,intercept,slope,RSS\n") i =0 while True: print("RSS:",rss(input_feature, output, intercept,slope)) file.write(str(i)+","+str(intercept)+","+str(slope)+","+str(rss(input_feature, output, intercept,slope))+"\n") i+=1 gradient = [derivative(input_feature, output, intercept,slope,n) for n in range(0,2) ] step = 0.05 intercept -= step*gradient[0] slope-= step*gradient[1] return intercept,slope def derivative(input_feature, output, intercept,slope,n): if n==0: return sum( [ -2*(output.iloc[i] - (intercept + slope*input_feature.iloc[i])) for i in range(0,len(output))] ) return sum( [ -2*(output.iloc[i] - (intercept + slope*input_feature.iloc[i]))*input_feature.iloc[i] for i in range(0,len(output))] )
主程序如下:
import Linear as linimport pandas as pdimport numpy as npfrom sklearn.model_selection import train_test_splitdf = pd.read_csv("test2.csv")train = dflin.train(train["X"],train["Y"], 0, 0)
test2.csv文件内容如下:
X,Y0,11,32,73,134,21
我将rss的值记录在一个文件中,并注意到rss的值在每次迭代中变得更糟,如下所示:
ID,intercept,slope,RSS0,0,0,6691,4.5,14.0,3585.252,-7.25,-18.5,19714.31253,19.375,58.25,108855.953125
从数学角度看,我认为这没有任何意义。我多次检查了自己的代码,认为它是正确的,我是不是在其他地方做错了什么?
回答:
如果你的成本没有减少,这通常意味着你在梯度下降方法中步长过大,导致超调。
使用较小的步长可能会有所帮助。你也可以考虑使用可变步长方法,这些方法可以在每次迭代中改变步长,以获得良好的收敛特性和速度;通常,这些方法会根据梯度成比例地调整步长。当然,具体方法取决于每个问题。