我目前正在尝试对一个名为load_boston
的玩具数据集进行批量梯度下降,该数据集是通过scikit-learn
库获得的。该数据集的维度为506 x 13,数据值的数量级在100左右。以下是我的Python脚本以及运行脚本后出现的错误信息。
boston_data_regression.py
import scipyimport numpyfrom sklearn.datasets import load_bostondef generateGradient (X, Y, m, alpha, theta, num_iterations) : X_transpose = X.transpose() for i in range(0, num_iterations) : hypothesis = numpy.dot(X, theta) delta = hypothesis - Y cost = numpy.sum(delta ** 2) / (2 * m) print ("No. iteration : %d | Cost : %ld" % ((i + 1), cost)) gradient = numpy.dot(X_transpose, delta) / m theta = theta - alpha * gradient return (theta)if __name__ == '__main__' : boston_data = load_boston() X = boston_data.data[:, 0:11] Y = boston_data.data[:,12] print (boston_data.data) print (numpy.shape(X)) print (numpy.shape(Y)) num_iterations = 100000 alpha = 0.0005 m, n = numpy.shape(X) theta = numpy.ones(n) theta = generateGradient(X, Y, m, alpha, theta, num_iterations) print (theta)
错误信息:
No. iteration : 75 | Cost : 5107568749643583921695342267251134617186569132604666005559083886757991071451800270203896531093730395389956630990780914914913406418422174358389131741568461360913005557192743665544540413282512755425657295941969706284629047517505070375172805106443882740219842668724638239205198801815953626988648840822784No. iteration : 76 | Cost : 50304231336916560424319335120140228744355885776376593114754676052001428477104842266241766923801372402675185672996149747402542290566577918714034301765248577735574592772115140169849029676464020678156657455729204985429508262045621361912203426365153327346440580108502094724090338985744326599309593512431845376boston_data_regression.py:13: RuntimeWarning: overflow encountered in square cost = numpy.sum(delta ** 2) / (2 * m)Traceback (most recent call last): File "boston_data_regression.py", line 38, in <module> theta = generateGradient(X, Y, m, alpha, theta, num_iterations) File "boston_data_regression.py", line 15, in generateGradient print ("No. iteration : %d | Cost : %ld" % ((i + 1), cost))TypeError: %d format: a number is required, not numpy.float64
请问如何解决这个错误,以及是否有更好的/更优化的方式来执行批量梯度下降?
回答:
您的问题源于数据值的维度。您的值逐渐增加到5.e+304
,在接下来的时间步中会引发溢出错误。
您可以使用以下方法检查numpy.float64
值的极限:
import numpynumpy.finfo('d')finfo(resolution=1e-15, min=-1.7976931348623157e+308, max=1.7976931348623157e+308, dtype=float64)
如您所见,最大值约为1.8e+308
。解决此问题的办法是缩小数据值的规模。