我正在尝试最小化一个函数。我展示了scipy运行时的进展情况。显示的第一条消息是. . .
Optimization terminated successfully. Current function value: 0.000113 Iterations: 32 Function evaluations: 13299 Gradient evaluations: 33
这看起来很有希望。问题是这个过程并没有终止。事实上,它继续显示如下消息
Warning: Maximum number of iterations has been exceeded. Current function value: 0.023312 Iterations: 50 Function evaluations: 20553 Gradient evaluations: 51Warning: Maximum number of iterations has been exceeded. Current function value: 0.068360 Iterations: 50 Function evaluations: 20553 Gradient evaluations: 51Warning: Maximum number of iterations has been exceeded. Current function value: 0.071812 Iterations: 50 Function evaluations: 20553 Gradient evaluations: 51Warning: Maximum number of iterations has been exceeded. Current function value: 0.050061 Iterations: 50 Function evaluations: 20553 Gradient evaluations: 51
下面是包含最小化调用的代码:
def one_vs_all(X, y, num_labels, lmbda): # store dimensions of X that will be reused m = X.shape[0] n = X.shape[1] # append ones vector to X matrix X = np.column_stack((np.ones((X.shape[0], 1)),X)) # create vector in which thetas will be returned all_theta = np.zeros((num_labels, n+1)) # choose initial thetas #init_theta = np.zeros((n+1, 1)) for i in np.arange(num_labels): # note theta should be first arg in objective func signature followed by X and y init_theta = np.zeros((n+1,1)) theta = minimize(lrCostFunctionReg, x0=init_theta, args=(X, (y == i)*1, lmbda), options={'disp':True, 'maxiter':50}) all_theta[i] = theta.x return all_theta
我尝试了更改最小化方法,将迭代次数从低至30调整到高达1000。我还尝试提供自己的梯度函数。在所有情况下,程序最终确实提供了答案,但答案完全错误。有人知道这是怎么回事吗?
编辑:该函数是可微分的。以下是成本函数,接着是其梯度(未正则化,然后是正则化)。
def lrCostFunctionReg(theta, X, y, lmbda): m = X.shape[0] # unregularized cost h = sigmoid(X @ theta) # calculate regularization term reg_term = ((lmbda / (2*m)) * (theta[1:,].T @ theta[1:,])) cost_reg = (1/m) * (-(y.T @ np.log(h)) - ((1 - y).T @ np.log(1 - h))) + reg_term return cost_regdef gradFunction(theta, X, y): m = X.shape[0] theta = np.reshape(theta,(theta.size,1)) # hypothesis as generated in cost function h = sigmoid(X@theta) # unregularized gradient grad = (1/m) * np.dot(X.T, (h-y)) return graddef lrGradFunctionReg(theta, X, y, lmbda): m = X.shape[0] # theta reshaped to ensure proper operation theta = np.reshape(theta,(theta.size,1)) # generate unregularized gradient grad = gradFunction(theta, X, y) # calc regularized gradient w/o touching intercept; essential that only 1 index used grad[1:,] = ((lmbda / m) * theta[1:,]) + grad[1:,] return grad.flatten()
回答:
为了回答我自己的问题,问题最终证明是向量形状的问题。我喜欢用2D编程,但SciPy的优化例程只能处理已经“展平”为数组的列向量和行向量。多维矩阵没问题,但列向量和行向量就太过分了。
例如,如果y是一个标签向量,y.shape是(400,1),你需要对y使用y.flatten(),这将使y.shape变为(400,)。然后SciPy就能处理你的数据,假设所有其他维度都有意义。
因此,如果你将MATLAB机器学习代码翻译成Python的努力停滞不前,请检查确保你已经展平了你的行向量和列向量,特别是那些由梯度函数返回的向量。