我想在scikit-learn中创建一个自定义的评估评分器,以便我可以用来获取不同回归器的评估分数。
这是自定义评估函数:
def eval_func(y_true, y_pred): return float(max(0,100*r2_score(y_true , y_pred)))
自定义评分器的定义
cust_scorer = make_scorer(eval_func, greater_is_better=True)
我在以下函数中使用了上述代码
def regression_model(model, data, predictors,outcome): #拟合模型: #features = data.drop(columns=[outcome]) features = data[predictors] print(features.shape) target = data[outcome].to_numpy().reshape(-1,1) print(target.shape) model.fit(features,target) #在训练集上进行预测: predictions = model.predict(features) print(predictions.shape) #打印准确率 accuracy = cust_scorer(model,predictions,target) print ("Eval_metric : %s" % "{0:.3%}".format(accuracy))#执行5折交叉验证 kf = KFold(data.shape[0], n_folds=5) error = [] for train, test in kf: # 过滤训练数据 train_predictors = (features.iloc[train,:]) # 用于训练算法的目标。 train_target = target.iloc[train] # 使用预测器和目标训练算法。 model.fit(train_predictors, train_target) #记录每次交叉验证运行的错误 error.append(cust_scorer(model,features.iloc[test,:], target.iloc[test])) print ("Cross-Validation Score : %s" % "{0:.3%}".format(np.mean(error)))
我在调用自定义评分器获取准确率的行上得到了以下错误
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 57 is different from 1)
错误跟踪:
ValueError Traceback (most recent call last)<ipython-input-206-fdf6c0030d91> in <module>() 76 77 predictors = list(set(train_data.columns).difference({target_col}))---> 78 regression_model(lr,train_data,predictors,target_col) 79 # features = train_data[predictors].values 80 # # print(features.shape)<ipython-input-206-fdf6c0030d91> in regression_model(model, data, predictors, outcome) 34 35 #Print accuracy---> 36 accuracy = cust_scorer(model,predictions,target) 37 print ("Eval_metric : %s" % "{0:.3%}".format(accuracy)) 38 #Perform k-fold cross-validation with 5 folds/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_scorer.py in __call__(self, estimator, X, y_true, sample_weight) 167 stacklevel=2) 168 return self._score(partial(_cached_call, None), estimator, X, y_true,--> 169 sample_weight=sample_weight) 170 171 def _factory_args(self):/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_scorer.py in _score(self, method_caller, estimator, X, y_true, sample_weight) 203 """ 204 --> 205 y_pred = method_caller(estimator, "predict", X) 206 if sample_weight is not None: 207 return self._sign * self._score_func(y_true, y_pred,/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_scorer.py in _cached_call(cache, estimator, method, *args, **kwargs) 50 """Call estimator with method and args and kwargs.""" 51 if cache is None:---> 52 return getattr(estimator, method)(*args, **kwargs) 53 54 try:/usr/local/lib/python3.7/dist-packages/sklearn/linear_model/_base.py in predict(self, X) 223 Returns predicted values. 224 """--> 225 return self._decision_function(X) 226 227 _preprocess_data = staticmethod(_preprocess_data)/usr/local/lib/python3.7/dist-packages/sklearn/linear_model/_base.py in _decision_function(self, X) 207 X = check_array(X, accept_sparse=['csr', 'csc', 'coo']) 208 return safe_sparse_dot(X, self.coef_.T,--> 209 dense_output=True) + self.intercept_ 210 211 def predict(self, X):/usr/local/lib/python3.7/dist-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output) 149 ret = np.dot(a, b) 150 else:--> 151 ret = a @ b 152 153 if (sparse.issparse(a) and sparse.issparse(b)ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 57 is different from 1)
根据错误信息,至少有一个输入到评分器的形状应至少有一个维度为57(其中57是数据的输入特征数量,一个输出特征)。
但是,输入到评分器的两个输入的维度都是(22939, 1)
我尝试直接将评分器的两个输入传递给我的评估函数,结果是正确的,只有当我通过评分器传递时,我才会遇到这个问题。
我不明白为什么57
作为维度出现在具有不同形状的输入中,以及该如何处理这个问题。
任何帮助都将不胜感激。
编辑1:为了生成数据来测试这个问题,你可以创建一个维度为(22939, 58)
的随机np数组,将其转换为DataFrame,最后一列作为outcome
列,其余为预测器。
回答:
问题已解决。
我将预测值作为第一个输入传递给了评分器对象,而实际上它应该是特征向量。
所以Scorer(model, feature, label)
是正确的格式。内部Sklearn会将特征通过模型获取预测值,然后将这些预测值传递给自定义评估函数。