GPRegression
(GPy
)和GaussianProcessRegressor
(scikit-learn
)都使用相似的初始值和相同的优化器(lbfgs
)。为什么结果会有显著差异?
#!pip -qq install pods#!pip -qq install GPyfrom sklearn.gaussian_process import GaussianProcessRegressorfrom sklearn.gaussian_process.kernels import RBF, ConstantKernel as Cfrom sklearn.preprocessing import StandardScalerimport podsdata = pods.datasets.olympic_marathon_men()X = StandardScaler().fit_transform(data['X'])y = data['Y']# scikit-learnmodel = GaussianProcessRegressor(C()*RBF(), n_restarts_optimizer=20, random_state=0)model.fit(X, y)print(model.kernel_)# GPyfrom GPy.models import GPRegressionfrom GPy.kern import RBF as GPyRBFmodel = GPRegression(X, y, GPyRBF(1))model.optimize_restarts(20, verbose=0)print(model.kern)
结果
2.89**2 * RBF(length_scale=0.173) rbf. | value | constraints | priors variance | 25.399509298957504 | +ve | lengthscale | 4.279767394389103 | +ve |
回答:
使用GPy的RBF()
核相当于使用scikit-learn的ConstantKernel()*RBF() + WhiteKernel()
。因为GPy库内部添加了似然噪声。通过这种方法,我能够在两者之间获得可比较的结果。