我有一个numpy.ndarray
,类似于np.array([(1, 1), (2, 3), (3, 5), (4, 8), (5, 9), (6, 9), (7, 9)])
,我想找到一种曲线拟合方法,能够实现以下两点。
-
它可以将散点拟合成一条线。这并不难,我找到了相同的问题。python numpy/scipy curve fitting
-
它可以通过曲线的趋势返回超过
numpy.ndaray
中x值范围的y值。例如,如果我有一个x值,8,它可以返回一个值9。
我应该采用哪种方法,KNN或SVM(SVR)能解决这类问题吗?
我不知道是否已经讲清楚了,如果需要,我会编辑我的问题。
回答:
我对一种S型方程“y = a / (1.0 + exp(-(x-b)/c))”进行了不错的拟合,参数a = 9.25160014,b = 2.70654566,c = 0.80626597,得到的RMSE = 0.2661,R-squared = 0.9924。以下是我使用的Python图形拟合器,我使用scipy的differential_evolution遗传算法来找到初始参数估计。该模块的scipy实现使用拉丁超立方体算法来确保对参数空间进行彻底搜索,需要在搜索范围内设置参数边界 – 在这个例子中,这些边界是从数据的最大值和最小值中提取的。
import numpy, scipy, matplotlibimport matplotlib.pyplot as pltfrom scipy.optimize import curve_fitfrom scipy.optimize import differential_evolutionimport warningsdata = [(1, 1), (2, 3), (3, 5), (4, 8), (5, 9), (6, 9), (7, 9)]# data to float arraysxData = []yData = []for d in data: xData.append(float(d[0])) yData.append(float(d[1]))def func(x, a, b, c): #sigmoidal curve fitting function return a / (1.0 + numpy.exp(-1.0 * (x - b) / c))# function for genetic algorithm to minimize (sum of squared error)def sumOfSquaredError(parameterTuple): warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm val = func(xData, *parameterTuple) return numpy.sum((yData - val) ** 2.0)def generate_Initial_Parameters(): # min and max used for bounds maxX = max(xData) minX = min(xData) maxY = max(yData) minY = min(yData) minXY = min(minX, minY) maxXY = min(maxX, maxY) parameterBounds = [] parameterBounds.append([minXY, maxXY]) # search bounds for a parameterBounds.append([minXY, maxXY]) # search bounds for b parameterBounds.append([minXY, maxXY]) # search bounds for c # "seed" the numpy random number generator for repeatable results result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3) return result.x# by default, differential_evolution completes by calling curve_fit() using parameter boundsgeneticParameters = generate_Initial_Parameters()# now call curve_fit without passing bounds from the genetic algorithm,# just in case the best fit parameters are aoutside those boundsfittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)print('Fitted parameters:', fittedParameters)print()modelPredictions = func(xData, *fittedParameters) absError = modelPredictions - yDataSE = numpy.square(absError) # squared errorsMSE = numpy.mean(SE) # mean squared errorsRMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSERsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))print()print('RMSE:', RMSE)print('R-squared:', Rsquared)print()########################################################### graphics output sectiondef ModelAndScatterPlot(graphWidth, graphHeight): f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100) axes = f.add_subplot(111) # first the raw data as a scatter plot axes.plot(xData, yData, 'D') # create data for the fitted equation plot xModel = numpy.linspace(min(xData), max(xData)) yModel = func(xModel, *fittedParameters) # now the model as a line plot axes.plot(xModel, yModel) axes.set_xlabel('X Data') # X axis data label axes.set_ylabel('Y Data') # Y axis data label plt.show() plt.close('all') # clean up after using pyplotgraphWidth = 800graphHeight = 600ModelAndScatterPlot(graphWidth, graphHeight)