线性回归在Apache Spark中使用Scala甚至不是直线

我想为我愚蠢的问题道歉，但我遇到了线性回归的问题。我在这方面遇到了很多困难。您能帮帮我吗？

这是我的主要代码。我目前使用一些外部库来绘制数据。

import com.fundtrml.config.ConfigSetUpimport org.apache.spark.ml.feature.LabeledPointimport org.apache.spark.ml.linalg.Vectorsimport org.apache.spark.ml.regression.LinearRegressionimport org.apache.spark.sql.SparkSessionobject SimpleLinearRegression {  def main(args: Array[String]): Unit = {    ConfigSetUp.HadoopBinariesConfig();    val ss = SparkSession.builder().appName("DataSet Test")      .master("local[*]").getOrCreate()    import ss.implicits._    var listOfData = List(40, 41, 45, 43, 42, 60, 61, 59, 50, 49, 47, 39, 41, 37, 36, 34, 33, 37)    val data =  listOfData  //(1 to 21 by 1)                      // create a collection of Doubles      .map(n => (n, n))                               // make it pairs      .map { case (label, features) =>      LabeledPoint(label, Vectors.dense(features)) } // create labeled points of dense vectors      .toDF                                           // make it a DataFrame    var splittedData = data.randomSplit(Array(0.6,0.4))    var trainingData = splittedData(0)    var testSetData = splittedData(1)    trainingData.show()    val lr = new LinearRegression()        .setMaxIter(10)        .setRegParam(0.3)        .setElasticNetParam(0.8)    //train    val model = lr.fit(trainingData)    println(s"model.intercept: ${model.intercept}")    println(s"model.coefficients : ${model.coefficients}")    // Summarize the model over the training set and print out some metrics    val trainingSummary = model.summary    println(s"numIterations: ${trainingSummary.totalIterations}")    println(s"objectiveHistory: [${trainingSummary.objectiveHistory.mkString(",")}]")    trainingSummary.residuals.show()    println(s"RMSE: ${trainingSummary.rootMeanSquaredError}")    println(s"r2: ${trainingSummary.r2}")    val predictions = model.transform(testSetData)    predictions.show()    //Display the data    import com.quantifind.charts.Highcharts._    regression(listOfData) //using this external library with embeded functionality about regression    var currentPredictions = predictions.select("prediction").rdd.map(r => r(0)).collect.toList    println(currentPredictions)//    regression(currentPredictions.map(_.toString.toDouble))  }}

我的训练集如下，标签列 – 应预测的值，特征 – 用于进行预测的值：

+-----+--------+|label|features|+-----+--------+| 43.0|  [43.0]|| 45.0|  [45.0]|| 42.0|  [42.0]|| 60.0|  [60.0]|| 50.0|  [50.0]|| 59.0|  [59.0]|| 61.0|  [61.0]|| 47.0|  [47.0]|| 49.0|  [49.0]|| 41.0|  [41.0]|| 34.0|  [34.0]|+-----+--------+

评估回归模型时，我得到了以下数据：

model.intercept: 1.7363839862169372model.coefficients : [0.9640297102666925]numIterations: 3objectiveHistory: [0.5,0.406233822167566,0.031956224821402285]RMSE: 0.29784178261548705r2: 0.9987061382565019 --> 极高接近1

最后，我得到了以下预测结果：

+-----+--------+------------------+|label|features|        prediction|+-----+--------+------------------+| 40.0|  [40.0]| 40.29757239688463|| 41.0|  [41.0]|41.261602107151326|| 39.0|  [39.0]|39.333542686617946|| 36.0|  [36.0]|36.441453555817866|| 37.0|  [37.0]| 37.40548326608456|| 33.0|  [33.0]| 33.54936442501779|| 37.0|  [37.0]| 37.40548326608456|+-----+--------+------------------+

很容易看出预测结果并不在同一条线上。它们不可能位于直线上。这是使用Scala库WISP绘制的整个数据集

预测数据

预期结果，但使用WISP完成

回答：

您绘制的内容似乎是Y轴上的标签和X轴上的列表索引，而不是X轴上的特征值。

当以特征对预测的方式绘制时，预测确实位于同一条线上。我这样做时得到的结果是：链接

学技术

线性回归在Apache Spark中使用Scala甚至不是直线

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复