PYSPARK: 如何从CrossValidatorModel中获取权重?

我使用以下代码从https://spark.apache.org/docs/2.1.0/ml-tuning.html训练了一个使用交叉验证的逻辑回归模型

现在我想获取权重和截距,但得到以下错误:

AttributeError: ‘CrossValidatorModel’ object has no attribute ‘weights’

我该如何获取这些属性?

*同样问题出现在(trainingSummary = cvModel.summary)

from pyspark.ml import Pipeline    from pyspark.ml.classification import LogisticRegression    from pyspark.ml.evaluation import BinaryClassificationEvaluator    from pyspark.ml.feature import HashingTF, Tokenizer    from pyspark.ml.tuning import CrossValidator, ParamGridBuilder# Prepare training documents, which are labeled.training = spark.createDataFrame([    (0, "a b c d e spark", 1.0),    (1, "b d", 0.0),    (2, "spark f g h", 1.0),    (3, "hadoop mapreduce", 0.0),    (4, "b spark who", 1.0),    (5, "g d a y", 0.0),    (6, "spark fly", 1.0),    (7, "was mapreduce", 0.0),    (8, "e spark program", 1.0),    (9, "a e c l", 0.0),    (10, "spark compile", 1.0),    (11, "hadoop software", 0.0)], ["id", "text", "label"])# Configure an ML pipeline, which consists of tree stages: tokenizer, hashingTF, and lr.tokenizer = Tokenizer(inputCol="text", outputCol="words")hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")lr = LogisticRegression(maxIter=10)pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])# We now treat the Pipeline as an Estimator, wrapping it in a CrossValidator instance.# This will allow us to jointly choose parameters for all Pipeline stages.# A CrossValidator requires an Estimator, a set of Estimator ParamMaps, and an Evaluator.# We use a ParamGridBuilder to construct a grid of parameters to search over.# With 3 values for hashingTF.numFeatures and 2 values for lr.regParam,# this grid will have 3 x 2 = 6 parameter settings for CrossValidator to choose from.paramGrid = ParamGridBuilder() \    .addGrid(hashingTF.numFeatures, [10, 100, 1000]) \    .addGrid(lr.regParam, [0.1, 0.01]) \    .build()crossval = CrossValidator(estimator=pipeline,                          estimatorParamMaps=paramGrid,                          evaluator=BinaryClassificationEvaluator(),                          numFolds=2)  # use 3+ folds in practice# Run cross-validation, and choose the best set of parameters.cvModel = crossval.fit(training)# Prepare test documents, which are unlabeled.test = spark.createDataFrame([    (4, "spark i j k"),    (5, "l m n"),    (6, "mapreduce spark"),    (7, "apache hadoop")], ["id", "text"])# Make predictions on test documents. cvModel uses the best model found (lrModel).prediction = cvModel.transform(test)selected = prediction.select("id", "text", "probability", "prediction")for row in selected.collect():    print(row)

回答:

LogisticRegression模型有coefficients而不是weights。除此之外,可以按以下方式操作:

cvModel    # 从CrossValidator中获取最佳模型    .bestModel    # 获取Pipeline中的最后一个阶段    .stages[-1]    .coefficients)

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注