pyspark.sql.utils.IllegalArgumentException: ‘需求失败：无效的初始容量’

我在使用Spark的ML库进行决策树的交叉验证时，调用cv.fit(train_dataset)时遇到了这个错误：

pyspark.sql.utils.IllegalArgumentException: u'需求失败：无效的初始容量'

除了数据框为空之外，我没有找到太多关于这可能是什么的信息，但我的数据框不是空的。这是我的代码：

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data')
df.columns = ['Sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Schuked weight', 'Viscera weight', 'Shell weight', 'Rings']
train_dataset = sqlContext.createDataFrame(df)
column_types = train_dataset.dtypes
categoricalCols = []
numericCols = []
for ct in column_types:
    if ct[1] == 'string':
        categoricalCols += [ct[0]]
    else:
        numericCols += [ct[0]]
stages = []
for categoricalCol in categoricalCols:
    stringIndexer = StringIndexer(inputCol=categoricalCol, outputCol=categoricalCol+"Index")
    stages += [stringIndexer]
assemblerInputs = map(lambda c: c + "Index", categoricalCols) + numericCols
assembler = VectorAssembler(inputCols=assemblerInputs, outputCol="features")
stages += [assembler]
labelIndexer = StringIndexer(inputCol='Rings', outputCol='indexedLabel')
stages += [labelIndexer]
dt = DecisionTreeClassifier(labelCol="indexedLabel", featuresCol="features")
evaluator = MulticlassClassificationEvaluator(labelCol='indexedLabel', predictionCol='prediction', metricName='f1')
paramGrid = (ParamGridBuilder()
             .addGrid(dt.maxDepth, [1,2,6])
             .addGrid(dt.maxBins, [20,40])
             .build())
stages += [dt]
pipeline = Pipeline(stages=stages)
cv = CrossValidator(estimator=pipeline, estimatorParamMaps=paramGrid, evaluator=evaluator, numFolds=1)
cvModel = cv.fit(train_dataset)
train_dataset = cvModel.transform(train_dataset)

我在本地运行独立的Spark。可能出了什么问题？

谢谢！

回答：

所以，问题出在将CrossValidation的numFolds参数设置为1。如果我想使用ParamGrid进行参数调整，并且只进行一次训练-测试分割，显然我需要使用TrainValidationSplit来替代。

学技术

pyspark.sql.utils.IllegalArgumentException: ‘需求失败：无效的初始容量’

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复