pyspark – 第 2 页

IT技术

通过Spark MLlib回归估计数值

xiaolong · 2025年5月25日 · 0 Comment

我在训练Spark MLlib的线性回归模型，但我认为…

IT技术

使用GPU支持的高维数据更快的Kmeans聚类

xiaolong · 2025年5月25日 · 0 Comment

我们一直在使用Kmeans来对我们的日志进行聚类。一个…

IT技术

开发一个Python/PySpark程序来展示相似类型的词汇

xiaolong · 2025年5月25日 · 0 Comment

已关闭。此问题需要更加聚焦。目前不接受回答。想要改进…

IT技术

pyspark ml模型预测后映射id列

xiaolong · 2025年5月25日 · 0 Comment

我使用pyspark.ml.classificatio…

IT技术

Pyspark逻辑回归拟合RDD对象没有属性_jdf错误

xiaolong · 2025年5月24日 · 0 Comment

我使用Python创建逻辑回归，并转向mllib以获得…

IT技术

如何将JSON对象列表转换为单个PySpark数据框？

xiaolong · 2025年5月24日 · 0 Comment

我是PySpark的新手，我从API中获取了一系列JS…

IT技术

Spark 2.1.1：如何在已训练的Spark 2.1.1 LDA模型上预测未见文档的主题？

xiaolong · 2025年5月24日 · 0 Comment

我在pyspark（Spark 2.1.1）上使用客户…

IT技术

PySpark AttributeError: 类型对象 ‘ALS’ 没有属性 ‘trainImplicit’

xiaolong · 2025年5月24日 · 0 Comment

我试图使用ALS来训练我的数据集以找到潜在因子。我的数…

IT技术

如何在Spark中将多个列作为特征传递给逻辑回归分类器？ [重复]

xiaolong · 2025年5月23日 · 0 Comment

这个问题已有答案: 在Spark ML / pyspa…

IT技术

如何在Azure Databricks中导出我的预测（数组）？

xiaolong · 2025年5月22日 · 0 Comment

我无法将我的数据框导出为CSV文件。显示的消息是“CS…

IT技术

### Pyspark 内存溢出问题。如何确保表被覆盖

xiaolong · 2025年5月22日 · 0 Comment

我目前正在尝试理解 Spark 计算过程及其对内存消耗…

IT技术

‘CrossValidatorModel’对象没有属性’featureImportances’

xiaolong · 2025年5月22日 · 0 Comment

我正在尝试提取使用Pyspark训练的随机森林分类器模…

IT技术

为什么Spark ML感知机分类器的F1分数很高，而在TensorFlow上的相同模型表现却非常差？

xiaolong · 2025年5月22日 · 0 Comment

我们的团队正在处理一个自然语言处理问题。我们有一组带有…

IT技术

PySpark中的特征选择

xiaolong · 2025年5月22日 · 0 Comment

我正在处理一个形状为1,456,354 X 53的机器…

IT技术

### Pyspark错误：使用交叉验证时出现“Field rawPrediction does not exist”

xiaolong · 2025年5月22日 · 0 Comment

我在训练数据上尝试使用CrossValidator，但…

IT技术

Spark模型如何处理向量列？

xiaolong · 2025年5月22日 · 0 Comment

在Spark中，方法如何处理向量组装列？例如，如果我有…

IT技术

PySpark 中稀疏向量与稠密向量的比较

xiaolong · 2025年5月1日 · 0 Comment

如何判断在 PySpark 中应该使用稀疏表示还是稠密…

IT技术

使用CrossValidator和ParamGridBuilder查找最佳管道模型

xiaolong · 2025年4月16日 · 0 Comment

我已经有一个可以接受的模型，但我希望通过在Spark …

IT技术

在pyspark.ml中使用RandomForestClassifier时，VectorIndexer的maxCategories未按预期工作

xiaolong · 2025年4月16日 · 0 Comment

背景：我正在进行一个简单的二元分类，使用来自pyspa…

IT技术

使用关键词对列中的文本进行分类

xiaolong · 2025年4月16日 · 0 Comment

我有一个表格列，包含了解决问题的处理描述，这些文本中包…

IT技术

使用Pyspark训练非线性SVC模型

xiaolong · 2025年4月15日 · 0 Comment

有没有办法使用Pyspark来训练一个非线性SVC模型…

IT技术

pyspark.ml: 计算精确度和召回率时的类型错误

xiaolong · 2025年4月15日 · 0 Comment

我正在尝试使用 pyspark.ml 计算分类器的精确…

IT技术

pyspark.ml pipelines: 基本预处理任务是否需要自定义转换器？

xiaolong · 2025年4月15日 · 0 Comment

在开始使用pyspark.ml和管道API时，我发现自…

IT技术

在PySpark中使用UDF函数时，稠密向量的类型应该是怎样的？ [duplicate]

xiaolong · 2025年4月15日 · 0 Comment

这个问题已有答案: 如何在PySpark DataFr…

IT技术

使用Python从MongoDB创建LabeledPoint

xiaolong · 2025年4月15日 · 0 Comment

我想使用Python从MongoDB创建Labeled…