R FeatureHashing: hashed.model.matrix中的额外值

摘要

为什么由FeatureHashing生成的hashed.model矩阵在第一列总是有一个“标记”（即一个像1或2或更多的条目）？

详细信息

在研究特征哈希对一些简单数据的处理时，我发现了一些无法解释的情况：为什么生成的矩阵中每条记录都包含一个额外的值？（总是出现在第一列）。

数据如下：

library(FeatureHashing)df=data.frame( soup=c('broth','pea','tomato','pea','broth'),               main=c( 'fries', 'potato', 'fries', 'rice','rice') )> df    soup   main1  broth  fries2    pea potato3 tomato  fries4    pea   rice5  broth   rice

生成哈希矩阵：

m=hashed.model.matrix(~.,data=df,hash.size=16,signed.hash=FALSE,                      create.mapping=TRUE)5 x 16 sparse Matrix of class "dgCMatrix"   [[ suppressing 16 column names ‘1’, ‘2’, ‘3’ ... ]][1,] 1 . . . . . 1 . . . . . . . 1 .[2,] 2 . . . . . . . . . . . 1 . . .[3,] 1 . 1 . . . . . . . . . . . 1 .[4,] 1 . . . . . . . 1 . . . 1 . . .[5,] 1 . . . . . 1 . 1 . . . . . . .

显示映射：

hash.mapping(m)mainrice mainpotato  mainfries    souppea  soupbroth souptomato         9          1         15         13          7          3

现在手动翻译数据框df中的第一行，使用上述映射：第一行有soupbroth->7和mainfries->15。因此，我们期望在第7列和第15列有一个标记。

查看矩阵的第一行：

[1,] 1 . . . . . 1 . . . . . . . 1 .

我们确实发现第7列和第15列有一个标记，但第1列也有一个额外的标记。实际上，第1列在所有行中都有一个标记。这是从哪里来的？它的用途是什么？

附注：记录用的是”R version 3.2.1 (2015-06-18)” / FeatureHashing_0.9

回答：

第一列是截距项，在许多机器学习包中也称为偏置项。

例如：

m1 = model.matrix(~., df)

如你所见，m1的第一列被命名为截距项，其所有值都是1。

如果你想移除截距列，请尝试：

m = hashed.model.matrix(~ . -1, ...)

学技术

R FeatureHashing: hashed.model.matrix中的额外值

摘要

详细信息

发表回复取消回复

摘要

详细信息

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复