如何在R中将一个单词列表(chr)与数据框中多个列的值进行比较,并在匹配时输出二进制响应

我想将words列中的每个单词与V1V576列中的值进行比较(逐行对每一行)。如果words列中的任何单词与V列中的任何单词匹配,则用1替换相应V列中的单词;如果没有匹配,则用0替换。我知道如何做到这一点吗?我不确定如何对所有行和列进行循环

数据框名为Datawords列是一个列表($ words :List of 42201)。有42201行,大约有576列的单词需要比较(V1到V576)。

这是前3行和前20列的dput文件。

structure(list(id = c("Te-1", "Te-2", "Te-3"), category = c("Fabric Care", "Fabric Care", "Home Care"), brand = c("Tide", "Tide", "Cascade"), sub_category = c("Laundry", "Laundry", "Auto Dishwashing"),     market = c("US", "US", "US"), review_title = c("the best in a very crowded market",     "first time", "i have been using another well known brand and did not expect    "    ), review_text = c("the best general wash detergent  convenient container that keeps the product driy ",     "this helped to clean our washing machine after getting it from someone else   this review was collected as part of a promotion  ",     "i have been using another well known brand and did not expect much difference  wow  was i ever mistaken  i will never go back "    ), review_rating = c(5L, 5L, 5L), words = list(c("the", "best",     "general", "wash", "deterg", "conveni", "contain", "that",     "keep", "the", "product", "driy"), c("this", "help", "to",     "clean", "our", "wash", "machin", "after", "get", "it", "from",     "someon", "els", "this", "review", "was", "collect", "as",     "part", "of", "a", "promot"), c("i", "have", "been", "use",     "anoth", "well", "known", "brand", "and", "did", "not", "expect",     "much", "differ", "wow", "was", "i", "ever", "mistaken",     "i", "will", "never", "go", "back")), V1 = c("absolut", "absolut",     "absolut"), V2 = c("action", "action", "action"), V3 = c("actionpac",     "actionpac", "actionpac"), V4 = c("actual", "actual", "actual"    ), V5 = c("addit", "addit", "addit"), V6 = c("adverti", "adverti",     "adverti"), V7 = c("afford", "afford", "afford"), V8 = c("agent",     "agent", "agent"), V9 = c("allerg", "allerg", "allerg"),     V10 = c("allergi", "allergi", "allergi"), V11 = c("alon",     "alon", "alon")), row.names = c(NA, -3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000023d166a1ef0>)

请查看下面的数据框片段,以更好地理解我的问题

点击这里查看数据表

非常感谢你的帮助!


回答:

我已经创建了一个数据框

数据

data <- data.frame(words = c("the, best, general","i, have, been"), v1 = c("best","no"), v2 = c("have", "nothing"), stringsAsFactors = F)

使用for循环条件,我已经传递了函数grepl,只要匹配就显示1,否则显示0

for (i in 2: ncol(data)){  for (j in 1:nrow(data)){    x <- i    y <- data$words[j]    ab <- data [j,x]     abc <- grepl (ab , y)       data[j,i] <- ifelse (abc %in% "TRUE", 1, data[j,i])      }}

结果

print (data)        words       v1     v2the, best, general  1      0   i, have, been    0      0

Related Posts

为什么我们在K-means聚类方法中使用kmeans.fit函数?

我在一个视频中使用K-means聚类技术,但我不明白为…

如何获取Keras中ImageDataGenerator的.flow_from_directory函数扫描的类名?

我想制作一个用户友好的GUI图像分类器,用户只需指向数…

如何查看每个词的tf-idf得分

我试图了解文档中每个词的tf-idf得分。然而,它只返…

如何修复 ‘ValueError: Found input variables with inconsistent numbers of samples: [32979, 21602]’?

我在制作一个用于情感分析的逻辑回归模型时遇到了这个问题…

如何向神经网络输入两个不同大小的输入?

我想向神经网络输入两个数据集。第一个数据集(元素)具有…

逻辑回归与机器学习有何关联

我们正在开会讨论聘请一位我们信任的顾问来做机器学习。一…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注