使用Rattle加载WEKA的字符串到向量文件

我一直在使用WEKA进行文本分类工作，现在我想尝试使用R语言。

问题是我无法将WEKA的字符串解析器创建的字符串到向量ARFF文件加载到Rattle中。

查看日志时，我得到了类似这样的信息：

/Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,: scan() expected 'a real', got '2281}'/

我的ARFF数据文件大致如下：

@relation 'reviewData'@attribute polarity {0,2}.....@attribute $$ numeric@attribute we numeric@attribute wer numeric@attribute win numeric@attribute work numeric@data{0 2,63 1,71 1,100 1,112 1,140 1,186 1,228 1}{14 1,40 1,48 1,52 1,61 1,146 1}{2 1,41 1,43 1,57 1,71 1,79 1,106 1,108 1,133 1,146 1,149 1,158 1,201 1}{0 2,6 1,25 1,29 1,42 1,49 1,69 1,82 1,108 1,116 1,138 1,140 1,155 1}..../

有什么方法可以将这个文件转换成R语言可读的格式吗？

谢谢！

回答：

当你保存StringToWordVector属性过滤器的结果时，它将被保存为稀疏ARFF文件。

你需要检查Rattle是否支持读取这种格式。如果不支持，你可以应用SparseToNonSparse实例过滤器，将其转换为密集矩阵格式（文件大小会大得多）。

示例： 如果稀疏数据看起来像这样：

sparse.arff

@relation name@attribute word1 numeric@attribute word2 numeric..@attribute word10 numeric@data{0 1,3 3,8 1,9 1}{2 2,5 1,8 1,9 1}

它将被转换为：

nonsparse.arff

@relation name@attribute word1 numeric@attribute word2 numeric..@attribute word10 numeric@data1,0,0,3,0,0,0,0,1,10,0,2,0,0,1,0,0,1,1

学技术

使用Rattle加载WEKA的字符串到向量文件

sparse.arff

nonsparse.arff

发表回复取消回复

sparse.arff

nonsparse.arff

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复