C50代码以值1退出（使用因子决策变量非空值）

我读过一个与此问题相关的帖子，但我担心这个错误代码是由于其他原因引起的。我有一个包含8个观测和10个变量的CSV文件：

 > str(rorIn)'data.frame':   8 obs. of  10 variables: $ Acuity             : Factor w/ 3 levels "Elective  ","Emergency ",..: 1 1 2 2 1 2 2 3 $ AgeInYears         : int  49 56 77 65 51 79 67 63 $ IsPriority         : int  0 0 1 0 0 1 0 1 $ AuthorizationStatus: Factor w/ 1 level "APPROVED  ": 1 1 1 1 1 1 1 1 $ iscasemanagement   : Factor w/ 2 levels "N","Y": 1 1 2 1 1 2 2 2 $ iseligible         : Factor w/ 1 level "Y": 1 1 1 1 1 1 1 1 $ referralservicecode: Factor w/ 4 levels "12345","278",..: 4 1 3 1 1 2 3 1 $ IsHighlight        : Factor w/ 1 level "N": 1 1 1 1 1 1 1 1 $ RealLengthOfStay   : int  25 1 1 1 2 2 1 3 $ Readmit            : Factor w/ 2 levels "0","1": 2 1 2 1 2 1 2 1

我这样调用算法：

library("C50")rorIn <- read.csv(file = "RoRdataInputData_v1.6.csv", header = TRUE, quote = "\"")rorIn$Readmit <- factor(rorIn$Readmit)fit <- C5.0(Readmit~., data= rorIn)

然后我得到：

> source("~/R-workspace/src/RoR/RoR/testing.R")c50 code called exit with value 1>

我遵循了其他建议，例如：- 使用因子作为决策变量- 避免空数据

对此有何帮助？我读到这是机器学习中最好的算法之一，但我总是遇到这个错误。

这是原始数据集：

Acuity,AgeInYears,IsPriority,AuthorizationStatus,iscasemanagement,iseligible,referralservicecode,IsHighlight,RealLengthOfStay,ReadmitElective  ,49,0,APPROVED  ,N,Y,SNF            ,N,25,1Elective  ,56,0,APPROVED  ,N,Y,12345,N,1,0Emergency ,77,1,APPROVED  ,Y,Y,OBSERVE        ,N,1,1Emergency ,65,0,APPROVED  ,N,Y,12345,N,1,0Elective  ,51,0,APPROVED  ,N,Y,12345,N,2,1Emergency ,79,1,APPROVED  ,Y,Y,278,N,2,0Emergency ,67,0,APPROVED  ,Y,Y,OBSERVE        ,N,1,1Urgent    ,63,1,APPROVED  ,Y,Y,12345,N,3,0

提前感谢任何帮助，

@David

回答：

你需要以几种方式清理你的数据。

删除只有一个级别的不必要列。它们不包含信息并导致问题。
将目标变量rorIn$Readmit的类别转换为因子。
将目标变量从你提供的用于训练的数据集中分离出来。

这应该可以工作：

rorIn <- read.csv("RoRdataInputData_v1.6.csv", header=TRUE) rorIn$Readmit <- as.factor(rorIn$Readmit)library(Hmisc)singleLevelVars <- names(rorIn)[contents(rorIn)$contents$Levels == 1]trainvars <- setdiff(colnames(rorIn), c("Readmit", singleLevelVars))library(C50)RoRmodel <- C5.0(rorIn[,trainvars], rorIn$Readmit,trials = 10)predict(RoRmodel, rorIn[,trainvars])#[1] 1 0 1 0 0 0 1 0#Levels: 0 1

然后你可以通过将这个预测结果与目标变量的实际值进行比较来评估准确率、召回率和其他统计数据：

rorIn$Readmit#[1] 1 0 1 0 1 0 1 0#Levels: 0 1

通常的方法是设置一个混淆矩阵来比较二元分类问题中的实际值和预测值。在这个小数据集的情况下，可以很容易地看出只有一个假阴性结果。因此，代码似乎工作得很好，但这种鼓舞人心的结果可能由于观测数量非常少而具有欺骗性。

library(gmodels)actual <- rorIn$Readmitpredicted <- predict(RoRmodel,rorIn[,trainvars])     CrossTable(actual,predicted, prop.chisq=FALSE,prop.r=FALSE)# Total Observations in Table:  8  ## #              | predicted #       actual |         0 |         1 | Row Total | #--------------|-----------|-----------|-----------|#            0 |         4 |         0 |         4 | #              |     0.800 |     0.000 |           | #              |     0.500 |     0.000 |           | #--------------|-----------|-----------|-----------|#            1 |         1 |         3 |         4 | #              |     0.200 |     1.000 |           | #              |     0.125 |     0.375 |           | #--------------|-----------|-----------|-----------|# Column Total |         5 |         3 |         8 | #              |     0.625 |     0.375 |           | #--------------|-----------|-----------|-----------|

对于更大的数据集，将数据集分成训练数据和测试数据是很有用的，如果不是必要的话。关于机器学习有很多很好的文献，可以帮助你微调模型及其预测。

学技术

C50代码以值1退出（使用因子决策变量非空值）

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复