C5.0决策树 – c50代码以值1退出

我遇到了以下错误

c50代码以值1退出

我正在使用Kaggle提供的泰坦尼克号数据进行操作

# 导入数据集train <- read.csv("train.csv", sep=",")# 这是数据结构  str(train)

输出如下：

    'data.frame':   891 obs. of  12 variables: $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ... $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ... $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ... $ Name       : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ... $ Sex        : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ... $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ... $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ... $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ... $ Ticket     : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ... $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ... $ Cabin      : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ... $ Embarked   : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...

然后我尝试使用C5.0决策树

# 尝试使用C5.0决策树library(C50)#C5.0模型需要因子结果，否则会出错train$Survived <- factor(train$Survived)new_model <- C5.0(train[-2],train$Survived)

运行上述代码后，我得到了这个错误

c50代码以值1退出

我无法弄清楚哪里出了问题？我在不同的数据集上使用了类似的代码，并且运行得很好。有什么关于如何调试我的代码的建议吗？

-谢谢

回答：

对于有兴趣的人，数据可以在这里找到：http://www.kaggle.com/c/titanic-gettingStarted/data。我认为你需要注册才能下载它。

关于你的问题，首先我想你应该写的是

new_model <- C5.0(train[,-2],train$Survived)

其次，请注意Cabin和Embarked列的结构。这两个因子有一个空字符作为级别名称（使用levels(train$Embarked)检查）。这是C50出错的地方。如果你修改你的数据，使得

levels(train$Cabin)[1] = "missing"levels(train$Embarked)[1] = "missing"

你的算法现在将不会出错地运行。

学技术

C5.0决策树 – c50代码以值1退出

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复