我正在运行一个随机森林模型,大多数变量都可以正常运行且无错误;然而,当我包含一个变量:duration_in_program,以及以下代码时:
```{r Random Forest Model}## 运行随机森林模型mod_rf <- train(left_school ~ job_title + gender + + marital_status + age_at_enrollment + monthly_wage + educational_qualification + cityD + educational_qualification + cityC. + cityB +cityA + duration_in_program, # 方程(结果变量和所有其他变量) data=train_data, # 训练数据 method = "ranger", # 随机森林(ranger比rf快得多) metric = "ROC", # 曲线下面积 trControl = control_conditions, tuneGrid = tune_mtry )mod_rf
我得到了以下错误:
Error in na.fail.default(list(left_welfare = c(1L, 2L, 2L, 2L, 2L, 2L, : 对象中存在缺失值
回答:
假设train()
来自caret包,你可以使用na.action
参数指定一个处理缺失值的函数。默认是na.fail
。一个非常常见的选项是na.omit
。randomForest库有一个na.roughfix
函数,它会“用中位数/众数来填补缺失值”。
mod_rf <- train(left_school ~ job_title + gender + + marital_status + age_at_enrollment + monthly_wage + educational_qualification + cityD + educational_qualification + cityC. + cityB +cityA + duration_in_program, # 方程(结果变量和所有其他变量) data=train_data, # 训练数据 method = "ranger", # 随机森林(ranger比rf快得多) metric = "ROC", # 曲线下面积 trControl = control_conditions, tuneGrid = tune_mtry, na.action = na.omit )mod_rf