我在R中创建了一个决策树模型。目标变量是薪资,我们试图根据其他输入变量预测一个人的薪资是否高于或低于50k
df<-salary.data train = sample(1:nrow(df), nrow(df)/2)train = sample(1:nrow(df), size=0.2*nrow(df))test = - traintraining_data = df[train, ]testing_data = df[test, ]fit <- rpart(training_data$INCOME ~ ., method="class", data=training_data)##generate treetesting_data$predictionsOutput = predict(fit, newdata=testing_data, type="class")##make prediction
之后我尝试通过以下方式创建增益图
# Gain Chartpred <- prediction(testing_data$predictionsOutput, testing_data$INCOME)gain <- performance(pred,"tpr","fpr")plot(gain, col="orange", lwd=2)
通过参考资料,我无法理解如何使用ROCR包通过’Prediction’函数构建图表。这仅适用于二元目标变量吗?我收到错误信息说’预测格式无效’
任何帮助都将非常感激,以帮助我为上述模型构建增益图。谢谢!!
AGE EMPLOYER DEGREE MSTATUS JOBTYPE SEX C.GAIN C.LOSS HOURS1 39 State-gov Bachelors Never-married Adm-clerical Male 2174 0 402 50 Self-emp-not-inc Bachelors Married-civ-spouse Exec-managerial Male 0 0 133 38 Private HS-grad Divorced Handlers-cleaners Male 0 0 40 COUNTRY INCOME1 United-States <=50K2 United-States <=50K3 United-States <=50K
回答:
将预测转换为向量,使用c()
library('rpart')library('ROCR')setwd('C:\\Users\\John\\Google Drive\\working\\R\\questions')df<-read.csv(file='salary-class.csv',header=TRUE)train = sample(1:nrow(df), nrow(df)/2)train = sample(1:nrow(df), size=0.2*nrow(df))test = - traintraining_data = df[train, ]testing_data = df[test, ]fit <- rpart(training_data$INCOME ~ ., method="class", data=training_data)##generate treetesting_data$predictionsOutput = predict(fit, newdata=testing_data, type="class")##make prediction# Doesn't work# pred <- prediction(testing_data$predictionsOutput, testing_data$INCOME)v <- c(pred = testing_data$predictionsOutput)pred <- prediction(v, testing_data$INCOME)gain <- performance(pred,"tpr","fpr")plot(gain, col="orange", lwd=2)