在做一项作业时,我在R语言中使用mixtools包进行混合模型建模。当我试图通过引导重采样方法来确定最佳组件数量时,遇到了以下错误
Error in boot.comp(y, x, N = NULL, max.comp = 2, B = 5, sig = 0.05, arbmean = TRUE, : Number of trials must be specified!
我发现我需要填写一个N:对于logisregmix类型的逻辑回归,这是试验次数的n向量。如果为NULL,则N是二元逻辑回归的n个1的向量。
但是,我不知道如何确定N的值以使我的引导重采样能够正常工作。
我的代码链接:https://www.kaggle.com/blastchar/telco-customer-churn
我的代码如下:
data <- read.csv("Desktop/WA_Fn-UseC_-Telco-Customer-Churn.csv", stringsAsFactors = FALSE, na.strings = c("NA", "N/A", "Unknown*", "NULL", ".P"))data <- droplevels(na.omit(data))data <- data[c(1:5032),]testdf <- data[c(5033:7032),]data <- subset(data, select = -customerID)set.seed(100)library(plyr)library(mixtools)data$Churn <- revalue(data$Churn, c("Yes"=1, "No"=0))y <- as.numeric(data$Churn)x <- model.matrix(Churn ~ . , data = data)x <- x[, -1] #remove interceptx <-x[,-c(7, 11, 13, 15, 17, 19, 21)] #multicollinearitya <- boot.comp(y, x, N = NULL, max.comp = 2, B = 100, sig = 0.05, arbmean = TRUE, arbvar = TRUE, mix.type = "logisregmix", hist = TRUE)
以下是关于我的预测变量的更多信息:
dput(x[1:4,]) structure(c(0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 34, 2, 45, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 29.85, 56.95, 53.85, 42.3, 29.85, 1889.5, 108.15, 1840.75), .Dim = c(4L, 23L), .Dimnames = list(c(“1”, “2”, “3”, “4”), c(“genderMale”, “SeniorCitizen”, “PartnerYes”, “DependentsYes”, “tenure”, “PhoneServiceYes”, “MultipleLinesYes”, “InternetServiceFiber optic”, “InternetServiceNo”, “OnlineSecurityYes”, “OnlineBackupYes”, “DeviceProtectionYes”, “TechSupportYes”, “StreamingTVYes”, “StreamingMoviesYes”, “ContractOne year”, “ContractTwo year”, “PaperlessBillingYes”, “PaymentMethodCredit card (automatic)”, “PaymentMethodElectronic check”, “PaymentMethodMailed check”, “MonthlyCharges”, “TotalCharges” )))
我的响应变量是二元的
希望大家能帮帮我!
回答:
查看mixtools::boot.comp
的源代码,这段代码超过800行,急需重构,导致错误的代码行是:
if (mix.type == "logisregmix") { if (is.null(N)) stop("Number of trials must be specified!")
尽管文档中说明了,N
必须被指定。
尝试将其设置为1的向量:N = rep(1, length(y))
或N = rep(1, nrow(x))
事实上,如果你查看mixtools::logisregmixEM
,即boot.comp
调用的内部函数,你会看到当N
为NULL
时如何设置N
:
n <- length(y)if (is.null(N)) { N = rep(1, n)}
遗憾的是,如果N
为NULL
,这段代码永远不会被执行,因为在之前就会因为错误而停止。这是一个bug。