我在尝试使用tidymodels进行PCR时不断遇到这个问题。我知道有一个类似的帖子,但那里的解决方案对我来说不起作用。
我的数据
library(AppliedPredictiveModeling)data(solubility)train = solTrainY %>% bind_cols(solTrainXtrans) %>% rename(solubility = ...1)
我的PCR分析
train %<>% mutate_all(., as.numeric) %>% glimpse()tidy_rec = recipe(solubility ~ ., data = train) %>% step_corr(all_predictors(), threshold = 0.9) %>% step_pca(all_predictors(), num_comp = ncol(train)-1) %>% prep()tidy_rec %>% tidy(2) %>% select(terms) %>% distinct()tidy_predata = tidy_rec %>% juice()# Re-samplingtidy_folds = vfold_cv(train, v = 10)# Set modeltidy_rlm = linear_reg() %>% set_mode("regression") %>% set_engine("lm")# Set workflowtidy_wf = workflow() %>% add_recipe(tidy_rec) %>% add_model(tidy_rlm) # Fit modeltidy_fit = tidy_wf %>% fit_resamples(tidy_folds) tidy_fit %>% collect_metrics()
错误
x Fold01: recipe: Error: Can't subset columns that don't exist.x Columns `PC1`, `PC2`, `PC3`, `PC4`, and `PC5` don't exist.x Fold02: recipe: Error: Can't subset columns that don't exist.x Columns `PC1`, `PC2`, `PC3`, `PC4`, and `PC5` don't exist.x Fold03: recipe: Error: Can't subset columns that don't exist.x Columns `PC1`, `PC2`, `PC3`, `PC4`, and `PC5` don't exist.x Fold04: recipe: Error: Can't subset columns that don't exist.x Columns `PC1`, `PC2`, `PC3`, `PC4`, and `PC5` don't exist.x Fold05: recipe: Error: Can't subset columns that don't exist.x Columns `PC1`, `PC2`, `PC3`, `PC4`, and `PC5` don't exist.x Fold06: recipe: Error: Can't subset columns that don't exist....
回答:
这是因为workflow
需要一个未预处理的配方规范。
因此,在你的代码中,从配方规范中删除prep()
将消除错误。
tidy_rec <- recipe(solubility ~ ., data = train) %>% step_corr(all_predictors(), threshold = 0.9) %>% step_pca(all_predictors(), num_comp = ncol(train)-1) # 移除 prep() 方法