我正在尝试对名为news_df的数据框进行一些虚假新闻分析。我拟合了一个非常简单的模型,并尝试运行一些交叉验证来找到最佳的n值,但R表示无法修剪单节点树。您知道为什么会发生这种情况吗?
library(tree)news_df <- structure(list(title = c("China's Xi says will support Interpol raising its profile", "Clinton says Trump may have violated U.S. law on Cuba", "House Oversight head Chaffetz to leave Congress after 2018"), text = c("BEIJING (Reuters) - China will support Interpol, raising the profile and leadership of the global police cooperation agency, Chinese President Xi Jinping said on Tuesday at the opening of Interpol s general assembly in Beijing, state media reported. Last year, Interpol elected a senior Chinese public security official, Vice Public Security Minister Meng Hongwei, as its president, prompting rights groups to ask whether Beijing could try and use the position to go after dissidents abroad.", "CHICAGO (Reuters) - U.S. Democratic presidential nominee Hillary Clinton said on Thursday that Republican opponent Donald Trump may have violated U.S. law, following a news report that one of his companies attempted to do business in Cuba. Newsweek said on Thursday that a hotel and casino company controlled by Trump secretly conducted business with Cuba that was illegal under U.S. sanctions in force during Fidel Castro’s presidency of the Communist-ruled island.", "WASHINGTON (Reuters) - U.S. Representative Jason Chaffetz, who chairs a House committee with broad investigative powers, on Wednesday announced his plans to leave Congress after the 2018 midterm elections, saying he had no intention of running for any political office. "), type = c("Real", "Real", "Fake")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"), na.action = structure(c(`8971` = 8971L), class = "omit"))fit1 <- tree(type~. , data = news_df)cv.trees <- cv.tree(fit1) #error hereplot(cv.trees$size, cv.trees$dev, type = "b")
回答:
我认为tree包的默认设置是每个节点至少需要十个观察值。该数据只有三个观察值。此外,它最多允许每个分类变量有32个因子,因此一旦您增加更多观察值,它可能不会接受标题和文本变量。