我在R中使用C50包,并需要导出模型用于生产环境。
我使用了提升选项,我知道试验是有权重的,但我的输出中没有指定权重。
我没有使用错误分类权重选项,我只需要试验的权重。
通过R,有没有办法知道我的c50模型中每个试验的权重?
回答:
> fit <- C5.0(credit[,-24], credit[,24])> summary(fit)Call:C5.0.default(x = credit[, -24], y = credit[, 24])C5.0 [Release 2.07 GPL Edition] Thu Nov 23 09:36:14 2017-------------------------------Class specified by attribute `outcome'Read 30000 cases (24 attributes) from undefined.dataDecision tree:PAY_0 > 1::...EDUCATION > 3: 0 (29/7): EDUCATION <= 3:: :...PAY_3 <= -1: 0 (187/86): PAY_3 > -1: 1 (2914/830)PAY_0 <= 1::...PAY_2 <= 1: 0 (24599/3514) PAY_2 > 1: :...PAY_6 <= 0: 0 (1625/605) PAY_6 > 0: :...PAY_6 > 2: 1 (58/21) PAY_6 <= 2: :...PAY_5 <= 0: 0 (132/52) PAY_5 > 0: :...SEX <= 1: 1 (215/82) SEX > 1: :...PAY_3 <= 1: 1 (40/13) PAY_3 > 1: 0 (201/91)Evaluation on training data (30000 cases): Decision Tree ---------------- Size Errors 10 5301(17.7%) << (a) (b) <-classified as ---- ---- 22418 946 (a): class 0 4355 2281 (b): class 1 Attribute usage: 100.00% PAY_0 89.57% PAY_2 11.14% PAY_3 10.43% EDUCATION 7.57% PAY_6 1.96% PAY_5 1.52% SEXTime: 2.5 secs
所有使用的变量的权重可以通过以下方式找到:
> C5imp(fit, metric = "splits") Overall PAY_3 22.22222PAY_6 22.22222EDUCATION 11.11111PAY_0 11.11111PAY_2 11.11111PAY_5 11.11111SEX 11.11111LIMIT_BAL 0.00000MARRIAGE 0.00000AGE 0.00000PAY_4 0.00000BILL_AMT1 0.00000BILL_AMT2 0.00000BILL_AMT3 0.00000BILL_AMT4 0.00000BILL_AMT5 0.00000BILL_AMT6 0.00000PAY_AMT1 0.00000PAY_AMT2 0.00000PAY_AMT3 0.00000PAY_AMT4 0.00000PAY_AMT5 0.00000PAY_AMT6 0.00000