Sci-Kit Learn: 将朴素贝叶斯模型预测结果纳入逻辑回归模型?

我有关于各种客户属性的数据(自我描述和年龄),以及这些客户是否会购买特定产品的二元结果

  {"would_buy": "No",   "self_description": "I'm a college student studying biology",   "Age": 19}, 

我想对self-description使用MultinomialNB来预测would_buy,然后将这些预测结果纳入一个逻辑回归模型中,该模型不仅考虑would_buy,还将age作为协变量。

到目前为止,关于文本模型的代码(我是SciKit的新手!),使用了一个简化的数据集。

from sklearn.naive_bayes import MultinomialNBfrom sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_report#包含客户是否会购买商品(我感兴趣的)、他们的自我描述和年龄的客户数据。 data = [  {"would_buy": "No", "self_description": "I'm a college student studying biology", "Age": 19},   {"would_buy": "Yes", "self_description": "I'm a blue-collar worker", "Age": 20},  {"would_buy": "No", "self_description": "I'm a Stack Overflow denzien", "Age": 56},   {"would_buy": "No", "self_description": "I'm a college student studying economics", "Age": 20},   {"would_buy": "Yes", "self_description": "I'm a UPS worker", "Age": 35},   {"would_buy": "No", "self_description": "I'm a Stack Overflow denzien", "Age": 56}  ]def naive_bayes_model(customer_data):  self_descriptions = [customer['self_description'] for customer in customer_data]  decisions = [customer['would_buy'] for customer in customer_data]  vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2))  X = vectorizer.fit_transform(self_descriptions, decisions)  naive_bayes = MultinomialNB(alpha=0.01)  naive_bayes.fit(X, decisions)  train(naive_bayes, X, decisions)def train(classifier, X, y):    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=22)    classifier.fit(X_train, y_train)    print(classification_report(classifier.predict(X_test), y_test))def main():  naive_bayes_model(data)main()

回答:

简短的回答是使用训练好的naive_bayespredict_probapredict_log_proba方法来创建逻辑回归模型的输入。这些可以与Age值连接起来,创建逻辑回归模型的训练和测试集。

然而,我要指出的是,您编写的代码在训练后无法访问naive_bayes模型。因此,您肯定需要重构您的代码。

这个问题暂且不提,这是我将naive_bayes的输出纳入逻辑回归的方法:

descriptions = np.array([customer['self_description'] for customer in data])decisions = np.array([customer['would_buy'] for customer in data])ages = np.array([customer['Age'] for customer in data])vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2))desc_vec = vectorizer.fit_transform(descriptions, decisions)naive_bayes = MultinomialNB(alpha=0.01)desc_train, desc_test, age_train, age_test, dec_train, dec_test = train_test_split(desc_vec, ages, decisions, test_size=0.25, random_state=22)naive_bayes.fit(desc_train, dec_train)nb_train_preds = naive_bayes.predict_proba(desc_train)lr = LogisticRegression()lr_X_train = np.hstack((nb_tarin_preds, age_train.reshape(-1, 1)))lr.fit(lr_X_train, dec_train)lr_X_test = np.hstack((naive_bayes.predict_proba(desc_test), age_test.reshape(-1, 1)))lr.score(lr_X_test, dec_test)

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注