Sci-Kit Learn: 将朴素贝叶斯模型预测结果纳入逻辑回归模型？

我有关于各种客户属性的数据（自我描述和年龄），以及这些客户是否会购买特定产品的二元结果

  {"would_buy": "No",   "self_description": "I'm a college student studying biology",   "Age": 19},

我想对self-description使用MultinomialNB来预测would_buy，然后将这些预测结果纳入一个逻辑回归模型中，该模型不仅考虑would_buy，还将age作为协变量。

到目前为止，关于文本模型的代码（我是SciKit的新手！），使用了一个简化的数据集。

from sklearn.naive_bayes import MultinomialNBfrom sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_report#包含客户是否会购买商品（我感兴趣的）、他们的自我描述和年龄的客户数据。 data = [  {"would_buy": "No", "self_description": "I'm a college student studying biology", "Age": 19},   {"would_buy": "Yes", "self_description": "I'm a blue-collar worker", "Age": 20},  {"would_buy": "No", "self_description": "I'm a Stack Overflow denzien", "Age": 56},   {"would_buy": "No", "self_description": "I'm a college student studying economics", "Age": 20},   {"would_buy": "Yes", "self_description": "I'm a UPS worker", "Age": 35},   {"would_buy": "No", "self_description": "I'm a Stack Overflow denzien", "Age": 56}  ]def naive_bayes_model(customer_data):  self_descriptions = [customer['self_description'] for customer in customer_data]  decisions = [customer['would_buy'] for customer in customer_data]  vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2))  X = vectorizer.fit_transform(self_descriptions, decisions)  naive_bayes = MultinomialNB(alpha=0.01)  naive_bayes.fit(X, decisions)  train(naive_bayes, X, decisions)def train(classifier, X, y):    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=22)    classifier.fit(X_train, y_train)    print(classification_report(classifier.predict(X_test), y_test))def main():  naive_bayes_model(data)main()

回答：

简短的回答是使用训练好的naive_bayes的predict_proba或predict_log_proba方法来创建逻辑回归模型的输入。这些可以与Age值连接起来，创建逻辑回归模型的训练和测试集。

然而，我要指出的是，您编写的代码在训练后无法访问naive_bayes模型。因此，您肯定需要重构您的代码。

这个问题暂且不提，这是我将naive_bayes的输出纳入逻辑回归的方法：

descriptions = np.array([customer['self_description'] for customer in data])decisions = np.array([customer['would_buy'] for customer in data])ages = np.array([customer['Age'] for customer in data])vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2))desc_vec = vectorizer.fit_transform(descriptions, decisions)naive_bayes = MultinomialNB(alpha=0.01)desc_train, desc_test, age_train, age_test, dec_train, dec_test = train_test_split(desc_vec, ages, decisions, test_size=0.25, random_state=22)naive_bayes.fit(desc_train, dec_train)nb_train_preds = naive_bayes.predict_proba(desc_train)lr = LogisticRegression()lr_X_train = np.hstack((nb_tarin_preds, age_train.reshape(-1, 1)))lr.fit(lr_X_train, dec_train)lr_X_test = np.hstack((naive_bayes.predict_proba(desc_test), age_test.reshape(-1, 1)))lr.score(lr_X_test, dec_test)

学技术

Sci-Kit Learn: 将朴素贝叶斯模型预测结果纳入逻辑回归模型？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复