我已经按照这个教程构建了一个正常工作的分类模型。
教程只输出预测的类别名称。我希望它能输出类别名称及其概率,并且我只想输出超过某个概率的类别。例如,我只想要概率超过0.5的类别。
这是用于访问模型的函数:
import pickleimport numpy as npcategory_model_path="categorymodel.pkl"category_transformer_path="categorytransformer.pkl"sentiment_model_path="sentimentmodel.pkl"sentiment_transformer_path="sentimenttransformer.pkl"def get_top_k_predictions(model,X_test,k): # get probabilities instead of predicted labels, since we want to collect top 3 np.set_printoptions(suppress=True) probs = model.predict_proba(X_test) # GET TOP K PREDICTIONS BY PROB - note these are just index best_n = np.argsort(probs, axis=1)[:,-k:] # GET CATEGORY OF PREDICTIONS preds=[[model.classes_[predicted_cat] for predicted_cat in prediction] for prediction in best_n] preds=[ item[::-1] for item in preds] return predscategory_loaded_model = pickle.load(open(category_model_path, 'rb'))category_loaded_transformer = pickle.load(open(category_transformer_path, 'rb'))sentiment_loaded_model = pickle.load(open(sentiment_model_path, 'rb'))sentiment_loaded_transformer = pickle.load(open(sentiment_transformer_path, 'rb'))
然后使用以下代码调用该函数:
category_test_features=category_loaded_transformer.transform(["I absolutley loved the organization "])get_top_k_predictions(category_loaded_model,category_test_features,2)
这是当前的输出:
[['Course Structure', 'Learning Materials']]
在函数中,概率被计算到probs
变量中。我不知道如何只获取超过0.5的概率并将其添加到preds
输出中。
回答:
best_n
数组包含概率数组probs
的索引。你可以像获取标签一样使用它。你可以这样获取标签-概率元组:
preds = [ [(model.classes_[predicted_cat], distribution[predicted_cat]) for predicted_cat in prediction] for distribution, prediction in zip(probs, best_n)]
如果你不想返回概率,只想过滤它们,可以这样做:
preds=[ [model.classes_[predicted_cat] for predicted_cat in prediction if distribution[predicted_cat] > 0.5] for distribution, prediction in zip(probs, best_n)]