多类分类在ML.Net中的置信度

我找到了一个关于ML.NET的完美介绍：https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net。这对我解决一些与ML.NET相关的问题很有帮助。

但其中一个问题仍然存在：

当我将一些文本发送到语言检测器（LanguageDetection示例）时，我总是会收到结果。即使对于非常短的文本片段，分类结果也不够可靠。我能获取多类分类的置信度信息吗？或者能获取属于某个类的概率，以便在第二轮算法中使用邻近句子的语言吗？

回答：

根据@的人名的提示，我修改了CodeProject上的原始示例。该代码可以通过以下链接找到：https://github.com/sotnyk/LanguageDetector/tree/Code-for-stackoverflow-52536943

主要的改动（如@的人名所建议）是添加字段：

public float[] Score;

到ClassPrediction类中。

如果这个字段存在，我们就能收到每个类别的多类分类概率/置信度。

但原始示例中还有另一个难题。它使用浮点值作为类别标签。但这些值并不是分数数组中的索引。要将分数索引映射到类别，我们应该使用TryGetScoreLabelNames方法：

if (!model.TryGetScoreLabelNames(out var scoreClassNames))    throw new Exception("Can't get score classes");

但这个方法不适用于将类别标签作为浮点值的情况。因此，我更改了原始的.tsv文件以及ClassificationData.LanguageClass和ClassPrediction.Class字段，使用字符串标签作为类名。

其他与问题主题无直接关联的更改包括：

更新了nuget包的版本。
我对使用lightGBM分类器很感兴趣（对我来说，它的表现质量最佳）。但目前版本的nuget包在非NetCore应用中存在bug。所以，我将示例的平台改为NetCore20/Standard。
取消注释了使用lightGBM分类器的模型。

每个语言的分数在名为Prediction的应用程序中打印。现在，这部分代码如下所示：

internal static async Task<PredictionModel<ClassificationData, ClassPrediction>> PredictAsync(    string modelPath,    IEnumerable<ClassificationData> predicts = null,    PredictionModel<ClassificationData, ClassPrediction> model = null){    if (model == null)    {        new LightGbmArguments();        model = await PredictionModel.ReadAsync<ClassificationData, ClassPrediction>(modelPath);    }    if (predicts == null) // do we have input to predict a result?        return model;    // Use the model to predict the positive or negative sentiment of the data.    IEnumerable<ClassPrediction> predictions = model.Predict(predicts);    Console.WriteLine();    Console.WriteLine("Classification Predictions");    Console.WriteLine("--------------------------");    // Builds pairs of (sentiment, prediction)    IEnumerable<(ClassificationData sentiment, ClassPrediction prediction)> sentimentsAndPredictions =        predicts.Zip(predictions, (sentiment, prediction) => (sentiment, prediction));    if (!model.TryGetScoreLabelNames(out var scoreClassNames))        throw new Exception("Can't get score classes");    foreach (var (sentiment, prediction) in sentimentsAndPredictions)    {        string textDisplay = sentiment.Text;        if (textDisplay.Length > 80)            textDisplay = textDisplay.Substring(0, 75) + "...";        string predictedClass = prediction.Class;        Console.WriteLine("Prediction: {0}-{1} | Test: '{2}', Scores:",            prediction.Class, predictedClass, textDisplay);        for(var l = 0; l < prediction.Score.Length; ++l)        {            Console.Write($"  {l}({scoreClassNames[l]})={prediction.Score[l]}");        }        Console.WriteLine();        Console.WriteLine();    }    Console.WriteLine();    return model;}

}

学技术

多类分类在ML.Net中的置信度

发表回复取消回复

相关文章：

Related Posts

为什么我们在K-means聚类方法中使用kmeans.fit函数？

如何获取Keras中ImageDataGenerator的.flow_from_directory函数扫描的类名？

如何查看每个词的tf-idf得分

如何修复 ‘ValueError: Found input variables with inconsistent numbers of samples: [32979, 21602]’？

如何向神经网络输入两个不同大小的输入？

逻辑回归与机器学习有何关联

发表回复 取消回复

发表回复取消回复