我有以下pipeline:
from sklearn.pipeline import Pipelinepipeline = Pipeline([ ("kmeans", KMeans(n_clusters=50)), ("log_reg", LogisticRegression()),])pipeline.fit(X_train, y_train)
我想访问kmeans
的标签(或Kmeans
的任何其他指标)。我不知道该怎么做。我尝试过print(kmeans.labels_)
甚至print(pipeline.labels_)
,但这不起作用,我得到变量未定义的错误。我如何访问pipeline
中特定阶段的结果?
回答:
使用scikit-learn的最新版本(0.21.2),你可以使用pipeline的__getitem__
来索引步骤。
from sklearn.datasets import samples_generatorfrom sklearn.cluster import KMeansfrom sklearn.linear_model import LogisticRegressionfrom sklearn.pipeline import Pipeline# 生成一些数据来测试X, y = samples_generator.make_classification( n_informative=5, n_redundant=0, random_state=42)pipeline = Pipeline([ ("kmeans", KMeans(n_clusters=50)), ("log_reg", LogisticRegression(solver='lbfgs')),])pipeline.fit(X, y)pipeline['kmeans'].labels_# array([ 2, 42, 40, 38, ...])
对于之前的版本,使用pipeline.named_steps['kmeans']