我已经研究了sklearn中关于局部异常因子(LocalOutliner Detection)的示例,并尝试将其应用于我自己的示例数据集上。但结果对我来说似乎不太合理。
我实现的代码如下:(省略了导入部分)
import numpy as npimport matplotlib.pyplot as pltimport pandasfrom sklearn.neighbors import LocalOutlierFactor# import fileurl = ".../Python/outliner.csv"names = ['R1', 'P1', 'T1', 'P2', 'Flag']dataset = pandas.read_csv(url, names=names) array = dataset.valuesX = array[:,0:2] rng = np.random.RandomState(42)# fit the modelclf = LocalOutlierFactor(n_neighbors=50, algorithm='auto', leaf_size=30)y_pred = clf.fit_predict(X)y_pred_outliers = y_pred[500:]# plot the level sets of the decision functionxx, yy = np.meshgrid(np.linspace(0, 1000, 50), np.linspace(0, 200, 50))Z = clf._decision_function(np.c_[xx.ravel(), yy.ravel()])Z = Z.reshape(xx.shape)plt.title("Local Outlier Factor (LOF)")plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)a = plt.scatter(X[:200, 0], X[:200, 1], c='white', edgecolor='k', s=20)b = plt.scatter(X[200:, 0], X[200:, 1], c='red', edgecolor='k', s=20)plt.axis('tight')plt.xlim((0, 1000))plt.ylim((0, 200))plt.legend([a, b], ["normal observations", "abnormal observations"], loc="upper left")plt.show()
能有人告诉我为什么检测失败了吗?
我尝试调整参数和范围,但对异常检测本身没有太大改变。
如果有人能指导我解决这个问题,将不胜感激。谢谢
编辑:添加了导入文件: 文件
回答:
我假设你参考了这个示例。该示例试图比较实际/观察数据(散点图)与从中学习的决策函数(等高线图)。由于数据是已知/虚构的(200个正常点 + 20个异常点),我们可以简单地通过X[200:]
(从第200个索引开始)选择异常点,通过X[:200]
(从第0到199个索引)选择正常点。
所以,如果你想绘制预测结果(作为散点图)而不是实际/观察数据,你可以像下面的代码一样操作。基本思路是根据y_pred
(1:正常,-1:异常)来分割X
,然后在散点图中使用它:
import numpy as npimport matplotlib.pyplot as pltimport pandasfrom sklearn.neighbors import LocalOutlierFactor# import fileurl = ".../Python/outliner.csv"names = ['R1', 'P1', 'T1', 'P2', 'Flag']dataset = pandas.read_csv(url, names=names)X = dataset.values[:, 0:2]# fit the modelclf = LocalOutlierFactor(n_neighbors=50, algorithm='auto', leaf_size=30)y_pred = clf.fit_predict(X)# map resultsX_normals = X[y_pred == 1]X_outliers = X[y_pred == -1]# plot the level sets of the decision functionxx, yy = np.meshgrid(np.linspace(0, 1000, 50), np.linspace(0, 200, 50))Z = clf._decision_function(np.c_[xx.ravel(), yy.ravel()])Z = Z.reshape(xx.shape)plt.title("Local Outlier Factor (LOF)")plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)a = plt.scatter(X_normals[:, 0], X_normals[:, 1], c='white', edgecolor='k', s=20)b = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='red', edgecolor='k', s=20)plt.axis('tight')plt.xlim((0, 1000))plt.ylim((0, 200))plt.legend([a, b], ["normal predictions", "abnormal predictions"], loc="upper left")plt.show()
如你所见,正常数据的散点图将遵循等高线图: