K-Means的内部运作

使用Matlab的K-means聚类时，我对聚类的具体细节不太确定。为了解释这一点，我将使用一个例子来说明：

我的数据已经过标准化，输出看起来像这样：

enter image description here

每一行代表一个经过标准化的网络数据包。因此，第一行将代表来自计算机A的数据包。

现在我想知道，当我在Matlab中运行K-means时，它是按列聚类还是按行聚类？

也就是说，列A是否属于聚类1，列B属于聚类2等等？

我问这个的原因是我需要每个数据包（行）保持完整，并且每个数据包根据其内在特性进行聚类。然而，我担心这可能会严重削弱其能力。但我希望有某种聚合方法可以解决这个问题。

代码：

        %% generate sample dataK = 4;numObservarations = 5000;dimensions = 42;%% clusteropts = statset('MaxIter', 500, 'Display', 'iter');[clustIDX, clusters, interClustSum, Dist] = kmeans(data, K, 'options',opts, ...'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3);%% plot data+clustersfigure, hold onscatter3(data(:,1),data(:,2),data(:,3), 5, clustIDX, 'filled')scatter3(clusters(:,1),clusters(:,2),clusters(:,3), 100, (1:K)', 'filled')hold off, xlabel('x'), ylabel('y'), zlabel('z')%% plot clusters qualityfigure[silh,h] = silhouette(data, clustIDX);avrgScore = mean(silh);%% Assign data to clusters% calculate distance (squared) of all instances to each cluster centroidD = zeros(numObservarations, K);     % init distancesfor k=1:K%d = sum((x-y).^2).^0.5D(:,k) = sum( ((data - repmat(clusters(k,:),numObservarations,1)).^2), 2);end% find  for all instances the cluster closet to it[minDists, clusterIndices] = min(D, [], 2);% compare it with what you expect it to besum(clusterIndices == clustIDX)

结果：

enter image description here

这是基于5000行的数据。不幸的是，无法在聚类后重建数据限制了我对正在发生的事情的了解。（参见相关问题：MATLAB – Classification output）

回答：

在Matlab中，聚类和分类数据输入的标准格式是：

每行一个样本
某一行（样本）的不同特征在各列中。

学技术

K-Means的内部运作

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复