查找循环数据集群的最小值和最大值

如何确定循环数据中集群的最小值和最大值，这里数据范围从0到24，考虑到集群可能超出数值范围的界限？

观察蓝色集群，我想确定22和2作为集群的边界。哪个算法可以解决这个问题？

回答：

我已经找到了解决这个问题的方法。假设数据格式如下：

#!/usr/bin/env python3import numpy as npdata = np.array([0, 1, 2, 12, 13, 14, 15, 21, 22, 23])labels = np.array([0, 0, 0, 1, 1, 1, 1, 0, 0, 0])bounds = get_cluster_bounds(data, labels)print(bounds) # {0: array([21,  2]), 1: array([12, 15])}

你可以在这里找到这个函数：

#!/usr/bin/env python3import numpy as npdef get_cluster_bounds(data: np.ndarray, labels: np.ndarray) -> dict:    """    集群点有五种循环考虑方式。将要确定的点用箭头标记。    在第一种情况下，集群数据分布超出了循环的边缘：         ↓B           ↓A    |#####____________#####|    在第二种情况下，数据正好位于数值范围的开始，但没有超出。    ↓A        ↓B    |##########____________|    在第三种情况下，数据正好位于数值范围的末尾，但没有超出。                 ↓A       ↓B    |____________##########|    在第四种情况下，数据位于数值范围内，    没有触及边界。            ↓A       ↓B    |_______##########_____|    在第五种也是最简单的情况下，数据覆盖整个区域而没有其他标签存在。     ↓A                   ↓B    |######################|    Args:        data:      (n, 1) numpy数组，包含所有数据点。        labels:    (n, 1) numpy数组，包含所有数据标签。    Returns:        bounds:   一个字典，其键是集群的索引，值指定集群的起始和结束点。    """    # 按升序排序数据。    shuffle = data.argsort()    data = data[shuffle]    labels = labels[shuffle]    # 获取唯一集群的数量。    labels_unique = np.unique(labels)    num_clusters = labels_unique.size    bounds = {}    for c_index in range(num_clusters):        mask = labels == c_index        # 情况1或5        if mask[0] and mask[-1]:            # 情况5            if np.all(mask):                start = data[0]                end = data[-1]            # 情况1            else:                edges = np.where(np.invert(mask))[0]                start = data[edges[-1] + 1]                end = data[edges[0] - 1]        # 情况2        elif mask[0] and not mask[-1]:            edges = np.where(np.invert(mask))[0]            start = data[0]            end = data[edges[0] - 1]        # 情况3        elif not mask[0] and mask[-1]:            edges = np.where(np.invert(mask))[0]            start = data[edges[-1] + 1]            end = data[-1]        # 情况4        elif not mask[0] and not mask[-1]:            edges = np.where(mask)[0]            start = data[edges[0]]            end = data[edges[-1]]        else:            raise ValueError('这不应该发生。')        bounds[c_index] = np.array([start, end])    return bounds

学技术

查找循环数据集群的最小值和最大值

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复