TypeError: 不可哈希类型: ‘numpy.ndarray’ – 如何通过查询球树的半径从数据框中获取数据？

如何通过查询球树的半径来获取数据？例如

from sklearn.neighbors import BallTreeimport pandas as pdbt = BallTree(df[['lat','lng']], metric="haversine")for idx, row in df.iterrow():    res = df[bt.query_radius(row[['lat','lng']],r=1)]

我想获取df中半径为r=1内的那些行。但它抛出了类型错误

TypeError: unhashable type: 'numpy.ndarray'

按照第一个回答，我在迭代行时得到了索引超出范围的错误

5183(5219, 25)5205(5219, 25)5205(5219, 25)5221(5219, 25)Traceback (most recent call last):  File "/Users/Chu/Documents/dssg2018/sa4.py", line 45, in <module>    df.loc[idx,word]=len(df.iloc[indices[idx]][df[word]==1])/\IndexError: index 5221 is out of bounds for axis 0 with size 5219

代码如下

bag_of_words = ['beautiful','love','fun','sunrise','sunset','waterfall','relax']for idx,row in df.iterrows():    for word in bag_of_words:        if word in row['caption']:            df.loc[idx, word] = 1        else:            df.loc[idx, word] = 0bt = BallTree(df[['lat','lng']], metric="haversine")indices = bt.query_radius(df[['lat','lng']],r=(float(10)/40000)*360)for idx,row in df.iterrows():    for word in bag_of_words:        if word in row['caption']:            print(idx)            print(df.shape)            df.loc[idx,word]=len(df.iloc[indices[idx]][df[word]==1])/\                             np.max([1,len(df.iloc[indices[idx]][df[word]!=1])])

回答：

错误不在于BallTree，而是没有正确使用它返回的索引来放入索引中。

可以这样做：

for idx, row in df.iterrows():    indices = bt.query_radius(row[['lat','lng']].values.reshape(1,-1), r=1)    res = df.iloc[[x for b in indices for x in b]]    # 对res进行你想要的操作

这样做也可以（因为每次我们只发送一个点）：

    res = df.iloc[indices[0]]

解释：

我使用的是scikit 0.20。所以你上面写的代码：

df[bt.query_radius(row[['lat','lng']],r=1)]

对我不起作用。我需要通过使用reshape()将其变成一个2维数组。

现在bt.query_radius()返回的是一个数组的数组，包含指定半径r内的索引，如文档中所述：

ind : 对象数组，形状 = X.shape[:-1]

每个元素是一个numpy整数数组，列出对应点的邻居索引。请注意，与k-neighbors查询的结果不同，返回的邻居默认情况下不会按距离排序。

所以我们需要迭代两个数组才能到达数据的实际索引。

现在一旦我们得到了索引，在pandas数据框中，iloc是通过索引访问数据的方式。

更新：

你不需要每次都为单个点查询bt。你可以一次性发送整个df，返回一个包含指定索引点内半径内的点的索引的2维数组。

indices = bt.query_radius(df, r=1)for idx, row in df.iterrows():    nearest_points_index = indices[idx]    res = df.iloc[nearest_points_index]    # 对res进行你想要的操作

学技术

TypeError: 不可哈希类型: ‘numpy.ndarray’ – 如何通过查询球树的半径从数据框中获取数据？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复