我正在尝试通过根据区域和MZZONE的价格中位数来预测销售价格,以下是数值:
combo=pd.pivot_table(train,values=['SALES_PRICE'],index=['MZZONE','AREA'],aggfunc='median')combo
输出:
SALES_PRICE MZZONE AREA A Adyar 7144042.5 Karapakkam 5468500.0 Velachery 8428745.0 C Adyar 7877645.0 Karapakkam 6443000.0 Velachery 9170660.0 I Adyar 8785350.0
但是当我尝试在测试数据中创建一个新列时,整个列都填充了NaN,以下是我用于在测试数据中填充中位数值的代码:
test['super_mean']=0s2 = 'MZZONE's1 = 'AREA'for i in test[s1].unique(): for j in test[s2].unique(): test['super_mean'][ (test[s1]==str(i)) & (test[s2]==str(j)) ] = train['SALES_PRICE'][ (train[s1]==str(i)) & (train[s2]==str(i)) ].median()
为什么会这样?
回答:
你在’j’循环内的代码中犯了一个错误。你用了一个’i’而应该用’j’。这是正确的for循环:
test['super_mean']=0s2 = 'MZZONE's1 = 'AREA'for i in test[s1].unique(): for j in test[s2].unique(): test['super_mean'][ (test[s1]==str(i)) & (test[s2]==str(j)) ] = train['SALES_PRICE'][ (train[s1]==str(i)) & (train[s2]==str(j)) ].median()