我正在处理一些数据,试图了解两个变量之间的关联,并使用了Python中的Scipy包进行卡方分析。
这是两个变量的交叉表结果:
pd.crosstab(data['loan_default'],data['id_proofs'])
结果:
id_proofs 2 3 4 5 loan_default 0 167035 15232 273 3 1 46354 4202 54 1
当我对相同的数据应用卡方分析时,出现了错误,提示ValueError: The internally computed table of expected frequencies has a zero element at (0,).
代码:
from scipy.stats import chi2_contingencystat,p,dof,expec = chi2_contingency(data['loan_default'],data['id_proofs'])print(stat,p,dof,expec)
错误报告:
---------------------------------------------------------------------------ValueError Traceback (most recent call last)<ipython-input-154-63c6f49aec48> in <module>() 1 from scipy.stats import chi2_contingency----> 2 stat,p,dof,expec = chi2_contingency(data['loan_default'],data['id_proofs']) 3 print(stat,p,dof,expec)~/anaconda3/lib/python3.6/site-packages/scipy/stats/contingency.py in chi2_contingency(observed, correction, lambda_) 251 zeropos = list(zip(*np.where(expected == 0)))[0] 252 raise ValueError("The internally computed table of expected "--> 253 "frequencies has a zero element at %s." % (zeropos,)) 254 255 # The degrees of freedomValueError: The internally computed table of expected frequencies has a zero element at (0,).
这个问题的原因可能是什么?我该如何解决这个问题?
回答:
请重新查看chi2_contingency
的文档字符串。第一个参数observed
必须是列联表。你需要计算列联表(就像你用pd.crosstab(data['loan_default'],data['id_proofs'])
做的那样),然后将其传递给chi2_contingency
。