在使用pandas的read_json()函数解析JSON文件时，是否建议将JSON文件的内容括在方括号中？

我有一个如下所示的JSON文件

{   "fruit": "Apple",   "size": "Large",    "color": "Red",    "grade": null,    "bool": true}

当我尝试使用pandas的read_json()函数读取此JSON文件时，如下所示

import pandas as pddata=pd.read_json("example_1.json")print(data.to_json(orient="records"))

我得到了一个错误提示“如果使用所有标量值，必须传递一个索引”，但如果我将上述JSON文件括在方括号中，如下所示

[{   "fruit": "Apple",   "size": "Large",    "color": "Red",    "grade": null,    "bool": true}]

现在如果我使用pandas的read_json()函数，我的程序运行正常，我想知道为什么这些方括号有如此大的影响

回答：

这完全取决于pandas如何解释广播。当你有一个字典（类似于JSON对象）时，你基本上是在说键是列名，值是你将输入到数据框中的值。然而，当值是一个标量类对象（即不是可迭代的），pandas不知道你的列会有多少项，因为标量可以广播到任何形状。

如果你有一组单个值，那么很明显每个列只有一项，就不会有歧义。

因此，例如，以下代码应该可以正常工作，因为很明显每个列只有一个值。

pd.DataFrame({   "fruit": ["Apple"],   "size": ["Large"],    "color": ["Red"],    "grade": [None],    "bool": [True]})

以下代码也应该可以正常工作，因为一组字典被解释为列表中的每一项都是一行。

pd.DataFrame([{   "fruit": "Apple",   "size": "Large",    "color": "Red",    "grade": None,    "bool": True}])

但是以下代码不会工作，因为不清楚你会有多少项：

pd.DataFrame({   "fruit": "Apple",   "size": "Large",    "color": "Red",    "grade": None,    "bool": True})

为了消除歧义，你必须指定索引，这将直接指定总项数。例如：

pd.DataFrame({   "fruit": "Apple",   "size": "Large",    "color": "Red",    "grade": None,    "bool": True}, index=[0])

  fruit   size color grade  bool0  Apple  Large   Red  None  True

还有

pd.DataFrame({   "fruit": "Apple",   "size": "Large",    "color": "Red",    "grade": None,    "bool": True}, index=[0,1])

   fruit   size color grade  bool0  Apple  Large   Red  None  True1  Apple  Large   Red  None  True

这些例子是使用默认构造函数创建的，但逻辑对于read_json也是相同的。

学技术