我有一个结构非常复杂的大型JSON文件
你可以在这里查看:https://drive.google.com/file/d/1tBVJ2xYSCpTTUGPJegvAz2ZXbeN0bteX/view?usp=sharing
它包含超过700万行数据,我只想提取“text”字段
我已经编写了一个Python代码,用于提取整个文件中“text”键或字段的所有值,结果只提取了12个值!而当我在Visualstudio中打开JSON文件时,我发现有超过19000个值!!
你可以在这里查看代码:
import jsonimport csvwith open("/Users/zahraa-maher/rasa-init-demo/venv/Tickie/external_data/frames2.json") as file: data = json.load(file)fname = "outputText8.csv"with open(fname, "w") as file: csv_file = csv.writer(file,lineterminator='\n') csv_file.writerow(["text"]) for item in data[i]["turns"]: csv_file.writerow([item['text']])
请查看这个JSON文件,因为它非常大且结构复杂,所以我无法在这里粘贴以供查看,因为它会变得难以理解
这也是JSON文件的一部分:
[ { "user_id": "U22HTHYNP", "turns": [ { "text": "我想预订从卡普里卡到亚特兰蒂斯的旅行,日期是2016年8月13日的星期六,8个成人。我的预算很紧张,只有1700。", "labels": { "acts": [ { "args": [ { "val": "book", "key": "intent" } ], "name": "inform" }, { "args": [ { "val": "Atlantis", "key": "dst_city" }, { "val": "Caprica", "key": "or_city" }, { "val": "Saturday, August 13, 2016", "key": "str_date" }, { "val": "8", "key": "n_adults" }, { "val": "1700", "key": "budget" } ], "name": "inform" } ], "acts_without_refs": [ { "args": [ { "val": "book", "key": "intent" } ], "name": "inform" }, { "args": [ { "val": "Atlantis", "key": "dst_city" }, { "val": "Caprica", "key": "or_city" }, { "val": "Saturday, August 13, 2016", "key": "str_date" }, { "val": "8", "key": "n_adults" }, { "val": "1700", "key": "budget" } ], "name": "inform" } ], "active_frame": 1, "frames": [ { "info": { "intent": [ { "val": "book", "negated": false } ], "budget": [ { "val": "1700.0", "negated": false } ], "dst_city": [ { "val": "Atlantis", "negated": false } ], "or_city": [ { "val": "Caprica", "negated": false } ], "str_date": [ { "val": "august 13", "negated": false } ], "n_adults": [ { "val": "8", "negated": false } ] }, "frame_id": 1, "requests": [], "frame_parent_id": null, "binary_questions": [], "compare_requests": [] } ] }, "author": "user", "timestamp": 1471272019730.0 }, { "db": { "result": [ [ { "trip": { "returning": { "duration": { "hours": 0, "min": 51 }, "arrival": { "hour": 10, "year": 2016, "day": 24, "min": 51, "month": 8 }, "departure": { "hour": 10, "year": 2016, "day": 24, "min": 0, "month": 8 } }, "seat": "ECONOMY", "leaving": { "duration": { "hours": 0, "min": 51 }, "arrival": { "hour": 0, "year": 2016, "day": 16, "min": 51, "month": 8 }, "departure": { "hour": 0, "year": 2016, "day": 16, "min": 0, "month": 8 } }, "or_city": "Porto Alegre", "duration_days": 9 }, "price": 2118.81, "hotel": { "gst_rating": 7.15, "vicinity": [], "name": "Scarlet Palms Resort", "country": "Brazil", "amenities": [ "FREE_BREAKFAST", "FREE_PARKING", "FREE_WIFI" ], "dst_city": "Goiania", "category": "3.5星级酒店" } }
如何修改代码以从JSON文件中提取所有“text”值到CSV文件中?
回答:
这是一个使用pandas
的潜在解决方案: