从JSON文件提取数据到CSV

我有一个结构非常复杂的大型JSON文件

你可以在这里查看:https://drive.google.com/file/d/1tBVJ2xYSCpTTUGPJegvAz2ZXbeN0bteX/view?usp=sharing

它包含超过700万行数据,我只想提取“text”字段

我已经编写了一个Python代码,用于提取整个文件中“text”键或字段的所有值,结果只提取了12个值!而当我在Visualstudio中打开JSON文件时,我发现有超过19000个值!!

你可以在这里查看代码:

import jsonimport csvwith open("/Users/zahraa-maher/rasa-init-demo/venv/Tickie/external_data/frames2.json") as file:    data = json.load(file)fname = "outputText8.csv"with open(fname, "w") as file:    csv_file = csv.writer(file,lineterminator='\n')    csv_file.writerow(["text"])    for item in data[i]["turns"]:        csv_file.writerow([item['text']])

请查看这个JSON文件,因为它非常大且结构复杂,所以我无法在这里粘贴以供查看,因为它会变得难以理解

这也是JSON文件的一部分:

[    {        "user_id": "U22HTHYNP",        "turns": [            {                "text": "我想预订从卡普里卡到亚特兰蒂斯的旅行,日期是2016年8月13日的星期六,8个成人。我的预算很紧张,只有1700。",                "labels": {                    "acts": [                        {                            "args": [                                {                                    "val": "book",                                    "key": "intent"                                }                            ],                            "name": "inform"                        },                        {                            "args": [                                {                                    "val": "Atlantis",                                    "key": "dst_city"                                },                                {                                    "val": "Caprica",                                    "key": "or_city"                                },                                {                                    "val": "Saturday, August 13, 2016",                                    "key": "str_date"                                },                                {                                    "val": "8",                                    "key": "n_adults"                                },                                {                                    "val": "1700",                                    "key": "budget"                                }                            ],                            "name": "inform"                        }                    ],                    "acts_without_refs": [                        {                            "args": [                                {                                    "val": "book",                                    "key": "intent"                                }                            ],                            "name": "inform"                        },                        {                            "args": [                                {                                    "val": "Atlantis",                                    "key": "dst_city"                                },                                {                                    "val": "Caprica",                                    "key": "or_city"                                },                                {                                    "val": "Saturday, August 13, 2016",                                    "key": "str_date"                                },                                {                                    "val": "8",                                    "key": "n_adults"                                },                                {                                    "val": "1700",                                    "key": "budget"                                }                            ],                            "name": "inform"                        }                    ],                    "active_frame": 1,                    "frames": [                        {                            "info": {                                "intent": [                                    {                                        "val": "book",                                        "negated": false                                    }                                ],                                "budget": [                                    {                                        "val": "1700.0",                                        "negated": false                                    }                                ],                                "dst_city": [                                    {                                        "val": "Atlantis",                                        "negated": false                                    }                                ],                                "or_city": [                                    {                                        "val": "Caprica",                                        "negated": false                                    }                                ],                                "str_date": [                                    {                                        "val": "august 13",                                        "negated": false                                    }                                ],                                "n_adults": [                                    {                                        "val": "8",                                        "negated": false                                    }                                ]                            },                            "frame_id": 1,                            "requests": [],                            "frame_parent_id": null,                            "binary_questions": [],                            "compare_requests": []                        }                    ]                },                "author": "user",                "timestamp": 1471272019730.0            },            {                "db": {                    "result": [                        [                            {                                "trip": {                                    "returning": {                                        "duration": {                                            "hours": 0,                                            "min": 51                                        },                                        "arrival": {                                            "hour": 10,                                            "year": 2016,                                            "day": 24,                                            "min": 51,                                            "month": 8                                        },                                        "departure": {                                            "hour": 10,                                            "year": 2016,                                            "day": 24,                                            "min": 0,                                            "month": 8                                        }                                    },                                    "seat": "ECONOMY",                                    "leaving": {                                        "duration": {                                            "hours": 0,                                            "min": 51                                        },                                        "arrival": {                                            "hour": 0,                                            "year": 2016,                                            "day": 16,                                            "min": 51,                                            "month": 8                                        },                                        "departure": {                                            "hour": 0,                                            "year": 2016,                                            "day": 16,                                            "min": 0,                                            "month": 8                                        }                                    },                                    "or_city": "Porto Alegre",                                    "duration_days": 9                                },                                "price": 2118.81,                                "hotel": {                                    "gst_rating": 7.15,                                    "vicinity": [],                                    "name": "Scarlet Palms Resort",                                    "country": "Brazil",                                    "amenities": [                                        "FREE_BREAKFAST",                                        "FREE_PARKING",                                        "FREE_WIFI"                                    ],                                    "dst_city": "Goiania",                                    "category": "3.5星级酒店"                                }                            }

如何修改代码以从JSON文件中提取所有“text”值到CSV文件中?


回答:

这是一个使用pandas的潜在解决方案:

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注