您好, 欢迎来到 !    登录 | 注册 | | 设为首页 | 收藏本站

python – 使用大量数据操作将JSON加速到数据帧

5b51 2022/1/14 8:21:46 python 字数 7669 阅读 521 来源 www.jb51.cc/python

我有一大堆JSON数据格式如下:[ [{ 'created_at': '2017-04-28T16:52:36Z', 'as_of': '2017-04-28T17:00:05Z', 'trends': [{ 'url': 'http://twitter.com/search?q

概述

我有一大堆JSON数据格式如下:

[
    [{
        "created_at": "2017-04-28T16:52:36Z","as_of": "2017-04-28T17:00:05Z","trends": [{
            "url": "http://twitter.com/search?q=%23ChavezSigueCandanga","query": "%23ChavezSigueCandanga","tweet_volume": 44587,"name": "#ChavezSigueCandanga","promoted_content": null
        },{
            "url": "http://twitter.com/search?q=%2327Abr","query": "%2327Abr","tweet_volume": 79781,"name": "#27Abr","promoted_content": null
        }],"locations": [{
            "woeid": 395277,"name": "Turmero"
        }]
    }],[{
        "created_at": "2017-04-28T16:57:35Z","as_of": "2017-04-28T17:00:03Z","trends": [{
            "url": "http://twitter.com/search?q=%23fyrefestival","query": "%23fyrefestival","tweet_volume": 141385,"name": "#fyrefestival",{
            "url": "http://twitter.com/search?q=%23HotDocs17","query": "%23HotDocs17","tweet_volume": null,"name": "#HotDocs17","locations": [{
            "woeid": 9807,"name": "Vancouver"
        }]
    }]
]...

我编写了一个函数,将其格式化为采用以下形式的pandas数据框:

+----+--------------------------------+------------------+----------------------------------+--------------+--------------------------------------------------------------+----------------------+----------------------+---------------+----------------+
|    |              name              | promoted_content |              query               | tweet_volume |                             url                              |        as_of         |      created_at      | location_name | location_woeid |
+----+--------------------------------+------------------+----------------------------------+--------------+--------------------------------------------------------------+----------------------+----------------------+---------------+----------------+
| 47 | #BatesMotel                    |                  | %23BatesMotel                    | 59748        | http://twitter.com/search?q=%23BatesMotel                    | 2017-04-25T17:00:05Z | 2017-04-25T16:53:43Z | Winnipeg      | 2972           |
| 48 | #AdviceForPeopleJoiningTwitter |                  | %23AdviceForPeopleJoiningTwitter | 51222        | http://twitter.com/search?q=%23AdviceForPeopleJoiningTwitter | 2017-04-25T17:00:05Z | 2017-04-25T16:53:43Z | Winnipeg      | 2972           |
| 49 | #CADTHSymp                     |                  | %23CADTHSymp                     |              | http://twitter.com/search?q=%23CADTHSymp                     | 2017-04-25T17:00:05Z | 2017-04-25T16:53:43Z | Winnipeg      | 2972           |
| 0  | #WorldPenguinDay               |                  | %23WorldPenguinDay               | 79006        | http://twitter.com/search?q=%23WorldPenguinDay               | 2017-04-25T17:00:05Z | 2017-04-25T16:58:22Z | Toronto       | 4118           |
| 1  | #TravelTuesday                 |                  | %23TravelTuesday                 |              | http://twitter.com/search?q=%23TravelTuesday                 | 2017-04-25T17:00:05Z | 2017-04-25T16:58:22Z | Toronto       | 4118           |
| 2  | #DigitalLeap                   |                  | %23DigitalLeap                   |              | http://twitter.com/search?q=%23DigitalLeap                   | 2017-04-25T17:00:05Z | 2017-04-25T16:58:22Z | Toronto       | 4118           |
| …  | …                              | …                | …                                | …            | …                                                            | …                    | …                    | …             | …              |
| 0  | #nusnc17                       |                  | %23nusnc17                       |              | http://twitter.com/search?q=%23nusnc17                       | 2017-04-25T17:00:05Z | 2017-04-25T16:58:24Z | Birmingham    | 12723          |
| 1  | #WorldPenguinDay               |                  | %23WorldPenguinDay               | 79006        | http://twitter.com/search?q=%23WorldPenguinDay               | 2017-04-25T17:00:05Z | 2017-04-25T16:58:24Z | Birmingham    | 12723          |
| 2  | #littleboyblue                 |                  | %23littleboyblue                 | 20772        | http://twitter.com/search?q=%23littleboyblue                 | 2017-04-25T17:00:05Z | 2017-04-25T16:58:24Z | Birmingham    | 12723          |
+----+--------------------------------+------------------+----------------------------------+--------------+--------------------------------------------------------------+----------------------+----------------------+---------------+----------------+

这是将JSON写入DataFrame的函数

def trends_to_dataframe(data):
    df = pd.DataFrame()

    for location in data:
        temp_df = pd.DataFrame()

        for trend in location[0]['trends']:
            temp_df = temp_df.append(pd.Series(trend),ignore_index=True)

        temp_df['as_of'] = location[0]['as_of']
        temp_df['created_at'] = location[0]['created_at']
        temp_df['location_name'] = location[0]['locations'][0]['name']
        temp_df['location_woeid'] = location[0]['locations'][0]['woeid']

        df = df.append(temp_df)

    return df

不幸的是,由于我拥有的数据量(以及我测试过的一些简单的计时器),这将需要大约4个小时才能完成.有关如何提高速度的任何想法?

from concurrent.futures import ThreadPoolExecutor

def get_trends(location):
    trends = []
    for trend in location[0]['trends']:
        trend['as_of'] = location[0]['as_of']
        trend['created_at'] = location[0]['created_at']
        trend['location_name'] = location[0]['locations'][0]['name']
        trend['location_woeid'] = location[0]['locations'][0]['woeid']
        trends.append(trend)
    return trends

flat_data = []
with ThreadPoolExecutor() as executor:
    for location in data:
        flat_data += get_trends(location)

df = pd.DataFrame.from_records(flat_data)

总结

以上是编程之家为你收集整理的python – 使用大量数据操作将JSON加速到数据帧全部内容,希望文章能够帮你解决python – 使用大量数据操作将JSON加速到数据帧所遇到的程序开发问题。


如果您也喜欢它,动动您的小指点个赞吧

除非注明,文章均由 laddyq.com 整理发布,欢迎转载。

转载请注明:
链接:http://laddyq.com
来源:laddyq.com
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。


联系我
置顶