网站首页 > 博客 > 正文

Python pandas 入门 05 JSON

小樱桃博客 2022-12-31 1 0

Pandas JSON

JSON（JavaScript Object Notation，JavaScript 对象表示法），是存储和交换文本信息的语法，类似 XML。

JSON 比 XML 更小、更快，更易解析，更多 JSON 内容可以参考 JSON 教程。

Pandas 可以很方便的处理 JSON 数据，本文以 sites.json 为例，内容如下：

实例

[

{

"id": "A001",

"name": "教程",

"url": "www.run.com",

"likes": 61

{

"id": "A002",

"name": "Google",

"url": "www.google.com",

"likes": 124

{

"id": "A003",

"name": "淘宝",

"url": "www.taobao.com",

"likes": 45

}

]

实例

import pandas as pd

df = pd.read_json('sites.json')

print(df.to_string())

to_string() 用于返回 DataFrame 类型的数据，我们也可以直接处理 JSON 字符串。

实例

import pandas as pd

data =[

{

"id": "A001",

"name": "教程",

"url": "www.run.com",

"likes": 61

{

"id": "A002",

"name": "Google",

"url": "www.google.com",

"likes": 124

{

"id": "A003",

"name": "淘宝",

"url": "www.taobao.com",

"likes": 45

}

]

df = pd.DataFrame(data)

print(df)

以上实例输出结果为：

id name url likes

0 A001 教程 www.run.com 61

1 A002 Google www.google.com 124

2 A003 淘宝 www.taobao.com 45

JSON 对象与 Python 字典具有相同的格式，所以我们可以直接将 Python 字典转化为 DataFrame 数据：

实例

import pandas as pd

# 字典格式的 JSON

s = {

"col1":{"row1":1,"row2":2,"row3":3},

"col2":{"row1":"x","row2":"y","row3":"z"}

}

# 读取 JSON 转为 DataFrame

df = pd.DataFrame(s)

print(df)

以上实例输出结果为：

col1 col2

row1 1 x

row2 2 y

row3 3 z

从 URL 中读取 JSON 数据：

实例

import pandas as pd

URL = 'https://static.runoob.com/download/sites.json'

df = pd.read_json(URL)

print(df)

以上实例输出结果为：

id name url likes

0 A001 教程 www.run.com 61

1 A002 Google www.google.com 124

2 A003 淘宝 www.taobao.com 45

内嵌的 JSON 数据

假设有一组内嵌的 JSON 数据文件 nested_list.json ：

nested_list.json 文件内容

{

"school_name": "ABC primary school",

"class": "Year 1",

"students": [

{

"id": "A001",

"name": "Tom",

"math": 60,

"physics": 66,

"chemistry": 61

{

"id": "A002",

"name": "James",

"math": 89,

"physics": 76,

"chemistry": 51

{

"id": "A003",

"name": "Jenny",

"math": 79,

"physics": 90,

"chemistry": 78

}]

}

使用以下代码格式化完整内容：

实例

import pandas as pd

df = pd.read_json('nested_list.json')

print(df)

以上实例输出结果为：

school_name class students

0 ABC primary school Year 1 {'id': 'A001', 'name': 'Tom', 'math': 60, 'phy...

1 ABC primary school Year 1 {'id': 'A002', 'name': 'James', 'math': 89, 'p...

2 ABC primary school Year 1 {'id': 'A003', 'name': 'Jenny', 'math': 79, 'p...

这时我们就需要使用到 json_normalize() 方法将内嵌的数据完整的解析出来：

实例

import pandas as pd

import json

# 使用 Python JSON 模块载入数据

with open('nested_list.json','r') as f:

data = json.loads(f.read())

# 展平数据

df_nested_list = pd.json_normalize(data, record_path =['students'])

print(df_nested_list)

以上实例输出结果为：

id name math physics chemistry

0 A001 Tom 60 66 61

1 A002 James 89 76 51

2 A003 Jenny 79 90 78

data = json.loads(f.read()) 使用 Python JSON 模块载入数据。

json_normalize() 使用了参数 record_path 并设置为 ['students'] 用于展开内嵌的 JSON 数据 students。

显示结果还没有包含 school_name 和 class 元素，如果需要展示出来可以使用 meta 参数来显示这些元数据：

实例

import pandas as pd

import json

# 使用 Python JSON 模块载入数据

with open('nested_list.json','r') as f:

data = json.loads(f.read())

# 展平数据

df_nested_list = pd.json_normalize(

data,

record_path =['students'],

meta=['school_name', 'class']

)

print(df_nested_list)

以上实例输出结果为：

id name math physics chemistry school_name class

0 A001 Tom 60 66 61 ABC primary school Year 1

1 A002 James 89 76 51 ABC primary school Year 1

2 A003 Jenny 79 90 78 ABC primary school Year 1

接下来，让我们尝试读取更复杂的 JSON 数据，该数据嵌套了列表和字典，数据文件 nested_mix.json 如下：

nested_mix.json 文件内容

{

"school_name": "local primary school",

"class": "Year 1",

"info": {

"president": "John Kasich",

"address": "ABC road, London, UK",

"contacts": {

"email": "admin@e.com",

"tel": "123456789"

}

"students": [

{

"id": "A001",

"name": "Tom",

"math": 60,

"physics": 66,

"chemistry": 61

{

"id": "A002",

"name": "James",

"math": 89,

"physics": 76,

"chemistry": 51

{

"id": "A003",

"name": "Jenny",

"math": 79,

"physics": 90,

"chemistry": 78

}]

}

nested_mix.json 文件转换为 DataFrame：

实例

import pandas as pd

import json

# 使用 Python JSON 模块载入数据

with open('nested_mix.json','r') as f:

data = json.loads(f.read())

df = pd.json_normalize(

data,

record_path =['students'],

meta=[

'class',

['info', 'president'],

['info', 'contacts', 'tel']

]

)

print(df)

以上实例输出结果为：

id name math physics chemistry class info.president info.contacts.tel

0 A001 Tom 60 66 61 Year 1 John Kasich 123456789

1 A002 James 89 76 51 Year 1 John Kasich 123456789

2 A003 Jenny 79 90 78 Year 1 John Kasich 123456789

读取内嵌数据中的一组数据

以下是实例文件 nested_deep.json，我们只读取内嵌中的 math 字段：

nested_deep.json 文件内容

{

"school_name": "local primary school",

"class": "Year 1",

"students": [

{

"id": "A001",

"name": "Tom",

"grade": {

"math": 60,

"physics": 66,

"chemistry": 61

}

{

"id": "A002",

"name": "James",

"grade": {

"math": 89,

"physics": 76,

"chemistry": 51

}

{

"id": "A003",

"name": "Jenny",

"grade": {

"math": 79,

"physics": 90,

"chemistry": 78

}

}]

}

这里我们需要使用到 glom 模块来处理数据套嵌，glom 模块允许我们使用 . 来访问内嵌对象的属性。

第一次使用我们需要安装 glom：

pip3 install glom

实例

import pandas as pd

from glom import glom

df = pd.read_json('nested_deep.json')

data = df['students'].apply(lambda row: glom(row, 'grade.math'))

print(data)

以上实例输出结果为：

0 60

1 89

2 79

Name: students, dtype: int64

REF

https://www.runoob.com/pandas/pandas-json.html

查看原文

本文由用户于 2022-12-31 发布在夸智网，如有疑问，请联系我们。
本文链接：https://www.kuazhi.com/post/336757.html

夸智网

Python pandas 入门 05 JSON

Python pandas 入门 06 数据清洗

Python pandas 入门 04 CSV 文件

发表评论取消回复

夸智网

Python pandas 入门 05 JSON

Python pandas 入门 06 数据清洗

Python pandas 入门 04 CSV 文件

相关文章

发表评论取消回复