来自云龙湖轮廓分明的月亮 发表于 2022-9-2 01:11:16

Pandas简单操作(学习总结)

Pandas 的主要数据结构是 Series (一维数据)与 DataFrame(二维数据),是一个提供高性能、易于使用的数据结构和数据分析工具。

接下来查看Pandas的基本使用:
 
# 导入模块
import pandas as pd
import numpy as np 
# 读取文件
stu = pd.read_excel('./stu_data.xlsx')
stu.head()https://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE2NTc0NDY3Ni0zODc3NzY2MjgucG5n
 
 
# 查看数据 (数据类型,是否有空值)
stu.info()https://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE2NTgxMzIxMy0yODQ1MDY0NzMucG5n
 
 
# 转换数据类型
stu['日期'] = stu['日期'].astype('str')
stu.info()https://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE2NTgzODQ1MC02MDI3OTYxODUucG5n
 
 切片操作
# iloc or loc切片 (学号,身高,体重)
stu.iloc[:,]# 获取学号,身高,体重,所有行信息
stu.loc ]https://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE3MDAyMTg4My0xMDI0MTQ4MjcucG5n
 
 
查询操作
# sql查询语言 身高高于170   性别是女
stu.query('身高 > 170 and 性别 == "女"')
# pandas查询
stu[ (stu['身高'] > 170) & (stu['性别'] =="女") ]https://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE3MDEwNDUyNi05MTc3NTQ3NDQucG5n
 
 
# 通过索引号获取信息
stu.query('10')https://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE3MDEzNTk3OC0yMDU1MDM3MDIucG5n
 
 
排序操作
stu['身高'].sort_values() # 默认正序
stu['身高'].sort_values(ascending=False) # 默认正序https://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE3MDIwNzI2MS0xMDQ1ODIxODQxLnBuZw==
 
 
 
分组操作
# 按课程分组,查看分组里面的数据
stu = stu.groupby('课程')
stu.groupshttps://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE3MDI1NTM4MC0xODMwNzgwMDk4LnBuZw==
 
 
# 查看分组描述
stu.describe()https://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE3MDM1NDgxNy0xMDMzMTAzMTY2LnBuZw==
 
 
# 分组汇总
# stu.agg(['mean','std']) # 分组后每一列的均值和标准差
print(stu.身高.agg(max))https://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE3MDQyMjA5Ny0yMTMzMTUxMjczLnBuZw==
 
 
数值变量分段stu = pd.read_excel('./stu_data.xlsx')
stu['新体重'] = pd.cut(stu.体重,bins=,right=False)
stu.head()https://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE3MDUwNTA2OS0xMTY3ODg2NDE0LnBuZw==
 
 
时间拆分
# stu.日期
stu['年份'] = stu.日期.dt.year
stu['月份'] = stu.日期.dt.month
stu['天数'] = stu.日期.dt.day
stu.head()https://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE3MDU1MDk0MS0xNzI4MDc1MDE5LnBuZw==
 
 
表连接
# 创建新Series对象
stu1 = pd.Series(np.arange(12345678900,12345678952),name='手机号')
stu1https://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE3MDY1MTk3MS0xMjc3MDgwNDQucG5n
 
 
# 合并表<br>stu3 = pd.concat(,axis=1)
stu3.head()https://dis.qidao123.com/imgproxy/aHR0cHM6Ly9pbWcyMDIyLmNuYmxvZ3MuY29tL2Jsb2cvMjg3MzIyMi8yMDIyMDcvMjg3MzIyMi0yMDIyMDcyNjE3MDcyMzk0Mi0xNzA3MTEzODEucG5n
 
 
 

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!
页: [1]
查看完整版本: Pandas简单操作(学习总结)