PyMongoArrow
简介
PyMongoArrow
是 PyMongo 包含用于加载 MongoDB 的工具的扩展,以 Apache Arrow 形式查询结果集、 表格、 NumPy 数组和 Pandas 或 Polars DataFrames。 PyMongoArrow
是将 MongoDB
查询结果集物化为适合内存中分析处理应用程序的连续内存类型数组的推荐方法。[^1] 1. PyMongoArrow
使用手册[^2]
PyMongoArrow
简明教程[^3]
安装
1
| pip install pymongoarrow
|
或 1
| conda install --channel conda-forge pymongoarrow
|
使用
1 2 3 4 5 6 7 8 9 10 11
| import pymongoarrow import pymongo import pandas as pd import polars as pl from pymongoarrow.monkey import patch_all
client = pymongo.MongoClient('localhost', 27017) db = client['quantaxis'] collection = db['stock_day'] patch_all()
|
查询结果转化为Pandas
数据帧
1 2
| df_pandas = collection.find_pandas_all({}) df_pandas
|
输出:
|
_id
|
open
|
close
|
high
|
low
|
vol
|
amount
|
date
|
code
|
date_stamp
|
0
|
5f0fd6243530c3ea823c573e
|
49.00
|
49.00
|
49.00
|
49.00
|
32768.5
|
5.000000e+03
|
1991-04-03
|
000001
|
6.706080e+08
|
1
|
5f0fd6243530c3ea823c573f
|
48.76
|
48.76
|
48.76
|
48.76
|
4098.0
|
1.500000e+04
|
1991-04-04
|
000001
|
6.706944e+08
|
2
|
5f0fd6243530c3ea823c5740
|
48.52
|
48.52
|
48.52
|
48.52
|
2.0
|
1.000000e+04
|
1991-04-05
|
000001
|
6.707808e+08
|
3
|
5f0fd6243530c3ea823c5741
|
48.28
|
48.28
|
48.28
|
48.28
|
7.0
|
3.400000e+04
|
1991-04-06
|
000001
|
6.708672e+08
|
4
|
5f0fd6243530c3ea823c5742
|
48.04
|
48.04
|
48.04
|
48.04
|
2.0
|
1.000000e+04
|
1991-04-08
|
000001
|
6.710400e+08
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
14705153
|
665deedfcd3d51f9534dd7ca
|
45.10
|
44.85
|
45.49
|
44.69
|
314630.0
|
1.414729e+09
|
2024-05-31
|
688981
|
1.717085e+09
|
14705154
|
665deedfcd3d51f9534dd7cb
|
44.96
|
45.44
|
46.36
|
44.80
|
393681.0
|
1.796427e+09
|
2024-06-03
|
688981
|
1.717344e+09
|
14705155
|
665deedfcd3d51f9534dd7cc
|
37.45
|
37.77
|
37.90
|
37.02
|
48175.0
|
1.810881e+08
|
2024-05-30
|
689009
|
1.716998e+09
|
14705156
|
665deedfcd3d51f9534dd7cd
|
37.69
|
37.51
|
38.35
|
37.31
|
45055.0
|
1.706379e+08
|
2024-05-31
|
689009
|
1.717085e+09
|
14705157
|
665deedfcd3d51f9534dd7ce
|
37.73
|
38.11
|
39.26
|
37.29
|
95049.0
|
3.641072e+08
|
2024-06-03
|
689009
|
1.717344e+09
|
查询结果转化为Polars
数据帧
1 2
| df_polars = collection.find_polars_all({}) df_polars
|
输出:
_id
|
open
|
close
|
high
|
low
|
vol
|
amount
|
date
|
code
|
date_stamp
|
binary
|
f64
|
f64
|
f64
|
f64
|
f64
|
f64
|
str
|
str
|
f64
|
b"_0f$50<W>"
|
49.0
|
49.0
|
49.0
|
49.0
|
32768.5
|
5000.0
|
"1991-04-03"
|
"000001"
|
6.70608e8
|
b"_0f$50<W?"
|
48.76
|
48.76
|
48.76
|
48.76
|
4098.0
|
15000.0
|
"1991-04-04"
|
"000001"
|
6.706944e8
|
b"_0f$50<W@"
|
48.52
|
48.52
|
48.52
|
48.52
|
2.0
|
10000.0
|
"1991-04-05"
|
"000001"
|
6.707808e8
|
b"_0f$50<WA"
|
48.28
|
48.28
|
48.28
|
48.28
|
7.0
|
34000.0
|
"1991-04-06"
|
"000001"
|
6.708672e8
|
b"_0f$50<WB"
|
48.04
|
48.04
|
48.04
|
48.04
|
2.0
|
10000.0
|
"1991-04-08"
|
"000001"
|
6.7104e8
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
b"f]=Q9SM"
|
45.1
|
44.85
|
45.49
|
44.69
|
314630.0
|
1.4147e9
|
"2024-05-31"
|
"688981"
|
1.7171e9
|
b"f]=Q9SM"
|
44.96
|
45.44
|
46.36
|
44.8
|
393681.0
|
1.7964e9
|
"2024-06-03"
|
"688981"
|
1.7173e9
|
b"f]=Q9SM"
|
37.45
|
37.77
|
37.9
|
37.02
|
48175.0
|
1.81088112e8
|
"2024-05-30"
|
"689009"
|
1.7170e9
|
b"f]=Q9SM"
|
37.69
|
37.51
|
38.35
|
37.31
|
45055.0
|
1.70637904e8
|
"2024-05-31"
|
"689009"
|
1.7171e9
|
b"f]=Q9SM"
|
37.73
|
38.11
|
39.26
|
37.29
|
95049.0
|
3.64107232e8
|
"2024-06-03"
|
"689009"
|
1.7173e9
|
查询结果转化为Numpy
数组
1 2
| ndarrays = collection.find_numpy_all({}) ndarrays
|
输出: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| {'_id': array([b'_\x0f\xd6$50\xc3\xea\x82<W>', b'_\x0f\xd6$50\xc3\xea\x82<W?', b'_\x0f\xd6$50\xc3\xea\x82<W@', ..., b'f]\xee\xdf\xcd=Q\xf9SM\xd7\xcc', b'f]\xee\xdf\xcd=Q\xf9SM\xd7\xcd', b'f]\xee\xdf\xcd=Q\xf9SM\xd7\xce'], dtype=object), 'open': array([49. , 48.76, 48.52, ..., 37.45, 37.69, 37.73]), 'close': array([49. , 48.76, 48.52, ..., 37.77, 37.51, 38.11]), 'high': array([49. , 48.76, 48.52, ..., 37.9 , 38.35, 39.26]), 'low': array([49. , 48.76, 48.52, ..., 37.02, 37.31, 37.29]), 'vol': array([3.27685e+04, 4.09800e+03, 2.00000e+00, ..., 4.81750e+04, 4.50550e+04, 9.50490e+04]), 'amount': array([5.00000000e+03, 1.50000000e+04, 1.00000000e+04, ..., 1.81088112e+08, 1.70637904e+08, 3.64107232e+08]), 'date': array(['1991-04-03', '1991-04-04', '1991-04-05', ..., '2024-05-30', '2024-05-31', '2024-06-03'], dtype='<U10'), 'code': array(['000001', '000001', '000001', ..., '689009', '689009', '689009'], dtype='<U6'), 'date_stamp': array([6.7060800e+08, 6.7069440e+08, 6.7078080e+08, ..., 1.7169984e+09, 1.7170848e+09, 1.7173440e+09])}
|
查询结果转化为Arrow
表
1 2
| arrow_table = collection.find_arrow_all({}) arrow_table
|
输出: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| _id: extension<pymongoarrow.objectid<ObjectIdType>> open: double close: double high: double low: double vol: double amount: double date: string code: string date_stamp: double ---- _id: [[5F0FD6243530C3EA823C573E,5F0FD6243530C3EA823C573F,5F0FD6243530C3EA823C5740,5F0FD6243530C3EA823C5741,5F0FD6243530C3EA823C5742,...,665DEEDFCD3D51F9534DD7CA,665DEEDFCD3D51F9534DD7CB,665DEEDFCD3D51F9534DD7CC,665DEEDFCD3D51F9534DD7CD,665DEEDFCD3D51F9534DD7CE]] open: [[49,48.76,48.52,48.28,48.04,...,45.1,44.96,37.45,37.69,37.73]] close: [[49,48.76,48.52,48.28,48.04,...,44.85,45.44,37.77,37.51,38.11]] high: [[49,48.76,48.52,48.28,48.04,...,45.49,46.36,37.9,38.35,39.26]] low: [[49,48.76,48.52,48.28,48.04,...,44.69,44.8,37.02,37.31,37.29]] vol: [[32768.5,4098,2,7,2,...,314630,393681,48175,45055,95049]] amount: [[5000,15000,10000,34000,10000,...,1414729472,1796427392,181088112,170637904,364107232]] date: [["1991-04-03","1991-04-04","1991-04-05","1991-04-06","1991-04-08",...,"2024-05-31","2024-06-03","2024-05-30","2024-05-31","2024-06-03"]] code: [["000001","000001","000001","000001","000001",...,"688981","688981","689009","689009","689009"]] date_stamp: [[670608000,670694400,670780800,670867200,671040000,...,1717084800,1717344000,1716998400,1717084800,1717344000]]
|
# 参考 [^1]: PyMongoArrow: Bridging the Gap Between MongoDB and Your Data Analysis App | MongoDB [^2]: MongoDB PyMongoArrow - PyMongoArrow v 1.3 [^3]: PyMongoArrow Extension Course | MongoDB University