PostgreSQL空间检索性能测试

前言

前几天安装好了PostgreSQL，并且安装了插件PostGIS，想测试一下空间查询性能如何。

随机生成1亿条数据经纬度数据，插入表中。用psql做空间检索。

测试服务器

ip：192.168.100.93（本地机器）
配置：
    20核64G
    cpu：Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz
    硬盘：500G NVME硬盘
    
数据库：
    ssh终端登录，切换到postgres账户，用psql，可以直接连接数据库。
    # psql
    # \c gis

表

数据库为gis，表为airline，创建表的sql语句如下：

create table airline(
    no serial,
    ph varchar(10),
    gd numeric,
    hx numeric,
    sd numeric,
    jwd geometry(point,4326),
    sj timestamp
);

生成测试数据

随机生成1亿条数据，经纬度随机生成，每次插入5千万条，分两次插入。

经度取值范围：-180°~180°，保留小数点后5位数字。

纬度取值范围：-90°~90°，保留小数点后5位数字。

生成语句如下：

# 采用系统的random()函数生成经纬度
# 经度取值范围为-180°~180°，保留小数点后5位数字。
# 纬度取值范围为-90°~90°，保留小数点后5位数字。
# 采用insert语句插入到表中
# 其它列的数据为测试值，在此处无实际意义。

insert into airline(ph, gd, hx, sd, sj, jwd) select 'A600', 5000, 200, 200, clock_timestamp(), ST_GeomFromText('POINT('|| trunc((0.5 - random())::numeric * 360, 5)::varchar || ' ' || trunc((0.5 - random())::numeric * 180, 5)::varchar || ')', 4326) from generate_series(1, 50000000);

建立索引

如果先建立索引，然后再插入测试数据的话，数据插入的会比较慢，所以此处先插入了数据，然后再建立的索引。

no列，类型为serial，在该列上建立btree索引。
jwd列，类型为geometry(Point， 4326)，在该列上建立gist索引。
sj列，类型为timestamp，在该列上建立btree索引。

create index concurrently idx_airline_no on airline(no);
create index concurrently idx_airline_jwd on airline using gist(jwd);
create index concurrently idx_airline_sj on airline(sj);

查看表：

gis=# \d airline
                                      Table "public.airline"
 Column |            Type             | Collation | Nullable |               Default                
--------+-----------------------------+-----------+----------+--------------------------------------
 no     | integer                     |           | not null | nextval('airline_no_seq'::regclass)
 ph     | character varying(10)       |           |          | 
 gd     | numeric                     |           |          | 
 hx     | numeric                     |           |          | 
 sd     | numeric                     |           |          | 
 jwd    | geometry(Point,4326)        |           |          | 
 sj     | timestamp without time zone |           |          | 
Indexes:
    "idx_airline_jwd" gist (jwd)
    "idx_airline_no" btree (no)
    "idx_airline_sj" btree (sj)

优化

修改postgresql.conf配置文件，将其中shared_buffers字段的值修改为实际内存的四分之一，测试机器实际内存64G，所以此处修改为16384MB。

测试时间

等值查询

按主键no，做简单的等值查询：

select * from airline t where t.no = 954362;

耗时约为4ms

空间查询

查询坐标语句：

select t.no, point(t.jwd) from airline as t where st_contains(ST_PolygonFromText('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))', 4326), t.jwd);

查询记录条数语句：

select count(*) from airline as t where st_contains(ST_PolygonFromText('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))', 4326), t.jwd);

测试结果：

注：下面的测试区域中的数字为度，可以大概理解为1度约等于111公里。

第一行为1×1的矩形，即111×111公里。详细的计算方法可以参考最后注章节。

其他行依次类推。

序号	测试区域大小（度）	首次查询时间（秒）	第二次查询时间（秒）	记录条数（条）
1	POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))	0.156	0.78	1556
2	POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))	0.107	0.086	6175
3	POLYGON((0 0, 4 0, 4 4, 0 4, 0 0))	0.162	0.140	24790
4	POLYGON((0 0, 8 0, 8 8, 0 8, 0 0))	0.345	0.388	98798
5	POLYGON((0 0, 16 0, 16 16, 0 16, 0 0))	1.174	1.173	396439
6	POLYGON((0 0, 32 0, 32 32, 0 32, 0 0))	4.450	4.423	1582095
7	POLYGON((0 0, 64 0, 64 64, 0 64, 0 0))	17.441	17.519	6323985
8	POLYGON((21 21, 22 21, 22 22, 21 22, 21 21))	0.122	0.119	1504
9	POLYGON((21 21, 29 21, 29 29, 21 29, 21 21))	0.393	0.396	98746

st_intersects、st_contains、&&性能对比

分别测试了在sp-gist索引和gist索引下，三个函数或者运算符的查询性能，如下表所述。

为避免缓存影响测试结果，所以每次测试均重启操作系统。

索引	st_intersects(秒，首次查询/二次查询)	st_contains(秒，首次查询/二次查询)	&&(秒，首次查询/二次查询)
sp-gist	23.4 8.2	1:32.0 8.6	1:29.0 6.3
gist	18.7 8.1	1:28.4 8.5	1:24.4 6.4

结论

总体而言，gist索引性能优于sp-gist索引，首次查询st_intersects性能比较好，第二次查询&&性能比较好。

可应用的场景：查询附近的商家、附近的共享单车等……。按照测试这个数据量，均能在毫秒级别返回所需数据。

注

在经线上，纬度每差1度,实地距离大约为111千米。

在纬线上，经度每差1度,实际距离为111×cosθ千米。（其中θ表示该纬线的纬度.在不同纬线上,经度每差1度的实际距离是不相等的）。

中国的经纬度范围介于经度73.5°E（新疆帕米尔高原）至135°E（黑龙江黑瞎子岛），纬度4°N（南沙群岛中的曾母暗沙）至53.5°N（黑龙江漠河以北的黑龙江主航道的中心线）之间。

111 * cos4 ≈ 110.7 km
111 * cos53.5 ≈ 66 km