在用 flask-sqlalchemy 对一个千万级别表进行 count 操作时,出现了耗时严重、内存飙升的情况。
要统计出一天内车辆访问次数,原代码如下:
car_visit_counts = CarVisit.query.filter(
CarVisit.park == car_visit.park,
CarVisit.plate_number == car_visit.plate_number,
CarVisit.visited_at >= today_start_time(),).count()
发现代码运行特别慢,所以把生成的 sql 打印出来看一下:
SELECT
COUNT(*) AS count_1
FROM
(
SELECT
car_visits.id AS car_visits_id
, car_visits.park_id AS car_visits_park_id
, car_visits.store_id AS car_visits_store_id
, car_visits.car_id AS car_visits_car_id
, car_visits.brand_id AS car_visits_brand_id
,
...
FROM
car_visits
WHERE
%(param_1)s = car_visits.park_id
AND car_visits.plate_number = %(plate_number_1)s
AND car_visits.visited_at >= %(visited_at_1)s
)
AS anon_1
可以发现进行了一次子查询,这样的话会生成临时表,效率低下,将原语句改变一下:
car_visit_counts = db.session.query(func.count(CarVisit.id)).filter(
CarVisit.park == car_visit.park,
CarVisit.plate_number == car_visit.plate_number,
CarVisit.visited_at >= today_start_time(),).scalar()
此时在看一下打印的 sql 语句:
SELECT
COUNT(car_visits.id) AS count_1
FROM
car_visits
WHERE
%(param_1)s = car_visits.park_id
AND car_visits.plate_number = %(plate_number_1)s
AND car_visits.visited_at >= %(visited_at_1)s
子查询消失了,速度也快了好多。