在用flask-sqlalchemy对一个千万级别表进行count操作时,出现了耗时严重、内存飙升的情况。
要统计出一天内车辆访问次数,原代码如下:

car_visit_counts = CarVisit.query.filter(    CarVisit.park == car_visit.park,    CarVisit.plate_number == car_visit.plate_number,    CarVisit.visited_at >= today_start_time(),).count()

发现代码运行特别慢,所以把生成的sql打印出来看一下:

SELECT          COUNT(*) AS count_1FROM          (                    SELECT                              car_visits.id                            AS car_visits_id                            , car_visits.park_id                       AS car_visits_park_id                            , car_visits.store_id                      AS car_visits_store_id                            , car_visits.car_id                        AS car_visits_car_id                            , car_visits.brand_id                      AS car_visits_brand_id                            ,                             ...                    FROM                              car_visits                    WHERE                              %(param_1)s                 = car_visits.park_id                              AND car_visits.plate_number = %(plate_number_1)s                              AND car_visits.visited_at  >= %(visited_at_1)s          )          AS anon_1

可以发现进行了一次子查询,这样的话会生成临时表,效率低下,将原语句改变一下:

car_visit_counts = db.session.query(func.count(CarVisit.id)).filter(    CarVisit.park == car_visit.park,    CarVisit.plate_number == car_visit.plate_number,    CarVisit.visited_at >= today_start_time(),).scalar()

此时在看一下打印的sql语句:

SELECT          COUNT(car_visits.id) AS count_1FROM          car_visitsWHERE          %(param_1)s                 = car_visits.park_id          AND car_visits.plate_number = %(plate_number_1)s          AND car_visits.visited_at  >= %(visited_at_1)s

子查询消失了,速度也快了好多。