关于mongodb:MongoDB-find-getmore操作慢问题排查

本文来自取得《2021MongoDB 技术实际与利用案例征集流动》入围案例奖作品

作者：张家侨

本文介绍一次帮忙业务排查线上查问操作慢的问题的具体过程以及解决办法。

业务在应用 find().limit() 命令查问数据时遇到卡顿，一次操作卡最多 1 分钟，问题形容如下：

db.xxx_collection.find({timetag: { $gt: 1636513802167}, $or: [{ nto: "zhang"}, {nfrom:"zhang"} ], nid: 50, status: 0 }).sort({timetag: 1}).limit(200)

业务反馈：出问题的 MongoDB 版本为 4.2.12。这个查问从慢日志来看要四五十秒，然而间接跑的话第一批返回数据很快，getmore 的时候就慢了，你看下为啥会慢在 getmore

从问题形容来看，直观上是 getmore 造成了查询卡顿，可能起因有如下：

getmore 操作内核呈现卡顿 — 内核问题可能性较低

查问打算与被查问数据不匹配 — 可能性较高

上面问题排查将从这两个次要问题点动手。

将数据恢复到一个备用集群，而后应用业务的查问进行操作：

db.xxx_collection.find({timetag: { $gt: 1636513802167}, $or: [{ nto: "zhang"}, {nfrom: "zhang"} ], nid: 50, status: 0 }).sort({timetag:1}).limit(200)

第一次查问立即返回，当获取数据条数到 102 条时，呈现卡顿。操作产生慢日志，发现是触发了一次 getmore 操作。日志输入如下：

2021-12-15T10:21:55.007+0800 I COMMAND [conn857] command xxx_db.xxx_collection appName: “MongoDB Shell” command: getMore {getMore: 1244654399487918694, collection: “xxx_collection”, planSummary: IXSCAN {timetag: 1} cursorid:1244654399487918694 keysExamined:11338296 docsExamined:11338296cursorExhausted:1 numYields:88583 nreturned:99 reslen:100170 locks:{ReplicationStateTransition: { acquireCount: { w: 88584} }, Global: {acquireCount: { r: 88584} }, Database: {acquireCount: { r: 88584} }, Collection: {acquireCount: { r: 88584} }, Mutex: {acquireCount: { r: 1} } } storage:{data: { bytesRead: 15442700982, timeReadingMicros: 40865619},timeWaitingMicros: {cache: 33773} }protocol:op_msg 65270ms

确认是否是 getmore 的问题

在 MongoDB 中，其查问返回后果批大小默认为 101。也就是说，MongoDB 一次查问会找到 101 个后果而后一次性返回，当须要更多后果时再查找 101 个后果，通过 getmore 返回。

咱们是能够通过 batchSize 操作来设置 MongoDB 一次返回的数据个数的，包含 getmore 操作。

如果是 getmore 的问题，实践上调整其 batchSize 并不会影响问题的呈现，所以咱们进行了如下调整。

将 batchSize 设置为 150：

db.xxx_collection.find({timetag: { $gt: 1636513802167}, $or: [{ nto: "zhang"}, {nfrom: "zhang"} ], nid: 50, status: 0 }).sort({timetag:1}).batchSize(150).limit(200)

第一次查问立即返回后果，超过 batchSize 数据后，getmore 又卡顿，问题仍旧：

2021-12-15T10:25:54.781+0800 I COMMAND [conn859] command xxx_db.xxx_collection appName: “MongoDB Shell” command: getMore {getMore: 8826588127480667691, collection: “xxx_collection”, batchSize: 150, planSummary: IXSCAN {timetag: 1}cursorid:8826588127480667691 keysExamined:11338234 docsExamined:11338234cursorExhausted:1 numYields:88582 nreturned:50 reslen:50818 locks:{ReplicationStateTransition: { acquireCount: { w: 88583} }, Global: {acquireCount: { r: 88583} }, Database: {acquireCount: { r: 88583} }, Collection: {acquireCount: { r: 88583} }, Mutex: {acquireCount: { r: 1} } } storage:{data: { bytesRead: 16610295032, timeReadingMicros: 30201139},timeWaitingMicros: {cache: 17084} }protocol:op_msg 53826ms

调整为 199 后成果也相似，然而调整为 200 后，问题变为第一次查问就呈现卡顿：

db.xxx_collection.find({timetag: { $gt: 1636513802167}, $or: [{ nto: "zhang"}, {nfrom: "zhang"} ], nid: 50, status: 0 }).sort({timetag:1}).batchSize(200).limit(200)

相应慢日志如下：

2021-12-15T10:27:23.729+0800 I COMMAND [conn859] command xxx_db.xxx_collection appName: “MongoDB Shell” command: find {find: “xxx_collection”, filter: { timetag: { $gt: 1636513802167.0}, $or: [{ nto: “zhang”}, {nfrom: “zhang”} ], nid: 50.0, status: 0.0 }, projection: {$sortKey: { $meta: “sortKey”} }, sort: {timetag: 1.0}, limit: 200, batchSize: 200, planSummary: IXSCAN {timetag: 1}keysExamined:11338445 docsExamined:11338445 cursorExhausted:1 numYields:88582nreturned:200 queryHash:ECA82717 planCacheKey:AC7EC9E3 reslen:202045 locks:{ReplicationStateTransition: { acquireCount: { w: 88583} }, Global: {acquireCount: { r: 88583} }, Database: {acquireCount: { r: 88583} }, Collection: {acquireCount: { r: 88583} }, Mutex: {acquireCount: { r: 2} } } storage:{data: { bytesRead: 17688667726, timeReadingMicros: 14907251},timeWaitingMicros: {cache: 11784} }protocol:op_msg 36654ms

所以咱们能够根本排除是 getmore 操作自身的问题。从慢操作日志咱们能够看出，查问打算应用 timetag 索引来进行数据获取和过滤，一共遍历了 1 千万的数据。问题应该和数据查问打算以及数据分布无关，具体在查问进行第 199～200 个后果获取时产生了卡顿，且遍历 timetag 索引不够快。

所以咱们的剖析方向转为查问打算以及查问的数据上，确认是否存在问题，即查问打算不适宜指标数据。

首先，咱们须要理解业务数据的散布格局以及查问的目标，业务数据关键字段如下：

{"nto" : , "nfrom" : , "nid" : , "timetag" : , "status" :}

从库表名以及数据格式来看，查问数据为某种音讯的传递记录。

指标查问如下：

db.xxx_collection.find({timetag: { $gt: 1636513802167}, $or: [{ nto: "zhang"}, {nfrom: "zhang"} ], nid: 50, status: 0 }).sort({timetag:1}).limit(200)

而查问所要做的就是要找到某人在某个工夫点后，接管到或者发送的音讯记录，且 nid = 50 & status = 0。所以整个查问比拟重要的条件就是工夫 timetag 以及用户名。而大于某个工夫点这种查问可能存在指标数据量特地大的情景，上面进行确认。

mongos>db.xxx_collection.count()
538058824

由上咱们能够看出，整个指标汇合数据量为 5 亿多，数据量较大。上面针对性取出造成卡顿的第 199 个和第 200 个查问后果的 timetag。

第 199 个后果 timetag：

{"_id" : ObjectId("618b380a82010a586b4265e6"), "timetag" : NumberLong("1636513802759") }

第 200 个后果 timetag：

{"_id" : ObjectId("618ce26682010a586be027f9"), "timetag" : NumberLong("1636622950801") }

以查问条件的工夫点 1636513802167 为 T0，第 199 个后果的工夫点 1636513802759 为 T1，第 200 个后果的工夫点 1636622950801 为 T2。上面以这三个工夫为节点，对数据进行分段剖析。

T0 后总的数据量：

mongos>db.xxx_collection.find({timetag: { $gt: 1636513802167}}).count()
191829505

处于指标时间段的数据总量为 1.9 亿多，数据量比拟大

T0~T1 的数据量：

mongos>db.xxx_collection.find({$and: [{timetag: {$gt: 1636513802167}}, {timetag: {$lte: 1636513802759}}]}).count()
287

T1~T2 之间的数据量：

mongos>db.xxx_collection.find({$and: [{timetag: {$gt: 1636513802759}}, {timetag: {$lt: 1636622950801}}]}).count()
11338157

T2 后的数据量：

mongos>db.xxx_collection.find({timetag: {$gte: 1636622950801}}).count()
180491061

总查问后果：

mongos>db.xxx_collection.find({timetag: { $gt: 1636513802167}, $or: [{ nto: "zhang"}, {nfrom: "zhang"} ], nid: 50, status: 0 }).sort({timetag:1}).count()
428

T0~T1：

mongos>db.xxx_collection.find({$and:[{timetag: { $gt: 1636513802167}}, {timetag: {$lte: 1636513802759}}], $or: [{ nto: "zhang"}, {nfrom: "zhang"} ], nid: 50, status: 0 }).sort({timetag:1}).count()
199

T1~T2：

mongos>db.xxx_collection.find({$and:[{timetag: { $gt: 1636513802759}}, {timetag: {$lt: 1636622950801}}], $or: [{ nto: “zhang”}, {nfrom: “zhang”} ], nid: 50, status: 0 }).sort({timetag:1}).count()
0

T2 后：

mongos>db.xxx_collection.find({timetag: { $gte: 1636622950801}, $or: [{ nto:”zhang”}, {nfrom: “zhang”} ], nid: 50, status: 0}).sort({timetag: 1}).count()
229

从数据以及相应后果数量的散布能够看出，查问后果次要散布在 T0~T1 和 T2 后的时间段内。T1～T2 时间段内不存在符合条件的后果，有 1 千万不符合条件的数据存在。总结来说，后果散布在 timetag 字段上出现中间散布，两头存在微小空洞。

原查问打算

db.xxx_collection.find({timetag: { $gt: 1636513802167}, $or: [{ nto: "zhang"}, {nfrom: "zhang"} ], nid: 50, status: 0 }).sort({timetag:1}).limit(200).explain("executionStats")

失去的查问打算：


"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {"$eq" : "zhang"}
},
{
"nto" : {"$eq" : "zhang"}
}
]
},
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
},
{
"timetag" : {"$gt" : 1636513802167}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 200,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {"$eq" : "zhang"}
},
{
"nto" : {"$eq" : "zhang"}
}
]
},
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {"timetag" : 1},
"indexName" : "timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {"timetag" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"timetag" : ["(1636513802167.0, inf.0]"
]
}
}
}
}

由下面的后果咱们能够看出，原查问的查问打算为利用 timetag 的索引进行扫描，而后一一过滤出合乎 timetag 条件的记录。在数据分布剖析中咱们晓得，合乎指标 timetag 的数据有 1.9 亿，而扫描这些数据将会十分慢，即便应用索引。因为 getmore 操作应用的 cursor 是原查问打算产生的，同一个查问内只会应用同一个查问打算。上面通过分段执行原查问打算来佐证扫描 timetag 慢。

T0～T1 数据段执行原打算

应用上述查问打算查问 T0~T1 的数据：

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {"$eq" : "zhang"}
},
{
"nto" : {"$eq" : "zhang"}
}
]
},
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
},
{
"timetag" : {"$lte" : 1636513802759}
},
{
"timetag" : {"$gt" : 1636513802167}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 200,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {"$eq" : "zhang"}
},
{
"nto" : {"$eq" : "zhang"}
}
]
},
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {"timetag" : 1},
"indexName" : "timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {"timetag" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"timetag" : ["(1636513802167.0, 1636513802759.0]"
]
}
}
}
},

后果立即失去，因为总的须要扫描的数据量只有 287。

T1～T2 数据段执行原打算

应用上述查问打算查问 T1～T2 的数据：

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {"$eq" : "zhang"}
},
{
"nto" : {"$eq" : "zhang"}
}
]
},
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
},
{
"timetag" : {"$lt" : 1636622950801}
},
{
"timetag" : {"$gt" : 1636513802759}
}
]
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {"timetag" : 1},
"limitAmount" : 200,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
}
]
},
"inputStage" : {
"stage" : "OR",
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"nto" : 1,
"validflag" : 1,
"timetag" : 1
},
"indexName" : "nto_1_validflag_1_timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {"nto" : [],
"validflag" : [ ],
"timetag" : []},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"nto" : ["[\"zhang\", \"zhang\"]"
],
"validflag" : ["[MinKey, MaxKey]"
],
"timetag" : ["(1636513802759.0, 1636622950801.0)"
]
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"nfrom" : 1,
"timetag" : 1
},
"indexName" : "nfrom_1_timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {"nfrom" : [],
"timetag" : []},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"nfrom" : ["[\"zhang\", \"zhang\"]"
],
"timetag" : ["(1636513802759.0, 1636622950801.0)"
]
}
}
]
}
}
}
},

查问打算变了，强制应用 timetag_1 索引来应用原查问打算：

mongos>db.xxx_collection.find({$and:[{timetag: { $gt: 1636513802759}}, {timetag:{$lt: 1636622950801}}], $or: [{ nto: "zhang"}, {nfrom:"zhang"} ], nid: 50, status: 0 }).sort({timetag: 1}).limit(200).hint("timetag_1").explain("executionStats")

查问打算：

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {"$eq" : "zhang"}
},
{
"nto" : {"$eq" : "zhang"}
}
]
},
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
},
{
"timetag" : {"$lt" : 1636622950801}
},
{
"timetag" : {"$gt" : 1636513802759}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 200,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {"$eq" : "zhang"}
},
{
"nto" : {"$eq" : "zhang"}
}
]
},
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {"timetag" : 1},
"indexName" : "timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {"timetag" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"timetag" : ["(1636513802759.0, 1636622950801.0)"
]
}
}
}
},

查问耗时：

2021-12-15T11:18:43.650+0800 I COMMAND [conn913] command xxx_db.xxx_collection appName: “MongoDB Shell” command: find {find: “xxx_collection”, filter: { $and: [ { timetag: { $gt: 1636513802759.0} }, {timetag: { $lt: 1636622950801.0} } ], $or: [{ nto: “zhang”}, {nfrom: “zhang”} ], nid: 50.0, status: 0.0 }, projection: {$sortKey: { $meta: “sortKey”} }, sort: {timetag: 1.0}, hint: {$hint: “timetag_1”}, limit: 200, runtimeConstants: {localNow: new Date(1639538294423), clusterTime:Timestamp(1639538291, 1) },shardVersion: [Timestamp(0, 0), ObjectId(‘000000000000000000000000’) ], planSummary: IXSCAN {timetag: 1} keysExamined:11338157 docsExamined:11338157 cursorExhausted:1numYields:88579 nreturned:0 reslen:386 locks:{ReplicationStateTransition: {acquireCount: { w: 88580} }, Global: {acquireCount: { r: 88580} }, Database:{acquireCount: { r: 88580} },Collection: {acquireCount: { r: 88580} }, Mutex: {acquireCount: { r: 2} } } storage:{data: { bytesRead: 16223299833,timeReadingMicros: 9431804},timeWaitingMicros: {cache: 14082} }protocol:op_msg 29226ms

咱们发现，查问 T1～T2 的空洞区域十分慢，耗时 29 秒，因为须要扫描 1 千万多的数据。

T2 后数据段执行原打算

mongos> db.xxx_collection.find({timetag: { $gt: 1636622950801}, $or: [{ nto: "zhang"}, {nfrom: "zhang"} ], nid: 50, status: 0 }).sort({timetag:1}).limit(200).explain("executionStats")

查问打算：

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {"$eq" : "zhang"}
},
{
"nto" : {"$eq" : "zhang"}
}
]
},
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
},
{
"timetag" : {"$gt" : 1636622950801}
}
]
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {"timetag" : 1},
"limitAmount" : 200,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
}
]
},
"inputStage" : {
"stage" : "OR",
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"nto" : 1,
"validflag" : 1,
"timetag" : 1
},
"indexName" : "nto_1_validflag_1_timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {"nto" : [],
"validflag" : [ ],
"timetag" : []},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"nto" : ["[\"zhang\", \"zhang\"]"
],
"validflag" : ["[MinKey, MaxKey]"
],
"timetag" : ["(1636622950801.0, inf.0]"
]
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"nfrom" : 1,
"timetag" : 1
},
"indexName" : "nfrom_1_timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {"nfrom" : [],
"timetag" : []},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"nfrom" : ["[\"zhang\", \"zhang\"]"
],
"timetag" : ["(1636622950801.0, inf.0]"
]
}
}
]
}
}
}
},

查问打算变了，强制应用 timetag_1 索引来应用原查问打算：

mongos>db.xxx_collection.find({timetag: { $gt: 1636622950801}, $or: [{ nto: "zhang"}, {nfrom: "zhang"} ], nid: 50, status: 0 }).sort({timetag:1}).limit(200).hint("timetag_1").explain("executionStats")

查问打算：

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {"$eq" : "zhang"}
},
{
"nto" : {"$eq" : "zhang"}
}
]
},
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
},
{
"timetag" : {"$gt" : 1636622950801}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 200,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {"$eq" : "zhang"}
},
{
"nto" : {"$eq" : "zhang"}
}
]
},
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {"timetag" : 1},
"indexName" : "timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {"timetag" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"timetag" : ["(1636622950801.0, inf.0]"
]
}
}
}
},

日志工夫输入：

2021-12-15T11:36:34.388+0800 I COMMAND [conn918] command xxx_db.xxx_collection appName: “MongoDB Shell” command: explain {explain: { find: “xxx_collection”, filter: { timetag: { $gt: 1636622950801.0}, $or: [{ nto: “zhang”}, {nfrom: “zhang”} ], nid: 50.0, status: 0.0 }, sort: {timetag: 1.0}, hint: {$hint: “timetag_1”}, limit: 200, ****** $db: “xxx_db” } numYields:1109015 reslen:2691 locks:{ReplicationStateTransition: { acquireCount: { w: 1109016} }, Global: {acquireCount: { r: 1109016} }, Database: {acquireCount: { r: 1109016} }, Collection: {acquireCount: { r: 1109016} }, Mutex: {acquireCount: { r: 2} } } storage:{data: { bytesRead: 195293544507, timeReadingMicros: 518472952},timeWaitingMicros: {cache: 272870} }protocol:op_msg 801697ms

由下面能够看出，查问 T2 后的数据时，须要 800 秒，因为须要扫描 1.8 亿的数据。因为原查问应用 limit(200) 限度了查问的总后果数，且只有一个后果在这个时间段内，所以查问找到一个后果即可返回，不须要扫描整个时间段。

问题起因总结

总结来说，问题呈现起因是 MongoDB 给出的查问打算不合乎指标数据的查问。

原查问打算为应用 timetag 索引来进行扫描，获取后果后再依据其余条件进行过滤。

咱们发现，查问后果在 timetag 上分段散布，散布在 timetag 的中间，两头存在一大段有效数据。第 1～199 的后果散布在 T0～T1 时间段内，第 200 后的后果散布在 T2 时间段后。

如果应用原查问打算，即只应用 timetag 索引来扫描全表，在查问第 199 个后的后果时将会十分慢，因为这些后果散布在 1.9 亿的数据里。这也解释了第一次查问获取 101 个后果快，因为只需查扫描 T0～T1 时间段内 400 多个数据，第二次查问须要获取第 102～202 个数据则慢，因为须要先扫描 1 千多万的有效数据而后能力返回后果。原查问打算不够高效，导致扫描比较慢。

问题解决方案

只应用 timetag 索引进行数据遍历较低效，应用聚合索引来在遍历数据时进行过滤，缩小遍历的数据量。所以预期要应用 timetag 和 nfrom 以及 timetag 和 nto 的联结索引来进行并行查问，最初将两个查问后果进行合并。因为 MongoDB 的查问优化器不能将原有的查问转化为下面预期的查问打算，所以咱们须要改写查问打算，便于 MongoDB 的查问优化器进行辨认。

将原查问：

db.xxx_collection.find({timetag: { $gt: 1636513802167}, $or: [{ nto: "zhang"}, {nfrom: "zhang"} ], nid: 50, status: 0 }).sort({timetag:1}).limit(200)

转化为等价的优化查问：

db.xxx_collection.find({$or: [ {$and: [{nto: "zhang"}, {timetag: {$gt: 1636513802167}}, {nid:50}, {status: 0}]}, {$and: [{nfrom: "zhang"}, {timetag: {$gt: 1636513802167}}, {nid:50}, {status: 0}]} ] }).sort({timetag:1}).limit(200)

并在 nto 和 timetag 以及 nfrom 和 timetag（备注：原库曾经有 nfrom 和 timetag 的联结索引）上建设联结索引：

{
"v" : 2,
"key" : {
"nfrom" : 1,
"timetag" : 1
},
"name" : "nfrom_1_timetag_1",
"ns" : "xxx_db.xxx_collection"
},
{
"v" : 2,
"key" : {
"nto" : 1,
"timetag" : 1
},
"name" : "nto_1_timetag_1",
"ns" : "xxx_db.xxx_collection"
},

失去的查问打算：

"parsedQuery" : {
"$or" : [
{
"$and" : [
{
"nfrom" : {"$eq" : "zhang"}
},
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
},
{
"timetag" : {"$gt" : 1636513802167}
}
]
},
{
"$and" : [
{
"nid" : {"$eq" : 50}
},
{
"nto" : {"$eq" : "zhang"}
},
{
"status" : {"$eq" : 0}
},
{
"timetag" : {"$gt" : 1636513802167}
}
]
}
]
},
"winningPlan" : {
"stage" : "SUBPLAN",
"inputStage" : {
"stage" : "LIMIT",
"limitAmount" : 200,
"inputStage" : {
"stage" : "PROJECTION_SIMPLE",
"transformBy" : {"timetag" : 1},
"inputStage" : {
"stage" : "SORT_MERGE",
"sortPattern" : {"timetag" : 1},
"inputStages" : [
{
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"nfrom" : 1,
"timetag" : 1
},
"indexName" : "nfrom_1_timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {"nfrom" : [],
"timetag" : []},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"nfrom" : ["[\"zhang\", \"zhang\"]"
],
"timetag" : ["(1636513802167.0, inf.0]"
]
}
}
},
{
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"nid" : {"$eq" : 50}
},
{
"status" : {"$eq" : 0}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"nto" : 1,
"timetag" : 1
},
"indexName" : "nto_1_timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {"nto" : [],
"timetag" : []},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"nto" : ["[\"zhang\", \"zhang\"]"
],
"timetag" : ["(1636513802167.0, inf.0]"
]
}
}
}
]
}
}
}
},

这个查问打算合乎咱们的预期。

查问等价证实

上述优化的查问打算能够通过布尔等价运算失去。

原查问为：

db.xxx_collection.find({timetag: { $gt: 1636513802167}, $or: [{ nto: "zhang"}, {nfrom: "zhang"} ], nid: 50, status: 0 }).sort({timetag:1}).limit(200)

咱们将 find 的查问条件定义形象如下：

a：timetag > 1636513802167

b：nto = “zhang”

c：nfrom = “zhang”

d：nid = 50

e：status = 0

所以原查问条件相当于一个合取范式：

$$
a \bigwedge (b \bigvee c) \bigwedge d \bigwedge e
$$

通过逻辑运算，能够转化为一个析取范式：

$$
(a \bigwedge b \bigwedge d \bigwedge e) \bigvee (a \bigwedge c \bigwedge d\bigwedge e)
$$

对应如下查问：

db.xxx_collection.find({$or: [ {$and: [{nto: "zhang"}, {timetag: {$gt: 1636513802167}}, {nid:50}, {status: 0}]}, {$and: [{nfrom: "zhang"}, {timetag: {$gt: 1636513802167}}, {nid:50}, {status: 0}]} ] }).sort({timetag:1}).limit(200)

查问打算等价得证

对于作者：网易游戏张家侨

关于mongodb:MongoDB-find-getmore操作慢问题排查

问题形容

问题重现

问题排查

剖析查问的数据分布

数据在 timetag 上的散布

查问数据的散布

查问后果的散布

剖析执行的查问打算

Just My Socks（注册教程内含优惠码）

关于mongodb:MongoDB-find-getmore操作慢问题排查

问题形容

问题重现

问题排查

剖析查问的数据分布

数据在 timetag 上的散布

查问数据的散布

查问后果的散布

剖析执行的查问打算

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）