一:背景
1. 讲故事
七月中旬有一位敌人加wx求助,他的程序在生产上跑着跑着内存就飙起来了,貌似没有回头的趋势,询问如何解决,截图如下:
和这位敌人聊下来,感觉像是本人在小县城当了个小老板,法则的生存,有当地资源,各种小关系,有一股财务自在的滋味,这也是我始终向往的生存形式 。
既然敌人找到我了,我得想方法给他解决问题,既然是内存暴涨,我就赌一把在托管层面吧,嘿嘿,上windbg谈话。
二: windbg 剖析
1. 托管还是非托管
始终在追这个系列的敌人应该晓得,我无数次的用 !address -summary
和 !eeheap -gc
这两个命令来判断以后内存属于托管层还是非托管层。
0:000> !address -summary--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotalMEM_FREE 393 7dfe`f2105000 ( 125.996 TB) 98.43%MEM_RESERVE 1691 200`0f1e4000 ( 2.000 TB) 99.81% 1.56%MEM_COMMIT 6191 0`fed07000 ( 3.981 GB) 0.19% 0.00%0:000> !eeheap -gcNumber of GC Heaps: 1generation 0 starts at 0x000001D2E572BBC8generation 1 starts at 0x000001D2E54F70E0generation 2 starts at 0x000001D252051000ephemeral segment allocation context: none segment begin allocated size000001D252050000 000001D252051000 000001D26204FFE0 0xfffefe0(268431328)Large object heap starts at 0x000001D262051000 segment begin allocated size000001D262050000 000001D262051000 000001D2655F3F80 0x35a2f80(56242048)Total Size: Size: 0xbf4dbf80 (3209543552) bytes.------------------------------GC Heap Size: Size: 0xbf4dbf80 (3209543552) bytes.
卦象上过程指标为 3.98G
,GC堆指标为 3209543552 = 3G
,很显然,本次事变属于 托管层面
。
2. 寻找托管层上的大对象
咱们都晓得C#是托管语言,所以甭管有用没用的对象都逃不出GC堆,话中有话就是看GC堆准没错,挑几个大对象看看。
0:000> !dumpheap -statStatistics: MT Count TotalSize Class Name00007ff98a68f090 391475 43869284 System.Int32[]00007ff98b6adfa0 1902760 45666240 System.Collections.ObjectModel.ReadOnlyCollection`1[[System.Linq.Expressions.Expression, System.Linq.Expressions]]00007ff98b6ac3c0 1951470 46835280 System.Linq.Expressions.ConstantExpression00007ff98bc452e0 1681178 53797696 System.Linq.Expressions.TypedConstantExpression00007ff98eacb6b8 1902708 60886656 System.Dynamic.Utils.ListArgumentProvider00007ff98f236518 1774982 70999280 Microsoft.EntityFrameworkCore.Query.Expressions.ColumnExpression00007ff98c650c58 1681142 80694816 System.Linq.Expressions.MethodCallExpression300007ff98a82bc38 3414094 81938256 System.RuntimeMethodHandle00007ff98fd96fc0 17750 83936016 System.Collections.Generic.Dictionary`2+Entry[[System.Reflection.MemberInfo, System.Private.CoreLib],[System.Linq.Expressions.Expression, System.Linq.Expressions]][]00007ff98e5ed5d8 35493 101740504 System.Collections.Generic.Dictionary`2+Entry[[Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceCacheKey, Microsoft.Extensions.DependencyInjection],[System.Object, System.Private.CoreLib]][]00007ff98bcff6a8 3639389 116460448 System.Linq.Expressions.PropertyExpression00007ff98b85cf00 5028347 160907104 System.Reflection.Emit.GenericFieldInfo00007ff98a671e18 2178117 168395994 System.String00007ff98a5b6610 160565 171498416 System.Object[]00007ff98eaa8ab0 4981589 199263560 System.Linq.Expressions.MemberAssignment00007ff98a672360 398740 391928469 System.Byte[]00007ff98a746d68 181886 486150592 System.Char[]
从托管堆上看,System.Linq.Expressions.MemberAssignment
对象高达 498w ,很显著有问题,从类名看可能和 ExpressionTree 无关,那就抽几个对象看看它的援用链上是否有过大的对象。
0:000> !gcroot 000001d25399f690HandleTable: 000001D251B715A8 (pinned handle) -> 000001D262068CF0 System.Object[] -> 000001D2531C3B78 Microsoft.EntityFrameworkCore.Internal.ServiceProviderCache -> 000001D25399E3D0 Remotion.Linq.QueryModel -> 000001D25399E3B8 Remotion.Linq.Clauses.SelectClause -> 000001D25442C068 System.Linq.Expressions.MemberInitExpression -> 000001D25442C050 System.Runtime.CompilerServices.TrueReadOnlyCollection`1[[System.Linq.Expressions.MemberBinding, System.Linq.Expressions]] -> 000001D2539A0290 System.Linq.Expressions.MemberBinding[] -> 000001D25399F690 System.Linq.Expressions.MemberAssignment
援用链特地长,这里我就截取一下,通过一顿排查,我发现大对象竟然是 Remotion.Linq.Clauses.SelectClause
,objsize 这个对象间接爆掉了,真的很奇葩,如下代码所示:
0:000> !objsize 000001D25399E3B8sizeof(000001D25399E3B8) = -1187378032 (0xb93a0c90) bytes (Remotion.Linq.Clauses.SelectClause)
有点懵,这个对象竟然是罪魁祸首,从援用链看它是 EF 下的一个构建表达式树的小部件,能够必定的是,敌人在用EF的时候出了什么问题,不过还得硬着头皮持续挖 SelectClause,通过深挖,我发现这个类有大量这样的 char[]
数组,导出来后大略是上面这样。
Logistics.Text30), | Text31 = string TryReadValue(t1.Outer.Outer, 42, WmsOutboundConfirmLogistics.Text31), | Text32 = string TryReadValue(t1.Outer.Outer, 43, WmsOutboundConfirmLogistics.Text32), | Text33 = string TryReadValue(t1.Outer.Outer, 44, WmsOutboundConfirmLogistics.Text33), | Text34 = string TryReadValue(t1.Outer.Outer, 45, WmsOutboundConfirmLogistics.Text34), | Text35 = string TryReadValue(t1.Outer.Outer, 46, WmsOutboundConfirmLogistics.Text35), | IsQueue = Nullable<bool> TryReadValue(t1.Outer.Outer, 47, WmsOutboundConfirmLogistics.IsQueue), | IsStop = Nullable<bool> TryReadValue(t1.Outer.Outer, 48, WmsOutboundConfirmLogistics.IsStop), | CheckCode = string TryReadValue(t1.Outer.Outer, 49, WmsOutboundConfirmLogistics.CheckCode), | ClientCode = string TryReadValue(t1.Outer.Inner, 50, WmsOutboundOrder.ClientCode), | WarehouseCode = string TryReadValue(t1.Outer.Inner, 51, WmsOutboundOrder.WarehouseCode), | ErpNumber = string TryReadValue(t1.Outer.Inner, 52, WmsOutboundOrder.ErpNumber), | OrderCategory = string TryReadValue(t1.Outer.Inner, 53, WmsOutboundOrder.OrderCategory), | OrderStatus = string TryReadValue(t1.Outer.Inner, 54, WmsOutboundOrder.OrderStatus), | OrderType = string TryReadValue(t1.Outer.Inner, 55, WmsOutboundOrder.OrderType), | SendCompany = string TryReadValue(t1.Outer.Inner, 56, WmsOutboundOrder.SendCompany), | SendName = string TryReadValue(t1.Outer.Inner, 57, WmsOutboundOrder.SendName), | SendTel = string TryReadValue(t1.Outer.Inner, 58, WmsOutboundOrder.SendTel), | SendMobile = string TryReadValue(t1.Outer.Inner, 59, WmsOutboundOrder.SendMobile), | SendProvince = string TryReadValue(t1.Outer.Inner, 60, WmsOutboundOrder.SendProvince), | SendCity = string TryReadValue(t1.Outer.Inner, 61, WmsOutboundOrder.SendCity), | SendArea = string TryReadValue(t1.Outer.Inner, 62, WmsOutboundOrder.SendArea), | ...| CategoryName = string TryReadValue(t1.Outer.Inner, 88, WmsOutboundOrder.CategoryName), | SourcePlatformCode = string TryReadValue(t1.Outer.Inner, 89, WmsOutboundOrder.SourcePlatformCode), | PayMode = (string)string TryReadValue(t1.Outer.Outer, 90, null), | List = List<WmsOutboundConfirmLogisticsLinesDTO> WmsOutboundConfirmLogisticsBusiness.GetOrderLines(string TryReadValue(t1.Outer.Outer, 5, WmsOutboundConfirmLogistics.OrderNumber)), | ConfirmTime = DateTime TryReadValue(t1.Inner, 91, WmsOutboundOrderConfirmation.CreateTime), | ReturnUrl = (string)string TryReadValue(t1.Outer.Outer, 92, null) | }|__ ), |__ contextType: Core.DataRepository.BaseDbContext, |__ logger: DiagnosticsLogger<Query>, |__ queryContext: Unhandled parameter: queryContext)
从内容看,应该是 select 语句的ExpressionTree示意,问了下敌人,说大略是报表业务,不过这些信息给他貌似也没有多大帮忙,说实话到这里我其实也不晓得怎么持续往下排查了,陷入了失望。
3. 从失望中寻找心愿
我在想,既然EF构建了大量这样的 ExpressionTree,必定有问题,但也想不出是什么问题,隔了半天,我忽然灵光一现,EF既然构建了树,有可能sql也进去了,对,我何不间接在 heap 上搜寻 select 的sql语句。。。。
0:000> !strings /m:*select*Address Gen Length Value000001d2e4de64e0 2 1964 SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...000001d2e4e11e78 2 1964 SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...000001d2e4e3d1f0 2 1964 SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...000001d2e4e673c8 2 1964 SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...000001d2e4e91760 2 1964 SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...000001d2e4ebb2e8 2 1964 SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...000001d2e4ee54f8 2 1964 SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...000001d2e4f10758 2 1964 SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...000001d2e4f398d0 2 1964 SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...---------------------------------------18128 matching strings
果然发现了大量反复的 select 语句,而且从最右边的内存地址看都是十分靠近的,也就阐明他们是在某一个操作中同时生成的,而后咱们导出几个sql语句。
SELECT a."Id", ....FROM "WmsOutboundConfirmLogistics" AS aINNER JOIN "WmsOutboundOrder" AS b ON a."OrderNumber" = b."OrderNumber"INNER JOIN "WmsOutboundOrderConfirmation" AS c ON a."OrderNumber" = c."OrderNumber"WHERE (a."OrderNumber" = @__pagination_OrderNumber_0) AND (b."FreezeStatus" = FALSE)ORDER BY a."Id"";SELECT a."Id", ....FROM "WmsOutboundConfirmLogistics" AS aINNER JOIN "WmsOutboundOrder" AS b ON a."OrderNumber" = b."OrderNumber"INNER JOIN "WmsOutboundOrderConfirmation" AS c ON a."OrderNumber" = c."OrderNumber"WHERE (a."OrderNumber" = @__pagination_OrderNumber_0) AND (b."FreezeStatus" = FALSE)ORDER BY a."Id""
拿到这 1.8w
反复的sql 给敌人看,敌人说这是查问报表的sql。
4. 所有线索整合买通
那这里就存在着很大问题,既然是查问报表,为什么会有 1.8w 雷同的sql,惟一不同的就是 a."OrderNumber" = @__pagination_OrderNumber_0
中的订单号,难道不应该是 a.OrderNumber in (xxxx)
或者是表关联查问吗??? 整顿一下就是上面这样的猜测:
-- 现实select * from a where a.id in (1,2,3)-- 事实select * from a where a.id=1;select * from a where a.id=2;select * from a where a.id=3;
加上每个sql内存地址相近,再联合爆表的 Remotion.Linq.Clauses.SelectClause
对象,整个流程大略就是:本该表关联或者in操作,后果变成了无数个单条sql语句查问,导致EF底层呈现内存爆炸式增长。
三:总结
看了下敌人查问ef的写法,猜想大多都是人肉构建 ExpressionTree 去查询数据库,大写的,比方上面的这张图:
解决方案就是让敌人查看下示意式树的写法问题,或者间接灌写好的sql得了,说实话这个dump还是费了九牛二虎之力,本认为很简略,实操起来还是碰到了一点小艰难,就当历练成长吧!
更多高质量干货:参见我的 GitHub: dotnetfly