关于go:Go十大常见错误第2篇benchmark性能测试的坑

前言

这是Go十大常见谬误系列的第二篇：benchmark性能测试的坑。素材来源于Go布道者，现Docker公司资深工程师Teiva Harsanyi。

本文波及的源代码全副开源在：Go十大常见谬误源代码，欢送大家关注公众号，及时获取本系列最新更新。

场景

go test反对benchmark性能测试，然而你晓得这里可能有坑么？

一个常见的坑是编译器内联优化，咱们来看一个具体的例子：

func add(a int, b int) int {
    return a + b
}

当初咱们要对add函数做性能测试，可能会编写如下测试代码：

func BenchmarkWrong(b *testing.B) {
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        add(1000000000, 1000000001)
    }
}

这里可能有什么坑呢？对于编译器而言，add函数是一个叶子函数(leaf function)，即add函数自身没有调用其它函数，所以编译器会对add函数的调用做内联(inline)优化，这会导致性能测试的后果不精确。因为咱们通常要测试的是本人程序自身的执行效率，而不是编译器做了优化后的执行效率，这样才不便咱们对程序的性能有一个正确的认知，而且你做go test测试时编译器的优化成果和理论生产环境运行时编译器的优化成果可能也不一样。

那怎么晓得执行go test的时候编译器是否做了内联优化呢？很简略，给go test减少-gcflags="-m"参数，-m示意打印编译器做出的优化决定。

$ go test -gcflags="-m" -v -bench=BenchmarkWrong -count 1
# example.com/benchmark [example.com/benchmark.test]
./go_util.go:3:6: can inline add
./go_bench_test.go:19:6: inlining call to add
./go_bench_test.go:16:21: b does not escape
# example.com/benchmark.test
/var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build2365344599/b001/_testmain.go:33:6: can inline init.0
/var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build2365344599/b001/_testmain.go:41:24: inlining call to testing.MainStart
/var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build2365344599/b001/_testmain.go:41:42: testdeps.TestDeps{} escapes to heap
/var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build2365344599/b001/_testmain.go:41:24: &testing.M{...} escapes to heap
goos: darwin
goarch: amd64
pkg: example.com/benchmark
cpu: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
BenchmarkWrong
BenchmarkWrong-4        1000000000               0.4601 ns/op
PASS
ok      example.com/benchmark   0.605s

下面的执行后果的./go_bench_test.go:19:6: inlining call to add就示意编译器对BenchmarkWrong里的add函数调用做了内联优化。

备注: -gcflags 的所有参数值能够执行go tool compile --help进行查看。

最佳实际

那在性能测试的时候怎么禁用编译期的内联优化呢？有2个计划：

-gcflags=”-l”

第一种计划，执行go test的时候，减少-gcfloags="-l"参数，-l示意禁用编译器的内联优化。

$ go test -gcflags="-m -l" -v -bench=BenchmarkWrong -count 3
# example.com/benchmark [example.com/benchmark.test]
./go_bench_test.go:16:21: b does not escape
# example.com/benchmark.test
/var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build2785655381/b001/_testmain.go:41:42: testdeps.TestDeps{} escapes to heap
goos: darwin
goarch: amd64
pkg: example.com/benchmark
cpu: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
BenchmarkWrong
BenchmarkWrong-4        476215998                2.447 ns/op
BenchmarkWrong-4        492860170                2.404 ns/op
BenchmarkWrong-4        483547294                2.388 ns/op
PASS
ok      example.com/benchmark   4.568s

通过下面的输入后果能够看出，并没有inlining call字样，这就证实了应用-gcflags="-l"参数后，编译器没有做内联优化了。

比照下编译期内联优化禁用前后的后果，性能差了将近5倍。

开启内联优化，耗时：0.4601 ns/op
-gcflags="-l"敞开内联优化，耗时大略：2.4 ns/op

go:noinline

第二种计划，应用//go:noinline编译器指令(compiler directive)，编译器在编译时会辨认到这个指令，不做内联优化。

//go:noinline
func add(a int, b int) int {
    return a + b
}

通过这种形式批改代码后，咱们就不须要应用-gcflags="-l"参数了，咱们来看看性能测试后果：

$ go test -gcflags="-m" -v -bench=BenchmarkWrong -count 3
# example.com/benchmark [example.com/benchmark.test]
./go_bench_test.go:16:21: b does not escape
# example.com/benchmark.test
/var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build1050705055/b001/_testmain.go:33:6: can inline init.0
/var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build1050705055/b001/_testmain.go:41:24: inlining call to testing.MainStart
/var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build1050705055/b001/_testmain.go:41:42: testdeps.TestDeps{} escapes to heap
/var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build1050705055/b001/_testmain.go:41:24: &testing.M{...} escapes to heap
goos: darwin
goarch: amd64
pkg: example.com/benchmark
cpu: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
BenchmarkWrong
BenchmarkWrong-4        482026485                2.422 ns/op
BenchmarkWrong-4        495307399                2.413 ns/op
BenchmarkWrong-4        407674614                2.613 ns/op
PASS
ok      example.com/benchmark   4.439s

通过下面的输入后果，同样能够看出编译器没有做内联优化了，最终的执行效率和第一种计划基本一致。

测试源代码地址：benchmark性能测试源代码，大家能够下载到本地进行测试。

备注: 网上有些文章的说法是把函数调用的后果赋值给一个局部变量，而后应用一个全局变量来承接这个局部变量的值就能够防止编译器的内联优化。这个说法实际上是谬误的，原作者Teiva Harsanyi在这方面也犯了谬误。要判断编译器是否做了内联优化，参考本文写的形式验证即可。

开源地址

文章和示例代码开源在GitHub: Go语言高级、中级和高级教程。

公众号：coding进阶。关注公众号能够获取最新Go面试题和技术栈。

集体网站：Jincheng’s Blog。

知乎：无忌。

References

https://itnext.io/the-top-10-…
https://codeantenna.com/a/xxY…
gcflag参数阐明：https://pkg.go.dev/cmd/compile
https://dave.cheney.net/2018/…

关于go:Go十大常见错误第2篇benchmark性能测试的坑

前言

场景

最佳实际

-gcflags=”-l”

go:noinline

开源地址

References

评论

发表回复取消回复

更多文章

DDN HPC 存储硬件架构设计深度分析

探秘IO500：从Lustre并行文件系统出发，开启HPC存储性能新征程

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

关于go:Go十大常见错误第2篇benchmark性能测试的坑

前言

场景

最佳实际

-gcflags=”-l”

go:noinline

开源地址

References

评论

发表回复 取消回复

更多文章

DDN HPC 存储硬件架构设计深度分析

探秘IO500：从Lustre并行文件系统出发，开启HPC存储性能新征程

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

发表回复取消回复