关于java:JMH指北

47次阅读

共计 56092 个字符,预计需要花费 141 分钟才能阅读完成。

原始博文链接

今回介绍入门一款实用性不错的测试工具——JMH,其实理解这个货色也是一段时间之前的事件了,拖拖拉拉到当初,感觉曾经遗记了大部分,所以就当重温吧。理解 JMH 的同时会自然而然接触到一些 JVM 相干的常识,值得学习一番。

简介

JMH 全称 Java Microbenchmark Harness,翻译过去就是 Java 微基准测试工具套件。很显著它是一款 Java 的测试工具,而其中的微基准则表明了它的实用层级。对代码性能的追赶是码农经常须要做的事件,那么代码的性能到底怎么样,不能靠嘴巴说而须要量化的指标,很多开源工具会给出 JMH 的比照测试后果来显示本人性能是如何的优越。现在计算机的算力对于执行一段代码块来说,很有可能就是几纳秒的事件,因而为了得出“肉眼可见”的论断,往往须要循环重试。没有接触 JMH 之前我置信大多数人都做过把一个办法用 for 循环执行 n 次并且记录起始完结工夫来验证这个办法耗时如何的事件,这对于纯编译执行的语言或者没什么问题,然而对于 Java 或者基于 JVM 的语言来说并不能失去最精确的后果,JVM 做了很多咱们看不到的事件,所以同一个测试运行屡次可能会看到差异较大的后果。而 JMH 就是为了解决这个问题而来,它由 JVM 开发人员编写,编写 JMH 不是一件容易的事件,因为这须要十分相熟 JVM 的运行机制。

用法说明

JDK9 以上的版本自带了 JMH,其余版本则须要引入相干的依赖。JMH 的主页很简略,基本上就只是有一个指向 Github 我的项目地址的连贯,而 Github 我的项目中的主页也只是给出了一些简略的用法说明,其余的只是通知你去看样例来了解。其实这样我感觉挺不错,所以本篇的内容次要就是照着样例一个一个解释。

官网我的项目阐明文档中写明了举荐应用命令行来执行测试,首先采纳 maven 来创立我的项目的根本骨架,而后编写测试代码并打包,最初应用命令行调用执行 jar 包。在编写时举荐将 JMH 构建成为一个独立的我的项目,在关系上依赖具体的利用我的项目,这样可能确保基准测试程序正确地初始化并产生牢靠的后果。当然也能够抉择在 IDE 中间接运行,当初风行的 IDE 如 IDEA 中也提供了相干插件,在进行理解学习时是个不错的应用形式。

样例阐明

01 HelloWorld

public class JMHSample_01_HelloWorld {

    @Benchmark
    public void wellHelloThere() {// this method was intentionally left blank.}
    
    public static void (String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_01_HelloWorld.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();}
    
}

JMH 的工作形式如下: 用户应用 @benchmark 正文办法,而后 JMH 执行生成的代码,以此尽可能牢靠地执行该测试方法。请浏览 @Benchmark 的 javadoc 正文来理解残缺的语义和限度。办法名称并不重要,只有办法用 @benchmark 它就会被认为是一个基准测试方法,在同一个类中能够有多个基准办法。留神如果基准测试方法永远不完结,那么 JMH 运行也永远不会完结。如果您从办法体中抛出异样,JMH 运行会立即完结这个基准测试,而后执行列表中的下一个基准测试。只管这个基准测试什么也没有执行,但它很好地展现了根底构造对于测量的负载,没有任何基础设施不会导致任何开销,重要的是要晓得你正在解决的根底管理费用是多少。在未来的示例中,你可能会发现这种思维是通过比拟“基线”测量后果而开展的。

02 BenchmarkModes

public class JMHSample_02_BenchmarkModes {

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void measureThroughput() throws InterruptedException {TimeUnit.MILLISECONDS.sleep(100);
    }
    
    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void measureAvgTime() throws InterruptedException {TimeUnit.MILLISECONDS.sleep(100);
    }
    
    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void measureSamples() throws InterruptedException {TimeUnit.MILLISECONDS.sleep(100);
    }
    
    @Benchmark
    @BenchmarkMode(Mode.SingleShotTime)
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void measureSingleShot() throws InterruptedException {TimeUnit.MILLISECONDS.sleep(100);
    }

    @Benchmark
    @BenchmarkMode({Mode.Throughput, Mode.AverageTime, Mode.SampleTime, Mode.SingleShotTime})
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void measureMultiple() throws InterruptedException {TimeUnit.MILLISECONDS.sleep(100);
    }
    
    @Benchmark
    @BenchmarkMode(Mode.All)
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void measureAll() throws InterruptedException {TimeUnit.MILLISECONDS.sleep(100);
    }

}

这个例子介绍了注解 @BenchmarkMode 以及配合应用的 @OutputTimeUnit,这个注解接管的枚举值代表了测试类型,留神注解接管的是数组类型,这代表你能够同时执行多种测试。

  • Throughput:单位工夫内的执行次数。过在无限迭代工夫内一直调用基准办法并计算执行该办法的次数来度量原始吞吐量。
  • AverageTime:每次执行的均匀耗时。它与 Throughput 类似,只是有时度量工夫更不便。
  • SampleTime:采样每次执行的工夫。在这种模式下,依然是在有工夫限度的迭代中运行该办法,然而不测量总工夫,而是测量某几次调用测试方法所破费的工夫。次要是为了推断工夫散布和百分比。JMH 会尝试主动调整采样频率,如果办法执行过于迟缓会导致所有执行都会被采集。
  • SingleShotTime:测量单次执行工夫。迭代次数在这种模式下是无意义的,这种模式对于测试冷启动执行成果很有用。
  • All:所有模式汇合。

样例的 javadoc 中还阐明了如果你对某些执行行为感到纳闷,能够尝试查看生成的代码,你可能会发现代码并没有在做你冀望做的事件。

03 States

public class JMHSample_03_States {@State(Scope.Benchmark)
    public static class BenchmarkState {volatile double x = Math.PI;}

    @State(Scope.Thread)
    public static class ThreadState {volatile double x = Math.PI;}
    
    @Benchmark
    public void measureUnshared(ThreadState state) {
        // All benchmark threads will call in this method.
        //
        // However, since ThreadState is the Scope.Thread, each thread
        // will have it's own copy of the state, and this benchmark
        // will measure unshared case.
        state.x++;
    }
    
    @Benchmark
    public void measureShared(BenchmarkState state) {
        // All benchmark threads will call in this method.
        //
        // Since BenchmarkState is the Scope.Benchmark, all threads
        // will share the state instance, and we will end up measuring
        // shared case.
        state.x++;
    }
    
    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_03_States.class.getSimpleName())
                .threads(4)
                .forks(1)
                .build();

        new Runner(opt).run();}

}

很多时候在执行基准测试的时候你须要保护某些状态,同时 JMH 常常用于构建并发型基准测试,因而提供了状态对象的标记注解:@State,应用其标注的对象将会被按需构建并且在整个测试过程中依照给定的范畴重用。留神 State 对象总是会被某一个须要获取它的线程实例化,这意味着你能够像在工作线程中那样初始化字段。基准测试方法能够间接援用这些 State 对象(作为办法参数),JMH 会主动做注入操作。

04 Default State

@State(Scope.Thread)
public class JMHSample_04_DefaultState {

    double x = Math.PI;

    @Benchmark
    public void measure() {x++;}
    
    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_04_DefaultState.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();}

}

很多状况下你只须要一个状态对象,此时你能够抉择将基准测试类本身标记 @State,这样就可能很不便地援用本身的成员。

05 State Fixtures

@State(Scope.Thread)
public class JMHSample_05_StateFixtures {

    double x;
    
    @Setup
    public void prepare() {x = Math.PI;}
    
    @TearDown
    public void check() {assert x > Math.PI : "Nothing changed?";}

    @Benchmark
    public void measureRight() {x++;}
    
    @Benchmark
    public void measureWrong() {
        double x = 0;
        x++;
    }
    
    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_05_StateFixtures.class.getSimpleName())
                .forks(1)
                .jvmArgs("-ea")
                .build();

        new Runner(opt).run();}

}

因为 State 对象在 benchmark 生命周期中维持,因而相干状态治理办法会有所帮忙,JMH 提供了一些常见的状态治理办法,如果应用 Junit 或者 TestNG 会对这些十分相熟。这些治理办法只会对 State 对象无效,否则 JMH 将会编译失败。同时办法只会在某个应用 State 对象的线程中调用,这意味着治理办法内是线程公有环境。

06 Fixture Level

@State(Scope.Thread)
public class JMHSample_06_FixtureLevel {

    double x;
    
    @TearDown(Level.Iteration)
    public void check() {assert x > Math.PI : "Nothing changed?";}
    
    @Benchmark
    public void measureRight() {x++;}

    @Benchmark
    public void measureWrong() {
        double x = 0;
        x++;
    }
    
    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_06_FixtureLevel.class.getSimpleName())
                .forks(1)
                .jvmArgs("-ea")
                .shouldFailOnError(false) // switch to "true" to fail the complete run
                .build();

        new Runner(opt).run();}

}

状态治理办法能够在不同层级执行,次要提供了三种:

  1. Level.Trial:在整个 benchmark 执行前后调用
  2. Level.Iteration:在每次迭代执行前后调用
  3. Level.Invocation:在每次办法调用前后执行。留神如果要应用这个级别请认真查看相干 javadoc,理解其应用限度

执行状态治理办法消耗的工夫不会统计入后果,所以在办法内能够做一些比拟重的操作。

07 Fixture Level Invocation

@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class JMHSample_07_FixtureLevelInvocation {

    /*
     * Fixtures have different Levels to control when they are about to run.
     * Level.Invocation is useful sometimes to do some per-invocation work,
     * which should not count as payload. PLEASE NOTE the timestamping and
     * synchronization for Level.Invocation helpers might significantly offset
     * the measurement, use with care. See Level.Invocation javadoc for further
     * discussion.
     *
     * Consider this sample:
     */

    /*
     * This state handles the executor.
     * Note we create and shutdown executor with Level.Trial, so
     * it is kept around the same across all iterations.
     */

    @State(Scope.Benchmark)
    public static class NormalState {
        ExecutorService service;

        @Setup(Level.Trial)
        public void up() {service = Executors.newCachedThreadPool();
        }

        @TearDown(Level.Trial)
        public void down() {service.shutdown();
        }

    }

    /*
     * This is the *extension* of the basic state, which also
     * has the Level.Invocation fixture method, sleeping for some time.
     */

    public static class LaggingState extends NormalState {public static final int SLEEP_TIME = Integer.getInteger("sleepTime", 10);

        @Setup(Level.Invocation)
        public void lag() throws InterruptedException {TimeUnit.MILLISECONDS.sleep(SLEEP_TIME);
        }
    }

    /*
     * This allows us to formulate the task: measure the task turnaround in
     * "hot" mode when we are not sleeping between the submits, and "cold" mode,
     * when we are sleeping.
     */

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    public double measureHot(NormalState e, final Scratch s) throws ExecutionException, InterruptedException {return e.service.submit(new Task(s)).get();}

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    public double measureCold(LaggingState e, final Scratch s) throws ExecutionException, InterruptedException {return e.service.submit(new Task(s)).get();}

    /*
     * This is our scratch state which will handle the work.
     */

    @State(Scope.Thread)
    public static class Scratch {
        private double p;
        public double doWork() {p = Math.log(p);
            return p;
        }
    }

    public static class Task implements Callable<Double> {
        private Scratch s;

        public Task(Scratch s) {this.s = s;}

        @Override
        public Double call() {return s.doWork();
        }
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_07_FixtureLevelInvocation.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();}


}

给了一个 Level.Invocation 的应用示例。能够看到定义了三个 State 对象,并且前两个有继承关系,留神到状态对象办法的 Level 有所不同。两个 benchmark 办法尽管内容雷同,然而因为应用了不同的 State 对象,measureCold 办法会在每次调用前睡 10ms,以此模仿比照线程池不同应用模式下的体现。

Level.Invocation 对于每次执行都要执行一些前置或者后续操作时会比拟不便,然而应用它你须要仔细阅读它的 javadoc 阐明。在它的 javadoc 中阐明它次要实用于执行工夫超过 1ms 的办法,并给出了四点警示:

  1. 因为 Setup、Teardown 等办法不能计入性能统计后果,因而应用这个 Level 时必须对每次调用独自计时,如果办法调用工夫很短,那么为了计时所发动的获取零碎工夫戳的调用将会影响测试后果甚至造成瓶颈
  2. 还是因为独自计时造成的问题,因为独自计时而后累加,这可能造成精度失落,求和失去较短的工夫
  3. 为了维持与其余 Level 雷同的共享行为,JMH 有时须要在拜访 state 对象时进行 synchronized 同步,这有可能使测量后果偏移正确值
  4. 依据以后的实现,辅助办法与基准测试方法是交叠执行的,这在多线程基准测试时可能会有影响,比方某个线程在执行基准测试方法时能够察看到别的线程曾经调用了 TearDown 从而导致产生异样。

08 Dead Code

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_08_DeadCode {

    private double x = Math.PI;

    @Benchmark
    public void baseline() {// do nothing, this is a baseline}

    @Benchmark
    public void measureWrong() {
        // This is wrong: result is not used and the entire computation is optimized away.
        Math.log(x);
    }

    @Benchmark
    public double measureRight() {
        // This is correct: the result is being used.
        return Math.log(x);
    }
    
    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_08_DeadCode.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();}

}

这个例子阐明了 Dead Code 陷阱,许多基准测试失败的起因是因为没有思考 Dead-Code Elimination(DCE 死代码打消)。编译器十分聪慧,可能推断出某些计算是多余的,并将其齐全打消,如果被淘汰的局部是咱们的基准测试代码,那么就会呈现问题。所幸 JMH 提供了必要的基础设施来应答这种情况,你能够为办法定义返回值,将计算结果返回,这样 JMH 就会增加对 DCE 的对应解决。

09 Blackholes

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class JMHSample_09_Blackholes {    
    
    double x1 = Math.PI;
    double x2 = Math.PI * 2;
    
    @Benchmark
    public double baseline() {return Math.log(x1);
    }

    /*
     * While the Math.log(x2) computation is intact, Math.log(x1)
     * is redundant and optimized out.
     */

    @Benchmark
    public double measureWrong() {Math.log(x1);
        return Math.log(x2);
    }

    /*
     * This demonstrates Option A:
     *
     * Merge multiple results into one and return it.
     * This is OK when is computation is relatively heavyweight, and merging
     * the results does not offset the results much.
     */

    @Benchmark
    public double measureRight_1() {return Math.log(x1) + Math.log(x2);
    }

    /*
     * This demonstrates Option B:
     *
     * Use explicit Blackhole objects, and sink the values there.
     * (Background: Blackhole is just another @State object, bundled with JMH).
     */

    @Benchmark
    public void measureRight_2(Blackhole bh) {bh.consume(Math.log(x1));
        bh.consume(Math.log(x2));
    }
    
    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_09_Blackholes.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();}

}

这个例子引出了最终解决 DCE 的对象 Blackhole,如果基准测试方法只有一个计算结果那么你能够间接将其返回,JMH 对隐式调用 Blockhole 来解决返回值。然而如果测试方法有多个返回值,则能够尝试间接引入 Blackhole 对象手动解决。

10 Constant Fold

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_10_ConstantFold {

    // IDEs will say "Oh, you can convert this field to local variable". Don't. Trust. Them.
    // (While this is normally fine advice, it does not work in the context of measuring correctly.)
    private double x = Math.PI;

    // IDEs will probably also say "Look, it could be final". Don't. Trust. Them. Either.
    // (While this is normally fine advice, it does not work in the context of measuring correctly.)
    private final double wrongX = Math.PI;
    
    @Benchmark
    public double baseline() {
        // simply return the value, this is a baseline
        return Math.PI;
    }

    @Benchmark
    public double measureWrong_1() {
        // This is wrong: the source is predictable, and computation is foldable.
        return Math.log(Math.PI);
    }

    @Benchmark
    public double measureWrong_2() {
        // This is wrong: the source is predictable, and computation is foldable.
        return Math.log(wrongX);
    }

    @Benchmark
    public double measureRight() {
        // This is correct: the source is not predictable.
        return Math.log(x);
    }
    
    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_10_ConstantFold.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();}

}

这个例子与 JVM 的优化——常量折叠相干。如果 JVM 发现计算的后果无论如何都是一样的即是一个常量,它能够奇妙地对其进行优化。在给出的例子中,这意味着咱们能够将计算移到外部 JMH 循环之外。通常咱们能够通过读取非 final 的 State 对象字段来防止这种状况。留神 IDE 有时会给出将字段定义为 final 的倡议,这对于一般代码来说是正确的,然而在基准测试状况下须要认真思考。

11 Loops

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_11_Loops {

    /*
     * Suppose we want to measure how much it takes to sum two integers:
     */

    int x = 1;
    int y = 2;

    /*
     * This is what you do with JMH.
     */

    @Benchmark
    public int measureRight() {return (x + y);
    }

    /*
     * The following tests emulate the naive looping.
     * This is the Caliper-style benchmark.
     */
    private int reps(int reps) {
        int s = 0;
        for (int i = 0; i < reps; i++) {s += (x + y);
        }
        return s;
    }

    /*
     * We would like to measure this with different repetitions count.
     * Special annotation is used to get the individual operation cost.
     */

    @Benchmark
    @OperationsPerInvocation(1)
    public int measureWrong_1() {return reps(1);
    }

    @Benchmark
    @OperationsPerInvocation(10)
    public int measureWrong_10() {return reps(10);
    }

    @Benchmark
    @OperationsPerInvocation(100)
    public int measureWrong_100() {return reps(100);
    }

    @Benchmark
    @OperationsPerInvocation(1_000)
    public int measureWrong_1000() {return reps(1_000);
    }

    @Benchmark
    @OperationsPerInvocation(10_000)
    public int measureWrong_10000() {return reps(10_000);
    }

    @Benchmark
    @OperationsPerInvocation(100_000)
    public int measureWrong_100000() {return reps(100_000);
    }
    
    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_11_Loops.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();}
    
}

这个例子表明了使用者不应该在基准测试方法中被动增加循环并缩小办法调用次数。循环是为了最小化调用测试方法的开销,通过在外部循环而不是在办法调用层面循环调用——这个观点是不正确的,当咱们容许优化器合并循环迭代时,你会看到一些意想不到的状况。

执行下面的代码能够发现,当 JVM 对外部循环进行优化当前,耗时体现有 10 倍的晋升(机子不同可能有所差异)。

12 Forking

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_12_Forking {
    
    /*
     * Suppose we have this simple counter interface, and two implementations.
     * Even though those are semantically the same, from the JVM standpoint,
     * those are distinct classes.
     */

    public interface Counter {int inc();
    }

    public static class Counter1 implements Counter {
        private int x;

        @Override
        public int inc() {return x++;}
    }

    public static class Counter2 implements Counter {
        private int x;

        @Override
        public int inc() {return x++;}
    }

    /*
     * And this is how we measure it.
     * Note this is susceptible for same issue with loops we mention in previous examples.
     */

    public int measure(Counter c) {
        int s = 0;
        for (int i = 0; i < 10; i++) {s += c.inc();
        }
        return s;
    }

    /*
     * These are two counters.
     */
    Counter c1 = new Counter1();
    Counter c2 = new Counter2();

    /*
     * We first measure the Counter1 alone...
     * Fork(0) helps to run in the same JVM.
     */

    @Benchmark
    @Fork(0)
    public int measure_1_c1() {return measure(c1);
    }

    /*
     * Then Counter2...
     */

    @Benchmark
    @Fork(0)
    public int measure_2_c2() {return measure(c2);
    }

    /*
     * Then Counter1 again...
     */

    @Benchmark
    @Fork(0)
    public int measure_3_c1_again() {return measure(c1);
    }
    
    /*
     * These two tests have explicit @Fork annotation.
     * JMH takes this annotation as the request to run the test in the forked JVM.
     * It's even simpler to force this behavior for all the tests via the command
     * line option "-f". The forking is default, but we still use the annotation
     * for the consistency.
     *
     * This is the test for Counter1.
     */

    @Benchmark
    @Fork(1)
    public int measure_4_forked_c1() {return measure(c1);
    }

    /*
     * ...and this is the test for Counter2.
     */

    @Benchmark
    @Fork(1)
    public int measure_5_forked_c2() {return measure(c2);
    }
    
    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_12_Forking.class.getSimpleName())
                .build();

        new Runner(opt).run();}

}

JVM 善于 profile-guided optimizations。这对基准测试来说是不利的,因为不同的测试能够将它们的 profile 混合在一起,而后为每个测试提供“对立蹩脚”的代码。Fork(在独自的过程中运行)每个测试能够躲避这个问题。JMH 默认会 Fork 过程来解决测试方法。能够在测试时查看过程来验证。

下面的样例代码中,Counter1 和 Counter2 在逻辑上是等价的,然而在 JVM 看来依然是不同的对象。因而在同一个过程中交替混合执行两种计数办法,会导致性能反而呈现降落的状况,measure_3_c1_again 的体现会显著差于 measure_1_c1。

13 Run To Run

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class JMHSample_13_RunToRun {

    /*
     * In order to introduce readily measurable run-to-run variance, we build
     * the workload which performance differs from run to run. Note that many workloads
     * will have the similar behavior, but we do that artificially to make a point.
     */

    @State(Scope.Thread)
    public static class SleepyState {
        public long sleepTime;

        @Setup
        public void setup() {sleepTime = (long) (Math.random() * 1000);
        }
    }

    /*
     * Now, we will run this different number of times.
     */

    @Benchmark
    @Fork(1)
    public void baseline(SleepyState s) throws InterruptedException {TimeUnit.MILLISECONDS.sleep(s.sleepTime);
    }

    @Benchmark
    @Fork(5)
    public void fork_1(SleepyState s) throws InterruptedException {TimeUnit.MILLISECONDS.sleep(s.sleepTime);
    }

    @Benchmark
    @Fork(20)
    public void fork_2(SleepyState s) throws InterruptedException {TimeUnit.MILLISECONDS.sleep(s.sleepTime);
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_13_RunToRun.class.getSimpleName())
                .warmupIterations(0)
                .measurementIterations(3)
                .build();

        new Runner(opt).run();}

}

JVM 是一个简单的零碎,这也会导致很多的不确定性。有时咱们必须要思考单次执行的差异性,而 JMH 提供的 Fork 个性在躲避 PGO 的同时也会主动将所有过程的后果纳入统计后果,不便咱们应用。代码样例中,sleepTime 由随机数计算得出,以此模仿每次执行的差异性。

14 N/A

样例被删除了?

15 Asymmetric

@State(Scope.Group)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_15_Asymmetric {

private AtomicInteger counter;

    @Setup
    public void up() {counter = new AtomicInteger();
    }

    @Benchmark
    @Group("g")
    @GroupThreads(3)
    public int inc() {return counter.incrementAndGet();
    }

    @Benchmark
    @Group("g")
    @GroupThreads(1)
    public int get() {return counter.get();
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_15_Asymmetric.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();}
    
}

这个例子介绍了 Group 的概念,在此之前,所有的测试都是对称统一的,所有的线程执行雷同的代码。有了 Group 就能够执行非对称测试,它能够将多个办法绑定在一起并且规定线程应该如何散布。以上述代码为例,两个办法 inc 和 get 都属于同一个 group g,然而调配了不同的线程数量,执行测试时能够发现有 3 个线程执行 inc 办法、1 个线程执行 get 办法。如果应用 4 个线程来执行测试,只会生成一个执行组,应用 4 * N 个线程将会调用 N 个执行组。

留神 State 对象的范畴还包含 Scope.Group,这可能使得 State 对象在每个 group 外部分享。

16 Complier Control

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_16_CompilerControl {

    /**
     * These are our targets:
     *   - first method is prohibited from inlining
     *   - second method is forced to inline
     *   - third method is prohibited from compiling
     *
     * We might even place the annotations directly to the benchmarked
     * methods, but this expresses the intent more clearly.
     */

    public void target_blank() {// this method was intentionally left blank}

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public void target_dontInline() {// this method was intentionally left blank}

    @CompilerControl(CompilerControl.Mode.INLINE)
    public void target_inline() {// this method was intentionally left blank}

    @CompilerControl(CompilerControl.Mode.EXCLUDE)
    public void target_exclude() {// this method was intentionally left blank}

    /*
     * These method measures the calls performance.
     */

    @Benchmark
    public void baseline() {// this method was intentionally left blank}

    @Benchmark
    public void blank() {target_blank();
    }

    @Benchmark
    public void dontinline() {target_dontInline();
    }

    @Benchmark
    public void inline() {target_inline();
    }

    @Benchmark
    public void exclude() {target_exclude();
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_16_CompilerControl.class.getSimpleName())
                .warmupIterations(0)
                .measurementIterations(3)
                .forks(1)
                .build();

        new Runner(opt).run();}

}

这个例子表明了能够应用注解来通知编译器执行一些特定的操作,比方是否进行办法内联(inline)。具体查看下面的代码就能够,比拟明确。

17 Sync Iterations

@State(Scope.Thread)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class JMHSample_17_SyncIterations {

    /*
     * This is the another thing that is enabled in JMH by default.
     *
     * Suppose we have this simple benchmark.
     */

    private double src;

    @Benchmark
    public double test() {
        double s = src;
        for (int i = 0; i < 1000; i++) {s = Math.sin(s);
        }
        return s;
    }

    /*
     * It turns out if you run the benchmark with multiple threads,
     * the way you start and stop the worker threads seriously affects
     * performance.
     *
     * The natural way would be to park all the threads on some sort
     * of barrier, and the let them go "at once". However, that does
     * not work: there are no guarantees the worker threads will start
     * at the same time, meaning other worker threads are working
     * in better conditions, skewing the result.
     *
     * The better solution would be to introduce bogus iterations,
     * ramp up the threads executing the iterations, and then atomically
     * shift the system to measuring stuff. The same thing can be done
     * during the rampdown. This sounds complicated, but JMH already
     * handles that for you.
     *
     */

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_17_SyncIterations.class.getSimpleName())
                .warmupTime(TimeValue.seconds(1))
                .measurementTime(TimeValue.seconds(1))
                .threads(Runtime.getRuntime().availableProcessors()*16)
                .forks(1)
                .syncIterations(true) // try to switch to "false"
                .build();

        new Runner(opt).run();}

}

这个样例想要表白的内容次要在正文中。实际表明如果应用多线程来执行基准测试,工作线程的开始和完结形式将重大影响性能体现。通常的做法是将所有线程进行在某个相似栅栏的中央让后对立放行,然而这样并不是很无效,因为这并不能保障工作线程在同一时间开始工作。更好的解决方案是引入虚伪迭代,减少执行迭代的线程,而后原子性地将零碎转移到执行测量方法上,在进行过程中也能够做同样的事件。这听起来很简单,然而 JMH 曾经解决好了,即执行选项:syncIterations。

18 Control

@State(Scope.Group)
public class JMHSample_18_Control {

    /*
     * In this example, we want to estimate the ping-pong speed for the simple
     * AtomicBoolean. Unfortunately, doing that in naive manner will livelock
     * one of the threads, because the executions of ping/pong are not paired
     * perfectly. We need the escape hatch to terminate the loop if threads
     * are about to leave the measurement.
     */

    public final AtomicBoolean flag = new AtomicBoolean();

    @Benchmark
    @Group("pingpong")
    public void ping(Control cnt) {while (!cnt.stopMeasurement && !flag.compareAndSet(false, true)) {// this body is intentionally left blank}
    }

    @Benchmark
    @Group("pingpong")
    public void pong(Control cnt) {while (!cnt.stopMeasurement && !flag.compareAndSet(true, false)) {// this body is intentionally left blank}
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_18_Control.class.getSimpleName())
                .threads(2)
                .forks(1)
                .build();

        new Runner(opt).run();}

}

本样例介绍了一个试验性质的工具类 Control,其用处次要是为了在条件执行的状况下可能进行基准测试方法执行,如果基准办法不进行整个测试将不会完结。下面的例子中,在同一个 group 内两个办法别离执行 cas 操作,若果没有 Control 的染指,在测试进行时,其中一个办法将会陷入死循环。

19 N/A

样例被删除了?

20 Annotations

public class JMHSample_20_Annotations {

    double x1 = Math.PI;

    /*
     * In addition to all the command line options usable at run time,
     * we have the annotations which can provide the reasonable defaults
     * for the some of the benchmarks. This is very useful when you are
     * dealing with lots of benchmarks, and some of them require
     * special treatment.
     *
     * Annotation can also be placed on class, to have the effect over
     * all the benchmark methods in the same class. The rule is, the
     * annotation in the closest scope takes the precedence: i.e.
     * the method-based annotation overrides class-based annotation,
     * etc.
     */

    @Benchmark
    @Warmup(iterations = 5, time = 100, timeUnit = TimeUnit.MILLISECONDS)
    @Measurement(iterations = 5, time = 100, timeUnit = TimeUnit.MILLISECONDS)
    public double measure() {return Math.log(x1);
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_20_Annotations.class.getSimpleName())
                .build();

        new Runner(opt).run();}

}

JMH 不仅反对在运行时应用 Options 对象来配置执行参数,同样也反对应用注解来进行配置,包含 @Measurement、@Warmup 等等,大部分配置参数都可能找到对应的注解。

21 Consume CPU

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_21_ConsumeCPU {

    /*
     * At times you require the test to burn some of the cycles doing nothing.
     * In many cases, you *do* want to burn the cycles instead of waiting.
     *
     * For these occasions, we have the infrastructure support. Blackholes
     * can not only consume the values, but also the time! Run this test
     * to get familiar with this part of JMH.
     *
     * (Note we use static method because most of the use cases are deep
     * within the testing code, and propagating blackholes is tedious).
     */

    @Benchmark
    public void consume_0000() {Blackhole.consumeCPU(0);
    }

    @Benchmark
    public void consume_0001() {Blackhole.consumeCPU(1);
    }

    @Benchmark
    public void consume_0002() {Blackhole.consumeCPU(2);
    }

    @Benchmark
    public void consume_0004() {Blackhole.consumeCPU(4);
    }

    @Benchmark
    public void consume_0008() {Blackhole.consumeCPU(8);
    }

    @Benchmark
    public void consume_0016() {Blackhole.consumeCPU(16);
    }

    @Benchmark
    public void consume_0032() {Blackhole.consumeCPU(32);
    }

    @Benchmark
    public void consume_0064() {Blackhole.consumeCPU(64);
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_21_ConsumeCPU.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();}

}

介绍了一种“空转”的办法,有时候可能就是须要消耗掉一部分性能,能够应用 Blockhole 的静态方法来疾速实现这个目标。

22 False Sharing

@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(5)
public class JMHSample_22_FalseSharing {

    /*
     * Suppose we have two threads:
     *   a) innocuous reader which blindly reads its own field
     *   b) furious writer which updates its own field
     */

    /*
     * BASELINE EXPERIMENT:
     * Because of the false sharing, both reader and writer will experience
     * penalties.
     */

    @State(Scope.Group)
    public static class StateBaseline {
        int readOnly;
        int writeOnly;
    }

    @Benchmark
    @Group("baseline")
    public int reader(StateBaseline s) {return s.readOnly;}

    @Benchmark
    @Group("baseline")
    public void writer(StateBaseline s) {s.writeOnly++;}

    /*
     * APPROACH 1: PADDING
     *
     * We can try to alleviate some of the effects with padding.
     * This is not versatile because JVMs can freely rearrange the
     * field order, even of the same type.
     */

    @State(Scope.Group)
    public static class StatePadded {
        int readOnly;
        int p01, p02, p03, p04, p05, p06, p07, p08;
        int p11, p12, p13, p14, p15, p16, p17, p18;
        int writeOnly;
        int q01, q02, q03, q04, q05, q06, q07, q08;
        int q11, q12, q13, q14, q15, q16, q17, q18;
    }

    @Benchmark
    @Group("padded")
    public int reader(StatePadded s) {return s.readOnly;}

    @Benchmark
    @Group("padded")
    public void writer(StatePadded s) {s.writeOnly++;}

    /*
     * APPROACH 2: CLASS HIERARCHY TRICK
     *
     * We can alleviate false sharing with this convoluted hierarchy trick,
     * using the fact that superclass fields are usually laid out first.
     * In this construction, the protected field will be squashed between
     * paddings.
     * It is important to use the smallest data type, so that layouter would
     * not generate any gaps that can be taken by later protected subclasses
     * fields. Depending on the actual field layout of classes that bear the
     * protected fields, we might need more padding to account for "lost"
     * padding fields pulled into in their superclass gaps.
     */

    public static class StateHierarchy_1 {int readOnly;}

    public static class StateHierarchy_2 extends StateHierarchy_1 {
        byte p01, p02, p03, p04, p05, p06, p07, p08;
        byte p11, p12, p13, p14, p15, p16, p17, p18;
        byte p21, p22, p23, p24, p25, p26, p27, p28;
        byte p31, p32, p33, p34, p35, p36, p37, p38;
        byte p41, p42, p43, p44, p45, p46, p47, p48;
        byte p51, p52, p53, p54, p55, p56, p57, p58;
        byte p61, p62, p63, p64, p65, p66, p67, p68;
        byte p71, p72, p73, p74, p75, p76, p77, p78;
    }

    public static class StateHierarchy_3 extends StateHierarchy_2 {int writeOnly;}

    public static class StateHierarchy_4 extends StateHierarchy_3 {
        byte q01, q02, q03, q04, q05, q06, q07, q08;
        byte q11, q12, q13, q14, q15, q16, q17, q18;
        byte q21, q22, q23, q24, q25, q26, q27, q28;
        byte q31, q32, q33, q34, q35, q36, q37, q38;
        byte q41, q42, q43, q44, q45, q46, q47, q48;
        byte q51, q52, q53, q54, q55, q56, q57, q58;
        byte q61, q62, q63, q64, q65, q66, q67, q68;
        byte q71, q72, q73, q74, q75, q76, q77, q78;
    }

    @State(Scope.Group)
    public static class StateHierarchy extends StateHierarchy_4 { }

    @Benchmark
    @Group("hierarchy")
    public int reader(StateHierarchy s) {return s.readOnly;}

    @Benchmark
    @Group("hierarchy")
    public void writer(StateHierarchy s) {s.writeOnly++;}

    /*
     * APPROACH 3: ARRAY TRICK
     *
     * This trick relies on the contiguous allocation of an array.
     * Instead of placing the fields in the class, we mangle them
     * into the array at very sparse offsets.
     */

    @State(Scope.Group)
    public static class StateArray {int[] arr = new int[128];
    }

    @Benchmark
    @Group("sparse")
    public int reader(StateArray s) {return s.arr[0];
    }

    @Benchmark
    @Group("sparse")
    public void writer(StateArray s) {s.arr[64]++;
    }

    /*
     * APPROACH 4:
     *
     * @Contended (since JDK 8):
     *  Uncomment the annotation if building with JDK 8.
     *  Remember to flip -XX:-RestrictContended to enable.
     */

    @State(Scope.Group)
    public static class StateContended {
        int readOnly;

//        @sun.misc.Contended
        int writeOnly;
    }

    @Benchmark
    @Group("contended")
    public int reader(StateContended s) {return s.readOnly;}

    @Benchmark
    @Group("contended")
    public void writer(StateContended s) {s.writeOnly++;}

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_22_FalseSharing.class.getSimpleName())
                .threads(Runtime.getRuntime().availableProcessors())
                .build();

        new Runner(opt).run();}

}

伪共享是并发编程中常见的问题,缓存零碎中是以缓存行(cache line)为单位存储的,当多线程批改相互独立的变量时,如果这些变量共享同一个缓存行会因为缓存生效导致性能降落。这个问题在微基准测试中同样不能疏忽。这个样例给出了解决这个问题的几种办法:

  1. 字段填充:额定定义多个字段来填补缓存行
  2. 类继承:也是填充的一种,将多余字段定义在父类里
  3. 数组填充:定义一个较长的数组,无效数据的距离大于缓存行大小
  4. 注解:JDK 8 提供了 @Contended 注解来通知编译器被注解的字段须要填充

23 Aux Counters

@OutputTimeUnit(TimeUnit.SECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class JMHSample_23_AuxCounters {

    /*
     * In some weird cases you need to get the separate throughput/time
     * metrics for the benchmarked code depending on the outcome of the
     * current code. Trying to accommodate the cases like this, JMH optionally
     * provides the special annotation which treats @State objects
     * as the object bearing user counters. See @AuxCounters javadoc for
     * the limitations.
     */

    @State(Scope.Thread)
    @AuxCounters(AuxCounters.Type.OPERATIONS)
    public static class OpCounters {
        // These fields would be counted as metrics
        public int case1;
        public int case2;

        // This accessor will also produce a metric
        public int total() {return case1 + case2;}
    }

    @State(Scope.Thread)
    @AuxCounters(AuxCounters.Type.EVENTS)
    public static class EventCounters {
        // This field would be counted as metric
        public int wows;
    }

    /*
     * This code measures the "throughput" in two parts of the branch.
     * The @AuxCounters state above holds the counters which we increment
     * ourselves, and then let JMH to use their values in the performance
     * calculations.
     */

    @Benchmark
    public void splitBranch(OpCounters counters) {if (Math.random() < 0.1) {counters.case1++;} else {counters.case2++;}
    }

    @Benchmark
    public void runSETI(EventCounters counters) {float random = (float) Math.random();
        float wowSignal = (float) Math.PI / 4;
        if (random == wowSignal) {
            // WOW, that's unusual.
            counters.wows++;
        }
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_23_AuxCounters.class.getSimpleName())
                .build();

        new Runner(opt).run();}

}

辅助计数器,不是很常见,就间接翻译一下吧:在一些非凡的状况下,你须要依据以后代码执行的后果来辨别获取的吞吐量 / 工夫指标。为了应答这种状况,JMH 提供了非凡的正文,将 @State 对象视为承载用户计数器的对象。无关限度请参阅 @AuxCounters 的 javadoc。

24 Inheritance

public class JMHSample_24_Inheritance {

    /*
     * In very special circumstances, you might want to provide the benchmark
     * body in the (abstract) superclass, and specialize it with the concrete
     * pieces in the subclasses.
     *
     * The rule of thumb is: if some class has @Benchmark method, then all the subclasses
     * are also having the "synthetic" @Benchmark method. The caveat is, because we only
     * know the type hierarchy during the compilation, it is only possible during
     * the same compilation session. That is, mixing in the subclass extending your
     * benchmark class *after* the JMH compilation would have no effect.
     *
     * Note how annotations now have two possible places. The closest annotation
     * in the hierarchy wins.
     */

    @BenchmarkMode(Mode.AverageTime)
    @Fork(1)
    @State(Scope.Thread)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    public static abstract class AbstractBenchmark {
        int x;

        @Setup
        public void setup() {x = 42;}

        @Benchmark
        @Warmup(iterations = 5, time = 100, timeUnit = TimeUnit.MILLISECONDS)
        @Measurement(iterations = 5, time = 100, timeUnit = TimeUnit.MILLISECONDS)
        public double bench() {return doWork() * doWork();}

        protected abstract double doWork();}

    public static class BenchmarkLog extends AbstractBenchmark {
        @Override
        protected double doWork() {return Math.log(x);
        }
    }

    public static class BenchmarkSin extends AbstractBenchmark {
        @Override
        protected double doWork() {return Math.sin(x);
        }
    }

    public static class BenchmarkCos extends AbstractBenchmark {
        @Override
        protected double doWork() {return Math.cos(x);
        }
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_24_Inheritance.class.getSimpleName())
                .build();

        new Runner(opt).run();}

}

JMH 容许应用继承,你能够在形象父类中应用注解来配置基准测试并且提供一些须要实现的形象办法。@Benchmark 这个注解是能够被继承的,所有子类都会具备父类的基准测试方法。值得注意的是,因为这是编译期能力晓得的关系,因而须要留神 JMH 编译阶段。此外,注解的失效规定是在继承树中最近的注解将会失效。

25 API GA

这个样例有些简单,不是很懂,先不谈了好吧。。

26 Batch Size

@State(Scope.Thread)
public class JMHSample_26_BatchSize {

    /*
     * Suppose we want to measure insertion in the middle of the list.
     */

    List<String> list = new LinkedList<>();

    @Benchmark
    @Warmup(iterations = 5, time = 1)
    @Measurement(iterations = 5, time = 1)
    @BenchmarkMode(Mode.AverageTime)
    public List<String> measureWrong_1() {list.add(list.size() / 2, "something");
        return list;
    }

    @Benchmark
    @Warmup(iterations = 5, time = 5)
    @Measurement(iterations = 5, time = 5)
    @BenchmarkMode(Mode.AverageTime)
    public List<String> measureWrong_5() {list.add(list.size() / 2, "something");
        return list;
    }

    /*
     * This is what you do with JMH.
     */
    @Benchmark
    @Warmup(iterations = 5, batchSize = 5000)
    @Measurement(iterations = 5, batchSize = 5000)
    @BenchmarkMode(Mode.SingleShotTime)
    public List<String> measureRight() {list.add(list.size() / 2, "something");
        return list;
    }

    @Setup(Level.Iteration)
    public void setup(){list.clear();
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_26_BatchSize.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();}

}

如果测试方法的执行效率并不是稳固的,即每次执行测试都存在较大的差异,在这种状况下以固定工夫范畴执行测试是不可行的,因而必须选用 Mode.SingleShotTime。然而与此同时只执行一次对于该操作来说无奈失去可信赖的测试后果,此时就能够抉择应用 batchSize 参数。

对于下面的例子来说,所做的事件是在测试在链表两头插入对象,这个操作受到链表长度的影响,因而效率不是稳固的。为了达到每次测试的执行环境等价,须要执行固定次数,所以对于 measureRight 这个正确基准测试方法的行为可形容为:迭代 5 轮,每轮执行一次,每次调用 5000 次测试方法。

27 Params

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
public class JMHSample_27_Params {

    /**
     * In many cases, the experiments require walking the configuration space
     * for a benchmark. This is needed for additional control, or investigating
     * how the workload performance changes with different settings.
     */

    @Param({"1", "31", "65", "101", "103"})
    public int arg;

    @Param({"0", "1", "2", "4", "8", "16", "32"})
    public int certainty;

    @Benchmark
    public boolean bench() {return BigInteger.valueOf(arg).isProbablePrime(certainty);
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_27_Params.class.getSimpleName())
//                .param("arg", "41", "42") // Use this to selectively constrain/override parameters
                .build();

        new Runner(opt).run();}

}

这个例子比拟好了解也很实用,很多时候须要比拟不同配置参数下的测试后果,JMH 也提供了多参数执行的能力,你能够通过 @Param 注解和 param 配置项来给出参数候选项。留神在有多个测试参数且都蕴含多个候选项的状况下,JMH 会执行所有参数的排列组合。

28 Blackhole Helpers

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Thread)
public class JMHSample_28_BlackholeHelpers {

    /**
     * Sometimes you need the black hole not in @Benchmark method, but in
     * helper methods, because you want to pass it through to the concrete
     * implementation which is instantiated in helper methods. In this case,
     * you can request the black hole straight in the helper method signature.
     * This applies to both @Setup and @TearDown methods, and also to other
     * JMH infrastructure objects, like Control.
     *
     * Below is the variant of {@link org.openjdk.jmh.samples.JMHSample_08_DeadCode}
     * test, but wrapped in the anonymous classes.
     */

    public interface Worker {void work();
    }

    private Worker workerBaseline;
    private Worker workerRight;
    private Worker workerWrong;

    @Setup
    public void setup(final Blackhole bh) {workerBaseline = new Worker() {
            double x;

            @Override
            public void work() {// do nothing}
        };

        workerWrong = new Worker() {
            double x;

            @Override
            public void work() {Math.log(x);
            }
        };

        workerRight = new Worker() {
            double x;

            @Override
            public void work() {bh.consume(Math.log(x));
            }
        };

    }

    @Benchmark
    public void baseline() {workerBaseline.work();
    }

    @Benchmark
    public void measureWrong() {workerWrong.work();
    }

    @Benchmark
    public void measureRight() {workerRight.work();
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_28_BlackholeHelpers.class.getSimpleName())
                .build();

        new Runner(opt).run();}

}

这个样例表明你能够在一些辅助办法中应用、保留 Blackhole 对象。在样例中,Setup 办法的办法参数带有 Blackhole,并以此对接口进行了不同的实现。这种注入能力对于一些其余 JMH 根底工具同样实用,比方 Control。

29 States DAG

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Thread)
public class JMHSample_29_StatesDAG {

    /**
     * WARNING:
     * THIS IS AN EXPERIMENTAL FEATURE, BE READY FOR IT BECOME REMOVED WITHOUT NOTICE!
     */

    /*
     * This is a model case, and it might not be a good benchmark.
     * // TODO: Replace it with the benchmark which does something useful.
     */

    public static class Counter {
        int x;

        public int inc() {return x++;}

        public void dispose() {// pretend this is something really useful}
    }

    /*
     * Shared state maintains the set of Counters, and worker threads should
     * poll their own instances of Counter to work with. However, it should only
     * be done once, and therefore, Local state caches it after requesting the
     * counter from Shared state.
     */

    @State(Scope.Benchmark)
    public static class Shared {
        List<Counter> all;
        Queue<Counter> available;

        @Setup
        public synchronized void setup() {all = new ArrayList<>();
            for (int c = 0; c < 10; c++) {all.add(new Counter());
            }

            available = new LinkedList<>();
            available.addAll(all);
        }

        @TearDown
        public synchronized void tearDown() {for (Counter c : all) {c.dispose();
            }
        }

        public synchronized Counter getMine() {return available.poll();
        }
    }

    @State(Scope.Thread)
    public static class Local {
        Counter cnt;

        @Setup
        public void setup(Shared shared) {cnt = shared.getMine();
        }
    }

    @Benchmark
    public int test(Local local) {return local.cnt.inc();
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_29_StatesDAG.class.getSimpleName())
                .build();

        new Runner(opt).run();}


}

本例形容的是 State 对象存在依赖关系的状况,JMH 容许各个 State 对象存在 DAG(有向无环图)模式的依赖关系。在例子中 Thread Scope 的 Local 对象依赖 Benchmark Scope 的 Shared 对象,每个 Local 对象都会从 Shared 对象的队列成员中取出专属的 Counter。这是个试验性质的个性,不是很罕用,简略理解即可。

30 Interrupts

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Group)
public class JMHSample_30_Interrupts {

    /*
     * In this example, we want to measure the simple performance characteristics
     * of the ArrayBlockingQueue. Unfortunately, doing that without a harness
     * support will deadlock one of the threads, because the executions of
     * take/put are not paired perfectly. Fortunately for us, both methods react
     * to interrupts well, and therefore we can rely on JMH to terminate the
     * measurement for us. JMH will notify users about the interrupt actions
     * nevertheless, so users can see if those interrupts affected the measurement.
     * JMH will start issuing interrupts after the default or user-specified timeout
     * had been reached.
     *
     * This is a variant of org.openjdk.jmh.samples.JMHSample_18_Control, but without
     * the explicit control objects. This example is suitable for the methods which
     * react to interrupts gracefully.
     */

    private BlockingQueue<Integer> q;

    @Setup
    public void setup() {q = new ArrayBlockingQueue<>(1);
    }

    @Group("Q")
    @Benchmark
    public Integer take() throws InterruptedException {return q.take();
    }

    @Group("Q")
    @Benchmark
    public void put() throws InterruptedException {q.put(42);
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_30_Interrupts.class.getSimpleName())
                .threads(2)
                .forks(5)
                .timeout(TimeValue.seconds(10))
                .build();

        new Runner(opt).run();}

}

JMH 可能给 Benchmark 办法设值超时工夫,在超时后被动 interrupt 办法调用。下面的例子与样例 18 相似然而没有 Control 对象来管制,因而在测试进入进行阶段时会有某个办法 block 住。JMH 会在默认或设置的超时工夫达到时进行打断并提醒用户进行了打断操作,不便用户判断打断是否影响测试后果。

31 Infra Params

@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class JMHSample_31_InfraParams {

    /*
     * There is a way to query JMH about the current running mode. This is
     * possible with three infrastructure objects we can request to be injected:
     *   - BenchmarkParams: covers the benchmark-global configuration
     *   - IterationParams: covers the current iteration configuration
     *   - ThreadParams: covers the specifics about threading
     *
     * Suppose we want to check how the ConcurrentHashMap scales under different
     * parallelism levels. We can put concurrencyLevel in @Param, but it sometimes
     * inconvenient if, say, we want it to follow the @Threads count. Here is
     * how we can query JMH about how many threads was requested for the current run,
     * and put that into concurrencyLevel argument for CHM constructor.
     */

    static final int THREAD_SLICE = 1000;

    private ConcurrentHashMap<String, String> mapSingle;
    private ConcurrentHashMap<String, String> mapFollowThreads;

    @Setup
    public void setup(BenchmarkParams params) {int capacity = 16 * THREAD_SLICE * params.getThreads();
        mapSingle        = new ConcurrentHashMap<>(capacity, 0.75f, 1);
        mapFollowThreads = new ConcurrentHashMap<>(capacity, 0.75f, params.getThreads());
    }

    /*
     * Here is another neat trick. Generate the distinct set of keys for all threads:
     */

    @State(Scope.Thread)
    public static class Ids {
        private List<String> ids;

        @Setup
        public void setup(ThreadParams threads) {ids = new ArrayList<>();
            for (int c = 0; c < THREAD_SLICE; c++) {ids.add("ID" + (THREAD_SLICE * threads.getThreadIndex() + c));
            }
        }
    }

    @Benchmark
    public void measureDefault(Ids ids) {for (String s : ids.ids) {mapSingle.remove(s);
            mapSingle.put(s, s);
        }
    }

    @Benchmark
    public void measureFollowThreads(Ids ids) {for (String s : ids.ids) {mapFollowThreads.remove(s);
            mapFollowThreads.put(s, s);
        }
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_31_InfraParams.class.getSimpleName())
                .threads(4)
                .forks(5)
                .build();

        new Runner(opt).run();}

}

JMH 提供了一些可能在运行时查问以后配置的工具类,不便在代码逻辑中依据配置进行操作。次要包含三个参数对象:BenchmarkParams、IterationParams、ThreadParams,这个应该无需多解释,字面意思。

32 Bulk Warmup

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_32_BulkWarmup {

    /*
     * This is an addendum to JMHSample_12_Forking test.
     *
     * Sometimes you want an opposite configuration: instead of separating the profiles
     * for different benchmarks, you want to mix them together to test the worst-case
     * scenario.
     *
     * JMH has a bulk warmup feature for that: it does the warmups for all the tests
     * first, and then measures them. JMH still forks the JVM for each test, but once the
     * new JVM has started, all the warmups are being run there, before running the
     * measurement. This helps to dodge the type profile skews, as each test is still
     * executed in a different JVM, and we only "mix" the warmup code we want.
     */

    /*
     * These test classes are borrowed verbatim from JMHSample_12_Forking.
     */

    public interface Counter {int inc();
    }

    public static class Counter1 implements Counter {
        private int x;

        @Override
        public int inc() {return x++;}
    }

    public static class Counter2 implements Counter {
        private int x;

        @Override
        public int inc() {return x++;}
    }

    Counter c1 = new Counter1();
    Counter c2 = new Counter2();

    /*
     * And this is our test payload. Notice we have to break the inlining of the payload,
     * so that in could not be inlined in either measure_c1() or measure_c2() below, and
     * specialized for that only call.
     */

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public int measure(Counter c) {
        int s = 0;
        for (int i = 0; i < 10; i++) {s += c.inc();
        }
        return s;
    }

    @Benchmark
    public int measure_c1() {return measure(c1);
    }

    @Benchmark
    public int measure_c2() {return measure(c2);
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_32_BulkWarmup.class.getSimpleName())
                // .includeWarmup(...) <-- this may include other benchmarks into warmup
                .warmupMode(WarmupMode.BULK) // see other WarmupMode.* as well
                .forks(1)
                .build();

        new Runner(opt).run();}

}

这是对样例 12 的补充,在样例 12 中咱们晓得为了不影响 JVM 的 PGO 优化,JMH 会默认 Fork 过程使每个基准测试方法在独立的 JVM 中预热、执行。然而也有可能用户就是想测试在混淆执行的状况下的执行状况,此时能够通过设置 warmupMode 为 WarmupMode.BULK 来管制 JMH 运行所有办法的预热后再执行相干基准测试方法。留神 JMH 依然会为每个办法 Fork 过程,只是每个过程开始执行时的预热行为产生了扭转。

33 Security Manager

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_33_SecurityManager {

    /*
     * Some targeted tests may care about SecurityManager being installed.
     * Since JMH itself needs to do privileged actions, it is not enough
     * to blindly install the SecurityManager, as JMH infrastructure will fail.
     */

    /*
     * In this example, we want to measure the performance of System.getProperty
     * with SecurityManager installed or not. To do this, we have two state classes
     * with helper methods. One that reads the default JMH security policy (we ship one
     * with JMH), and installs the security manager; another one that makes sure
     * the SecurityManager is not installed.
     *
     * If you need a restricted security policy for the tests, you are advised to
     * get /jmh-security-minimal.policy, that contains the minimal permissions
     * required for JMH benchmark to run, merge the new permissions there, produce new
     * policy file in a temporary location, and load that policy file instead.
     * There is also /jmh-security-minimal-runner.policy, that contains the minimal
     * permissions for the JMH harness to run, if you want to use JVM args to arm
     * the SecurityManager.
     */

    @State(Scope.Benchmark)
    public static class SecurityManagerInstalled {
        @Setup
        public void setup() throws IOException, NoSuchAlgorithmException, URISyntaxException {URI policyFile = JMHSample_33_SecurityManager.class.getResource("/jmh-security.policy").toURI();
            Policy.setPolicy(Policy.getInstance("JavaPolicy", new URIParameter(policyFile)));
            System.setSecurityManager(new SecurityManager());
        }

        @TearDown
        public void tearDown() {System.setSecurityManager(null);
        }
    }

    @State(Scope.Benchmark)
    public static class SecurityManagerEmpty {
        @Setup
        public void setup() throws IOException, NoSuchAlgorithmException, URISyntaxException {System.setSecurityManager(null);
        }
    }

    @Benchmark
    public String testWithSM(SecurityManagerInstalled s) throws InterruptedException {return System.getProperty("java.home");
    }

    @Benchmark
    public String testWithoutSM(SecurityManagerEmpty s) throws InterruptedException {return System.getProperty("java.home");
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_33_SecurityManager.class.getSimpleName())
                .warmupIterations(5)
                .measurementIterations(5)
                .forks(1)
                .build();

        new Runner(opt).run();}

}

这个样例是对于平安方面的阐明,Java 平安次要依附 Security Manager。样例给出了指定安全策略以及无平安治理的测试比照形式。

34 Safe Looping

@State(Scope.Thread)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_34_SafeLooping {

    /*
     * JMHSample_11_Loops warns about the dangers of using loops in @Benchmark methods.
     * Sometimes, however, one needs to traverse through several elements in a dataset.
     * This is hard to do without loops, and therefore we need to devise a scheme for
     * safe looping.
     */

    /*
     * Suppose we want to measure how much it takes to execute work() with different
     * arguments. This mimics a frequent use case when multiple instances with the same
     * implementation, but different data, is measured.
     */

    static final int BASE = 42;

    static int work(int x) {return BASE + x;}

    /*
     * Every benchmark requires control. We do a trivial control for our benchmarks
     * by checking the benchmark costs are growing linearly with increased task size.
     * If it doesn't, then something wrong is happening.
     */

    @Param({"1", "10", "100", "1000"})
    int size;

    int[] xs;

    @Setup
    public void setup() {xs = new int[size];
        for (int c = 0; c < size; c++) {xs = c;}
    }

    /*
     * First, the obviously wrong way: "saving" the result into a local variable would not
     * work. A sufficiently smart compiler will inline work(), and figure out only the last
     * work() call needs to be evaluated. Indeed, if you run it with varying $size, the score
     * will stay the same!
     */

    @Benchmark
    public int measureWrong_1() {
        int acc = 0;
        for (int x : xs) {acc = work(x);
        }
        return acc;
    }

    /*
     * Second, another wrong way: "accumulating" the result into a local variable. While
     * it would force the computation of each work() method, there are software pipelining
     * effects in action, that can merge the operations between two otherwise distinct work()
     * bodies. This will obliterate the benchmark setup.
     *
     * In this example, HotSpot does the unrolled loop, merges the $BASE operands into a single
     * addition to $acc, and then does a bunch of very tight stores of $x-s. The final performance
     * depends on how much of the loop unrolling happened *and* how much data is available to make
     * the large strides.
     */

    @Benchmark
    public int measureWrong_2() {
        int acc = 0;
        for (int x : xs) {acc += work(x);
        }
        return acc;
    }

    /*
     * Now, let's see how to measure these things properly. A very straight-forward way to
     * break the merging is to sink each result to Blackhole. This will force runtime to compute
     * every work() call in full. (We would normally like to care about several concurrent work()
     * computations at once, but the memory effects from Blackhole.consume() prevent those optimization
     * on most runtimes).
     */

    @Benchmark
    public void measureRight_1(Blackhole bh) {for (int x : xs) {bh.consume(work(x));
        }
    }

    /*
     * DANGEROUS AREA, PLEASE READ THE DESCRIPTION BELOW.
     *
     * Sometimes, the cost of sinking the value into a Blackhole is dominating the nano-benchmark score.
     * In these cases, one may try to do a make-shift "sinker" with non-inlineable method. This trick is
     * *very* VM-specific, and can only be used if you are verifying the generated code (that's a good
     * strategy when dealing with nano-benchmarks anyway).
     *
     * You SHOULD NOT use this trick in most cases. Apply only where needed.
     */

    @Benchmark
    public void measureRight_2() {for (int x : xs) {sink(work(x));
        }
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public static void sink(int v) {
        // IT IS VERY IMPORTANT TO MATCH THE SIGNATURE TO AVOID AUTOBOXING.
        // The method intentionally does nothing.
    }

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(JMHSample_34_SafeLooping.class.getSimpleName())
                .forks(3)
                .build();

        new Runner(opt).run();}

}

这个样例是对样例 11 的补充,通过样例 11 咱们晓得在编写测试方法时不应该手动执行循环而应该让 JMH 在办法调用层面进行操作,然而有时循环无奈防止,比方测试查询数据库后遍历获取的数据列表,此时循环是测试方法不可拆散的一部分。针对这种状况,上述示例代码给出了谬误和正确的解决办法。首先直白的循环肯定是谬误的,JVM 会执行内联、推断、简化等各种操作使得代码块“生效”,最不便的解决形式是在循环内应用 Blackhole 对象,然而如果 Blackhole 的办法调用占据了基准测试的大部分工夫那也无奈失去正确的测试后果,此时能够思考定义一个阻止内联的空办法来代替 Blackhole,然而这个操作十分 vm-specific,只有必要的时候才应该应用,请具体浏览下面代码中的相干正文阐明。

35 Profilers

JMH 提供了一些十分不便的分析器,能够帮忙用户理解基准测试的细节信息。尽管这些分析器不能代替成熟的内部分析器,但在许多状况下它们能够不便疾速地深入研究基准行为。当你在对基准代码自身进行一直地调整时,疾速取得后果十分重要。这个例子中给出了许多分析器的执行后果阐明,示例内容比拟长,请间接在 Github 中查看。

36 Branch Prediction

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(5)
@State(Scope.Benchmark)
public class JMHSample_36_BranchPrediction {

    /*
     * This sample serves as a warning against regular data sets.
     *
     * It is very tempting to present a regular data set to benchmark, either due to
     * naive generation strategy, or just from feeling better about regular data sets.
     * Unfortunately, it frequently backfires: the regular datasets are known to be
     * optimized well by software and hardware. This example exploits one of these
     * optimizations: branch prediction.
     *
     * Imagine our benchmark selects the branch based on the array contents, as
     * we are streaming through it:
     */

    private static final int COUNT = 1024 * 1024;

    private byte[] sorted;
    private byte[] unsorted;

    @Setup
    public void setup() {sorted = new byte[COUNT];
        unsorted = new byte[COUNT];
        Random random = new Random(1234);
        random.nextBytes(sorted);
        random.nextBytes(unsorted);
        Arrays.sort(sorted);
    }

    @Benchmark
    @OperationsPerInvocation(COUNT)
    public void sorted(Blackhole bh1, Blackhole bh2) {for (byte v : sorted) {if (v > 0) {bh1.consume(v);
            } else {bh2.consume(v);
            }
        }
    }

    @Benchmark
    @OperationsPerInvocation(COUNT)
    public void unsorted(Blackhole bh1, Blackhole bh2) {for (byte v : unsorted) {if (v > 0) {bh1.consume(v);
            } else {bh2.consume(v);
            }
        }
    }

    /*
        There is a substantial difference in performance for these benchmarks!
        It is explained by good branch prediction in "sorted" case, and branch mispredicts in "unsorted"
        case. -prof perfnorm conveniently highlights that, with larger "branch-misses", and larger "CPI"
        for "unsorted" case:
        Benchmark                                                       Mode  Cnt   Score    Error  Units
        JMHSample_36_BranchPrediction.sorted                            avgt   25   2.160 ±  0.049  ns/op
        JMHSample_36_BranchPrediction.sorted:·CPI                       avgt    5   0.286 ±  0.025   #/op
        JMHSample_36_BranchPrediction.sorted:·branch-misses             avgt    5  ≈ 10⁻⁴            #/op
        JMHSample_36_BranchPrediction.sorted:·branches                  avgt    5   7.606 ±  1.742   #/op
        JMHSample_36_BranchPrediction.sorted:·cycles                    avgt    5   8.998 ±  1.081   #/op
        JMHSample_36_BranchPrediction.sorted:·instructions              avgt    5  31.442 ±  4.899   #/op
        JMHSample_36_BranchPrediction.unsorted                          avgt   25   5.943 ±  0.018  ns/op
        JMHSample_36_BranchPrediction.unsorted:·CPI                     avgt    5   0.775 ±  0.052   #/op
        JMHSample_36_BranchPrediction.unsorted:·branch-misses           avgt    5   0.529 ±  0.026   #/op  <--- OOPS
        JMHSample_36_BranchPrediction.unsorted:·branches                avgt    5   7.841 ±  0.046   #/op
        JMHSample_36_BranchPrediction.unsorted:·cycles                  avgt    5  24.793 ±  0.434   #/op
        JMHSample_36_BranchPrediction.unsorted:·instructions            avgt    5  31.994 ±  2.342   #/op
        It is an open question if you want to measure only one of these tests. In many cases, you have to measure
        both to get the proper best-case and worst-case estimate!
     */

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(".*" + JMHSample_36_BranchPrediction.class.getSimpleName() + ".*")
                .build();

        new Runner(opt).run();}

}

这个样例表述的内容与 JVM 的一个优化性能相干:分支预测,产生这种类型的问题次要是由规整的数据集造成。在编写代码时很有可能因为简略的生成规定或者代码美感偏差之类的起因导致数据十分规整,但这恰好会事与愿违。家喻户晓,规定的数据集能够被软件或硬件良好地优化,而分支预测正是其中一种伎俩。

在代码例子中给出了两种 byte 数组数据汇合:乱序的和有序的,而后对其别离执行逻辑雷同的测试方法:循环数组依据元素是否大于 0 来执行对应的代码块。显然,排序的数组在正负分界点前后只会执行固定的代码块,这使得 JVM 能够利用这一点进行优化。

37 Cache Access

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(5)
@State(Scope.Benchmark)
public class JMHSample_37_CacheAccess {

    /*
     * This sample serves as a warning against subtle differences in cache access patterns.
     *
     * Many performance differences may be explained by the way tests are accessing memory.
     * In the example below, we walk the matrix either row-first, or col-first:
     */

    private final static int COUNT = 4096;
    private final static int MATRIX_SIZE = COUNT * COUNT;

    private int[][] matrix;

    @Setup
    public void setup() {matrix = new int[COUNT][COUNT];
        Random random = new Random(1234);
        for (int i = 0; i < COUNT; i++) {for (int j = 0; j < COUNT; j++) {matrix[i][j] = random.nextInt();}
        }
    }

    @Benchmark
    @OperationsPerInvocation(MATRIX_SIZE)
    public void colFirst(Blackhole bh) {for (int c = 0; c < COUNT; c++) {for (int r = 0; r < COUNT; r++) {bh.consume(matrix[r]);
            }
        }
    }

    @Benchmark
    @OperationsPerInvocation(MATRIX_SIZE)
    public void rowFirst(Blackhole bh) {for (int r = 0; r < COUNT; r++) {for (int c = 0; c < COUNT; c++) {bh.consume(matrix[r]);
            }
        }
    }

    /*
        Notably, colFirst accesses are much slower, and that's not a surprise: Java's multidimensional
        arrays are actually rigged, being one-dimensional arrays of one-dimensional arrays. Therefore,
        pulling n-th element from each of the inner array induces more cache misses, when matrix is large.
        -prof perfnorm conveniently highlights that, with >2 cache misses per one benchmark op:
        Benchmark                                                 Mode  Cnt   Score    Error  Units
        JMHSample_37_MatrixCopy.colFirst                          avgt   25   5.306 ±  0.020  ns/op
        JMHSample_37_MatrixCopy.colFirst:·CPI                     avgt    5   0.621 ±  0.011   #/op
        JMHSample_37_MatrixCopy.colFirst:·L1-dcache-load-misses   avgt    5   2.177 ±  0.044   #/op <-- OOPS
        JMHSample_37_MatrixCopy.colFirst:·L1-dcache-loads         avgt    5  14.804 ±  0.261   #/op
        JMHSample_37_MatrixCopy.colFirst:·LLC-loads               avgt    5   2.165 ±  0.091   #/op
        JMHSample_37_MatrixCopy.colFirst:·cycles                  avgt    5  22.272 ±  0.372   #/op
        JMHSample_37_MatrixCopy.colFirst:·instructions            avgt    5  35.888 ±  1.215   #/op
        JMHSample_37_MatrixCopy.rowFirst                          avgt   25   2.662 ±  0.003  ns/op
        JMHSample_37_MatrixCopy.rowFirst:·CPI                     avgt    5   0.312 ±  0.003   #/op
        JMHSample_37_MatrixCopy.rowFirst:·L1-dcache-load-misses   avgt    5   0.066 ±  0.001   #/op
        JMHSample_37_MatrixCopy.rowFirst:·L1-dcache-loads         avgt    5  14.570 ±  0.400   #/op
        JMHSample_37_MatrixCopy.rowFirst:·LLC-loads               avgt    5   0.002 ±  0.001   #/op
        JMHSample_37_MatrixCopy.rowFirst:·cycles                  avgt    5  11.046 ±  0.343   #/op
        JMHSample_37_MatrixCopy.rowFirst:·instructions            avgt    5  35.416 ±  1.248   #/op
        So, when comparing two different benchmarks, you have to follow up if the difference is caused
        by the memory locality issues.
     */

    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(".*" + JMHSample_37_CacheAccess.class.getSimpleName() + ".*")
                .build();

        new Runner(opt).run();}

}

这个样例探讨的并不是 JMH 自身,而是阐明缓存读取模式所带来的影响,很多时候性能上的差异都能够通过拜访内存形式的差异来解释。样例中应用遍历矩阵来阐明了这个观点,两个基准测试方法别离通过行优先和列优先的模式来遍历矩阵,失去的后果是列优先的遍历形式显著会更慢一些。

38 Per Invoke Setup

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(5)
public class JMHSample_38_PerInvokeSetup {

    /*
     * This example highlights the usual mistake in non-steady-state benchmarks.
     *
     * Suppose we want to test how long it takes to bubble sort an array. Naively,
     * we could make the test that populates an array with random (unsorted) values,
     * and calls sort on it over and over again:
     */

    private void bubbleSort(byte[] b) {
        boolean changed = true;
        while (changed) {
            changed = false;
            for (int c = 0; c < b.length - 1; c++) {if (b > b) {
                    byte t = b;
                    b = b;
                    b = t;
                    changed = true;
                }
            }
        }
    }

    // Could be an implicit State instead, but we are going to use it
    // as the dependency in one of the tests below
    @State(Scope.Benchmark)
    public static class Data {@Param({"1", "16", "256"})
        int count;

        byte[] arr;

        @Setup
        public void setup() {arr = new byte[count];
            Random random = new Random(1234);
            random.nextBytes(arr);
        }
    }

    @Benchmark
    public byte[] measureWrong(Data d) {bubbleSort(d.arr);
        return d.arr;
    }

    /*
     * The method above is subtly wrong: it sorts the random array on the first invocation
     * only. Every subsequent call will "sort" the already sorted array. With bubble sort,
     * that operation would be significantly faster!
     *
     * This is how we might *try* to measure it right by making a copy in Level.Invocation
     * setup. However, this is susceptible to the problems described in Level.Invocation
     * Javadocs, READ AND UNDERSTAND THOSE DOCS BEFORE USING THIS APPROACH.
     */

    @State(Scope.Thread)
    public static class DataCopy {byte[] copy;

        @Setup(Level.Invocation)
        public void setup2(Data d) {copy = Arrays.copyOf(d.arr, d.arr.length);
        }
    }

    @Benchmark
    public byte[] measureNeutral(DataCopy d) {bubbleSort(d.copy);
        return d.copy;
    }

    /*
     * In an overwhelming majority of cases, the only sensible thing to do is to suck up
     * the per-invocation setup costs into a benchmark itself. This work well in practice,
     * especially when the payload costs dominate the setup costs.
     */

    @Benchmark
    public byte[] measureRight(Data d) {byte[] c = Arrays.copyOf(d.arr, d.arr.length);
        bubbleSort(c);
        return c;
    }

    /*
        Benchmark                                   (count)  Mode  Cnt      Score     Error  Units
        JMHSample_38_PerInvokeSetup.measureWrong          1  avgt   25      2.408 ±   0.011  ns/op
        JMHSample_38_PerInvokeSetup.measureWrong         16  avgt   25      8.286 ±   0.023  ns/op
        JMHSample_38_PerInvokeSetup.measureWrong        256  avgt   25     73.405 ±   0.018  ns/op
        JMHSample_38_PerInvokeSetup.measureNeutral        1  avgt   25     15.835 ±   0.470  ns/op
        JMHSample_38_PerInvokeSetup.measureNeutral       16  avgt   25    112.552 ±   0.787  ns/op
        JMHSample_38_PerInvokeSetup.measureNeutral      256  avgt   25  58343.848 ± 991.202  ns/op
        JMHSample_38_PerInvokeSetup.measureRight          1  avgt   25      6.075 ±   0.018  ns/op
        JMHSample_38_PerInvokeSetup.measureRight         16  avgt   25    102.390 ±   0.676  ns/op
        JMHSample_38_PerInvokeSetup.measureRight        256  avgt   25  58812.411 ± 997.951  ns/op
        We can clearly see that "measureWrong" provides a very weird result: it "sorts" way too fast.
        "measureNeutral" is neither good or bad: while it prepares the data for each invocation correctly,
        the timing overheads are clearly visible. These overheads can be overwhelming, depending on
        the thread count and/or OS flavor.
     */


    public static void main(String[] args) throws RunnerException {Options opt = new OptionsBuilder()
                .include(".*" + JMHSample_38_PerInvokeSetup.class.getSimpleName() + ".*")
                .build();

        new Runner(opt).run();}

}

最初这个样例举了一个冒泡排序的例子,因为排序操作是对数组间接进行批改且冒泡排序受到数组自身程序的影响,因而在雷同环境下反复执行排序并不是稳固的操作。很显著 measureWrong 办法间接进行循环排序是谬误的;接下来依照直白的想法,用户通常会抉择采纳 Level.Invocation 级别的 Setup 操作来在每次办法调用之前对数组进行拷贝,这样操作逻辑上是没有问题的,然而 Level.Invocation 是一个须要小心应用的调用级别,你必须仔细阅读相干 javadoc 阐明,这在后面的样例中也有提到,JMH 并不举荐应用这种模式;最初给到的 measureRight 办法间接把数组拷贝放在了基准测试方法块外部,只管看起来不太好,然而在逻辑代码执行工夫占相对主导的状况下,这是通过实际得出的最佳实际。最初样例也给出了他们理论执行的后果比照。

正文完
 0