前两天逛博客的时候看到有集体写了一篇博客说ReentrantLock比synchronized慢,这就很违反我的认知了,具体看了他的博客和测试代码,发现了他测试的不谨严,并在评论中敌对地指出了他的问题,后果他间接把博客给删了 删了 了……

很多老一辈的程序猿对有synchronized有个 性能差 的刻板印象,而后竭力推崇应用java.util.concurrent包中的lock类,如果你诘问他们synchronized和lock实现性能差多少,预计没几个人能答出来。 说到这你是不是也很想晓得我的测试后果? synchronized与ReentrantLock所实现的性能差不多,用处也大幅度重合,索性咱们就来测测这二者的性能差别。

实测后果

测试平台:jdk11, MacBook Pro (13-inch, 2017) , jmh测试

测试代码如下:
public class LockTest {

private static Object lock = new Object();private static ReentrantLock reentrantLock = new ReentrantLock();private static long cnt = 0;@Benchmark@Measurement(iterations = 2)@Threads(10)@Fork(0)@Warmup(iterations = 5, time = 10)public void testWithoutLock(){    doSomething();}@Benchmark@Measurement(iterations = 2)@Threads(10)@Fork(0)@Warmup(iterations = 5, time = 10)public void testReentrantLock(){    reentrantLock.lock();    doSomething();    reentrantLock.unlock();}@Benchmark@Measurement(iterations = 2)@Threads(10)@Fork(0)@Warmup(iterations = 5, time = 10)public void testSynchronized(){    synchronized (lock) {        doSomething();    }}private void doSomething() {    cnt += 1;    if (cnt >= (Long.MAX_VALUE >> 1)) {        cnt = 0;    }}public static void main(String[] args) {    Options options = new OptionsBuilder().include(LockTest.class.getSimpleName()).build();    try {        new Runner(options).run();    } catch (Exception e) {    } finally {    }}

}

Benchmark                    Mode  Cnt          Score   Error  UnitsLockTest.testReentrantLock  thrpt    2   32283819.289          ops/sLockTest.testSynchronized   thrpt    2   25325244.320          ops/sLockTest.testWithoutLock    thrpt    2  641215542.492          ops/s

没错synchronized性能的确更差,但就只差20%左右,第一次测试的时候我也挺惊讶的,晓得synchronized会差,但那种预期中几个数量级的差别却没有呈现。 于是我又把@Threads线程数调大了,减少了多线程之间竞争的可能性,失去了如下的后果。

Benchmark                    Mode  Cnt          Score   Error  UnitsLockTest.testReentrantLock  thrpt    2   29464798.051          ops/sLockTest.testSynchronized   thrpt    2   22346035.066          ops/sLockTest.testWithoutLock    thrpt    2  383047064.795          ops/s

性能差别稍有拉开,但还是在同一量级上。

论断

半信半疑,synchronized的性能的确要比synchronized差个20%-30%,那是不是代码中所有用到synchronized的中央都应该换成lock? 非也,认真想想看,ReentrantLock简直和能够代替任何应用synchronized的场景,而且性能更好,那为什么jdk始终要留着这个关键词呢?而且齐全没有任何想要废除它的想法。

黑格尔说过存在即正当, synchronized因多线程应运而生,它的存在也大幅度简化了Java多线程的开发。没错,它的劣势就是应用简略,你不须要显示去加减锁,相比之下ReentrantLock的应用就繁琐的多了,你加完锁之后还得思考到各种状况下的锁开释,稍不留神就一个bug埋下了。

但ReentrantLock的繁琐之下,它也提供了更简单的api,足以应答更多更简单的需要,具体能够参考我之前的博客ReentrantLock源码解析。

现在synchronized与ReentrantLock二者的性能差别不再是选谁的次要因素,你在做抉择的时候更应该思考的是其易用性、功能性和代码的可维护性…… 二者30%的性能差别决定不了什么,如果你真想优化代码的性能,你应该抉择的是其余的切入点,而不是宽宏大量这个,切记不要拣了芝麻丢了西瓜。

文章本该到这里就完结了,但我依然好奇为什么synchronized给老一辈java程序猿留下了性能差的印象,无奈jdk1.5及之前的材料曾经比拟长远 不太好找,然而jdk1.6对synchronized的性能晋升做了啥还是很好找的。

jdk对synchronized优化了啥?

如果你对代码段加了synchronized的,jvm编译后就会在其前后别离插入monitorenter和monitorexit指令,如下:

void onlyMe(Foo f) {    synchronized(f) {        doSomething();    }}

编译后:

Method void onlyMe(Foo)0   aload_1             // Push f1   dup                 // Duplicate it on the stack2   astore_2            // Store duplicate in local variable 23   monitorenter        // Enter the monitor associated with f4   aload_0             // Holding the monitor, pass this and...5   invokevirtual #5    // ...call Example.doSomething()V8   aload_2             // Push local variable 2 (f)9   monitorexit         // Exit the monitor associated with f10  goto 18             // Complete the method normally13  astore_3            // In case of any throw, end up here14  aload_2             // Push local variable 2 (f)15  monitorexit         // Be sure to exit the monitor!16  aload_3             // Push thrown value...17  athrow              // ...and rethrow value to the invoker18  return              // Return in the normal caseException table:From    To      Target      Type4       10      13          any13      16      13          any

加锁和开释锁的性能耗费其实就体现在了 monitorenter和monitorexit两个指令上了,如果是优化性能,必定也是在这两个指令上优化了。 查阅《Java并发编程的艺术》发现,Java6为了缩小锁获取和开释带来的性能耗费,引入了锁分级的策略。 将锁状态别离分成 无锁、偏差锁、轻量级锁、重量级锁 四个状态,其性能顺次递加。但所幸因为局部性的存在,大多数并发状况下偏差锁或者轻量级锁就能满足咱们的需要,而且锁只有在竞争重大的状况下才会降级,所以大多数状况下synchronized性能也不会太差。

最初我在jdk11u的源码里找到了monitorenter和monitorexit的x86版本的实现(汇编指令和具体平台相干)献给大家,欢送有志之士研读下。

//-----------------------------------------------------------------------------// Synchronization//// Note: monitorenter & exit are symmetric routines; which is reflected//       in the assembly code structure as well//// Stack layout://// [expressions  ] <--- rsp               = expression stack top// ..// [expressions  ]// [monitor entry] <--- monitor block top = expression stack bot// ..// [monitor entry]// [frame data   ] <--- monitor block bot// ...// [saved rbp    ] <--- rbpvoid TemplateTable::monitorenter() {  transition(atos, vtos);  // check for NULL object  __ null_check(rax);  const Address monitor_block_top(        rbp, frame::interpreter_frame_monitor_block_top_offset * wordSize);  const Address monitor_block_bot(        rbp, frame::interpreter_frame_initial_sp_offset * wordSize);  const int entry_size = frame::interpreter_frame_monitor_size() * wordSize;  Label allocated;  Register rtop = LP64_ONLY(c_rarg3) NOT_LP64(rcx);  Register rbot = LP64_ONLY(c_rarg2) NOT_LP64(rbx);  Register rmon = LP64_ONLY(c_rarg1) NOT_LP64(rdx);  // initialize entry pointer  __ xorl(rmon, rmon); // points to free slot or NULL  // find a free slot in the monitor block (result in rmon)  {    Label entry, loop, exit;    __ movptr(rtop, monitor_block_top); // points to current entry,                                        // starting with top-most entry    __ lea(rbot, monitor_block_bot);    // points to word before bottom                                        // of monitor block    __ jmpb(entry);    __ bind(loop);    // check if current entry is used    __ cmpptr(Address(rtop, BasicObjectLock::obj_offset_in_bytes()), (int32_t) NULL_WORD);    // if not used then remember entry in rmon    __ cmovptr(Assembler::equal, rmon, rtop);   // cmov => cmovptr    // check if current entry is for same object    __ cmpptr(rax, Address(rtop, BasicObjectLock::obj_offset_in_bytes()));    // if same object then stop searching    __ jccb(Assembler::equal, exit);    // otherwise advance to next entry    __ addptr(rtop, entry_size);    __ bind(entry);    // check if bottom reached    __ cmpptr(rtop, rbot);    // if not at bottom then check this entry    __ jcc(Assembler::notEqual, loop);    __ bind(exit);  }  __ testptr(rmon, rmon); // check if a slot has been found  __ jcc(Assembler::notZero, allocated); // if found, continue with that one  // allocate one if there's no free slot  {    Label entry, loop;    // 1. compute new pointers          // rsp: old expression stack top    __ movptr(rmon, monitor_block_bot); // rmon: old expression stack bottom    __ subptr(rsp, entry_size);         // move expression stack top    __ subptr(rmon, entry_size);        // move expression stack bottom    __ mov(rtop, rsp);                  // set start value for copy loop    __ movptr(monitor_block_bot, rmon); // set new monitor block bottom    __ jmp(entry);    // 2. move expression stack contents    __ bind(loop);    __ movptr(rbot, Address(rtop, entry_size)); // load expression stack                                                // word from old location    __ movptr(Address(rtop, 0), rbot);          // and store it at new location    __ addptr(rtop, wordSize);                  // advance to next word    __ bind(entry);    __ cmpptr(rtop, rmon);                      // check if bottom reached    __ jcc(Assembler::notEqual, loop);          // if not at bottom then                                                // copy next word  }  // call run-time routine  // rmon: points to monitor entry  __ bind(allocated);  // Increment bcp to point to the next bytecode, so exception  // handling for async. exceptions work correctly.  // The object has already been poped from the stack, so the  // expression stack looks correct.  __ increment(rbcp);  // store object  __ movptr(Address(rmon, BasicObjectLock::obj_offset_in_bytes()), rax);  __ lock_object(rmon);  // check to make sure this monitor doesn't cause stack overflow after locking  __ save_bcp();  // in case of exception  __ generate_stack_overflow_check(0);  // The bcp has already been incremented. Just need to dispatch to  // next instruction.  __ dispatch_next(vtos);}void TemplateTable::monitorexit() {  transition(atos, vtos);  // check for NULL object  __ null_check(rax);  const Address monitor_block_top(        rbp, frame::interpreter_frame_monitor_block_top_offset * wordSize);  const Address monitor_block_bot(        rbp, frame::interpreter_frame_initial_sp_offset * wordSize);  const int entry_size = frame::interpreter_frame_monitor_size() * wordSize;  Register rtop = LP64_ONLY(c_rarg1) NOT_LP64(rdx);  Register rbot = LP64_ONLY(c_rarg2) NOT_LP64(rbx);  Label found;  // find matching slot  {    Label entry, loop;    __ movptr(rtop, monitor_block_top); // points to current entry,                                        // starting with top-most entry    __ lea(rbot, monitor_block_bot);    // points to word before bottom                                        // of monitor block    __ jmpb(entry);    __ bind(loop);    // check if current entry is for same object    __ cmpptr(rax, Address(rtop, BasicObjectLock::obj_offset_in_bytes()));    // if same object then stop searching    __ jcc(Assembler::equal, found);    // otherwise advance to next entry    __ addptr(rtop, entry_size);    __ bind(entry);    // check if bottom reached    __ cmpptr(rtop, rbot);    // if not at bottom then check this entry    __ jcc(Assembler::notEqual, loop);  }

参考资料

  1. Java Virtual Machine Specification 3.14. Synchronization
  2. 《Java并发编程的艺术》 2.2 synchronized的实现原理和利用
本文来自https://blog.csdn.net/xindoo