共计 7725 个字符,预计需要花费 20 分钟才能阅读完成。
前两天逛博客的时候看到有集体写了一篇博客说 ReentrantLock 比 synchronized 慢,这就很违反我的认知了,具体看了他的博客和测试代码,发现了他测试的不谨严,并在评论中敌对地指出了他的问题,后果他间接把博客给删了 删了 了……
很多老一辈的程序猿对有 synchronized 有个 性能差 的刻板印象,而后竭力推崇应用 java.util.concurrent 包中的 lock 类,如果你诘问他们 synchronized 和 lock 实现性能差多少,预计没几个人能答出来。说到这你是不是也很想晓得我的测试后果?synchronized 与 ReentrantLock 所实现的性能差不多,用处也大幅度重合,索性咱们就来测测这二者的性能差别。
实测后果
测试平台:jdk11, MacBook Pro (13-inch, 2017) , jmh 测试
测试代码如下:
public class LockTest {
private static Object lock = new Object();
private static ReentrantLock reentrantLock = new ReentrantLock();
private static long cnt = 0;
@Benchmark
@Measurement(iterations = 2)
@Threads(10)
@Fork(0)
@Warmup(iterations = 5, time = 10)
public void testWithoutLock(){doSomething();
}
@Benchmark
@Measurement(iterations = 2)
@Threads(10)
@Fork(0)
@Warmup(iterations = 5, time = 10)
public void testReentrantLock(){reentrantLock.lock();
doSomething();
reentrantLock.unlock();}
@Benchmark
@Measurement(iterations = 2)
@Threads(10)
@Fork(0)
@Warmup(iterations = 5, time = 10)
public void testSynchronized(){synchronized (lock) {doSomething();
}
}
private void doSomething() {
cnt += 1;
if (cnt >= (Long.MAX_VALUE >> 1)) {cnt = 0;}
}
public static void main(String[] args) {Options options = new OptionsBuilder().include(LockTest.class.getSimpleName()).build();
try {new Runner(options).run();} catch (Exception e) {} finally {}
}
}
Benchmark Mode Cnt Score Error Units
LockTest.testReentrantLock thrpt 2 32283819.289 ops/s
LockTest.testSynchronized thrpt 2 25325244.320 ops/s
LockTest.testWithoutLock thrpt 2 641215542.492 ops/s
没错 synchronized 性能的确更差,但就只差 20% 左右,第一次测试的时候我也挺惊讶的,晓得 synchronized 会差,但那种预期中几个数量级的差别却没有呈现。于是我又把 @Threads 线程数调大了,减少了多线程之间竞争的可能性,失去了如下的后果。
Benchmark Mode Cnt Score Error Units
LockTest.testReentrantLock thrpt 2 29464798.051 ops/s
LockTest.testSynchronized thrpt 2 22346035.066 ops/s
LockTest.testWithoutLock thrpt 2 383047064.795 ops/s
性能差别稍有拉开,但还是在同一量级上。
论断
半信半疑,synchronized 的性能的确要比 synchronized 差个 20%-30%,那是不是代码中所有用到 synchronized 的中央都应该换成 lock?非也,认真想想看,ReentrantLock 简直和能够代替任何应用 synchronized 的场景,而且性能更好,那为什么 jdk 始终要留着这个关键词呢?而且齐全没有任何想要废除它的想法。
黑格尔说过 存在即正当 ,synchronized 因多线程应运而生,它的存在也大幅度简化了 Java 多线程的开发。没错,它的劣势就是应用简略,你不须要显示去加减锁,相比之下 ReentrantLock 的应用就繁琐的多了,你加完锁之后还得思考到各种状况下的锁开释,稍不留神就一个 bug 埋下了。
但 ReentrantLock 的繁琐之下,它也提供了更简单的 api,足以应答更多更简单的需要,具体能够参考我之前的博客 ReentrantLock 源码解析。
现在 synchronized 与 ReentrantLock 二者的性能差别不再是选谁的次要因素,你在做抉择的时候更应该思考的是其易用性、功能性和代码的可维护性…… 二者 30% 的性能差别决定不了什么,如果你真想优化代码的性能,你应该抉择的是其余的切入点,而不是宽宏大量这个,切记不要拣了芝麻丢了西瓜。
文章本该到这里就完结了,但我依然好奇为什么 synchronized 给老一辈 java 程序猿留下了性能差的印象,无奈 jdk1.5 及之前的材料曾经比拟长远 不太好找,然而 jdk1.6 对 synchronized 的性能晋升做了啥还是很好找的。
jdk 对 synchronized 优化了啥?
如果你对代码段加了 synchronized 的,jvm 编译后就会在其前后别离插入 monitorenter 和 monitorexit 指令,如下:
void onlyMe(Foo f) {synchronized(f) {doSomething();
}
}
编译后:
Method void onlyMe(Foo)
0 aload_1 // Push f
1 dup // Duplicate it on the stack
2 astore_2 // Store duplicate in local variable 2
3 monitorenter // Enter the monitor associated with f
4 aload_0 // Holding the monitor, pass this and...
5 invokevirtual #5 // ...call Example.doSomething()V
8 aload_2 // Push local variable 2 (f)
9 monitorexit // Exit the monitor associated with f
10 goto 18 // Complete the method normally
13 astore_3 // In case of any throw, end up here
14 aload_2 // Push local variable 2 (f)
15 monitorexit // Be sure to exit the monitor!
16 aload_3 // Push thrown value...
17 athrow // ...and rethrow value to the invoker
18 return // Return in the normal case
Exception table:
From To Target Type
4 10 13 any
13 16 13 any
加锁和开释锁的性能耗费其实就体现在了 monitorenter 和 monitorexit 两个指令上了,如果是优化性能,必定也是在这两个指令上优化了。查阅《Java 并发编程的艺术》发现,Java6 为了缩小锁获取和开释带来的性能耗费,引入了锁分级的策略。将锁状态别离分成 无锁、偏差锁、轻量级锁、重量级锁 四个状态,其性能顺次递加。但所幸因为局部性的存在,大多数并发状况下偏差锁或者轻量级锁就能满足咱们的需要,而且锁只有在竞争重大的状况下才会降级,所以大多数状况下 synchronized 性能也不会太差。
最初我在 jdk11u 的源码里找到了 monitorenter 和 monitorexit 的 x86 版本的实现 (汇编指令和具体平台相干) 献给大家,欢送有志之士研读下。
//-----------------------------------------------------------------------------
// Synchronization
//
// Note: monitorenter & exit are symmetric routines; which is reflected
// in the assembly code structure as well
//
// Stack layout:
//
// [expressions] <--- rsp = expression stack top
// ..
// [expressions]
// [monitor entry] <--- monitor block top = expression stack bot
// ..
// [monitor entry]
// [frame data] <--- monitor block bot
// ...
// [saved rbp] <--- rbp
void TemplateTable::monitorenter() {transition(atos, vtos);
// check for NULL object
__ null_check(rax);
const Address monitor_block_top(rbp, frame::interpreter_frame_monitor_block_top_offset * wordSize);
const Address monitor_block_bot(rbp, frame::interpreter_frame_initial_sp_offset * wordSize);
const int entry_size = frame::interpreter_frame_monitor_size() * wordSize;
Label allocated;
Register rtop = LP64_ONLY(c_rarg3) NOT_LP64(rcx);
Register rbot = LP64_ONLY(c_rarg2) NOT_LP64(rbx);
Register rmon = LP64_ONLY(c_rarg1) NOT_LP64(rdx);
// initialize entry pointer
__ xorl(rmon, rmon); // points to free slot or NULL
// find a free slot in the monitor block (result in rmon)
{
Label entry, loop, exit;
__ movptr(rtop, monitor_block_top); // points to current entry,
// starting with top-most entry
__ lea(rbot, monitor_block_bot); // points to word before bottom
// of monitor block
__ jmpb(entry);
__ bind(loop);
// check if current entry is used
__ cmpptr(Address(rtop, BasicObjectLock::obj_offset_in_bytes()), (int32_t) NULL_WORD);
// if not used then remember entry in rmon
__ cmovptr(Assembler::equal, rmon, rtop); // cmov => cmovptr
// check if current entry is for same object
__ cmpptr(rax, Address(rtop, BasicObjectLock::obj_offset_in_bytes()));
// if same object then stop searching
__ jccb(Assembler::equal, exit);
// otherwise advance to next entry
__ addptr(rtop, entry_size);
__ bind(entry);
// check if bottom reached
__ cmpptr(rtop, rbot);
// if not at bottom then check this entry
__ jcc(Assembler::notEqual, loop);
__ bind(exit);
}
__ testptr(rmon, rmon); // check if a slot has been found
__ jcc(Assembler::notZero, allocated); // if found, continue with that one
// allocate one if there's no free slot
{
Label entry, loop;
// 1. compute new pointers // rsp: old expression stack top
__ movptr(rmon, monitor_block_bot); // rmon: old expression stack bottom
__ subptr(rsp, entry_size); // move expression stack top
__ subptr(rmon, entry_size); // move expression stack bottom
__ mov(rtop, rsp); // set start value for copy loop
__ movptr(monitor_block_bot, rmon); // set new monitor block bottom
__ jmp(entry);
// 2. move expression stack contents
__ bind(loop);
__ movptr(rbot, Address(rtop, entry_size)); // load expression stack
// word from old location
__ movptr(Address(rtop, 0), rbot); // and store it at new location
__ addptr(rtop, wordSize); // advance to next word
__ bind(entry);
__ cmpptr(rtop, rmon); // check if bottom reached
__ jcc(Assembler::notEqual, loop); // if not at bottom then
// copy next word
}
// call run-time routine
// rmon: points to monitor entry
__ bind(allocated);
// Increment bcp to point to the next bytecode, so exception
// handling for async. exceptions work correctly.
// The object has already been poped from the stack, so the
// expression stack looks correct.
__ increment(rbcp);
// store object
__ movptr(Address(rmon, BasicObjectLock::obj_offset_in_bytes()), rax);
__ lock_object(rmon);
// check to make sure this monitor doesn't cause stack overflow after locking
__ save_bcp(); // in case of exception
__ generate_stack_overflow_check(0);
// The bcp has already been incremented. Just need to dispatch to
// next instruction.
__ dispatch_next(vtos);
}
void TemplateTable::monitorexit() {transition(atos, vtos);
// check for NULL object
__ null_check(rax);
const Address monitor_block_top(rbp, frame::interpreter_frame_monitor_block_top_offset * wordSize);
const Address monitor_block_bot(rbp, frame::interpreter_frame_initial_sp_offset * wordSize);
const int entry_size = frame::interpreter_frame_monitor_size() * wordSize;
Register rtop = LP64_ONLY(c_rarg1) NOT_LP64(rdx);
Register rbot = LP64_ONLY(c_rarg2) NOT_LP64(rbx);
Label found;
// find matching slot
{
Label entry, loop;
__ movptr(rtop, monitor_block_top); // points to current entry,
// starting with top-most entry
__ lea(rbot, monitor_block_bot); // points to word before bottom
// of monitor block
__ jmpb(entry);
__ bind(loop);
// check if current entry is for same object
__ cmpptr(rax, Address(rtop, BasicObjectLock::obj_offset_in_bytes()));
// if same object then stop searching
__ jcc(Assembler::equal, found);
// otherwise advance to next entry
__ addptr(rtop, entry_size);
__ bind(entry);
// check if bottom reached
__ cmpptr(rtop, rbot);
// if not at bottom then check this entry
__ jcc(Assembler::notEqual, loop);
}
参考资料
- Java Virtual Machine Specification 3.14. Synchronization
- 《Java 并发编程的艺术》2.2 synchronized 的实现原理和利用
本文来自 https://blog.csdn.net/xindoo