关于java:更新丢失当HashMap遇见CompletableFuture

HashMap是面试题中陈词滥调的话题，本人面试时也已经娓娓而谈。万万没想到，老司机在开发过程中也载到这上了。

具体场景是这样的。有一个列表查问的数据，某个字段的状态须要从其余接口中查问获取，为了优化查问速度，用了 CompletableFuture.runAsync() 的形式并行调用接口查问，查问后果放到一个 HashMap 中供前面应用。大抵简化后的代码如下：

private static void task() {Map<Long, Boolean> contractsMap = Maps.newHashMap();
    List<CompletableFuture> futures = Lists.newArrayList();
    Lists.newArrayList(123L, 456L).forEach(uid -> {futures.add(CompletableFuture.runAsync(() -> {
            Boolean effect = true;
            contractsMap.put(uid, effect);
        }));
    });
    CompletableFuture.allOf(futures.toArray(new CompletableFuture[]{})).join();
    System.out.println(contractsMap);
}

这个查问的列表多刷新几次，就会发现每次的状态值都不同。在查问之后打断点后发现，contractsMap有时候 size 都不一样，有些数据莫名其妙的失落了。

下面这个简化后的 task 办法，多运行几次后很容易就能复现出这个景象。

public static void main(String[] args) {for (int i = 0; i < 100; i++) {task();
    }
}

输入：

{456=false, 123=true}
{456=false, 123=true}
{456=false, 123=true}
{456=false, 123=true}
{123=true}
{456=false, 123=true}
{456=false, 123=true}
{123=true}
{456=false, 123=true}
{456=false, 123=true}
{456=false, 123=true}

确认了逻辑没问题后，猛然想起这个 HashMap 用到了多线程环境中，换成了 ConcurrentHashMap 之后果然不再呈现。

在网上搜寻了一番后，很多文章都提到了 JDK 8 中 HashMap 可能会呈现的数据失落问题，但根本都在说哈希抵触时的状况。但下面的例子，其实就两个 key，哈希值还绝不雷同，那么就不可能是哈希抵触时呈现的数据失落。

来看一下 HashMap 中的 putVal 办法：

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {for (int binCount = 0; ; ++binCount) {if ((e = p.next) == null) {p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

总共三个分支，第一个是首次应用 HashMap 时发现 tab 为空，于是调用 resize 来结构 tab；第二个分支是没有哈希抵触时，结构一个节点并间接放到tab 的槽位上；第三个简单的分支是用来解决哈希抵触的状况。

对于下面的例子而言，并不存在哈希抵触的状况，因而产生数据失落只可能在第一和第二个分支的处理过程。

集体猜想的可能是

第一个线程进到 putVal 办法时，发现 tab 为空，于是筹备 resize 办法；
但在结构 tab 前，切换到第二个线程执行，同样发现 tab 为空，于是进入到 resize 办法并结构了tab；而后进入第二个分支将键值对放入tab。
切换回第一个线程，结构了一个 tab 并赋值，于是将第二步中的 tab 笼罩；而后进入第二个分支将键值对放入tab。

为了验证猜测，须要明确 resize 办法和 newNode 办法的执行绝对程序。但并发条件又不好打断点，于是抉择了采纳 Java Agent 的形式，来批改 resize 和newNode办法，在办法进入前和进入后打印日志。

public class JvmAgentDemo {public static void premain(String args, Instrumentation instrumentation){instrumentation.addTransformer(new TestTransformer(), true);
        try{instrumentation.retransformClasses(Test.class);
            instrumentation.retransformClasses(HashMap.class);
            System.out.println("agent load done");
        }catch (Exception e){System.out.println("agent load failed");
        }
    }
}

public class TestTransformer implements ClassFileTransformer {
    
    @Override
    public byte[] transform(ClassLoader loader, String className, Class<?> classBeingRedefined, ProtectionDomain protectionDomain, byte[] classfileBuffer) throws IllegalClassFormatException {System.out.println("Transforming" + className);       
        if ("java/util/HashMap".equals(className)) {
            try {ClassPool cp = ClassPool.getDefault();
                CtClass cc = cp.get("java.util.HashMap");
                
                CtMethod m = cc.getDeclaredMethod("resize");
                m.insertBefore("{ System.out.println(Thread.currentThread().getName() + \": resize start\"); }");
                m.insertAfter("{ System.out.println(Thread.currentThread().getName() + \": resize end\");; }");
                
                CtMethod m2 = cc.getDeclaredMethod("newNode");
                m2.insertBefore("{ System.out.println(Thread.currentThread().getName() + \": newNode start\"); }");
                m2.insertAfter("{ System.out.println(Thread.currentThread().getName() + \": newNode end\"); }");
                return cc.toBytecode();} catch (Exception e) {e.printStackTrace();
            }
        }
        return null;
    }
}

将上述 Agent 打成 jar，运行后面 task 用例时加载这个 agent，察看控制台的输入：

ForkJoinPool.commonPool-worker-3: resize start
ForkJoinPool.commonPool-worker-3: resize end
ForkJoinPool.commonPool-worker-3: newNode start
ForkJoinPool.commonPool-worker-2: newNode start
ForkJoinPool.commonPool-worker-3: newNode end
ForkJoinPool.commonPool-worker-2: newNode end
{456=false, 123=true}
ForkJoinPool.commonPool-worker-3: resize start
ForkJoinPool.commonPool-worker-3: resize end
ForkJoinPool.commonPool-worker-3: newNode start
ForkJoinPool.commonPool-worker-3: newNode end
ForkJoinPool.commonPool-worker-2: resize start
ForkJoinPool.commonPool-worker-2: resize end
ForkJoinPool.commonPool-worker-2: newNode start
ForkJoinPool.commonPool-worker-2: newNode end
{456=false, 123=true}
ForkJoinPool.commonPool-worker-2: resize start
ForkJoinPool.commonPool-worker-3: resize start
ForkJoinPool.commonPool-worker-3: resize end
ForkJoinPool.commonPool-worker-3: newNode start
ForkJoinPool.commonPool-worker-3: newNode end
ForkJoinPool.commonPool-worker-2: resize end
ForkJoinPool.commonPool-worker-2: newNode start
ForkJoinPool.commonPool-worker-2: newNode end
{123=true}

截取了三次运行时的日志，第一次和第二次都是失常状况，resize和 newNode 的执行程序没什么问题，无非是第二次多了一次 resize 的，但 resize 外部在创立新的 tab 时会将老的 tab 内容复制过去，因而最终后果也是失常的。

但第三次的执行序列和后面的猜测是统一的，worker-2线程进入 resize 办法后、返回前，worker-3线程也进入到了 resize 办法中，从 resize 返回后进入 newNode 办法创立了一个节点；之后 worker-2 线程继续执行，笼罩了 worker-3 线程写入的数据。

还有一些执行序列也会导致数据失落，但我没太想明确可能的执行程序，比方：

ForkJoinPool.commonPool-worker-1: resize start
ForkJoinPool.commonPool-worker-2: resize start
ForkJoinPool.commonPool-worker-2: resize end
ForkJoinPool.commonPool-worker-1: resize end
ForkJoinPool.commonPool-worker-1: newNode start
ForkJoinPool.commonPool-worker-1: newNode end
ForkJoinPool.commonPool-worker-2: newNode start
ForkJoinPool.commonPool-worker-2: newNode end
{456=false}

以上便是整个剖析的过程。尽管大家都晓得 HashMap 是非线程平安的，但开发过程中还是很容易就在多线程环境下应用上了。面经看了很多，但和最终的利用还是有点间隔。

关于java:更新丢失当HashMap遇见CompletableFuture

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）