关于java:吊打-ThreadLocal谈谈FastThreadLocal为啥能这么快

1 FastThreadLocal的引入背景和原理简介

既然jdk曾经有ThreadLocal，为何netty还要本人造个FastThreadLocal？FastThreadLocal快在哪里？

这须要从jdk ThreadLocal的自身说起。如下图：

在java线程中，每个线程都有一个ThreadLocalMap实例变量（如果不应用ThreadLocal，不会创立这个Map，一个线程第一次拜访某个ThreadLocal变量时，才会创立）。

该Map是应用线性探测的形式解决hash抵触的问题，如果没有找到闲暇的slot，就一直往后尝试，直到找到一个闲暇的地位，插入entry，这种形式在常常遇到hash抵触时，影响效率。

FastThreadLocal(下文简称ftl)间接应用数组防止了hash抵触的产生，具体做法是：每一个FastThreadLocal实例创立时，调配一个下标index；调配index应用AtomicInteger实现，每个FastThreadLocal都能获取到一个不反复的下标。

当调用ftl.get()办法获取值时，间接从数组获取返回，如return array[index]，如下图：

2 实现源码剖析

依据上文图示可知，ftl的实现，波及到InternalThreadLocalMap、FastThreadLocalThread和FastThreadLocal几个类，自底向上，咱们先从InternalThreadLocalMap开始剖析。

InternalThreadLocalMap类的继承关系图如下：

2.1 UnpaddedInternalThreadLocalMap的次要属性

static final ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = new ThreadLocal<InternalThreadLocalMap>();static final AtomicInteger nextIndex = new AtomicInteger();Object[] indexedVariables;

数组indexedVariables就是用来存储ftl的value的，应用下标的形式间接拜访。nextIndex在ftl实例创立时用来给每个ftl实例调配一个下标，slowThreadLocalMap在线程不是ftlt时应用到。

2.2 InternalThreadLocalMap剖析

InternalThreadLocalMap的次要属性：

// 用于标识数组的槽位还未应用public static final Object UNSET = new Object();/** * 用于标识ftl变量是否注册了cleaner * BitSet简要原理： * BitSet默认底层数据结构是一个long[]数组，开始时长度为1，即只有long[0],而一个long有64bit。 * 当BitSet.set(1)的时候，示意将long[0]的第二位设置为true，即0000 0000 ... 0010（64bit）,则long[0]==2 * 当BitSet.get(1)的时候，第二位为1，则示意true；如果是0，则示意false * 当BitSet.set(64)的时候，示意设置第65位，此时long[0]曾经不够用了，扩容处long[1]来，进行存储 * * 存储相似 {index:boolean} 键值对，用于避免一个FastThreadLocal屡次启动清理线程 * 将index地位的bit设为true，示意该InternalThreadLocalMap中对该FastThreadLocal曾经启动了清理线程 */private BitSet cleanerFlags; private InternalThreadLocalMap() {        super(newIndexedVariableTable());}private static Object[] newIndexedVariableTable() {        Object[] array = new Object[32];        Arrays.fill(array, UNSET);        return array;}

比较简单，newIndexedVariableTable()办法创立长度为32的数组，而后初始化为UNSET，而后传给父类。之后ftl的值就保留到这个数组外面。

留神，这里保留的间接是变量值，不是entry，这是和jdk ThreadLocal不同的。InternalThreadLocalMap就先剖析到这，其余办法在前面剖析ftl再具体说。

2.3 ftlt的实现剖析

要施展ftl的性能劣势，必须和ftlt联合应用，否则就会进化到jdk的ThreadLocal。ftlt比较简单，要害代码如下：

public class FastThreadLocalThread extends Thread {  // This will be set to true if we have a chance to wrap the Runnable.  private final boolean cleanupFastThreadLocals;    private InternalThreadLocalMap threadLocalMap;    public final InternalThreadLocalMap threadLocalMap() {        return threadLocalMap;  }  public final void setThreadLocalMap(InternalThreadLocalMap threadLocalMap) {        this.threadLocalMap = threadLocalMap;  }}

ftlt的窍门就在threadLocalMap属性，它继承java Thread，而后聚合了本人的InternalThreadLocalMap。前面拜访ftl变量，对于ftlt线程，都间接从InternalThreadLocalMap获取变量值。

2.4 ftl实现剖析

ftl实现剖析基于netty-4.1.34版本，特地地申明了版本，是因为在革除的中央，该版本的源码曾经正文掉了ObjectCleaner的调用，和之前的版本有所不同。

2.4.1 ftl的属性和实例化

private final int index;public FastThreadLocal() {    index = InternalThreadLocalMap.nextVariableIndex();}

非常简单，就是给属性index赋值，赋值的静态方法在InternalThreadLocalMap：

 public static int nextVariableIndex() {        int index = nextIndex.getAndIncrement();        if (index < 0) {            nextIndex.decrementAndGet();            throw new IllegalStateException("too many thread-local indexed variables");        }        return index;  }

可见，每个ftl实例以步长为1的递增序列，获取index值，这保障了InternalThreadLocalMap中数组的长度不会突增。

2.4.2 get()办法实现剖析

public final V get() {    InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get(); // 1    Object v = threadLocalMap.indexedVariable(index); // 2    if (v != InternalThreadLocalMap.UNSET) {        return (V) v;    }    V value = initialize(threadLocalMap); // 3    registerCleaner(threadLocalMap);  // 4    return value;}

1.先来看看InternalThreadLocalMap.get()办法如何获取threadLocalMap：

=======================InternalThreadLocalMap=======================    public static InternalThreadLocalMap get() {        Thread thread = Thread.currentThread();        if (thread instanceof FastThreadLocalThread) {            return fastGet((FastThreadLocalThread) thread);        } else {            return slowGet();        }    }      private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {        InternalThreadLocalMap threadLocalMap = thread.threadLocalMap();        if (threadLocalMap == null) {            thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap());        }        return threadLocalMap;    }

因为联合FastThreadLocalThread应用能力施展FastThreadLocal的性能劣势，所以次要看fastGet办法。该办法间接从ftlt线程获取threadLocalMap，还没有则创立一个InternalThreadLocalMap实例并设置进去，而后返回。

2.threadLocalMap.indexedVariable(index)就简略了，间接从数组获取值，而后返回：

  public Object indexedVariable(int index) {        Object[] lookup = indexedVariables;        return index < lookup.length? lookup[index] : UNSET;    }

3.如果获取到的值不是UNSET，那么是个无效的值，间接返回。如果是UNSET，则初始化。

initialize(threadLocalMap)办法：

  private V initialize(InternalThreadLocalMap threadLocalMap) {        V v = null;        try {            v = initialValue();        } catch (Exception e) {            PlatformDependent.throwException(e);        }        threadLocalMap.setIndexedVariable(index, v); // 3-1        addToVariablesToRemove(threadLocalMap, this); // 3-2        return v;    }

3.1.获取ftl的初始值，而后保留到ftl里的数组，如果数组长度不够则裁减数组长度，而后保留，不开展。

3.2.addToVariablesToRemove(threadLocalMap, this)的实现，是将ftl实例保留在threadLocalMap外部数组第0个元素的Set汇合中。

此处不贴代码，用图示如下：

4.registerCleaner(threadLocalMap)的实现，netty-4.1.34版本中的源码：

private void registerCleaner(final InternalThreadLocalMap threadLocalMap) {        Thread current = Thread.currentThread();        if (FastThreadLocalThread.willCleanupFastThreadLocals(current) || threadLocalMap.isCleanerFlagSet(index)) {            return;        }        threadLocalMap.setCleanerFlag(index);        // TODO: We need to find a better way to handle this.        /*        // We will need to ensure we will trigger remove(InternalThreadLocalMap) so everything will be released        // and FastThreadLocal.onRemoval(...) will be called.        ObjectCleaner.register(current, new Runnable() {            @Override            public void run() {                remove(threadLocalMap);                // It's fine to not call InternalThreadLocalMap.remove() here as this will only be triggered once                // the Thread is collected by GC. In this case the ThreadLocal will be gone away already.            }        });        */}

因为ObjectCleaner.register这段代码在该版本曾经正文掉，而余下逻辑比较简单，因而不再做剖析。

2.5 一般线程应用ftl的性能进化

随着get()办法剖析结束，set(value)办法原理也跃然纸上，限于篇幅，不再独自剖析。

前文说过，ftl要联合ftlt能力最大地施展其性能，如果是其余的一般线程，就会进化到jdk的ThreadLocal的状况，因为一般线程没有蕴含InternalThreadLocalMap这样的数据结构，接下来咱们看如何进化。

从InternalThreadLocalMap的get()办法看起：

=======================InternalThreadLocalMap=======================    public static InternalThreadLocalMap get() {        Thread thread = Thread.currentThread();        if (thread instanceof FastThreadLocalThread) {            return fastGet((FastThreadLocalThread) thread);        } else {            return slowGet();        }    }  private static InternalThreadLocalMap slowGet() {       // 父类的类型为jdk ThreadLocald的动态属性，从该threadLocal获取InternalThreadLocalMap        ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = UnpaddedInternalThreadLocalMap.slowThreadLocalMap;        InternalThreadLocalMap ret = slowThreadLocalMap.get();        if (ret == null) {            ret = new InternalThreadLocalMap();            slowThreadLocalMap.set(ret);        }        return ret;    }

从ftl看，进化操作的整个流程是：从一个jdk的ThreadLocal变量中获取InternalThreadLocalMap，而后再从InternalThreadLocalMap获取指定数组下标的值，对象关系示意图：

3 ftl的资源回收机制

在netty中对于ftl提供了三种回收机制：

主动： 应用ftlt执行一个被FastThreadLocalRunnable wrap的Runnable工作，在工作执行结束后会主动进行ftl的清理。

手动： ftl和InternalThreadLocalMap都提供了remove办法，在适合的时候用户能够（有的时候也是必须，例如一般线程的线程池应用ftl）手动进行调用，进行显示删除。

主动： 为以后线程的每一个ftl注册一个Cleaner，当线程对象不强可达的时候，该Cleaner线程会将以后线程的以后ftl进行回收。（netty举荐如果能够用其余两种形式，就不要再用这种形式，因为须要另起线程，消耗资源，而且多线程就会造成一些资源竞争，在netty-4.1.34版本中，曾经正文掉了调用ObjectCleaner的代码。）

4 ftl在netty中的应用

ftl在netty中最重要的应用，就是调配ByteBuf。根本做法是：每个线程都调配一块内存(PoolArena)，当须要调配ByteBuf时，线程先从本人持有的PoolArena调配，如果本人无奈调配，再采纳全局调配。

然而因为内存资源无限，所以还是会有多个线程持有同一块PoolArena的状况。不过这种形式曾经最大限度地加重了多线程的资源竞争，进步程序效率。

具体的代码在PoolByteBufAllocator的外部类PoolThreadLocalCache中：

  final class PoolThreadLocalCache extends FastThreadLocal<PoolThreadCache> {    @Override        protected synchronized PoolThreadCache initialValue() {            final PoolArena<byte[]> heapArena = leastUsedArena(heapArenas);            final PoolArena<ByteBuffer> directArena = leastUsedArena(directArenas);            Thread current = Thread.currentThread();            if (useCacheForAllThreads || current instanceof FastThreadLocalThread) {              // PoolThreadCache即为各个线程持有的内存块的封装                return new PoolThreadCache(                        heapArena, directArena, tinyCacheSize, smallCacheSize, normalCacheSize,                        DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL);            }            // No caching so just use 0 as sizes.            return new PoolThreadCache(heapArena, directArena, 0, 0, 0, 0, 0);        }    }