关于java:ThreadLocal比FastThreadLocal慢在哪里

吊打 ThreadLocal，谈谈 FastThreadLocal 为啥能这么快？

既然 jdk 曾经有 ThreadLocal，为何 netty 还要本人造个 FastThreadLocal？FastThreadLocal 快在哪里？

这须要从 jdk ThreadLocal 的自身说起。如下图：

在 java 线程中，每个线程都有一个 ThreadLocalMap 实例变量（如果不应用 ThreadLocal，不会创立这个 Map，一个线程第一次拜访某个 ThreadLocal 变量时，才会创立）。该 Map 是应用线性探测的形式解决 hash 抵触的问题，如果没有找到闲暇的 slot，就一直往后尝试，直到找到一个闲暇的地位，插入 entry，这种形式在常常遇到 hash 抵触时，影响效率。

FastThreadLocal(下文简称 ftl)间接应用数组防止了 hash 抵触的产生，具体做法是：每一个 FastThreadLocal 实例创立时，调配一个下标 index；调配 index 应用 AtomicInteger 实现，每个 FastThreadLocal 都能获取到一个不反复的下标。当调用 ftl.get()办法获取值时，间接从数组获取返回，如 return array[index]，如下图：

netty FastThreadLocal

依据上文图示可知，ftl 的实现，波及到 InternalThreadLocalMap、FastThreadLocalThread 和 FastThreadLocal 几个类，自底向上，咱们先从 InternalThreadLocalMap 开始剖析。

InternalThreadLocalMap 类的继承关系图如下：

在这里插入图片形容

static final ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = new ThreadLocal<InternalThreadLocalMap>();
static final AtomicInteger nextIndex = new AtomicInteger();
Object[] indexedVariables;

数组 indexedVariables 就是用来存储 ftl 的 value 的，应用下标的形式间接拜访。nextIndex 在 ftl 实例创立时用来给每个 ftl 实例调配一个下标，slowThreadLocalMap 在线程不是 ftlt 时应用到。

InternalThreadLocalMap 的次要属性：

// 用于标识数组的槽位还未应用
public static final Object UNSET = new Object();
/**
 * 用于标识 ftl 变量是否注册了 cleaner
 * BitSet 简要原理：* BitSet 默认底层数据结构是一个 long[]数组，开始时长度为 1，即只有 long[0], 而一个 long 有 64bit。* 当 BitSet.set(1)的时候，示意将 long[0]的第二位设置为 true，即 0000 0000 ... 0010（64bit）, 则 long[0]==2
 * 当 BitSet.get(1)的时候，第二位为 1，则示意 true；如果是 0，则示意 false
 * 当 BitSet.set(64)的时候，示意设置第 65 位，此时 long[0]曾经不够用了，扩容处 long[1]来，进行存储
 *
 * 存储相似 {index:boolean} 键值对，用于避免一个 FastThreadLocal 屡次启动清理线程
 * 将 index 地位的 bit 设为 true，示意该 InternalThreadLocalMap 中对该 FastThreadLocal 曾经启动了清理线程
 */
private BitSet cleanerFlags; 

``````
private InternalThreadLocalMap() {super(newIndexedVariableTable());
}

private static Object[] newIndexedVariableTable() {Object[] array = new Object[32];
        Arrays.fill(array, UNSET);
        return array;
}

比较简单，newIndexedVariableTable()办法创立长度为 32 的数组，而后初始化为 UNSET，而后传给父类。之后 ftl 的值就保留到这个数组外面。留神，这里保留的间接是变量值，不是 entry，这是和 jdk ThreadLocal 不同的。InternalThreadLocalMap 就先剖析到这，其余办法在前面剖析 ftl 再具体说。

要施展 ftl 的性能劣势，必须和 ftlt 联合应用，否则就会进化到 jdk 的 ThreadLocal。ftlt 比较简单，要害代码如下：

public class FastThreadLocalThread extends Thread {
  // This will be set to true if we have a chance to wrap the Runnable.
  private final boolean cleanupFastThreadLocals;
  
  private InternalThreadLocalMap threadLocalMap;
  
  public final InternalThreadLocalMap threadLocalMap() {return threadLocalMap;}
  public final void setThreadLocalMap(InternalThreadLocalMap threadLocalMap) {this.threadLocalMap = threadLocalMap;}
}

ftlt 的窍门就在 threadLocalMap 属性，它继承 java Thread，而后聚合了本人的 InternalThreadLocalMap。前面拜访 ftl 变量，对于 ftlt 线程，都间接从 InternalThreadLocalMap 获取变量值。

ftl 实现剖析基于 netty-4.1.34 版本，特地地申明了版本，是因为在革除的中央，该版本的源码曾经正文掉了 ObjectCleaner 的调用，和之前的版本有所不同。

private final int index;

public FastThreadLocal() {index = InternalThreadLocalMap.nextVariableIndex();
}

非常简单，就是给属性 index 赋值，赋值的静态方法在 InternalThreadLocalMap：

 public static int nextVariableIndex() {int index = nextIndex.getAndIncrement();
        if (index < 0) {nextIndex.decrementAndGet();
            throw new IllegalStateException("too many thread-local indexed variables");
        }
        return index;
  }

可见，每个 ftl 实例以步长为 1 的递增序列，获取 index 值，这保障了 InternalThreadLocalMap 中数组的长度不会突增。

  public final V get() {InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get(); // 1
        Object v = threadLocalMap.indexedVariable(index); // 2
        if (v != InternalThreadLocalMap.UNSET) {return (V) v;
        }

        V value = initialize(threadLocalMap); // 3
        registerCleaner(threadLocalMap);  // 4
        return value;
    }

先来看看 InternalThreadLocalMap.get() 办法如何获取 threadLocalMap：

=======================InternalThreadLocalMap=======================  
  public static InternalThreadLocalMap get() {Thread thread = Thread.currentThread();
        if (thread instanceof FastThreadLocalThread) {return fastGet((FastThreadLocalThread) thread);
        } else {return slowGet();
        }
    }
    
  private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {InternalThreadLocalMap threadLocalMap = thread.threadLocalMap();
        if (threadLocalMap == null) {thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap());
        }
        return threadLocalMap;
    }

因为联合 FastThreadLocalThread 应用能力施展 FastThreadLocal 的性能劣势，所以次要看 fastGet 办法。该办法间接从 ftlt 线程获取 threadLocalMap，还没有则创立一个 InternalThreadLocalMap 实例并设置进去，而后返回。

threadLocalMap.indexedVariable(index)就简略了，间接从数组获取值，而后返回：

  public Object indexedVariable(int index) {Object[] lookup = indexedVariables;
        return index < lookup.length? lookup[index] : UNSET;
    }

如果获取到的值不是 UNSET，那么是个无效的值，间接返回。如果是 UNSET，则初始化。

initialize(threadLocalMap)办法：

  private V initialize(InternalThreadLocalMap threadLocalMap) {
        V v = null;
        try {v = initialValue();
        } catch (Exception e) {PlatformDependent.throwException(e);
        }

        threadLocalMap.setIndexedVariable(index, v); // 3-1
        addToVariablesToRemove(threadLocalMap, this); // 3-2
        return v;
    }

「3-1」获取 ftl 的初始值，而后保留到 ftl 里的数组，如果数组长度不够则裁减数组长度，而后保留，不开展。

「3-2」「addToVariablesToRemove(threadLocalMap, this)」的实现，是将 ftl 实例保留在 threadLocalMap 外部数组第 0 个元素的 Set 汇合中。此处不贴代码，用图示如下：

ftl variablesToRemove

registerCleaner(threadLocalMap)的实现，netty-4.1.34 版本中的源码：

private void registerCleaner(final InternalThreadLocalMap threadLocalMap) {Thread current = Thread.currentThread();
        if (FastThreadLocalThread.willCleanupFastThreadLocals(current) || threadLocalMap.isCleanerFlagSet(index)) {return;}

        threadLocalMap.setCleanerFlag(index);

        // TODO: We need to find a better way to handle this.
        /*
        // We will need to ensure we will trigger remove(InternalThreadLocalMap) so everything will be released
        // and FastThreadLocal.onRemoval(...) will be called.
        ObjectCleaner.register(current, new Runnable() {
            @Override
            public void run() {remove(threadLocalMap);

                // It's fine to not call InternalThreadLocalMap.remove() here as this will only be triggered once
                // the Thread is collected by GC. In this case the ThreadLocal will be gone away already.
            }
        });
        */
}

因为 ObjectCleaner.register 这段代码在该版本曾经正文掉，而余下逻辑比较简单，因而不再做剖析。对于 ObjectCleaner，本文不做探讨。

随着 **get()** 办法剖析结束，**set(value)** 办法原理也跃然纸上，限于篇幅，不再独自剖析。前文说过，ftl 要联合 ftlt 能力最大地施展其性能，如果是其余的一般线程，就会进化到 jdk 的 ThreadLocal 的状况，因为一般线程没有蕴含 InternalThreadLocalMap 这样的数据结构，接下来咱们看如何进化。

从 InternalThreadLocalMap 的 get() 办法看起：

=======================InternalThreadLocalMap=======================  
  public static InternalThreadLocalMap get() {Thread thread = Thread.currentThread();
        if (thread instanceof FastThreadLocalThread) {return fastGet((FastThreadLocalThread) thread);
        } else {return slowGet();
        }
    }

  private static InternalThreadLocalMap slowGet() {
       // 父类的类型为 jdk ThreadLocald 的动态属性，从该 threadLocal 获取 InternalThreadLocalMap
        ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = UnpaddedInternalThreadLocalMap.slowThreadLocalMap;
        InternalThreadLocalMap ret = slowThreadLocalMap.get();
        if (ret == null) {ret = new InternalThreadLocalMap();
            slowThreadLocalMap.set(ret);
        }
        return ret;
    }

从 ftl 看，进化操作的整个流程是：从一个 jdk 的 ThreadLocal 变量中获取 InternalThreadLocalMap，而后再从 InternalThreadLocalMap 获取指定数组下标的值，对象关系示意图：

ftl slowGet

在 netty 中对于 ftl 提供了三种回收机制：

主动：应用 ftlt 执行一个被 FastThreadLocalRunnable wrap 的 Runnable 工作，在工作执行结束后会主动进行 ftl 的清理。
手动：ftl 和 InternalThreadLocalMap 都提供了 remove 办法，在适合的时候用户能够（有的时候也是必须，例如一般线程的线程池应用 ftl）手动进行调用，进行显示删除。
主动：为以后线程的每一个 ftl 注册一个 Cleaner，当线程对象不强可达的时候，该 Cleaner 线程会将以后线程的以后 ftl 进行回收。（netty 举荐如果能够用其余两种形式，就不要再用这种形式，因为须要另起线程，消耗资源，而且多线程就会造成一些资源竞争，在 netty-4.1.34 版本中，曾经正文掉了调用 ObjectCleaner 的代码。）

ftl 在 netty 中最重要的应用，就是调配 ByteBuf。根本做法是：每个线程都调配一块内存(PoolArena)，当须要调配 ByteBuf 时，线程先从本人持有的 PoolArena 调配，如果本人无奈调配，再采纳全局调配。然而因为内存资源无限，所以还是会有多个线程持有同一块 PoolArena 的状况。不过这种形式曾经最大限度地加重了多线程的资源竞争，进步程序效率。

具体的代码在 PoolByteBufAllocator 的外部类 PoolThreadLocalCache 中：

  final class PoolThreadLocalCache extends FastThreadLocal<PoolThreadCache> {

    @Override
        protected synchronized PoolThreadCache initialValue() {final PoolArena<byte[]> heapArena = leastUsedArena(heapArenas);
            final PoolArena<ByteBuffer> directArena = leastUsedArena(directArenas);

            Thread current = Thread.currentThread();
            if (useCacheForAllThreads || current instanceof FastThreadLocalThread) {
              // PoolThreadCache 即为各个线程持有的内存块的封装  
              return new PoolThreadCache(
                        heapArena, directArena, tinyCacheSize, smallCacheSize, normalCacheSize,
                        DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL);
            }
            // No caching so just use 0 as sizes.
            return new PoolThreadCache(heapArena, directArena, 0, 0, 0, 0, 0);
        }
    }

关于java:ThreadLocal比FastThreadLocal慢在哪里

1 FastThreadLocal 的引入背景和原理简介

2 实现源码剖析

2.1 UnpaddedInternalThreadLocalMap 的次要属性

2.2 InternalThreadLocalMap 剖析

2.3 ftlt 的实现剖析

2.4 ftl 实现剖析

2.4.1 ftl 的属性和实例化

2.4.2 get()办法实现剖析

2.5 一般线程应用 ftl 的性能进化

3 ftl 的资源回收机制

4 ftl 在 netty 中的应用

Just My Socks（注册教程内含优惠码）

关于java:ThreadLocal比FastThreadLocal慢在哪里

1 FastThreadLocal 的引入背景和原理简介

2 实现源码剖析

2.1 UnpaddedInternalThreadLocalMap 的次要属性

2.2 InternalThreadLocalMap 剖析

2.3 ftlt 的实现剖析

2.4 ftl 实现剖析

2.4.1 ftl 的属性和实例化

2.4.2 get()办法实现剖析

2.5 一般线程应用 ftl 的性能进化

3 ftl 的资源回收机制

4 ftl 在 netty 中的应用

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）