前言
最近在看netty源碼的時候發現了一個叫FastThreadLocal的類,jdk本身自帶了ThreadLocal類,所以可以大致想到此類比jdk自帶的類速度更快,主要快在什麼地方,以及爲什麼速度更快,下面做一個簡單的分析;
性能測試
ThreadLocal主要被用在多線程環境下,方便的獲取當前線程的數據,使用者無需關心多線程問題,方便使用;爲了能說明問題,分別對兩個場景進行測試,分別是:多個線程操作同一個ThreadLocal,單線程下的多個ThreadLocal,下面分別測試:
1.多個線程操作同一個ThreadLocal
分別對ThreadLocal和FastThreadLocal使用測試代碼,部分代碼如下:
public static void test2() throws Exception {
CountDownLatch cdl = new CountDownLatch(10000);
ThreadLocal<String> threadLocal = new ThreadLocal<String>();
long starTime = System.currentTimeMillis();
for (int i = 0; i < 10000; i++) {
new Thread(new Runnable() {
@Override
public void run() {
threadLocal.set(Thread.currentThread().getName());
for (int k = 0; k < 100000; k++) {
threadLocal.get();
}
cdl.countDown();
}
}, "Thread" + (i + 1)).start();
}
cdl.await();
System.out.println(System.currentTimeMillis() - starTime + "ms");
}
以上代碼創建了10000個線程,同時往ThreadLocal設置,然後get十萬次,然後通過CountDownLatch來計算總的時間消耗,運行結果爲:1000ms左右;
下面再對FastThreadLocal進行測試,代碼類似:
public static void test2() throws Exception {
CountDownLatch cdl = new CountDownLatch(10000);
FastThreadLocal<String> threadLocal = new FastThreadLocal<String>();
long starTime = System.currentTimeMillis();
for (int i = 0; i < 10000; i++) {
new FastThreadLocalThread(new Runnable() {
@Override
public void run() {
threadLocal.set(Thread.currentThread().getName());
for (int k = 0; k < 100000; k++) {
threadLocal.get();
}
cdl.countDown();
}
}, "Thread" + (i + 1)).start();
}
cdl.await();
System.out.println(System.currentTimeMillis() - starTime);
}
運行之後結果爲:1000ms左右;可以發現在這種情況下兩種類型的ThreadLocal在性能上並沒有什麼差距,下面對第二種情況進行測試;
2.單線程下的多個ThreadLocal
分別對ThreadLocal和FastThreadLocal使用測試代碼,部分代碼如下:
public static void test1() throws InterruptedException {
int size = 10000;
ThreadLocal<String> tls[] = new ThreadLocal[size];
for (int i = 0; i < size; i++) {
tls[i] = new ThreadLocal<String>();
}
new Thread(new Runnable() {
@Override
public void run() {
long starTime = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
tls[i].set("value" + i);
}
for (int i = 0; i < size; i++) {
for (int k = 0; k < 100000; k++) {
tls[i].get();
}
}
System.out.println(System.currentTimeMillis() - starTime + "ms");
}
}).start();
}
以上代碼創建了10000個ThreadLocal,然後使用同一個線程對ThreadLocal設值,同時get十萬次,運行結果:2000ms左右;
下面再對FastThreadLocal進行測試,代碼類似:
public static void test1() {
int size = 10000;
FastThreadLocal<String> tls[] = new FastThreadLocal[size];
for (int i = 0; i < size; i++) {
tls[i] = new FastThreadLocal<String>();
}
new FastThreadLocalThread(new Runnable() {
@Override
public void run() {
long starTime = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
tls[i].set("value" + i);
}
for (int i = 0; i < size; i++) {
for (int k = 0; k < 100000; k++) {
tls[i].get();
}
}
System.out.println(System.currentTimeMillis() - starTime + "ms");
}
}).start();
}
運行結果:30ms左右;可以發現性能達到兩個數量級的差距,當然這是在大量訪問次數的情況下才有的效果;下面重點分析一下ThreadLocal的機制,以及FastThreadLocal爲什麼比ThreadLocal更快;
ThreadLocal的機制
因爲我們常用的就是set和get方法,分別看一下對應的源碼:
public void set(T value) {
Thread t = Thread.currentThread();
ThreadLocalMap map = getMap(t);
if (map != null)
map.set(this, value);
else
createMap(t, value);
}
ThreadLocalMap getMap(Thread t) {
return t.threadLocals;
}
以上代碼大致意思:首先獲取當前線程,然後獲取當前線程中存儲的threadLocals變量,此變量其實就是ThreadLocalMap,最後看此ThreadLocalMap是否爲空,爲空就創建一個新的Map,不爲空則以當前的ThreadLocal爲key,存儲當前value;可以進一步看一下ThreadLocalMap中的set方法:
private void set(ThreadLocal<?> key, Object value) {
// We don't use a fast path as with get() because it is at
// least as common to use set() to create new entries as
// it is to replace existing ones, in which case, a fast
// path would fail more often than not.
Entry[] tab = table;
int len = tab.length;
int i = key.threadLocalHashCode & (len-1);
for (Entry e = tab[i];
e != null;
e = tab[i = nextIndex(i, len)]) {
ThreadLocal<?> k = e.get();
if (k == key) {
e.value = value;
return;
}
if (k == null) {
replaceStaleEntry(key, value, i);
return;
}
}
tab[i] = new Entry(key, value);
int sz = ++size;
if (!cleanSomeSlots(i, sz) && sz >= threshold)
rehash();
}
大致意思:ThreadLocalMap內部使用一個數組來保存數據,類似HashMap;每個ThreadLocal在初始化的時候會分配一個threadLocalHashCode,然後和數組的長度進行取模操作,所以就會出現hash衝突的情況,在HashMap中處理衝突是使用數組+鏈表的方式,而在ThreadLocalMap中,可以看到直接使用nextIndex,進行遍歷操作,明顯性能更差;下面再看一下get方法:
public T get() {
Thread t = Thread.currentThread();
ThreadLocalMap map = getMap(t);
if (map != null) {
ThreadLocalMap.Entry e = map.getEntry(this);
if (e != null) {
@SuppressWarnings("unchecked")
T result = (T)e.value;
return result;
}
}
return setInitialValue();
}
同樣是先獲取當前線程,然後獲取當前線程中的ThreadLocalMap,然後以當前的ThreadLocal爲key,到ThreadLocalMap中獲取value:
private Entry getEntry(ThreadLocal<?> key) {
int i = key.threadLocalHashCode & (table.length - 1);
Entry e = table[i];
if (e != null && e.get() == key)
return e;
else
return getEntryAfterMiss(key, i, e);
}
private Entry getEntryAfterMiss(ThreadLocal<?> key, int i, Entry e) {
Entry[] tab = table;
int len = tab.length;
while (e != null) {
ThreadLocal<?> k = e.get();
if (k == key)
return e;
if (k == null)
expungeStaleEntry(i);
else
i = nextIndex(i, len);
e = tab[i];
}
return null;
}
同set方式,通過取模獲取數組下標,如果沒有衝突直接返回數據,否則同樣出現遍歷的情況;所以通過分析可以大致知道以下幾個問題:
1.ThreadLocalMap是存放在Thread下面的,ThreadLocal作爲key,所以多個線程操作同一個ThreadLocal其實就是在每個線程的ThreadLocalMap中插入的一條記錄,不存在任何衝突問題;
2.ThreadLocalMap在解決衝突時,通過遍歷的方式,非常影響性能;
3.FastThreadLocal通過其他方式解決衝突的問題,達到性能的優化;
下面繼續來看一下FastThreadLocal是通過何種方式達到性能的優化。
爲什麼Netty的FastThreadLocal速度快
Netty中分別提供了FastThreadLocal和FastThreadLocalThread兩個類,FastThreadLocalThread繼承於Thread,下面同樣對常用的set和get方法來進行源碼分析:
public final void set(V value) {
if (value != InternalThreadLocalMap.UNSET) {
set(InternalThreadLocalMap.get(), value);
} else {
remove();
}
}
public final void set(InternalThreadLocalMap threadLocalMap, V value) {
if (value != InternalThreadLocalMap.UNSET) {
if (threadLocalMap.setIndexedVariable(index, value)) {
addToVariablesToRemove(threadLocalMap, this);
}
} else {
remove(threadLocalMap);
}
}
此處首先對value進行判定是否爲InternalThreadLocalMap.UNSET,然後同樣使用了一個InternalThreadLocalMap用來存放數據:
public static InternalThreadLocalMap get() {
Thread thread = Thread.currentThread();
if (thread instanceof FastThreadLocalThread) {
return fastGet((FastThreadLocalThread) thread);
} else {
return slowGet();
}
}
private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {
InternalThreadLocalMap threadLocalMap = thread.threadLocalMap();
if (threadLocalMap == null) {
thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap());
}
return threadLocalMap;
}
可以發現InternalThreadLocalMap同樣存放在FastThreadLocalThread中,不同在於,不是使用ThreadLocal對應的hash值取模獲取位置,而是直接使用FastThreadLocal的index屬性,index在實例化時被初始化:
private final int index;
public FastThreadLocal() {
index = InternalThreadLocalMap.nextVariableIndex();
}
再進入nextVariableIndex方法中:
static final AtomicInteger nextIndex = new AtomicInteger();
public static int nextVariableIndex() {
int index = nextIndex.getAndIncrement();
if (index < 0) {
nextIndex.decrementAndGet();
throw new IllegalStateException("too many thread-local indexed variables");
}
return index;
}
在InternalThreadLocalMap中存在一個靜態的nextIndex對象,用來生成數組下標,因爲是靜態的,所以每個FastThreadLocal生成的index是連續的,再看一下InternalThreadLocalMap中是如何setIndexedVariable的:
public boolean setIndexedVariable(int index, Object value) {
Object[] lookup = indexedVariables;
if (index < lookup.length) {
Object oldValue = lookup[index];
lookup[index] = value;
return oldValue == UNSET;
} else {
expandIndexedVariableTableAndSet(index, value);
return true;
}
}
indexedVariables是一個對象數組,用來存放value;直接使用index作爲數組下標進行存放;如果index大於數組長度,進行擴容;get方法直接通過FastThreadLocal中的index進行快速讀取:
public final V get(InternalThreadLocalMap threadLocalMap) {
Object v = threadLocalMap.indexedVariable(index);
if (v != InternalThreadLocalMap.UNSET) {
return (V) v;
}
return initialize(threadLocalMap);
}
public Object indexedVariable(int index) {
Object[] lookup = indexedVariables;
return index < lookup.length? lookup[index] : UNSET;
}
直接通過下標進行讀取,速度非常快;但是這樣會有一個問題,可能會造成空間的浪費;
總結
通過以上分析我們可以知道在有大量的ThreadLocal進行讀寫操作的時候,纔可能會遇到性能問題;另外FastThreadLocal通過空間換取時間的方式來達到O(1)讀取數據;還有一個疑問就是內部爲什麼不直接使用HashMap(數組+黑紅樹)來代替ThreadLocalMap。