- java多線程與高併發①volatile關鍵字的字節碼原語
- java多線程與高併發②synchronized與volatile的硬件級實現
- java多線程與高併發③無鎖、偏向鎖、輕量級鎖、重量級鎖升級過程
- java多線程與高併發④內存屏障的基本概念
- java多線程與高併發⑤使用線程池的好與不好
- java多線程與高併發⑥爲什麼阿里開發手冊建議自定義線程池
- java多線程與高併發⑦自定義線程池的最佳實踐
- java多線程與高併發⑧常見線程池類型與應用場景
- java多線程與高併發⑨JVM規範如何要求內存屏障
- java多線程與高併發⑩比線程更牛X的線程,壓測結果展現纖程的威力
多線程與高併發大概講六大塊
第一:基本的概念,從什麼是線程開始
第二:JUC同步工具,就是各種同步鎖
第三:同步容器
第四:線程池
第五:高頻面試加分項的一些面試用的東西,包括纖程
第六:Disruptor,不知道有多少同學聽說過這個框架的,這個框架它也是一個MQ框架(Message Queue)叫做消息隊列,消息隊列非常多,後面還會給大家講Kafka、RabbitMQ,Redis等這些都是消息隊列。Disruptor是目前大家公認的在單機環境上效率最高的、性能最快的MQ。
- 線程的基本概念
- volatile與CAS
- Atomic類和線程同步新機制
- LockSupport、淘寶面試題與源碼閱讀方法論
- AQS源碼閱讀與強軟弱虛4種引用以及ThreadLocal原理與源碼
- 併發容器
- 線程池
- 線程池與源碼閱讀
- JMH與Disruptor
需要獲取這份文檔的朋友:轉發文章並關注我,後臺私信【馬士兵】即可免費獲取
CAS
Compare And Swap (Compare And Exchange) / 自旋 / 自旋鎖 / 無鎖
因爲經常配合循環操作,直到完成爲止,所以泛指一類操作
cas(v, a, b) ,變量v,期待值a, 修改值b
ABA問題,你的女朋友在離開你的這段兒時間經歷了別的人,自旋就是你空轉等待,一直等到她接納你爲止
解決辦法(版本號 AtomicStampedReference),基礎類型簡單值不需要版本號
Unsafe
AtomicInteger:
public final int incrementAndGet() {
for (;;) {
int current = get();
int next = current + 1;
if (compareAndSet(current, next))
return next;
}
}
public final boolean compareAndSet(int expect, int update) {
return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
}
Unsafe:
public final native boolean compareAndSwapInt(Object var1, long var2, int var4, int var5);
運用:
package com.mashibing.jol;
import sun.misc.Unsafe;
import java.lang.reflect.Field;
public class T02_TestUnsafe {
int i = 0;
private static T02_TestUnsafe t = new T02_TestUnsafe();
public static void main(String[] args) throws Exception {
//Unsafe unsafe = Unsafe.getUnsafe();
Field unsafeField = Unsafe.class.getDeclaredFields()[0];
unsafeField.setAccessible(true);
Unsafe unsafe = (Unsafe) unsafeField.get(null);
Field f = T02_TestUnsafe.class.getDeclaredField("i");
long offset = unsafe.objectFieldOffset(f);
System.out.println(offset);
boolean success = unsafe.compareAndSwapInt(t, offset, 0, 1);
System.out.println(success);
System.out.println(t.i);
//unsafe.compareAndSwapInt()
}
}
jdk8u: unsafe.cpp:
cmpxchg = compare and exchange
UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapInt(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jint e, jint x))
UnsafeWrapper("Unsafe_CompareAndSwapInt");
oop p = JNIHandles::resolve(obj);
jint* addr = (jint *) index_oop_from_field_offset_long(p, offset);
return (jint)(Atomic::cmpxchg(x, addr, e)) == e;
UNSAFE_END
jdk8u: atomic_linux_x86.inline.hpp
is_MP = Multi Processor
inline jint Atomic::cmpxchg (jint exchange_value, volatile jint* dest, jint compare_value) {
int mp = os::is_MP();
__asm__ volatile (LOCK_IF_MP(%4) "cmpxchgl %1,(%3)"
: "=a" (exchange_value)
: "r" (exchange_value), "a" (compare_value), "r" (dest), "r" (mp)
: "cc", "memory");
return exchange_value;
}
jdk8u: os.hpp is_MP()
static inline bool is_MP() {
// During bootstrap if _processor_count is not yet initialized
// we claim to be MP as that is safest. If any platform has a
// stub generator that might be triggered in this phase and for
// which being declared MP when in fact not, is a problem - then
// the bootstrap routine for the stub generator needs to check
// the processor count directly and leave the bootstrap routine
// in place until called after initialization has ocurred.
return (_processor_count != 1) || AssumeMP;
}
jdk8u: atomic_linux_x86.inline.hpp
#define LOCK_IF_MP(mp) "cmp $0, " #mp "; je 1f; lock; 1: "
最終實現:
cmpxchg = cas修改變量值
lock cmpxchg 指令
硬件:
lock指令在執行後面指令的時候鎖定一個北橋信號
(不採用鎖總線的方式)
markword
工具:JOL = Java Object Layout
<dependencies>
<!-- https://mvnrepository.com/artifact/org.openjdk.jol/jol-core -->
<dependency>
<groupId>org.openjdk.jol</groupId>
<artifactId>jol-core</artifactId>
<version>0.9</version>
</dependency>
</dependencies>
jdk8u: markOop.hpp
// Bit-format of an object header (most significant first, big endian layout below):
//
// 32 bits:
// --------
// hash:25 ------------>| age:4 biased_lock:1 lock:2 (normal object)
// JavaThread*:23 epoch:2 age:4 biased_lock:1 lock:2 (biased object)
// size:32 ------------------------------------------>| (CMS free block)
// PromotedObject*:29 ---------->| promo_bits:3 ----->| (CMS promoted object)
//
// 64 bits:
// --------
// unused:25 hash:31 -->| unused:1 age:4 biased_lock:1 lock:2 (normal object)
// JavaThread*:54 epoch:2 unused:1 age:4 biased_lock:1 lock:2 (biased object)
// PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object)
// size:64 ----------------------------------------------------->| (CMS free block)
//
// unused:25 hash:31 -->| cms_free:1 age:4 biased_lock:1 lock:2 (COOPs && normal object)
// JavaThread*:54 epoch:2 cms_free:1 age:4 biased_lock:1 lock:2 (COOPs && biased object)
// narrowOop:32 unused:24 cms_free:1 unused:4 promo_bits:3 ----->| (COOPs && CMS promoted object)
// unused:21 size:35 -->| cms_free:1 unused:7 ------------------>| (COOPs && CMS free block)
synchronized的橫切面詳解
- synchronized原理
- 升級過程
- 彙編實現
- vs reentrantLock的區別
java源碼層級
synchronized(o)
字節碼層級
monitorenter moniterexit
JVM層級(Hotspot)
package com.mashibing.insidesync;
import org.openjdk.jol.info.ClassLayout;
public class T01_Sync1 {
public static void main(String[] args) {
Object o = new Object();
System.out.println(ClassLayout.parseInstance(o).toPrintable());
}
}
com.mashibing.insidesync.T01_Sync1$Lock object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00 (00000101 00000000 00000000 00000000) (5)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 49 ce 00 20 (01001001 11001110 00000000 00100000) (536923721)
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
com.mashibing.insidesync.T02_Sync2$Lock object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 90 2e 1e (00000101 10010000 00101110 00011110) (506368005)
4 4 (object header) 1b 02 00 00 (00011011 00000010 00000000 00000000) (539)
8 4 (object header) 49 ce 00 20 (01001001 11001110 00000000 00100000) (536923721)
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes tota
InterpreterRuntime:: monitorenter方法
IRT_ENTRY_NO_ASYNC(void, InterpreterRuntime::monitorenter(JavaThread* thread, BasicObjectLock* elem))
#ifdef ASSERT
thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
if (PrintBiasedLockingStatistics) {
Atomic::inc(BiasedLocking::slow_path_entry_count_addr());
}
Handle h_obj(thread, elem->obj());
assert(Universe::heap()->is_in_reserved_or_null(h_obj()),
"must be NULL or an object");
if (UseBiasedLocking) {
// Retry fast entry if bias is revoked to avoid unnecessary inflation
ObjectSynchronizer::fast_enter(h_obj, elem->lock(), true, CHECK);
} else {
ObjectSynchronizer::slow_enter(h_obj, elem->lock(), CHECK);
}
assert(Universe::heap()->is_in_reserved_or_null(elem->obj()),
"must be NULL or an object");
#ifdef ASSERT
thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
IRT_END
synchronizer.cpp
revoke_and_rebias
void ObjectSynchronizer::fast_enter(Handle obj, BasicLock* lock, bool attempt_rebias, TRAPS) {
if (UseBiasedLocking) {
if (!SafepointSynchronize::is_at_safepoint()) {
BiasedLocking::Condition cond = BiasedLocking::revoke_and_rebias(obj, attempt_rebias, THREAD);
if (cond == BiasedLocking::BIAS_REVOKED_AND_REBIASED) {
return;
}
} else {
assert(!attempt_rebias, "can not rebias toward VM thread");
BiasedLocking::revoke_at_safepoint(obj);
}
assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
}
slow_enter (obj, lock, THREAD) ;
}
void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) {
markOop mark = obj->mark();
assert(!mark->has_bias_pattern(), "should not see bias pattern here");
if (mark->is_neutral()) {
// Anticipate successful CAS -- the ST of the displaced mark must
// be visible <= the ST performed by the CAS.
lock->set_displaced_header(mark);
if (mark == (markOop) Atomic::cmpxchg_ptr(lock, obj()->mark_addr(), mark)) {
TEVENT (slow_enter: release stacklock) ;
return ;
}
// Fall through to inflate() ...
} else
if (mark->has_locker() && THREAD->is_lock_owned((address)mark->locker())) {
assert(lock != mark->locker(), "must not re-lock the same lock");
assert(lock != (BasicLock*)obj->mark(), "don't relock with same BasicLock");
lock->set_displaced_header(NULL);
return;
}
#if 0
// The following optimization isn't particularly useful.
if (mark->has_monitor() && mark->monitor()->is_entered(THREAD)) {
lock->set_displaced_header (NULL) ;
return ;
}
#endif
// The object header will never be displaced to this lock,
// so it does not matter what the value is, except that it
// must be non-zero to avoid looking like a re-entrant lock,
// and must not look locked either.
lock->set_displaced_header(markOopDesc::unused_mark());
ObjectSynchronizer::inflate(THREAD, obj())->enter(THREAD);
}
inflate方法:膨脹爲重量級鎖
鎖升級過程
JDK8 markword實現表:
無鎖 - 偏向鎖 - 輕量級鎖 (自旋鎖,自適應自旋)- 重量級鎖
synchronized優化的過程和markword息息相關
用markword中最低的三位代表鎖狀態 其中1位是偏向鎖位 兩位是普通鎖位
- Object o = new Object() 鎖 = 0 01 無鎖態
- o.hashCode() 001 + hashcode00000001 10101101 00110100 00110110 01011001 00000000 00000000 00000000 little endian big endian00000000 00000000 00000000 01011001 00110110 00110100 10101101 00000000
- 默認synchronized(o) 00 -> 輕量級鎖 默認情況 偏向鎖有個時延,默認是4秒 why? 因爲JVM虛擬機自己有一些默認啓動的線程,裏面有好多sync代碼,這些sync代碼啓動時就知道肯定會有競爭,如果使用偏向鎖,就會造成偏向鎖不斷的進行鎖撤銷和鎖升級的操作,效率較低。-XX:BiasedLockingStartupDelay=0
- 如果設定上述參數 new Object () - > 101 偏向鎖 ->線程ID爲0 -> Anonymous BiasedLock 打開偏向鎖,new出來的對象,默認就是一個可偏向匿名對象101
- 如果有線程上鎖 上偏向鎖,指的就是,把markword的線程ID改爲自己線程ID的過程 偏向鎖不可重偏向 批量偏向 批量撤銷
- 如果有線程競爭 撤銷偏向鎖,升級輕量級鎖 線程在自己的線程棧生成LockRecord ,用CAS操作將markword設置爲指向自己這個線程的LR的指針,設置成功者得到鎖
- 如果競爭加劇 競爭加劇:有線程超過10次自旋, -XX:PreBlockSpin, 或者自旋線程數超過CPU核數的一半, 1.6之後,加入自適應自旋 Adapative Self Spinning , JVM自己控制 升級重量級鎖:-> 向操作系統申請資源,linux mutex , CPU從3級-0級系統調用,線程掛起,進入等待隊列,等待操作系統的調度,然後再映射回用戶空間
(以上實驗環境是JDK11,打開就是偏向鎖,而JDK8默認對象頭是無鎖)
偏向鎖默認是打開的,但是有一個時延,如果要觀察到偏向鎖,應該設定參數
沒錯,我就是廁所所長
加鎖,指的是鎖定對象
鎖升級的過程
JDK較早的版本 OS的資源 互斥量 用戶態 -> 內核態的轉換 重量級 效率比較低
現代版本進行了優化
無鎖 - 偏向鎖 -輕量級鎖(自旋鎖)-重量級鎖
偏向鎖 - markword 上記錄當前線程指針,下次同一個線程加鎖的時候,不需要爭用,只需要判斷線程指針是否同一個,所以,偏向鎖,偏向加鎖的第一個線程 。hashCode備份在線程棧上 線程銷燬,鎖降級爲無鎖
有爭用 - 鎖升級爲輕量級鎖 - 每個線程有自己的LockRecord在自己的線程棧上,用CAS去爭用markword的LR的指針,指針指向哪個線程的LR,哪個線程就擁有鎖
自旋超過10次,升級爲重量級鎖 - 如果太多線程自旋 CPU消耗過大,不如升級爲重量級鎖,進入等待隊列(不消耗CPU)-XX:PreBlockSpin
自旋鎖在 JDK1.4.2 中引入,使用 -XX:+UseSpinning 來開啓。JDK 6 中變爲默認開啓,並且引入了自適應的自旋鎖(適應性自旋鎖)。
自適應自旋鎖意味着自旋的時間(次數)不再固定,而是由前一次在同一個鎖上的自旋時間及鎖的擁有者的狀態來決定。如果在同一個鎖對象上,自旋等待剛剛成功獲得過鎖,並且持有鎖的線程正在運行中,那麼虛擬機就會認爲這次自旋也是很有可能再次成功,進而它將允許自旋等待持續相對更長的時間。如果對於某個鎖,自旋很少成功獲得過,那在以後嘗試獲取這個鎖時將可能省略掉自旋過程,直接阻塞線程,避免浪費處理器資源。
偏向鎖由於有鎖撤銷的過程revoke,會消耗系統資源,所以,在鎖爭用特別激烈的時候,用偏向鎖未必效率高。還不如直接使用輕量級鎖。
synchronized最底層實現
public class T {
static volatile int i = 0;
public static void n() { i++; }
public static synchronized void m() {}
publics static void main(String[] args) {
for(int j=0; j<1000_000; j++) {
m();
n();
}
}
}
java -XX:+UnlockDiagonositicVMOptions -XX:+PrintAssembly T
C1 Compile Level 1 (一級優化)
C2 Compile Level 2 (二級優化)
找到m() n()方法的彙編碼,會看到 lock comxchg .....指令
synchronized vs Lock (CAS)
在高爭用 高耗時的環境下synchronized效率更高
在低爭用 低耗時的環境下CAS效率更高
synchronized到重量級之後是等待隊列(不消耗CPU)
CAS(等待期間消耗CPU)
一切以實測爲準
鎖消除 lock eliminate
public void add(String str1,String str2){
StringBuffer sb = new StringBuffer();
sb.append(str1).append(str2);
}
我們都知道 StringBuffer 是線程安全的,因爲它的關鍵方法都是被 synchronized 修飾過的,但我們看上面這段代碼,我們會發現,sb 這個引用只會在 add 方法中使用,不可能被其它線程引用(因爲是局部變量,棧私有),因此 sb 是不可能共享的資源,JVM 會自動消除 StringBuffer 對象內部的鎖。
鎖粗化 lock coarsening
public String test(String str){
int i = 0;
StringBuffer sb = new StringBuffer():
while(i < 100){
sb.append(str);
i++;
}
return sb.toString():
}
JVM 會檢測到這樣一連串的操作都對同一個對象加鎖(while 循環內 100 次執行 append,沒有鎖粗化的就要進行 100 次加鎖/解鎖),此時 JVM 就會將加鎖的範圍粗化到這一連串的操作的外部(比如 while 虛幻體外),使得這一連串操作只需要加一次鎖即可。
鎖降級(不重要)
https://www.zhihu.com/question/63859501
其實,只被VMThread訪問,降級也就沒啥意義了。所以可以簡單認爲鎖降級不存在!
超線程
一個ALU + 兩組Registers + PC
參考資料
http://openjdk.java.net/groups/hotspot/docs/HotSpotGlossary.html
volatile的用途
1.線程可見性
package com.mashibing.testvolatile;
public class T01_ThreadVisibility {
private static volatile boolean flag = true;
public static void main(String[] args) throws InterruptedException {
new Thread(()-> {
while (flag) {
//do sth
}
System.out.println("end");
}, "server").start();
Thread.sleep(1000);
flag = false;
}
}
2.防止指令重排序
問題:DCL單例需不需要加volatile?
CPU的基礎知識
- 緩存行對齊 緩存行64個字節是CPU同步的基本單位,緩存行隔離會比僞共享效率要高 Disruptorpackage com.mashibing.juc.c_028_FalseSharing; public class T02_CacheLinePadding { private static class Padding { public volatile long p1, p2, p3, p4, p5, p6, p7; // } private static class T extends Padding { public volatile long x = 0L; } public static T[] arr = new T[2]; static { arr[0] = new T(); arr[1] = new T(); } public static void main(String[] args) throws Exception { Thread t1 = new Thread(()->{ for (long i = 0; i < 1000_0000L; i++) { arr[0].x = i; } }); Thread t2 = new Thread(()->{ for (long i = 0; i < 1000_0000L; i++) { arr[1].x = i; } }); final long start = System.nanoTime(); t1.start(); t2.start(); t1.join(); t2.join(); System.out.println((System.nanoTime() - start)/100_0000); } } MESI
- 僞共享
- 合併寫 CPU內部的4個字節的Bufferpackage com.mashibing.juc.c_029_WriteCombining; public final class WriteCombining { private static final int ITERATIONS = Integer.MAX_VALUE; private static final int ITEMS = 1 << 24; private static final int MASK = ITEMS - 1; private static final byte[] arrayA = new byte[ITEMS]; private static final byte[] arrayB = new byte[ITEMS]; private static final byte[] arrayC = new byte[ITEMS]; private static final byte[] arrayD = new byte[ITEMS]; private static final byte[] arrayE = new byte[ITEMS]; private static final byte[] arrayF = new byte[ITEMS]; public static void main(final String[] args) { for (int i = 1; i <= 3; i++) { System.out.println(i + " SingleLoop duration (ns) = " + runCaseOne()); System.out.println(i + " SplitLoop duration (ns) = " + runCaseTwo()); } } public static long runCaseOne() { long start = System.nanoTime(); int i = ITERATIONS; while (--i != 0) { int slot = i & MASK; byte b = (byte) i; arrayA[slot] = b; arrayB[slot] = b; arrayC[slot] = b; arrayD[slot] = b; arrayE[slot] = b; arrayF[slot] = b; } return System.nanoTime() - start; } public static long runCaseTwo() { long start = System.nanoTime(); int i = ITERATIONS; while (--i != 0) { int slot = i & MASK; byte b = (byte) i; arrayA[slot] = b; arrayB[slot] = b; arrayC[slot] = b; } i = ITERATIONS; while (--i != 0) { int slot = i & MASK; byte b = (byte) i; arrayD[slot] = b; arrayE[slot] = b; arrayF[slot] = b; } return System.nanoTime() - start; } }
- 指令重排序package com.mashibing.jvm.c3_jmm; public class T04_Disorder { private static int x = 0, y = 0; private static int a = 0, b =0; public static void main(String[] args) throws InterruptedException { int i = 0; for(;;) { i++; x = 0; y = 0; a = 0; b = 0; Thread one = new Thread(new Runnable() { public void run() { //由於線程one先啓動,下面這句話讓它等一等線程two. 讀着可根據自己電腦的實際性能適當調整等待時間. //shortWait(100000); a = 1; x = b; } }); Thread other = new Thread(new Runnable() { public void run() { b = 1; y = a; } }); one.start();other.start(); one.join();other.join(); String result = "第" + i + "次 (" + x + "," + y + ")"; if(x == 0 && y == 0) { System.err.println(result); break; } else { //System.out.println(result); } } } public static void shortWait(long interval){ long start = System.nanoTime(); long end; do{ end = System.nanoTime(); }while(start + interval >= end); } }
volatile如何解決指令重排序
1: volatile i
2: ACC_VOLATILE
3: JVM的內存屏障
4:hotspot實現
bytecodeinterpreter.cpp
int field_offset = cache->f2_as_index();
if (cache->is_volatile()) {
if (support_IRIW_for_not_multiple_copy_atomic_cpu) {
OrderAccess::fence();
}
orderaccess_linux_x86.inline.hpp
- inline void OrderAccess::fence() { if (os::is_MP()) { // always use locked addl since mfence is sometimes expensive#ifdef AMD64 __asm__ volatile ("lock; addl $0,0(%%rsp)" : : : "cc", "memory");#else __asm__ volatile ("lock; addl $0,0(%%esp)" : : : "cc", "memory");#endif }}
出自:馬士兵Java多線程與高併發
關注我,私信回覆“馬士兵”即可獲取 以下Java多線程與高併發資源
實體書籍
《多線程與高併發》電子版
由於篇幅限制這裏只能給大家把內容部分截取出來,因爲此書籍資料是爲內部資料,需要獲取完整電子版/實體書籍以及實體書籍的讀者朋友們轉發分享此文,後續會告訴您如何獲取
如何獲取?
轉發這篇文章,關注我,私信回覆“馬士兵”即可獲取高清大綱,以上 spring,MyBatis,Netty源碼分析,高併發、高性能、分佈式、微服務架構的原理,JVM性能優化、分佈式架構
如何私信?
關注我後,在手機,點進頭像進我的主頁,主頁上方右上角有個私信,點擊私信,如何回覆關鍵字“馬士兵”即可