synchronized底層原理（從Java對象頭說到即時編譯優化）

要想理解透synchronized，還要從Java對象頭說起。要想能直觀的觀察到內存佈局還要藉助一些工具。

一、兩個好用卻不被熟知的工具

1.1、字節碼查看插件（jclasslib Bytecode viewer）

常規觀看Java類編譯後的class文件的字節碼較爲複雜，需要將java類編譯成class文件，再使用javap -verbose ***.class命令才能查看它的字節碼。

Idea這麼強大，怎麼會沒有插件，插件的名字是jclasslib Bytecode viewer，至於怎麼安裝插件，大家自行百度。

這裏簡單介紹它的使用方式，也很easy，見下圖：

字節碼顯示區中，已將常量池、接口、變量等進行了分類，並且有信息提示、信息關聯，字節碼指令（點擊對應指令還可跳轉Oracle官網虛擬機指令API文檔）。使用起來非常方便，大家慢慢體會。

1.2、Java對象內存佈局查看工具-JOL

JOL是Java Object Layout的縮寫，相信不用翻譯大家，也已知道它的作用。JOL就是OpenJdk提供的一款小工具，傳送門。使用方式如下：

引入JOL的maven依賴

<!-- https://mvnrepository.com/artifact/org.openjdk.jol/jol-core -->
<dependency>
	<groupId>org.openjdk.jol</groupId>
	<artifactId>jol-core</artifactId>
	<version>0.9</version>
</dependency>

編寫程序調用即可

private static void main(String[] args){
	Object obj = new Object();
    String layout = ClassLayout.parseInstance(obj).toPrintable();
    System.out.println(layout);
}

打印結果如下：

java.lang.Object object internals:
 OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
      0     4        (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4        (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4        (object header)                           28 0f b3 1a (00101000 00001111 10110011 00011010) (447942440)
     12     4        (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

打印結果是一道高級的面試題哦：Object obj = new Object()初始化出的obj對象，在內存中佔用多少字節？大家還可嘗試聲明一個類，分別加上boolean、Boolean、int、Integer、數組、引用對象等成員變量，打印出的結果便可觀看出該類型在Java中到底佔多少字節。

更深入的用法等大家自行去深究。

二、Java對象在內存中的存儲佈局

這部分內容還是單獨拎出來做一個介紹，因爲synchronized鎖會用到這部分知識。

2.1、理論

在HotSpot的虛擬機中，Java對象在內存中的存儲佈局總體分爲3塊區域：對象頭(object header)、實例數據（instance data）、和對齊填充（Padding）。

下圖是普通對象實例與數組對象實例的數據結構，其中數組長度爲數組對象時纔會有的對象頭。

2.2、實踐

我們就通過1.2章中介紹的JOL工具進行查看

private static void main(String[] args){
	// User中加入成員變量，觀察它的內存佈局，此時會看到實例數據部分的內容
    User obj1 = new User();
    String layout1 = ClassLayout.parseInstance(obj1).toPrintable();
    System.out.println(layout1);
    
    // User數組對象，觀察它的內存佈局，此時會看到數組數據部分的內容
    // 數組數據所佔字節數 = 數組長度 * 4；下例中長度爲：5 * 4 = 20字節
    User[] obj2 = new User[5];
    String layout2 = ClassLayout.parseInstance(obj2).toPrintable();
    System.out.println(layout2);
}

上述程序的執行結果就不佔用文章內容貼出了，動手複製過去自己看下結果，並把玩一下。

2.3、MarkWord淺析及鎖膨脹過程

對象頭中的MarkWord用於存儲對象本身的運行時數據，記錄了對象的哈希碼、鎖和GC標記等相關信息。當使用synchronized關鍵字加鎖時，圍繞同步鎖的一系列過程均和Mark Word有關。這也是爲何會介紹內存存儲佈局的原因所在。

在jdk的源碼openjdk中的個人下載路徑\openjdk\hotspot\src\share\vm\oops路徑下有markOop.hpp的C++頭文件，裏面有這樣一段註釋：

// Bit-format of an object header (most significant first, big endian layout below):
//
//  32 bits:
//  --------
//             hash:25 ------------>| age:4    biased_lock:1 lock:2 (normal object)
//             JavaThread*:23 epoch:2 age:4    biased_lock:1 lock:2 (biased object)
//             size:32 ------------------------------------------>| (CMS free block)
//             PromotedObject*:29 ---------->| promo_bits:3 ----->| (CMS promoted object)
//
//  64 bits:
//  --------
//  unused:25 hash:31 -->| unused:1   age:4    biased_lock:1 lock:2 (normal object)
//  JavaThread*:54 epoch:2 unused:1   age:4    biased_lock:1 lock:2 (biased object)
//  PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object)
//  size:64 ----------------------------------------------------->| (CMS free block)
//
//  unused:25 hash:31 -->| cms_free:1 age:4    biased_lock:1 lock:2 (COOPs && normal object)
//  JavaThread*:54 epoch:2 cms_free:1 age:4    biased_lock:1 lock:2 (COOPs && biased object)
//  narrowOop:32 unused:24 cms_free:1 unused:4 promo_bits:3 ----->| (COOPs && CMS promoted object)
//  unused:21 size:35 -->| cms_free:1 unused:7 ------------------>| (COOPs && CMS free block)

MarkWord在32位的JVM中是32bit，在64位中是64bit。但是對於鎖狀態的存儲內容都是一致的。我們拿相對簡潔的32位JVM中的存儲舉例，MarkWord中的具體組成，如下圖：

其中2bit的鎖標誌位表示鎖的狀態，1bit的偏向鎖標誌位表示是否偏向。

當對象初始化後，還未有任何線程來競爭，此時爲無鎖狀態。其中鎖標誌位爲01，偏向鎖標誌位爲0
當有一個線程來競爭鎖，鎖對象第一次被線程獲取時，鎖標誌位依然爲01，偏向鎖標誌位會被置爲1，此時鎖進入偏向模式。同時，使用CAS操作將此獲取鎖對象的線程ID設置到鎖對象的Mark Word中，持有偏向鎖，下次再可直接進入。
此時，線程B嘗試獲取鎖，發現鎖處於偏向模式，但Mark Word中存儲的不是本線程ID。那麼線程B使用CAS操作嘗試獲取鎖，這時鎖是有可能獲取成功的，因爲上一個持有偏向鎖的線程不會主動釋放偏向鎖。如果線程B獲取鎖成功，則會將Mark Word中的線程ID設置爲本線程的ID。但若線程B獲取鎖失敗，則會執行下述操作。
偏向鎖搶佔失敗，表明鎖對象存在競爭，則會先撤銷偏向模式，偏向鎖標誌位重新被置爲0，準備升級輕量級鎖。首先將在當前線程的幀棧中開闢一塊鎖記錄空間（Lock Record），用於存儲鎖對象當前的Mark Word拷貝。然後，使用CAS操作嘗試把鎖對象的Mark Word更新爲指向幀棧中Lock Record的指針，CAS操作成功，則代表獲取到鎖，同時將鎖標誌位設置爲00，進入輕量級鎖模式。若CAS操作失敗，則進入下述操作。
剛一出現CAS競爭輕量級鎖失敗時，不會立即膨脹爲重量級鎖，而是採用自旋的方式，不斷重試，嘗試搶鎖。JDK1.6中，默認開啓自旋，自旋10次，可通過-XX:PreBlockSpin更改自旋次數。JDK1.6對於只能指定固定次數的自旋進行了優化，採用了自適應的自旋，重試機制更加智能。
只有通過自旋依然獲取不到鎖的情況，表明鎖競爭較爲激烈，不再適合額外的CAS操作消耗CPU資源，則直接膨脹爲重量級鎖，鎖標誌位設置爲10。在此狀態下，所有等待鎖的線程都必須進入阻塞狀態。（打個廣告：對於線程的狀態，推薦大家看下我的另外一篇文章：脫掉Java線程狀態的衣服）

針對上述的步驟不瞭解沒關係，看完後面的介紹，回過頭來再反覆品一品。

2.4、指針壓縮（-XX:+UseCompressedClassPointers 和-XX:+UseCompressedOops）

這裏會引申出“指針壓縮”的概念，以及可能會看到的兩個JVM的參數-XX:+UseCompressedClassPointers和-XX:+UseCompressedOops，這裏做一個簡介，並用實驗的方式解釋清楚它們的含義。

**指針壓縮：**JVM最初是32位的，隨着64位系統的興起，JVM也迎來了從32位到64位的轉換，32位的JVM對比64位的內存容量比較有限。但是使用64位虛擬機的同時，帶來一個問題，64位下的JVM中的對象指針佔用內存會比32位的多1.5倍，這是我們不希望看到的。於是在JDK1.6時，引入了指針壓縮。

**-XX:+UseCompressedClassPointers參數：**啓用類指針(類元數據的指針)壓縮。

**-XX:+UseCompressedOops參數：**啓用普通對象指針壓縮。Oops縮寫於：ordinary object pointers

-XX:+UseCompressedClassPointers和-XX:+UseCompressedOops在Jdk1.8中默認開啓，可用java -XX:+PrintCommandLineFlags -version此條命令進行檢測：

+UseCompressedClassPointers和+UseCompressedOops參數中的+號代表開啓參數，-號代表關閉參數。下面例子中會使用-號來關閉參數。通過在Idea中編輯jvm參數，來用實踐去檢驗這兩個參數的開和關對內存佈局的影響。

我們使用四組不同Vm options來跑下面的小Demo：

-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+PrintCommandLineFlags
-XX:-UseCompressedClassPointers -XX:+UseCompressedOops -XX:+PrintCommandLineFlags
-XX:+UseCompressedClassPointers -XX:-UseCompressedOops -XX:+PrintCommandLineFlags
-XX:-UseCompressedClassPointers -XX:-UseCompressedOops -XX:+PrintCommandLineFlags

public class HelloJOL {

    private boolean flag1 = true;
    private Boolean flag2 = true;
    private int x = 0;
    private Integer y = 0;
    private String str = "";
    private int[] arrInt = new int[10];
    private String[] arrStr;

    public static void main(String[] args) {
        HelloJOL o = new HelloJOL();     
        String layout = ClassLayout.parseInstance(o).toPrintable();
        System.out.println(layout);
    }
}

通過仔細觀察打印結果，會得出如下結論：

-XX:+UseCompressedClassPointers -XX:+UseCompressedOops

對象頭的大小爲12字節，其中8字節的markword + 4字節的class pointer
-XX:-UseCompressedClassPointers -XX:+UseCompressedOops

僅關閉類指針壓縮

對象頭的大小爲16字節，其中8字節的markword + 8字節的class pointer

說明：64位機器中UseCompressedClassPointers會將class pointer類指針從8字節壓縮至4字節
-XX:+UseCompressedClassPointers -XX:-UseCompressedOops

僅關閉普通對象指針

通過-XX:+PrintCommandLineFlags打印的jvm參數會發現，這時UseCompressedClassPointers會被系統默認關閉（雖然你沒有設置）；

對象頭的大小爲16字節。因爲類指針壓縮被級聯關閉；

boolean、int等基礎類型的屬性的大小不變，依然爲1、4字節，但是Boolean、Integer、String、數組等類型的屬性，佔用大小由4字節變成了8字節。
-XX:-UseCompressedClassPointers -XX:-UseCompressedOops

同時關閉類指針壓縮和普通對象指針壓縮，效果同同實驗3

三、synchronized詳解

3.1、Java源碼和字節碼層級的synchronized

此篇文章基於對synchronized關鍵字和用法有初步的理解，不再進行基礎知識的科普。不瞭解的先去學習一下：傳送門

首先，我們都知道synchronized關鍵字既可以修飾方法（靜態和非靜態），也可以修飾代碼塊。

非靜態方法：針對當前實例加鎖
靜態方法：作用於當前類加鎖
修飾代碼塊：指定加鎖對象，既可針對類加鎖，也可針對實例對象加鎖。

 public static synchronized void methodA() {
        // 修飾靜態方法，執行前必須先獲取當前類的鎖
    }

    public synchronized void methodB() {
        // 修飾非靜態方法，執行前必須先獲取當前實例對象的鎖
    }

    Object lock =  new Object();
    public void methodC() {
        synchronized (lock) {
            // 同步塊，執行前必須先獲取lock實例對象的鎖
        }
    }

    public void methodD() {
        synchronized (Object.class) {
            // 同步塊，執行前必須先獲取Object類鎖
        }
    }

對此段代碼進行編譯，查看字節碼文件。jvm對於synchronized關鍵字既可以修飾方法和修飾代碼塊的實現是不同的：

修飾方法：方法的訪問標誌flags中增加了ACC_SYNCHRONIZED標記。用來告訴JVM這是一個同步方法，在進入該方法之前需要獲取相應的鎖。
修飾代碼塊：方法的code中，會產生mointerenter和mointerexit指令，由monitorenter指令進入，然後monitorexit釋放鎖

3.2、JVM層級的synchronized`重點`

 public static void main(String[] args) {

        Object lock = new Object();

        System.out.println("加鎖前**********************");
        String layout0 = ClassLayout.parseInstance(lock).toPrintable();
        System.out.println(layout0);

        System.out.println("***********加鎖時***********");
        synchronized (lock) {
            // -XX:BiasedLockingStartupDelay=0 偏向鎖延時
            String layout1 = ClassLayout.parseInstance(lock).toPrintable();
            System.out.println(layout1);
        }

        System.out.println("*********************釋放鎖後*");
        String layout2 = ClassLayout.parseInstance(lock).toPrintable();
        System.out.println(layout2);
    }

通過我們在1.2章中介紹的JOL工具查看一下，加鎖前、加鎖時、釋放鎖後對象頭，都有什麼樣的變化，jdk版本不同，看的結果會不大相同。但是肯定會看到對象的markword發生了一定的變化。

在上面，我們已經介紹過synchronized修飾代碼塊時，會產生mointerenter和mointerexit指令。那麼，jvm是如何通過這兩個指令來搞定加鎖的呢？下面我們一步步跟蹤openjdk源碼中，如何實現的mointerenter和mointerexit。

我使用的是openjdk8，附百du雲盤下載鏈接:https://pan.baidu.com/s/1ZFQLurrriyUzyS78_SwcXw 密碼:aeqm

3.2.1 jdk源碼中mointerenter和mointerexit

openjdk根路徑/hotspot/src/share/vm/interpreter路徑下的interpreterRuntime.cpp文件中對mointerenter和mointerexit的定義：

// 解釋器的同步代碼被分解出來，以便方法調用和同步快可以共享使用
// The interpreter's synchronization code is factored out so that it can
// be shared by method invocation and synchronized blocks.
//%note synchronization_3

//%note monitor_1 monitorenter同步鎖加鎖方法
IRT_ENTRY_NO_ASYNC(void, InterpreterRuntime::monitorenter(JavaThread* thread, BasicObjectLock* elem))
#ifdef ASSERT
  thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
  if (PrintBiasedLockingStatistics) { // 打印偏向鎖的統計
    Atomic::inc(BiasedLocking::slow_path_entry_count_addr());
  }
  Handle h_obj(thread, elem->obj());
  assert(Universe::heap()->is_in_reserved_or_null(h_obj()),
         "must be NULL or an object");
  if (UseBiasedLocking) { // 如果開啓了偏向模式
    // Retry fast entry if bias is revoked to avoid unnecessary inflation
	// 請快速重試進入，如果偏向鎖被取消以避免不必要的膨脹
    ObjectSynchronizer::fast_enter(h_obj, elem->lock(), true, CHECK);
  } else {
	// 沒開啓偏向模式的，則調用slow_enter方法進入輕/重量級鎖
    ObjectSynchronizer::slow_enter(h_obj, elem->lock(), CHECK);
  }
  assert(Universe::heap()->is_in_reserved_or_null(elem->obj()),
         "must be NULL or an object");
#ifdef ASSERT
  thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
IRT_END


//%note monitor_1  monitorexit同步鎖的釋放鎖方法
IRT_ENTRY_NO_ASYNC(void, InterpreterRuntime::monitorexit(JavaThread* thread, BasicObjectLock* elem))
#ifdef ASSERT
  thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
  Handle h_obj(thread, elem->obj());
  assert(Universe::heap()->is_in_reserved_or_null(h_obj()),
         "must be NULL or an object");
  if (elem == NULL || h_obj()->is_unlocked()) {
    THROW(vmSymbols::java_lang_IllegalMonitorStateException());
  }
  ObjectSynchronizer::slow_exit(h_obj(), elem->lock(), thread);
  // Free entry. This must be done here, since a pending exception might be installed on
  // exit. If it is not cleared, the exception handling code will try to unlock the monitor again.
  elem->set_obj(NULL);
#ifdef ASSERT
  thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
IRT_END

3.2.2 jdk源碼中fast_enter和slow_enter方法

openjdk根路徑/hotspot/src/share/vm/runtime/synchronizer.cpp路徑下的synchronized.cpp文件中對fast_enter和slow_enter的定義，仔細閱讀並結合本文2.3章中於鎖膨脹過程的介紹，會對加鎖、鎖膨脹、釋放鎖的過程有更清晰的認識。本文2.3章內容一定要反覆看，反覆品！！！

// -----------------------------------------------------------------------------
// Monitor快速Enter/Exit的方法，解釋器和編譯器使用了一些彙編語言在其中。如果一下的函數被更改，請確保更新他們。實現方式對竟態條件及其敏感，務必小心。
//  Fast Monitor Enter/Exit
// This the fast monitor enter. The interpreter and compiler use
// some assembly copies of this code. Make sure update those code
// if the following function is changed. The implementation is
// extremely sensitive to race condition. Be careful.

void ObjectSynchronizer::fast_enter(Handle obj, BasicLock* lock, bool attempt_rebias, TRAPS) {
 if (UseBiasedLocking) {// 又判斷了一遍是否使用偏向模式
    if (!SafepointSynchronize::is_at_safepoint()) {// 確保當前不在安全點
	  // 偏向鎖加鎖：revoke_and_rebias
      BiasedLocking::Condition cond = BiasedLocking::revoke_and_rebias(obj, attempt_rebias, THREAD);
      if (cond == BiasedLocking::BIAS_REVOKED_AND_REBIASED) {
        return;
      }
    } else {
      assert(!attempt_rebias, "can not rebias toward VM thread");
      BiasedLocking::revoke_at_safepoint(obj);
    }
    assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
 }
 // 快速加鎖未成功時，採用慢加鎖的方式
 slow_enter (obj, lock, THREAD) ;
}

void ObjectSynchronizer::fast_exit(oop object, BasicLock* lock, TRAPS) {
  // 從下面這個斷言遍可得知：偏向鎖不會進入快鎖解鎖方法。
  assert(!object->mark()->has_bias_pattern(), "should not see bias pattern here");
  // displaced header是升級輕量級鎖過程中，用於存儲鎖對象MarkWord的拷貝，官方爲這份拷貝加了一個Displaced前綴。可參考：《深入理解Java虛擬機》第三版482頁的介紹。
  // 如果displaced header是空，先前的加鎖便是重量級鎖
  // if displaced header is null, the previous enter is recursive enter, no-op
  markOop dhw = lock->displaced_header();
  markOop mark ;
  if (dhw == NULL) {
     // Recursive stack-lock. 遞歸堆棧鎖
     // Diagnostics -- Could be: stack-locked, inflating, inflated. 斷定應該是：堆棧鎖、膨脹中、已膨脹（重量級鎖）
     mark = object->mark() ;
     assert (!mark->is_neutral(), "invariant") ;
     if (mark->has_locker() && mark != markOopDesc::INFLATING()) {
        assert(THREAD->is_lock_owned((address)mark->locker()), "invariant") ;
     }
     if (mark->has_monitor()) {
        ObjectMonitor * m = mark->monitor() ;
        assert(((oop)(m->object()))->mark() == mark, "invariant") ;
        assert(m->is_entered(THREAD), "invariant") ;
     }
     return ;
  }

  mark = object->mark() ; // 鎖對象頭的MarkWord

  // 此處爲輕量級鎖的釋放過程，使用CAS方式解鎖（下述方法中的cmpxchg_ptr即CAS操作）。
  // 如果對象被當前線程堆棧鎖定，請嘗試將displaced header和鎖對象中的MarkWord替換回來。
  // If the object is stack-locked by the current thread, try to
  // swing the displaced header from the box back to the mark.
  if (mark == (markOop) lock) {
     assert (dhw->is_neutral(), "invariant") ;
     if ((markOop) Atomic::cmpxchg_ptr (dhw, object->mark_addr(), mark) == mark) {
        TEVENT (fast_exit: release stacklock) ;
        return;
     }
  }

  ObjectSynchronizer::inflate(THREAD, object)->exit (true, THREAD) ;
}

// -----------------------------------------------------------------------------
// Interpreter/Compiler Slow Case
// 解釋器/編譯器慢加鎖的case。常規操作，此時不需使用fast_enter的方式，因爲一定是在解釋器/編譯器已經失敗過了。
// This routine is used to handle interpreter/compiler slow case
// We don't need to use fast path here, because it must have been
// failed in the interpreter/compiler code.
void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) {
  markOop mark = obj->mark();
  assert(!mark->has_bias_pattern(), "should not see bias pattern here");

  if (mark->is_neutral()) {
	// 預期成功的CAS -- 替換標記的ST必須是可見的 <= CAS執行的ST。優先使用輕量級鎖（又叫：自旋鎖）
    // Anticipate successful CAS -- the ST of the displaced mark must
    // be visible <= the ST performed by the CAS.
    lock->set_displaced_header(mark);
    if (mark == (markOop) Atomic::cmpxchg_ptr(lock, obj()->mark_addr(), mark)) {
      TEVENT (slow_enter: release stacklock) ;
      return ;
    }
    // Fall through to inflate() ... 上面沒成功，只能向下執行inflate()鎖膨脹方法了
  } else
  if (mark->has_locker() && THREAD->is_lock_owned((address)mark->locker())) { //當前線程已持有鎖
    assert(lock != mark->locker(), "must not re-lock the same lock");
    assert(lock != (BasicLock*)obj->mark(), "don't relock with same BasicLock");
    lock->set_displaced_header(NULL);
    return;
  }

#if 0
  // The following optimization isn't particularly useful.
  if (mark->has_monitor() && mark->monitor()->is_entered(THREAD)) {
    lock->set_displaced_header (NULL) ;
    return ;
  }
#endif

  // 對象頭將再也不會被移到這個鎖鎖，所以是什麼值並不重要，除非必須是非零的，以避免看起來像是重入鎖，而且也不能看起來是鎖定的。
  // 重量級鎖的mrakword中除了鎖標記位爲10外，另外30位是：指向重量級鎖的指針
  // The object header will never be displaced to this lock,
  // so it does not matter what the value is, except that it
  // must be non-zero to avoid looking like a re-entrant lock,
  // and must not look locked either.
  lock->set_displaced_header(markOopDesc::unused_mark());
  ObjectSynchronizer::inflate(THREAD, obj())->enter(THREAD);
}

// This routine is used to handle interpreter/compiler slow case
// We don't need to use fast path here, because it must have
// failed in the interpreter/compiler code. Simply use the heavy
// weight monitor should be ok, unless someone find otherwise.
void ObjectSynchronizer::slow_exit(oop object, BasicLock* lock, TRAPS) {
  fast_exit (object, lock, THREAD) ;
}

3.2.2 jdk源碼中inflate方法

同樣是synchronized.cpp文件中的方法，兩部分代碼沒挨着，又比較長，分開放了。

// Note that we could encounter some performance loss through false-sharing as
// multiple locks occupy the same $ line.  Padding might be appropriate.
// 注意：當多個鎖併發使用同一 $=行時，錯誤的共享方式可能會導致一些性能損失。填充可能是合適的。


ObjectMonitor * ATTR ObjectSynchronizer::inflate (Thread * Self, oop object) {
  // Inflate mutates the heap ...
  // Relaxing assertion for bug 6320749.
  assert (Universe::verify_in_progress() ||
          !SafepointSynchronize::is_at_safepoint(), "invariant") ;

  for (;;) {
      const markOop mark = object->mark() ;
      assert (!mark->has_bias_pattern(), "invariant") ;

      // The mark can be in one of the following states:
      // *  Inflated     - just return 僅僅返回
      // *  Stack-locked - coerce it to inflated 輕量級鎖，需強迫它膨脹
      // *  INFLATING    - busy wait for conversion to complete 膨脹中，需自旋等待轉換完成
      // *  Neutral中立的 - aggressively inflate the object. 積極地使object發生膨脹
      // *  BIASED       - Illegal.  We should never see this 進入此方法必定不是偏向鎖狀態，直接忽略即可

      // CASE: inflated
      if (mark->has_monitor()) {
          ObjectMonitor * inf = mark->monitor() ;
          assert (inf->header()->is_neutral(), "invariant");
          assert (inf->object() == object, "invariant") ;
          assert (ObjectSynchronizer::verify_objmon_isinpool(inf), "monitor is invalid");
          return inf ;
      }

      // CASE: inflation in progress - inflating over a stack-lock.   鎖膨脹正在進行中，膨脹的堆棧鎖（輕量級鎖）
      // Some other thread is converting from stack-locked to inflated.     其他線程正在從堆棧鎖（輕量級鎖）定轉換爲膨脹。
      // Only that thread can complete inflation -- other threads must wait.  只有那個線程才能完成膨脹——其他線程必須等待。
      // The INFLATING value is transient.                    INFLATING狀態是暫時的
      // Currently, we spin/yield/park and poll the markword, waiting for inflation to finish. 併發地，我們 spin/yield/park和poll的markword，等待inflation結束。
      // We could always eliminate polling by parking the thread on some auxiliary list.  我們總是可以通過將線程停在某個輔助列表上來消除輪詢。
      if (mark == markOopDesc::INFLATING()) {
         TEVENT (Inflate: spin while INFLATING) ;
         ReadStableMark(object) ;
         continue ;
      }

      // CASE: stack-locked 此時鎖爲：輕量級鎖，需強迫它膨脹爲重量級鎖
      // Could be stack-locked either by this thread or by some other thread.  可能被此線程或其他線程堆棧鎖定
      //
      // Note that we allocate the objectmonitor speculatively, _before_ attempting
      // to install INFLATING into the mark word.  We originally installed INFLATING,
      // allocated the objectmonitor, and then finally STed the address of the
      // objectmonitor into the mark.  This was correct, but artificially lengthened
      // the interval in which INFLATED appeared in the mark, thus increasing
      // the odds of inflation contention.
      // 我們大膽地分配objectmonitor，在此之前嘗試將INFLATING狀態先設置到mark word。
      // 我們先設置了INFLATING狀態標記，然後分配了objectmonitor，最後將objectmonitor的地址設置到mark word中。
      // 這是正確的，但人爲地延長了INFLATED出現在mark上的時間間隔，從而增加了鎖膨脹的可能性。
      // 老外反覆說了一堆重複的話，意思無非就是：markword設置狀態INFLATING（結合上段對INFLATING處理的代碼思考） -> 分配鎖 -> markword設置狀態INFLATED(膨脹重量級鎖成功)
      //
      // We now use per-thread private objectmonitor free lists.
      // These list are reprovisioned from the global free list outside the
      // critical INFLATING...ST interval.  A thread can transfer
      // multiple objectmonitors en-mass from the global free list to its local free list.
      // This reduces coherency traffic and lock contention on the global free list.
      // Using such local free lists, it doesn't matter if the omAlloc() call appears
      // before or after the CAS(INFLATING) operation.
      // See the comments in omAlloc().

      if (mark->has_locker()) {
          ObjectMonitor * m = omAlloc (Self) ;
          // Optimistically prepare the objectmonitor - anticipate successful CAS
          // We do this before the CAS in order to minimize the length of time
          // in which INFLATING appears in the mark.
          m->Recycle();
          m->_Responsible  = NULL ;
          m->OwnerIsThread = 0 ;
          m->_recursions   = 0 ;
          m->_SpinDuration = ObjectMonitor::Knob_SpinLimit ;   // Consider: maintain by type/class

          markOop cmp = (markOop) Atomic::cmpxchg_ptr (markOopDesc::INFLATING(), object->mark_addr(), mark) ;
          if (cmp != mark) {
             omRelease (Self, m, true) ;
             continue ;       // Interference -- just retry
          }

          // We've successfully installed INFLATING (0) into the mark-word.
          // This is the only case where 0 will appear in a mark-work.
          // Only the singular thread that successfully swings the mark-word
          // to 0 can perform (or more precisely, complete) inflation.
          //
          // Why do we CAS a 0 into the mark-word instead of just CASing the
          // mark-word from the stack-locked value directly to the new inflated state?
          // Consider what happens when a thread unlocks a stack-locked object.
          // It attempts to use CAS to swing the displaced header value from the
          // on-stack basiclock back into the object header.  Recall also that the
          // header value (hashcode, etc) can reside in (a) the object header, or
          // (b) a displaced header associated with the stack-lock, or (c) a displaced
          // header in an objectMonitor.  The inflate() routine must copy the header
          // value from the basiclock on the owner's stack to the objectMonitor, all
          // the while preserving the hashCode stability invariants.  If the owner
          // decides to release the lock while the value is 0, the unlock will fail
          // and control will eventually pass from slow_exit() to inflate.  The owner
          // will then spin, waiting for the 0 value to disappear.   Put another way,
          // the 0 causes the owner to stall if the owner happens to try to
          // drop the lock (restoring the header from the basiclock to the object)
          // while inflation is in-progress.  This protocol avoids races that might
          // would otherwise permit hashCode values to change or "flicker" for an object.
          // Critically, while object->mark is 0 mark->displaced_mark_helper() is stable.
          // 0 serves as a "BUSY" inflate-in-progress indicator.


          // fetch the displaced mark from the owner's stack.
          // The owner can't die or unwind past the lock while our INFLATING
          // object is in the mark.  Furthermore the owner can't complete
          // an unlock on the object, either.
          markOop dmw = mark->displaced_mark_helper() ;
          assert (dmw->is_neutral(), "invariant") ;

          // Setup monitor fields to proper values -- prepare the monitor
          m->set_header(dmw) ;

          // Optimization: if the mark->locker stack address is associated
          // with this thread we could simply set m->_owner = Self and
          // m->OwnerIsThread = 1. Note that a thread can inflate an object
          // that it has stack-locked -- as might happen in wait() -- directly
          // with CAS.  That is, we can avoid the xchg-NULL .... ST idiom.
          m->set_owner(mark->locker());
          m->set_object(object);
          // TODO-FIXME: assert BasicLock->dhw != 0.

          // Must preserve store ordering. The monitor state must
          // be stable at the time of publishing the monitor address.
          guarantee (object->mark() == markOopDesc::INFLATING(), "invariant") ;
          object->release_set_mark(markOopDesc::encode(m));

          // Hopefully the performance counters are allocated on distinct cache lines
          // to avoid false sharing on MP systems ...
          if (ObjectMonitor::_sync_Inflations != NULL) ObjectMonitor::_sync_Inflations->inc() ;
          TEVENT(Inflate: overwrite stacklock) ;
          if (TraceMonitorInflation) {
            if (object->is_instance()) {
              ResourceMark rm;
              tty->print_cr("Inflating object " INTPTR_FORMAT " , mark " INTPTR_FORMAT " , type %s",
                (void *) object, (intptr_t) object->mark(),
                object->klass()->external_name());
            }
          }
          return m ;
      }

      // CASE: neutral
      // TODO-FIXME: for entry we currently inflate and then try to CAS _owner.
      // If we know we're inflating for entry it's better to inflate by swinging a
      // pre-locked objectMonitor pointer into the object header.   A successful
      // CAS inflates the object *and* confers ownership to the inflating thread.
      // In the current implementation we use a 2-step mechanism where we CAS()
      // to inflate and then CAS() again to try to swing _owner from NULL to Self.
      // An inflateTry() method that we could call from fast_enter() and slow_enter()
      // would be useful.

      assert (mark->is_neutral(), "invariant");
      ObjectMonitor * m = omAlloc (Self) ;
      // prepare m for installation - set monitor to initial state
      m->Recycle();
      m->set_header(mark);
      m->set_owner(NULL);
      m->set_object(object);
      m->OwnerIsThread = 1 ;
      m->_recursions   = 0 ;
      m->_Responsible  = NULL ;
      m->_SpinDuration = ObjectMonitor::Knob_SpinLimit ;       // consider: keep metastats by type/class

      if (Atomic::cmpxchg_ptr (markOopDesc::encode(m), object->mark_addr(), mark) != mark) {
          m->set_object (NULL) ;
          m->set_owner  (NULL) ;
          m->OwnerIsThread = 0 ;
          m->Recycle() ;
          omRelease (Self, m, true) ;
          m = NULL ;
          continue ;
          // interference - the markword changed - just retry.
          // The state-transitions are one-way, so there's no chance of
          // live-lock -- "Inflated" is an absorbing state.
      }

      // Hopefully the performance counters are allocated on distinct
      // cache lines to avoid false sharing on MP systems ...
      if (ObjectMonitor::_sync_Inflations != NULL) ObjectMonitor::_sync_Inflations->inc() ;
      TEVENT(Inflate: overwrite neutral) ;
      if (TraceMonitorInflation) {
        if (object->is_instance()) {
          ResourceMark rm;
          tty->print_cr("Inflating object " INTPTR_FORMAT " , mark " INTPTR_FORMAT " , type %s",
            (void *) object, (intptr_t) object->mark(),
            object->klass()->external_name());
        }
      }
      return m ;
  }
}

3.3、鎖升級過程

重要的事情又來了，又到了反覆品本文2.3章內容的時刻！！！

鎖升級過程，可以總結爲：無鎖 -> 偏向鎖 -> 輕量級鎖（自旋鎖，自適應自旋）-> 重量級鎖。且只可正向膨脹升級，不存在降級。

對象初始化後，處於無鎖狀態
當存在一個線程A來獲取鎖，鎖對象第一次被獲取使用時，進入偏向鎖模式，且可重入。當滿足一些苛刻的條件時，如果存在另外一個線程B來獲取鎖時，偏向鎖可被B線程CAS獲取到，並替換markword中的線程ID相關信息。
若競爭偏向鎖失敗，則會升級爲輕量級鎖（又叫自旋鎖、堆棧鎖），在升級過程中也採用CAS操作。若首次CAS獲取或競爭輕量級鎖失敗，則會採用spin自旋的方式，自旋N次，重複嘗試。自旋也又固定的次數，逐漸優化爲更爲智能的自適應自旋重試。
若經過自旋，依然無法獲取到鎖，表明鎖競爭較爲激烈，CAS自旋較爲消耗CPU資源，直接膨脹升級爲重量級鎖。

超有用的總結：重量級鎖，會直接向操作系統申請資源，將等待線程掛起，進入鎖池隊列阻塞等待，等待操作系統的調度。其餘的偏向鎖和輕量級鎖，本質上並未交由操作系統調度，依然處於用戶態，依然消耗CPU資源，只是採用CAS無鎖競爭的方式獲取鎖。CAS又是Java通過Unsafe類中compareAndSwap方法，jni調用jvm中的C++方法，最終通過下述彙編指令鎖住cpu中的北橋信號（非鎖住總線，鎖住總線就什麼都幹不了了）實現。

lock cmpxchg 指令

3.4、鎖消除

引用《深入理解Java虛擬機》第三版對鎖消除的一段介紹：

鎖消除是指虛擬機即時編譯器在運行時，對一些代碼要求同步，但是對被檢測到不可能存在共享數據競爭的鎖進行消除。鎖消除的主要判定依據來源於逃逸分析的數據支持，如果判斷到一段代碼中，在堆上的所有數據都不會逃逸出去被其他線程調用，那就可以把它們當作棧上數據對待，認爲它們是線程私有的，同步加鎖自然無須再進行。

比如下面一段代碼：

	 public static String concatString(String str1, String str2, String str3) {
        StringBuffer sb = new StringBuffer();
        sb.append(str1).append(str2).append(str3);
        return sb.toString();
    }

大家都熟知StringBuffer是一個線程安全的字符串拼接類，它的每個方法都加了synchronized關鍵字，每個方法都需要獲取鎖才能執行，鎖對象就是StringBuffer的實例化對象。上述代碼中，鎖對象就是sb實例對象，經過虛擬機的逃逸分析後會發現sb對象的作用域僅僅被侷限在concatString方法內部，根本不會被外部方法使用或調用。因此，其他線程完全沒有機會訪問到它，也不會產生資源競爭的同步問題。在解釋執行時，這裏仍然會加鎖，在經過服務端編譯器的即時編譯後（因爲逃逸分析是屬於即時編譯器的優化技術），這段代碼就會忽略所有的同步措施而直接執行。

3.5、鎖粗化

Show Code:

 public static String testLockCoarsenin(String str) {
        StringBuffer sb = new StringBuffer();
   			for(int i = 0; i < 100; i++){
          sb.append(str1);
        }
        return sb.toString();
  }

比如上述代碼，append方法需要獲取鎖，在未優化的情況下，循環調用100次，則需要獲取鎖和釋放鎖各100次，相當浪費資源。JVM 會檢測到這樣一連串的操作都對同一個對象加鎖，將會把加鎖同步的範圍粗化到整個操作序列的外部（如循環體外部），使得一連串的操作只需要加一次鎖即可。

3.6、即時編譯器的鎖優化`拓展瞭解`

目前主流的Java虛擬機，如我們最常使用的HotSpot虛擬機採用的是：解釋器和編譯器並存的架構。Java程序最初通過解釋器進行解釋執行的，當虛擬機發現某個方法或代碼塊運行很頻繁，就會把這些代碼認定爲熱點代碼，並通過編譯器即時將熱點代碼編譯成本地機器碼，並以各種手段儘可能地優化代碼，以提高執行效率。

上述這種解釋器和編譯器並存的架構使解釋器和編譯器優勢互補：當程序需要迅速啓動或執行時，解釋器首先介入，省去編譯時間；當程序啓動後，編譯器逐漸發揮作用，把更多的代碼編譯成本地代碼，提高執行效率。

繼續Show Code:

public class Demo {
    static volatile int i = 0;
    static volatile int j = 0;

    public static void n() {
        i++;
    }

    public static synchronized void m() {
        j++;
    }

    public static void main(String[] args) throws InterruptedException {
        for (int j = 0; j < 100_0000; j++) {
            m();
            n();
        }
        System.out.println(i);
        System.out.println(j);
    }
}

執行main方法時，加上以下JVM參數（打開診斷模式，打印彙編代碼）：-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly 打印彙編代碼；或使用-server -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+PrintAssembly -XX:+LogCompilation -XX:LogFile=TestSynchronizedAssembly.log以log的形式輸出到文件，使用jitwatch等工具查看彙編代碼。

會看到m和n方法的C1 Compile Level 1 (C1編譯器優化)和C2 Compile Level 1 (C2編譯器優化)內容。裏面都會有lock comxchg .....指令，也就是我們重複執行100萬次的m和n方法成爲熱點代碼，經過了兩級編譯器的優化編譯，將較爲耗時的synchronized加鎖和釋放鎖操作，優化成了在此處更爲合理的底層cas操作，並使用lock指令修飾的同步措施。

注：並非所有的synchronized經過被編譯優化爲lock comxchg ...指令，不同代碼有不同的優化方式，千萬、千萬不要認爲synchronized的底層實現是lock comxchg ...指令。這裏只是拿上述代碼進行的舉例。

如果大家的**-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly** 指令無法正常使用，是因爲缺少hsdis的配置，請自行百度或參考《深入理解Java虛擬機》第三版的第11.2.4章。hsdis和強大的jitwatch的下載和安裝參考文章：https://www.xuebuyuan.com/3192700.html，以及強大

如果大家對編譯器工作內容和原理感興趣，請自行百度或或參考《深入理解Java虛擬機》第三版的第10章和第11章。

我對上述的底層原理也停留在“紙老虎”階段，如有理解或表述誤差，還請斧正或探討。

幹掉面試官1-synchronized底層原理（從Java對象頭說到即時編譯優化）

synchronized底層原理（從Java對象頭說到即時編譯優化）

一、兩個好用卻不被熟知的工具

1.1、字節碼查看插件（jclasslib Bytecode viewer）

1.2、Java對象內存佈局查看工具-JOL

二、Java對象在內存中的存儲佈局

2.1、理論

2.2、實踐

2.3、MarkWord淺析及鎖膨脹過程

2.4、指針壓縮（-XX:+UseCompressedClassPointers 和-XX:+UseCompressedOops）

三、synchronized詳解

3.1、Java源碼和字節碼層級的synchronized

3.2、JVM層級的synchronized`重點`

3.2.1 jdk源碼中mointerenter和mointerexit

3.2.2 jdk源碼中fast_enter和slow_enter方法

3.2.2 jdk源碼中inflate方法

3.3、鎖升級過程

3.4、鎖消除

3.5、鎖粗化

3.6、即時編譯器的鎖優化`拓展瞭解`

推薦2款開源、美觀的WinForm UI控件庫

NET9 AspnetCore將整合OpenAPI的文檔生成功能而無需三方庫

幹掉面試官3-CPU中的緩存、緩存一致性、僞共享和緩存行填充

idea中閱讀jdk源碼，並添加註釋

幹掉面試官1-synchronized底層原理（從Java對象頭說到即時編譯優化）

fastjson低於1.2.60的遠程拒絕服務漏洞

淺談軟件和硬件負載均衡（LVS、HAProxy、Nginx、F5）及一次線上問題分析

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

幹掉面試官1-synchronized底層原理（從Java對象頭說到即時編譯優化）

synchronized底層原理（從Java對象頭說到即時編譯優化）

一、兩個好用卻不被熟知的工具

1.1、字節碼查看插件（jclasslib Bytecode viewer）

1.2、Java對象內存佈局查看工具-JOL

二、Java對象在內存中的存儲佈局

2.1、理論

2.2、實踐

2.3、MarkWord淺析及鎖膨脹過程

2.4、指針壓縮（-XX:+UseCompressedClassPointers 和-XX:+UseCompressedOops）

三、synchronized詳解

3.1、Java源碼和字節碼層級的synchronized

3.2、JVM層級的synchronized重點

3.2.1 jdk源碼中mointerenter和mointerexit

3.2.2 jdk源碼中fast_enter和slow_enter方法

3.2.2 jdk源碼中inflate方法

3.3、鎖升級過程

3.4、鎖消除

3.5、鎖粗化

3.6、即時編譯器的鎖優化拓展瞭解

3.2、JVM層級的synchronized`重點`

3.6、即時編譯器的鎖優化`拓展瞭解`