[inside hotspot] 彙編模板解釋器(Template Interpreter)和字節碼執行

[inside hotspot] 彙編模板解釋器(Template Interpreter)和字節碼執行

1.模板解釋器

hotspot解釋器模塊(hotspot\src\share\vm\interpreter)有兩個實現:基於C++的解釋器和基於彙編的模板解釋器。hotspot默認使用比較快的模板解釋器。
其中

  • C++解釋器 = bytecodeInterpreter* + cppInterpreter*
  • 模板解釋器 = templateTable* + templateInterpreter*

它們前者負責字節碼的解釋,後者負責解釋器的運行時,共同完成解釋功能。這裏我們只關注模板解釋器。

模板解釋器又分爲三個組成部分:

  • templateInterpreterGenerator 解釋器生成器
  • templateTable 字節碼實現
  • templateInterpreter 解釋器
    可能看起來很奇怪,爲什麼有一個解釋器生成器和字節碼實現。進入解釋器實現:
class TemplateInterpreter: public AbstractInterpreter {
  friend class VMStructs;
  friend class InterpreterMacroAssembler;
  friend class TemplateInterpreterGenerator;
  friend class TemplateTable;
  friend class CodeCacheExtensions;
  // friend class Interpreter;
 public:

  enum MoreConstants {
    number_of_return_entries  = number_of_states,               // number of return entry points
    number_of_deopt_entries   = number_of_states,               // number of deoptimization entry points
    number_of_return_addrs    = number_of_states                // number of return addresses
  };

 protected:

  static address    _throw_ArrayIndexOutOfBoundsException_entry;
  static address    _throw_ArrayStoreException_entry;
  static address    _throw_ArithmeticException_entry;
  static address    _throw_ClassCastException_entry;
  static address    _throw_NullPointerException_entry;
  static address    _throw_exception_entry;

  static address    _throw_StackOverflowError_entry;

  static address    _remove_activation_entry;                   // continuation address if an exception is not handled by current frame
#ifdef HOTSWAP
  static address    _remove_activation_preserving_args_entry;   // continuation address when current frame is being popped
#endif // HOTSWAP

#ifndef PRODUCT
  static EntryPoint _trace_code;
#endif // !PRODUCT
  static EntryPoint _return_entry[number_of_return_entries];    // entry points to return to from a call
  static EntryPoint _earlyret_entry;                            // entry point to return early from a call
  static EntryPoint _deopt_entry[number_of_deopt_entries];      // entry points to return to from a deoptimization
  static EntryPoint _continuation_entry;
  static EntryPoint _safept_entry;

  static address _invoke_return_entry[number_of_return_addrs];           // for invokestatic, invokespecial, invokevirtual return entries
  static address _invokeinterface_return_entry[number_of_return_addrs];  // for invokeinterface return entries
  static address _invokedynamic_return_entry[number_of_return_addrs];    // for invokedynamic return entries

  static DispatchTable _active_table;                           // the active    dispatch table (used by the interpreter for dispatch)
  static DispatchTable _normal_table;                           // the normal    dispatch table (used to set the active table in normal mode)
  static DispatchTable _safept_table;                           // the safepoint dispatch table (used to set the active table for safepoints)
  static address       _wentry_point[DispatchTable::length];    // wide instructions only (vtos tosca always)


 public:
  ...
  static int InterpreterCodeSize;
};

裏面很多address變量,EntryPoint是一個address數組,DispatchTable也是。
模板解釋器就是由一系列例程(routine)組成的,即address變量,它們每個都表示一個例程的入口地址,比如異常處理例程,invoke指令例程,用於gc的safepoint例程...
舉個形象的例子,我們都知道字節碼文件長這樣:

public void f();                                                                                   0: aload_0                                                                                 
1: invokespecial #5                  // Method A.f:()V                                      
4: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;          
7: ldc           #6                  // String ff                                                 
9: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V  
12: return

如果要讓我們寫解釋器,可能基本上就是一個循環裏面switch,根據不同opcode派發到不同例程,例程的代碼都是一樣的模板代碼,對aload_0的處理永遠是取局部變量槽0的數據放到棧頂,那麼完全可以在switch派發字節碼前準備好這些模板代碼,templateInterpreterGenerator就是做的這件事,它的generate_all()函數初始化了所有的例程:

void TemplateInterpreterGenerator::generate_all() {
  // 設置slow_signature_handler例程
  { CodeletMark cm(_masm, "slow signature handler");
    AbstractInterpreter::_slow_signature_handler = generate_slow_signature_handler();
  }
  // 設置error_exit例程
  { CodeletMark cm(_masm, "error exits");
    _unimplemented_bytecode    = generate_error_exit("unimplemented bytecode");
    _illegal_bytecode_sequence = generate_error_exit("illegal bytecode sequence - method not verified");
  }
  ......
}

另外,既然已經涉及到機器碼了,單獨的templateInterpreterGenerator顯然是不能完成這件事的,它還需要配合
hotspot\src\cpu\x86\vm\templateInterpreterGenerator_x86.cpp&&hotspot\src\cpu\x86\vm\templateInterpreterGenerator_x86_64.cpp一起做事(我的機器是x86+windows)。

使用-XX:+UnlockDiagnosticVMOptions -XX:+PrintInterpreter -XX:+LogCompilation -XX:LogFile=file.log保存結果到文件,可以查看生成的這些例程。
隨便舉個例子,模板解釋器特殊處理java.lang.Math裏的很多數學函數,使用它們不需要建立通常意義的java棧幀,且使用sse指令可以得到極大的性能提升:

// hotspot\src\cpu\x86\vm\templateInterpreterGenerator_x86_64.cpp
address TemplateInterpreterGenerator::generate_math_entry(AbstractInterpreter::MethodKind kind) {
  // rbx,: Method*
  // rcx: scratrch
  // r13: sender sp
  if (!InlineIntrinsics) return NULL; // Generate a vanilla entry
  address entry_point = __ pc();

  if (kind == Interpreter::java_lang_math_fmaD) {
    if (!UseFMA) {
      return NULL; // Generate a vanilla entry
    }
    __ movdbl(xmm0, Address(rsp, wordSize));
    __ movdbl(xmm1, Address(rsp, 3 * wordSize));
    __ movdbl(xmm2, Address(rsp, 5 * wordSize));
    __ fmad(xmm0, xmm1, xmm2, xmm0);
  } else if (kind == Interpreter::java_lang_math_fmaF) {
    if (!UseFMA) {
      return NULL; // Generate a vanilla entry
    }
    __ movflt(xmm0, Address(rsp, wordSize));
    __ movflt(xmm1, Address(rsp, 2 * wordSize));
    __ movflt(xmm2, Address(rsp, 3 * wordSize));
    __ fmaf(xmm0, xmm1, xmm2, xmm0);
  } else if (kind == Interpreter::java_lang_math_sqrt) {
    __ sqrtsd(xmm0, Address(rsp, wordSize));
  } else if (kind == Interpreter::java_lang_math_exp) {
    __ movdbl(xmm0, Address(rsp, wordSize));
    if (StubRoutines::dexp() != NULL) {
      __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, StubRoutines::dexp())));
    } else {
      __ call_VM_leaf0(CAST_FROM_FN_PTR(address, SharedRuntime::dexp));
    }
  } else if (kind == Interpreter::java_lang_math_log) {
    __ movdbl(xmm0, Address(rsp, wordSize));
    if (StubRoutines::dlog() != NULL) {
      __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, StubRoutines::dlog())));
    } else {
      __ call_VM_leaf0(CAST_FROM_FN_PTR(address, SharedRuntime::dlog));
    }
  } else if (kind == Interpreter::java_lang_math_log10) {
    __ movdbl(xmm0, Address(rsp, wordSize));
    if (StubRoutines::dlog10() != NULL) {
      __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, StubRoutines::dlog10())));
    } else {
      __ call_VM_leaf0(CAST_FROM_FN_PTR(address, SharedRuntime::dlog10));
    }
  } else if (kind == Interpreter::java_lang_math_sin) {
    __ movdbl(xmm0, Address(rsp, wordSize));
    if (StubRoutines::dsin() != NULL) {
      __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, StubRoutines::dsin())));
    } else {
      __ call_VM_leaf0(CAST_FROM_FN_PTR(address, SharedRuntime::dsin));
    }
  } else if (kind == Interpreter::java_lang_math_cos) {
    __ movdbl(xmm0, Address(rsp, wordSize));
    if (StubRoutines::dcos() != NULL) {
      __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, StubRoutines::dcos())));
    } else {
      __ call_VM_leaf0(CAST_FROM_FN_PTR(address, SharedRuntime::dcos));
    }
  } else if (kind == Interpreter::java_lang_math_pow) {
    __ movdbl(xmm1, Address(rsp, wordSize));
    __ movdbl(xmm0, Address(rsp, 3 * wordSize));
    if (StubRoutines::dpow() != NULL) {
      __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, StubRoutines::dpow())));
    } else {
      __ call_VM_leaf0(CAST_FROM_FN_PTR(address, SharedRuntime::dpow));
    }
  } else if (kind == Interpreter::java_lang_math_tan) {
    __ movdbl(xmm0, Address(rsp, wordSize));
    if (StubRoutines::dtan() != NULL) {
      __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, StubRoutines::dtan())));
    } else {
      __ call_VM_leaf0(CAST_FROM_FN_PTR(address, SharedRuntime::dtan));
    }
  } else {
    __ fld_d(Address(rsp, wordSize));
    switch (kind) {
    case Interpreter::java_lang_math_abs:
      __ fabs();
      break;
    default:
      ShouldNotReachHere();
    }

    __ subptr(rsp, 2*wordSize);
    // Round to 64bit precision
    __ fstp_d(Address(rsp, 0));
    __ movdbl(xmm0, Address(rsp, 0));
    __ addptr(rsp, 2*wordSize);
  }

  __ pop(rax);
  __ mov(rsp, r13);
  __ jmp(rax);

  return entry_point;
}

我們關注java.lang.math.Pow()方法,加上-XX:+PrintInterpreter查看生成的例程:

else if (kind == Interpreter::java_lang_math_pow) {
    __ movdbl(xmm1, Address(rsp, wordSize));
    __ movdbl(xmm0, Address(rsp, 3 * wordSize));
    if (StubRoutines::dpow() != NULL) {
      __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, StubRoutines::dpow())));
    } else {
      __ call_VM_leaf0(CAST_FROM_FN_PTR(address, SharedRuntime::dpow));
    }
  }
----------------------------------------------------------------------
method entry point (kind = java_lang_math_pow)  [0x000001bcb62feaa0, 0x000001bcb62feac0]  32 bytes

  0x000001bcb62feaa0: vmovsd 0x8(%rsp),%xmm1
  0x000001bcb62feaa6: vmovsd 0x18(%rsp),%xmm0
  0x000001bcb62feaac: callq  0x000001bcb62f19d0
  0x000001bcb62feab1: pop    %rax
  0x000001bcb62feab2: mov    %r13,%rsp
  0x000001bcb62feab5: jmpq   *%rax
  0x000001bcb62feab7: nop
  0x000001bcb62feab8: add    %al,(%rax)
  0x000001bcb62feaba: add    %al,(%rax)
  0x000001bcb62feabc: add    %al,(%rax)
  0x000001bcb62feabe: add    %al,(%rax)

callq會調用hotspot\src\cpu\x86\vm\stubGenerator_x86_64.cppaddress generate_libmPow(),感興趣的可以去看一下,這裏就不展開了。

2.字節碼的解釋執行

現在我們知道了模板解釋器其實是由一堆例程構成的,但是,字節碼的例程的呢?看看上面TemplateInterpreter的類定義,有個static DispatchTable _active_table;,它就是我們要找的東西了。具體來說templateInterpreterGenerator會調用TemplateInterpreterGenerator::set_entry_points()爲每個字節碼設置例程,該例程通過templateTable::template_for()獲得。同樣,這些代碼需要關心cpu架構,所以自己每個字節碼的例程也是由hotspot\src\cpu\x86\vm\templateTable_x86.cpp+templateTable共同完成的。
字節碼太多了,這裏也隨便舉個例子,考慮istore,它負責將棧頂數據出棧並存放到當前方法的局部變量表,實現如下:

void TemplateTable::istore() {
  transition(itos, vtos);
  locals_index(rbx);
  __ movl(iaddress(rbx), rax);
}

合情合理的實現

等等,當使用-XX:+PrintInterpreter查看istore的合情合理的例程時卻得到了一大堆彙編:

----------------------------------------------------------------------
istore  54 istore  [0x00000192d1972ba0, 0x00000192d1972c00]  96 bytes

  0x00000192d1972ba0: mov    (%rsp),%eax
  0x00000192d1972ba3: add    $0x8,%rsp
  0x00000192d1972ba7: movzbl 0x1(%r13),%ebx
  0x00000192d1972bac: neg    %rbx
  0x00000192d1972baf: mov    %eax,(%r14,%rbx,8)
  0x00000192d1972bb3: movzbl 0x2(%r13),%ebx
  0x00000192d1972bb8: add    $0x2,%r13
  0x00000192d1972bbc: movabs $0x7fffd56e0fa0,%r10
  0x00000192d1972bc6: jmpq   *(%r10,%rbx,8)
  0x00000192d1972bca: mov    (%rsp),%eax
  0x00000192d1972bcd: add    $0x8,%rsp
  0x00000192d1972bd1: movzwl 0x2(%r13),%ebx
  0x00000192d1972bd6: bswap  %ebx
  0x00000192d1972bd8: shr    $0x10,%ebx
  0x00000192d1972bdb: neg    %rbx
  0x00000192d1972bde: mov    %eax,(%r14,%rbx,8)
  0x00000192d1972be2: movzbl 0x4(%r13),%ebx
  0x00000192d1972be7: add    $0x4,%r13
  0x00000192d1972beb: movabs $0x7fffd56e0fa0,%r10
  0x00000192d1972bf5: jmpq   *(%r10,%rbx,8)
  0x00000192d1972bf9: nopl   0x0(%rax)

雖然勉強能看出mov %eax,(%r14,%rbx,8)對應__ movl(iaddress(n), rax);,但是多出來的代碼怎麼回事。
要回答這個問題,需要點其他知識。

之前提到

templateInterpreterGenerator會調用TemplateInterpreterGenerator::set_entry_points()爲每個字節碼設置例程

可以從set_entry_points出發看看它爲istore做了什麼特殊的事情:

...
  // 指令是否存在
  if (Bytecodes::is_defined(code)) {
    Template* t = TemplateTable::template_for(code);
    assert(t->is_valid(), "just checking");
    set_short_entry_points(t, bep, cep, sep, aep, iep, lep, fep, dep, vep);
  }
  // 指令是否可以擴寬,即wide
  if (Bytecodes::wide_is_defined(code)) {
    Template* t = TemplateTable::template_for_wide(code);
    assert(t->is_valid(), "just checking");
    set_wide_entry_point(t, wep);
  }
...
}

中間有一句話:

 Template* t = TemplateTable::template_for(code);

從模板表中的查找Bytecodes::Code常量得到的是一個TemplateTemplate描述了一個指定的字節碼對應的代碼的一些屬性

// A Template describes the properties of a code template for a given bytecode
// and provides a generator to generate the code template.

// hotspot\src\share\vm\utilities\globalDefinitions.hpp
// TosState用來描述一個字節碼或者方法執行前後的狀態。
enum TosState {         // describes the tos cache contents
  btos = 0,             // byte, bool tos cached
  ztos = 1,             // byte, bool tos cached
  ctos = 2,             // char tos cached
  stos = 3,             // short tos cached
  itos = 4,             // int tos cached
  ltos = 5,             // long tos cached
  ftos = 6,             // float tos cached
  dtos = 7,             // double tos cached
  atos = 8,             // object cached
  vtos = 9,             // tos not cached
  number_of_states,
  ilgl                  // illegal state: should not occur
};
// hotspot\src\share\vm\interpreter\templateTable.hpp
class Template VALUE_OBJ_CLASS_SPEC {
 private:
  enum Flags {
    uses_bcp_bit,                                // 是否需要字節碼指針(bcp)?
    does_dispatch_bit,                           // 是否需要dispatch?
    calls_vm_bit,                                // 是否調用了虛擬機方法?
    wide_bit                                     // 能否擴寬,即加wide
  };

  typedef void (*generator)(int arg);           // 字節碼代碼生成器,其實是一個函數指針

  int       _flags;                              // 就是↑描述的flag
  TosState  _tos_in;                             // 執行字節碼前的棧頂緩存狀態
  TosState  _tos_out;                            // 執行字節碼的棧頂緩存狀態
  generator _gen;                                // 字節碼代碼生成器
  int       _arg;                                // 字節碼代碼生成器參數

然後找到istore對應的模板定義:

  //hotspot\src\share\vm\interpreter\templateTable.cpp
void TemplateTable::initialize() {
  ...
  //                                    interpr. templates
  // Java spec bytecodes                ubcp|disp|clvm|iswd  in    out   generator             argument
  def(Bytecodes::_istore              , ubcp|____|clvm|____, itos, vtos, istore              ,  _           );
  def(Bytecodes::_lstore              , ubcp|____|____|____, ltos, vtos, lstore              ,  _           );
  def(Bytecodes::_fstore              , ubcp|____|____|____, ftos, vtos, fstore              ,  _           );
  def(Bytecodes::_dstore              , ubcp|____|____|____, dtos, vtos, dstore              ,  _           );
  def(Bytecodes::_astore              , ubcp|____|clvm|____, vtos, vtos, astore              ,  _           );
 ...
  // wide Java spec bytecodes
  def(Bytecodes::_istore              , ubcp|____|____|iswd, vtos, vtos, wide_istore         ,  _           );
  def(Bytecodes::_lstore              , ubcp|____|____|iswd, vtos, vtos, wide_lstore         ,  _           );
  def(Bytecodes::_fstore              , ubcp|____|____|iswd, vtos, vtos, wide_fstore         ,  _           );
  def(Bytecodes::_dstore              , ubcp|____|____|iswd, vtos, vtos, wide_dstore         ,  _           );
  def(Bytecodes::_astore              , ubcp|____|____|iswd, vtos, vtos, wide_astore         ,  _           );
  def(Bytecodes::_iinc                , ubcp|____|____|iswd, vtos, vtos, wide_iinc           ,  _           );
  def(Bytecodes::_ret                 , ubcp|disp|____|iswd, vtos, vtos, wide_ret            ,  _           );
  def(Bytecodes::_breakpoint          , ubcp|disp|clvm|____, vtos, vtos, _breakpoint         ,  _           );

  ...
}

這裏定義的意思就是,istore使用無參數的生成器istore函數生成例程,這個生成器正是之前提到的那個很短的彙編代碼:

void TemplateTable::istore() {
  transition(itos, vtos);
  locals_index(rbx);
  __ movl(iaddress(rbx), rax);
}

ubcp表示使用字節碼指針,所謂字節碼指針指的是該字節碼的操作數是否存在於字節碼裏面,一圖勝千言:

istore的index緊跟在istore(0x36)後面,所以istore需要移動字節碼指針以獲取index。

istore還規定執行前棧頂緩存int值(itos),執行後不緩存(vtos),且istore還有一個wide版本,這個版本使用兩個字節的index。

有了這些信息,可以試着解釋多出的彙編是怎麼回事了。set_entry_points()爲istore和wide版本的istore生成代碼,
我們選擇普通版本的istore解釋,wide版本的依樣畫葫蘆即可。它又進一步調用了set_short_entry_points()

void TemplateInterpreterGenerator::set_entry_points(Bytecodes::Code code) {
 ...
  if (Bytecodes::is_defined(code)) {
    Template* t = TemplateTable::template_for(code);
    assert(t->is_valid(), "just checking");
    set_short_entry_points(t, bep, cep, sep, aep, iep, lep, fep, dep, vep);
  }
  if (Bytecodes::wide_is_defined(code)) {
    Template* t = TemplateTable::template_for_wide(code);
    assert(t->is_valid(), "just checking");
    set_wide_entry_point(t, wep);
  }
...
}

void TemplateInterpreterGenerator::set_short_entry_points(Template* t, address& bep, address& cep, address& sep, address& aep, address& iep, address& lep, address& fep, address& dep, address& vep) {
  assert(t->is_valid(), "template must exist");
  switch (t->tos_in()) {
    case btos:
    case ztos:
    case ctos:
    case stos:
      ShouldNotReachHere();  // btos/ctos/stos should use itos.
      break;
    case atos: vep = __ pc(); __ pop(atos); aep = __ pc(); generate_and_dispatch(t); break;
    case itos: vep = __ pc(); __ pop(itos); iep = __ pc(); generate_and_dispatch(t); break;
    case ltos: vep = __ pc(); __ pop(ltos); lep = __ pc(); generate_and_dispatch(t); break;
    case ftos: vep = __ pc(); __ pop(ftos); fep = __ pc(); generate_and_dispatch(t); break;
    case dtos: vep = __ pc(); __ pop(dtos); dep = __ pc(); generate_and_dispatch(t); break;
    case vtos: set_vtos_entry_points(t, bep, cep, sep, aep, iep, lep, fep, dep, vep);     break;
    default  : ShouldNotReachHere();                                                 break;
  }
}

set_short_entry_points會根據該指令執行前是否需要棧頂緩存pop數據,istore使用了itos緩存,所以需要pop:

// hotspot\src\cpu\x86\vm\interp_masm_x86.cpps
void InterpreterMacroAssembler::pop_i(Register r) {
  // XXX can't use pop currently, upper half non clean
  movl(r, Address(rsp, 0));
  addptr(rsp, wordSize);
}

稍微需要注意的是這裏說的pop是一個彈出的概念,實際生成的代碼是mov,試着解釋那一大堆彙編:
mov指令

----------------------------------------------------------------------
istore  54 istore  [0x00000192d1972ba0, 0x00000192d1972c00]  96 bytes
  ;獲取棧頂int緩存
  0x00000192d1972ba0: mov    (%rsp),%eax
  0x00000192d1972ba3: add    $0x8,%rsp

  0x00000192d1972ba7: movzbl 0x1(%r13),%ebx
  0x00000192d1972bac: neg    %rbx
  0x00000192d1972baf: mov    %eax,(%r14,%rbx,8)
  0x00000192d1972bb3: movzbl 0x2(%r13),%ebx
  0x00000192d1972bb8: add    $0x2,%r13
  0x00000192d1972bbc: movabs $0x7fffd56e0fa0,%r10
  0x00000192d1972bc6: jmpq   *(%r10,%rbx,8)
  0x00000192d1972bca: mov    (%rsp),%eax
  0x00000192d1972bcd: add    $0x8,%rsp
  0x00000192d1972bd1: movzwl 0x2(%r13),%ebx
  0x00000192d1972bd6: bswap  %ebx
  0x00000192d1972bd8: shr    $0x10,%ebx
  0x00000192d1972bdb: neg    %rbx
  0x00000192d1972bde: mov    %eax,(%r14,%rbx,8)
  0x00000192d1972be2: movzbl 0x4(%r13),%ebx
  0x00000192d1972be7: add    $0x4,%r13
  0x00000192d1972beb: movabs $0x7fffd56e0fa0,%r10
  0x00000192d1972bf5: jmpq   *(%r10,%rbx,8)
  0x00000192d1972bf9: nopl   0x0(%rax)

接着generate_and_dispatch()又分爲執行前(dispatch_prolog)+執行字節碼(t->generate())+執行後三部分(dispatch_epilog):

void TemplateInterpreterGenerator::generate_and_dispatch(Template* t, TosState tos_out) {
  ...
  int step = 0;
  if (!t->does_dispatch()) {
    step = t->is_wide() ? Bytecodes::wide_length_for(t->bytecode()) : Bytecodes::length_for(t->bytecode());
    if (tos_out == ilgl) tos_out = t->tos_out();
    // compute bytecode size
    assert(step > 0, "just checkin'");
    // setup stuff for dispatching next bytecode
    if (ProfileInterpreter && VerifyDataPointer
        && MethodData::bytecode_has_profile(t->bytecode())) {
      __ verify_method_data_pointer();
    }
    __ dispatch_prolog(tos_out, step);
  }
  // generate template
  t->generate(_masm);
  // advance
  if (t->does_dispatch()) {
#ifdef ASSERT
    // make sure execution doesn't go beyond this point if code is broken
    __ should_not_reach_here();
#endif // ASSERT
  } else {
    // dispatch to next bytecode
    __ dispatch_epilog(tos_out, step);
  }
}

x86的字節碼執行前不會做任何事,所以沒有其他代碼:

----------------------------------------------------------------------
istore  54 istore  [0x00000192d1972ba0, 0x00000192d1972c00]  96 bytes
  ;獲取棧頂int緩存
  0x00000192d1972ba0: mov    (%rsp),%eax
  0x00000192d1972ba3: add    $0x8,%rsp
  ; 執行istore,即移動bcp指針獲取index,放入局部變量槽
  0x00000192d1972ba7: movzbl 0x1(%r13),%ebx
  0x00000192d1972bac: neg    %rbx
  0x00000192d1972baf: mov    %eax,(%r14,%rbx,8)

  0x00000192d1972bb3: movzbl 0x2(%r13),%ebx
  0x00000192d1972bb8: add    $0x2,%r13
  0x00000192d1972bbc: movabs $0x7fffd56e0fa0,%r10
  0x00000192d1972bc6: jmpq   *(%r10,%rbx,8)
  0x00000192d1972bca: mov    (%rsp),%eax
  0x00000192d1972bcd: add    $0x8,%rsp
  0x00000192d1972bd1: movzwl 0x2(%r13),%ebx
  0x00000192d1972bd6: bswap  %ebx
  0x00000192d1972bd8: shr    $0x10,%ebx
  0x00000192d1972bdb: neg    %rbx
  0x00000192d1972bde: mov    %eax,(%r14,%rbx,8)
  0x00000192d1972be2: movzbl 0x4(%r13),%ebx
  0x00000192d1972be7: add    $0x4,%r13
  0x00000192d1972beb: movabs $0x7fffd56e0fa0,%r10
  0x00000192d1972bf5: jmpq   *(%r10,%rbx,8)
  0x00000192d1972bf9: nopl   0x0(%rax)

執行後調用的是dispatch_prolog:

void InterpreterMacroAssembler::dispatch_epilog(TosState state, int step) {
  dispatch_next(state, step);
}

void InterpreterMacroAssembler::dispatch_next(TosState state, int step) {
  // load next bytecode (load before advancing _bcp_register to prevent AGI)
  load_unsigned_byte(rbx, Address(_bcp_register, step));
  // advance _bcp_register
  increment(_bcp_register, step);
  dispatch_base(state, Interpreter::dispatch_table(state));
}

void InterpreterMacroAssembler::dispatch_base(TosState state,
                                              address* table,
                                              bool verifyoop) {
  verify_FPU(1, state);
  if (VerifyActivationFrameSize) {
    Label L;
    mov(rcx, rbp);
    subptr(rcx, rsp);
    int32_t min_frame_size =
      (frame::link_offset - frame::interpreter_frame_initial_sp_offset) *
      wordSize;
    cmpptr(rcx, (int32_t)min_frame_size);
    jcc(Assembler::greaterEqual, L);
    stop("broken stack frame");
    bind(L);
  }
  if (verifyoop) {
    verify_oop(rax, state);
  }
#ifdef _LP64
  // 防止意外執行到死代碼
  lea(rscratch1, ExternalAddress((address)table));
  jmp(Address(rscratch1, rbx, Address::times_8));
#else
  Address index(noreg, rbx, Address::times_ptr);
  ExternalAddress tbl((address)table);
  ArrayAddress dispatch(tbl, index);
  jump(dispatch);
#endif // _LP64
}
----------------------------------------------------------------------
istore  54 istore  [0x00000192d1972ba0, 0x00000192d1972c00]  96 bytes
  ; 獲取棧頂int緩存
  0x00000192d1972ba0: mov    (%rsp),%eax
  0x00000192d1972ba3: add    $0x8,%rsp

  ; 執行istore,即移動bcp指針獲取index,放入局部變量槽
  0x00000192d1972ba7: movzbl 0x1(%r13),%ebx
  0x00000192d1972bac: neg    %rbx
  0x00000192d1972baf: mov    %eax,(%r14,%rbx,8)

  ; 加載下一個字節碼,istore後面一個字節是index,所以需要r13+2
  0x00000192d1972bb3: movzbl 0x2(%r13),%ebx
  0x00000192d1972bb8: add    $0x2,%r13
  
  ; 防止意外執行到死代碼
  0x00000192d1972bbc: movabs $0x7fffd56e0fa0,%r10
  0x00000192d1972bc6: jmpq   *(%r10,%rbx,8)
  
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
  ; 之前提到istore有一個wide版本的也會一併生成,wide istore格式如下
  ; wide istore byte1, byte2 [四個字節]
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
  ; 獲取棧頂緩存的int
  0x00000192d1972bca: mov    (%rsp),%eax
  0x00000192d1972bcd: add    $0x8,%rsp
  
  ; 獲取兩個字節的index
  0x00000192d1972bd1: movzwl 0x2(%r13),%ebx         ; 除兩個字節的index外0填充,比如當前index分別爲2,2,擴展後ebx=0x00000202
  0x00000192d1972bd6: bswap  %ebx                   ; 4個字節反序,ebx=0x02020000
  0x00000192d1972bd8: shr    $0x10,%ebx             ; ebx=0x00000202
  0x00000192d1972bdb: neg    %rbx                   ; 取負數
  0x00000192d1972bde: mov    %eax,(%r14,%rbx,8)     ; r14-rbx*8,

  ; 加載下一個字節碼,wide istore byte1,byte2 所以r13+4
  0x00000192d1972be2: movzbl 0x4(%r13),%ebx
  0x00000192d1972be7: add    $0x4,%r13
  
  ; 防止意外執行到死代碼
  0x00000192d1972beb: movabs $0x7fffd56e0fa0,%r10
  0x00000192d1972bf5: jmpq   *(%r10,%rbx,8)
  0x00000192d1972bf9: nopl   0x0(%rax)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章