前言

在前面 SandHook 系列我們知道 ArtMethod 入口替換並不能覆蓋所有的方法，而且這個問題比預想的嚴重的多的多。
而導致 Hook 不到的原因不僅僅是 inline 優化，在 Android O 之前 Inline 只是小頭，真正主要的原因是 Art Optimizing 代碼生成的 Sharpening 優化。

Quick & Optimizing

ART 中的 Compiler 有兩種

Quick
Optimizing

Quick 在 4.4 就引入，直到 6.0 一直作爲默認 Compiler, 直到 7.0 被移除。

Optimizing 5.0 引入，7.0 - 9.0 作爲唯一 Compiler。

下面以 Optimizing Compiler 爲例分析 ART 方法調用的生成。

Optimizing

Optimizing 比 Quick 生成速度慢，但是會附帶各種優化，包括：

逃逸分析：如果不能逃逸，則直接棧上分配
常量摺疊
死代碼塊移除
方法內聯
指令精簡
指令重拍序
load/store 精簡
Intrinsic 函數替換

。。。

其中包括 Invoke 代碼生成：

invoke-static/invoke-direct 代碼生成默認使用 Sharpening 優化。

Sharpening

Sharpening 做了兩件事情：

確定加載 ArtMethod 的方式和位置
確定直接 blr 入口調用方法還是查詢 ArtMethod -> CodeEntry 調用方法

結果保存在 MethodLoadKind & CodePtrLocation 兩個 enum 中

MethodLoadKind 就是 ArtMethod 加載類型
CodePtrLocation 就是跳轉地址的類型

我們重點關注 CodePtrLocation：
但是 CodePtrLocation 在 8.0 有重大變化：

8.0 之前

 // Determines the location of the code pointer.
  enum class CodePtrLocation {
    // Recursive call, use local PC-relative call instruction.
    kCallSelf,

    // Use PC-relative call instruction patched at link time.
    // Used for calls within an oat file, boot->boot or app->app.
    kCallPCRelative,

    // Call to a known target address, embed the direct address in code.
    // Used for app->boot call with non-relocatable image and for JIT-compiled calls.
    kCallDirect,

    // Call to a target address that will be known at link time, embed the direct
    // address in code. If the image is relocatable, emit .patch_oat entry.
    // Used for app->boot calls with relocatable image and boot->boot calls, whether
    // the image relocatable or not.
    kCallDirectWithFixup,

    // Use code pointer from the ArtMethod*.
    // Used when we don't know the target code. This is also the last-resort-kind used when
    // other kinds are unimplemented or impractical (i.e. slow) on a particular architecture.
    kCallArtMethod,
  };

kCallSelf 顧名思義，遞歸調用自己，此時不需要重新加載 ArtMethod，可以直接確定代碼位置。
kCallPCRelative，直接 B 到下面的方法，多見於調用附近的方法。
kCallDirect ，可以直接知道編譯完成的入口代碼，則可以跳過 ArtMethod->CodeEntry 查詢，直接 blx entry。多見於調用系統方法，這些方法中都是絕對地址，不需要重定向。
kCallDirectWithFixup，link OAT 文件的時候，才能確定方法在內存中的位置，方法入口需要 linker 重定向。也不需要查詢 ArtMethod。
kCallArtMethod，此種需要在 Runtime 期間得知方法入口，需要查詢 ArtMethod->CodeEntry。那麼由此可見只有在此種情況下，入口替換的 Hook 纔有可能生效。

代碼生成

void CodeGeneratorARM64::GenerateStaticOrDirectCall(HInvokeStaticOrDirect* invoke, Location temp) {


//處理 ArtMethod 加載位置
...........

//生成跳轉代碼
switch (invoke->GetCodePtrLocation()) {
    case HInvokeStaticOrDirect::CodePtrLocation::kCallSelf:
      __ Bl(&frame_entry_label_);
      break;
    case HInvokeStaticOrDirect::CodePtrLocation::kCallPCRelative: {
      relative_call_patches_.emplace_back(invoke->GetTargetMethod());
      vixl::Label* label = &relative_call_patches_.back().label;
      vixl::SingleEmissionCheckScope guard(GetVIXLAssembler());
      __ Bind(label);
      __ bl(0);  // Branch and link to itself. This will be overriden at link time.
      break;
    }
    case HInvokeStaticOrDirect::CodePtrLocation::kCallDirectWithFixup:
    case HInvokeStaticOrDirect::CodePtrLocation::kCallDirect:
      // LR prepared above for better instruction scheduling.
      DCHECK(direct_code_loaded);
      // lr()
      __ Blr(lr);
      break;
    case HInvokeStaticOrDirect::CodePtrLocation::kCallArtMethod:
      // LR = callee_method->entry_point_from_quick_compiled_code_;
      __ Ldr(lr, MemOperand(
          XRegisterFrom(callee_method),
       ArtMethod::EntryPointFromQuickCompiledCodeOffset(kArm64WordSize).Int32Value()));
      // lr()
      __ Blr(lr);
      break;
  }
}

可以看到只有 kCallArtMethod 才使用：

__ Ldr(lr, MemOperand(XRegisterFrom(callee_method),ArtMethod::EntryPointFromQuickCompiledCodeOffset(kArm64WordSize).Int32Value()));

生成了從 ArtMethod 加載 CodeEntry 的代碼：

ldr lr [RegMethod, #CodeEntryOffset]

其他情況都是直接 B CodeEntry

8.0 之後

8.0 之後情況有所改觀，說實話，從我的角度來說並沒有感覺這項優化能帶來多大的性能提升，所以 8.0 之後索性除了遞歸都先從 ArtMethod 裏面找入口。

// Determines the location of the code pointer.
  enum class CodePtrLocation {
    // Recursive call, use local PC-relative call instruction.
    kCallSelf,

    // Use code pointer from the ArtMethod*.
    // Used when we don't know the target code. This is also the last-resort-kind used when
    // other kinds are unimplemented or impractical (i.e. slow) on a particular architecture.
    kCallArtMethod,
  };

代碼生成

switch (invoke->GetCodePtrLocation()) {
    case HInvokeStaticOrDirect::CodePtrLocation::kCallSelf:
      {
        // Use a scope to help guarantee that `RecordPcInfo()` records the correct pc.
        ExactAssemblyScope eas(GetVIXLAssembler(),
                               kInstructionSize,
                               CodeBufferCheckScope::kExactSize);
        __ bl(&frame_entry_label_);
        RecordPcInfo(invoke, invoke->GetDexPc(), slow_path);
      }
      break;
    case HInvokeStaticOrDirect::CodePtrLocation::kCallArtMethod:
      // LR = callee_method->entry_point_from_quick_compiled_code_;
      __ Ldr(lr, MemOperand(
          XRegisterFrom(callee_method),
          ArtMethod::EntryPointFromQuickCompiledCodeOffset(kArm64PointerSize).Int32Value()));
      {
        // Use a scope to help guarantee that `RecordPcInfo()` records the correct pc.
        ExactAssemblyScope eas(GetVIXLAssembler(),
                               kInstructionSize,
                               CodeBufferCheckScope::kExactSize);
        // lr()
        __ blr(lr);
        RecordPcInfo(invoke, invoke->GetDexPc(), slow_path);
      }
      break;
  }

invoke-virtual/interface

invoke-virtual/interface 默認走另外一套

{
    // Ensure that between load and MaybeRecordImplicitNullCheck there are no pools emitted.
    EmissionCheckScope guard(GetVIXLAssembler(), kMaxMacroInstructionSizeInBytes);
    // /* HeapReference<Class> */ temp = receiver->klass_
    __ Ldr(temp.W(), HeapOperandFrom(LocationFrom(receiver), class_offset));
    MaybeRecordImplicitNullCheck(invoke);
  }
  // Instead of simply (possibly) unpoisoning `temp` here, we should
  // emit a read barrier for the previous class reference load.
  // intermediate/temporary reference and because the current
  // concurrent copying collector keeps the from-space memory
  // intact/accessible until the end of the marking phase (the
  // concurrent copying collector may not in the future).
  GetAssembler()->MaybeUnpoisonHeapReference(temp.W());
  // temp = temp->GetMethodAt(method_offset);
  __ Ldr(temp, MemOperand(temp, method_offset));
  // lr = temp->GetEntryPoint();
  __ Ldr(lr, MemOperand(temp, entry_point.SizeValue()));
  {
    // Use a scope to help guarantee that `RecordPcInfo()` records the correct pc.
    ExactAssemblyScope eas(GetVIXLAssembler(), kInstructionSize, CodeBufferCheckScope::kExactSize);
    // lr();
    __ blr(lr);
    RecordPcInfo(invoke, invoke->GetDexPc(), slow_path);
  }

步驟如下：

Class clazz = receiver.getClass()
Method method = class.getMethodAt(Index);
Blr method->CodeEntry

InvokeRuntime

主要服務於需要在 Runtime 時期才能確定的 Invoke，例如類初始化函數。(kQuickInitializeType)

InvokeRuntime 會從當前 Thread 中查找 CodeEntry：

void CodeGeneratorARM64::InvokeRuntime(int32_t entry_point_offset,
                                       HInstruction* instruction,
                                       uint32_t dex_pc,
                                       SlowPathCode* slow_path) {
  ValidateInvokeRuntime(instruction, slow_path);
  BlockPoolsScope block_pools(GetVIXLAssembler());
  __ Ldr(lr, MemOperand(tr, entry_point_offset));
  __ Blr(lr);
  RecordPcInfo(instruction, dex_pc, slow_path);
}

tr 就是線程寄存器，一般 ARM64 是 X19

所以代碼出來一般長這樣：

loc_3e6828:
mov        x0, x19
ldr        x20, [x0, #0x310]
blr        x20

Intrinsics

ART 額外維護了一批系統函數的高效實現，這些高效實現利用了CPU的指令，直接跳過了方法調用。

  // System.arraycopy.
    case kIntrinsicSystemArrayCopyCharArray:
      return Intrinsics::kSystemArrayCopyChar;

    case kIntrinsicSystemArrayCopy:
      return Intrinsics::kSystemArrayCopy;

    // Thread.currentThread.
    case kIntrinsicCurrentThread:
      return Intrinsics::kThreadCurrentThread;

以 Thread.currentThread() 方法爲例，此次調用在 intrinsics 的優化下變成了這段代碼：

void IntrinsicCodeGeneratorARM64::VisitThreadCurrentThread(HInvoke* invoke) {
  codegen_->Load(Primitive::kPrimNot, WRegisterFrom(invoke->GetLocations()->Out()),
                 MemOperand(tr, Thread::PeerOffset<8>().Int32Value()));
}

最後出來的代碼類似這樣，直接就把 Thread.nativePeer ldr 給目標寄存器，根本不是方法調用了：

ldr x17, [x19, #PeerOffset]

結論

當 8.0 以上時，我們使用 ArtMethod 入口替換即可基本滿足 Hook 需求。但如果 8.0 以下，如果不開啓 debug 或者 deoptimize 的話，則必須使用 inline hook，否則會漏掉很多調用。

Android ART invoke 代碼生成

前言

Quick & Optimizing

Optimizing

Sharpening

8.0 之前

代碼生成

8.0 之後

代碼生成

invoke-virtual/interface

InvokeRuntime

Intrinsics

結論

開源高性能結構化日誌模塊NanoLog

杭州的 IT 崩盤了麼？

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

Android ART invoke 代碼生成

SandHook 第四彈，Android Q 支持 & Hidden API & Inline 的特別處理

SandHook 第三彈 - 性能優化 & Xposed 模塊 & 阻止 VM Inline

SandHook 之 Native Inline Hook

Android ART Hook & 注入實現細節

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結