通過C代碼和反彙編工具研究ARM指令。

教程目標：

生成了Thumb指令還是ARM指令，如何通過編譯參數改變；
對於ARM指令，能否產生條件執行的指令；
設計C的代碼場景，觀察是否產生了寄存器移位尋址；
設計C的代碼場景,觀察一個複雜的32位數是如何裝載到寄存器的;
寫一個C的多重函數調用的程序，觀察和分析:
調用時的返回地址在哪裏？
傳入的參數在哪裏？
本地變量的堆棧分配是如何做的？
寄存器是caller保存還是callee保存？是全體保存還是部分保存？
MLA是帶累加的乘法，嘗試要如何寫C的表達式能編譯得到MLA指令。

教程器材及軟件：

樹莓派的板子。
SD卡（已經有鏡像刷入）。
電源線及USB充電器。
U盤或USB硬盤
putty和psftp。
有DHCP的網線。

步驟：

首先寫一段簡單的C代碼：

#include<stdio.h>

int main(int argc,char** argv)
{
    int a=0x12345678;
    printf("a:%d\n",a);
    return 0;
}

如果要將其編譯成ARM指令的，那麼，默認就好了。然後，再用objdump出來看看。
```
gcc -o 1.o -c 1.c
objdump -d 1.o
```

我們可以看到指令是32位的。

1.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <main>:
   0:   e92d4800        push    {fp, lr}
   4:   e28db004        add     fp, sp, #4
   8:   e24dd010        sub     sp, sp, #16
   c:   e50b0010        str     r0, [fp, #-16]
  10:   e50b1014        str     r1, [fp, #-20]
  14:   e59f3020        ldr     r3, [pc, #32]   ; 3c <main+0x3c>
  18:   e50b3008        str     r3, [fp, #-8]
  1c:   e59f301c        ldr     r3, [pc, #28]   ; 40 <main+0x40>
  20:   e1a00003        mov     r0, r3
  24:   e51b1008        ldr     r1, [fp, #-8]
  28:   ebfffffe        bl      0 <printf>
  2c:   e3a03000        mov     r3, #0
  30:   e1a00003        mov     r0, r3
  34:   e24bd004        sub     sp, fp, #4
  38:   e8bd8800        pop     {fp, pc}
  3c:   12345678        .word   0x12345678
  40:   00000000        .word   0x00000000

如果要將其編譯成Thumb指令的話，就要像下面這樣子。如果，不加-mfloat-abi=softfp，會報錯。好像和浮點運算VFP 的ABI沒有有關係。
```
gcc -o 1.o -c 1.c -mthumb -mfloat-abi=softfp
objdump -d 1.o
```

我們可以看到指令是16位的。

1.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <main>:
   0:   b580            push    {r7, lr}
   2:   b084            sub     sp, #16
   4:   af00            add     r7, sp, #0
   6:   6078            str     r0, [r7, #4]
   8:   6039            str     r1, [r7, #0]
   a:   4b06            ldr     r3, [pc, #24]   ; (24 <main+0x24>)
   c:   60fb            str     r3, [r7, #12]
   e:   4a06            ldr     r2, [pc, #24]   ; (28 <main+0x28>)
  10:   68fb            ldr     r3, [r7, #12]
  12:   1c10            adds    r0, r2, #0
  14:   1c19            adds    r1, r3, #0
  16:   f7ff fffe       bl      0 <printf>
  1a:   2300            movs    r3, #0
  1c:   1c18            adds    r0, r3, #0
  1e:   46bd            mov     sp, r7
  20:   b004            add     sp, #16
  22:   bd80            pop     {r7, pc}
  24:   12345678        .word   0x12345678
  28:   00000000        .word   0x00000000

再寫一個程序2.c：

int max(int a,int b)
{
    if(a>b)
        return a;
    else
        return b;
}

2.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <max>:
   0:   e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
   4:   e28db000        add     fp, sp, #0
   8:   e24dd00c        sub     sp, sp, #12
   c:   e50b0008        str     r0, [fp, #-8]
  10:   e50b100c        str     r1, [fp, #-12]
  14:   e51b2008        ldr     r2, [fp, #-8]
  18:   e51b300c        ldr     r3, [fp, #-12]
  1c:   e1520003        cmp     r2, r3
  20:   da000001        ble     2c <max+0x2c>
  24:   e51b3008        ldr     r3, [fp, #-8]
  28:   ea000000        b       30 <max+0x30>
  2c:   e51b300c        ldr     r3, [fp, #-12]
  30:   e1a00003        mov     r0, r3
  34:   e28bd000        add     sp, fp, #0
  38:   e8bd0800        pop     {fp}
  3c:   e12fff1e        bx      lr

上面有些跳轉指令，但是沒有條件執行指令。

我們讓gcc對代碼進行優化：
```
gcc -o 2.o -c 2.c -O1
```

2.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <max>:
   0:   e1510000        cmp     r1, r0
   4:   a1a00001        movge   r0, r1
   8:   b1a00000        movlt   r0, r0
   c:   e12fff1e        bx      lr

代碼就變得非常短了，而且也可以明顯的看到，條件執行指令。

寫一個簡單的程序3.c：

int fun(int p[],int index)
{
    return p[index];
}

gcc -o 3.o -c 3.c -O1
objdump -d 3.o
3.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <fun>:
   0:   e7900101        ldr     r0, [r0, r1, lsl #2]
   4:   e12fff1e        bx      lr

再寫一個簡單的程序4.c:

int fun(void)
{
    return 0x12345678;
}

```
gcc -o 4.o -c 4.c -O1
objdump -d 4.o

4.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <fun>:
   0:   e59f0000        ldr     r0, [pc]        ; 8 <fun+0x8>
   4:   e12fff1e        bx      lr
   8:   12345678        .word   0x12345678
```
它的做法很簡單，將32位數放在指令的附近，然後load一下就可以了。加上有cache的存在，這樣的方案可能比將數字拆分成16位再load進來要快，而且它只執行了1條指令。實驗了一下，發現load64位數，它也是將數放在指令附近然後load兩次。
再寫一個不簡單的程序5.c（gcc的優化能力實在是太強了，要寫一個程序就看出所有的這些，真是不容易啊。）：

#include<stdio.h>
int bb(int a,int b,int c,int d,int e,int f)
{
    printf("Hello world!\n");
    return a*b*c*d*e*f;
}
int cc(int a,int b,int c,int d,int e,int f,int g,int h,int i,int j,int k)
{
    int t1=a+b;
    int t2=c+d;

    int t3=e+f;
    int t4=g+h;
    int t5=i+j;

    bb(1,2,3,4,5,6);
    int t6=t1*t2;
    int t7=t3*t4;
    int t8=t6-t7;
    int t9=t8*t5*k;

    return t9;
    //return a*b*c*d*e*f*g*h*i*j*k;
}

gcc -o 5.o -c 5.c -O1
objdump -d 5.o
5_1.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <bb>:
   0:   e92d40f8        push    {r3, r4, r5, r6, r7, lr}
   4:   e1a04000        mov     r4, r0//r0-r3會被用作作爲傳參數的寄存器，如果不夠就會用堆棧裏的。
   8:   e1a05001        mov     r5, r1
   c:   e1a06002        mov     r6, r2
  10:   e1a07003        mov     r7, r3
  14:   e59f0020        ldr     r0, [pc, #32]   ; 3c <bb+0x3c>
  18:   ebfffffe        bl      0 <puts>
  1c:   e0040495        mul     r4, r5, r4
  20:   e0060496        mul     r6, r6, r4
  24:   e0070697        mul     r7, r7, r6
  28:   e59d6018        ldr     r6, [sp, #24]
  2c:   e0070796        mul     r7, r6, r7
  30:   e59d001c        ldr     r0, [sp, #28]
  34:   e0000790        mul     r0, r0, r7
  38:   e8bd80f8        pop     {r3, r4, r5, r6, r7, pc}
  3c:   00000000        .word   0x00000000

00000040 <cc>:
  40:   e92d41f0        push    {r4, r5, r6, r7, r8, lr}//這個說明caller save r0-r3,lr,callee save r4-r8,
//另外，返回地址就在lr上，如果該函數要表用別的函數的話，lr會被推入堆棧。
  44:   e24dd008        sub     sp, sp, #8
  48:   e0804001        add     r4, r0, r1
  4c:   e0825003        add     r5, r2, r3
  50:   e59d3024        ldr     r3, [sp, #36]   ; 0x24
  54:   e59d7020        ldr     r7, [sp, #32]
  58:   e0877003        add     r7, r7, r3
  5c:   e59d302c        ldr     r3, [sp, #44]   ; 0x2c
  60:   e59d6028        ldr     r6, [sp, #40]   ; 0x28
  64:   e0866003        add     r6, r6, r3
  68:   e59d3034        ldr     r3, [sp, #52]   ; 0x34
  6c:   e59d8030        ldr     r8, [sp, #48]   ; 0x30
  70:   e0888003        add     r8, r8, r3
  74:   e3a03005        mov     r3, #5
  78:   e58d3000        str     r3, [sp]
  7c:   e3a03006        mov     r3, #6
  80:   e58d3004        str     r3, [sp, #4]
  84:   e3a00001        mov     r0, #1
  88:   e3a01002        mov     r1, #2
  8c:   e3a02003        mov     r2, #3
  90:   e3a03004        mov     r3, #4
  94:   ebfffffe        bl      0 <bb>
  98:   e0040495        mul     r4, r5, r4
  9c:   e0060796        mul     r6, r6, r7
  a0:   e0664004        rsb     r4, r6, r4
  a4:   e0080498        mul     r8, r8, r4
  a8:   e59d0038        ldr     r0, [sp, #56]   ; 0x38
  ac:   e0000890        mul     r0, r0, r8
  b0:   e28dd008        add     sp, sp, #8
  b4:   e8bd81f0        pop     {r4, r5, r6, r7, r8, pc}
//至於本地變量的存放問題，因爲，開啓了優化，本地變量都放在寄存器裏面了。如果，不要優化，就可以看到它是先用低地址，再用高地址。

再寫一個簡單的程序6.c：

int fun(int a,int b,int c)
{
    return a*b+c;
}

gcc -o 6.o -c 6.c -O1
objdump -d 6.o

6.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <fun>:
   0:   e0202091        mla     r0, r1, r0, r2
   4:   e12fff1e        bx      lr

備註：

此爲浙江大學計算機學院嵌入式系統實驗報告。

logicworldzju

發佈了35 篇原創文章 · 獲贊 2 · 訪問量 5萬+

私信關注

Lab3 ARM指令

教程目標：

教程器材及軟件：

步驟：

備註：

vim中的調試和補全（windows平臺）

Lab樹莓派中的看門狗

Lab樹莓派實現airplay

Lab1:初見樹莓派(Raspberry)(windows平臺)

Lab1.1樹莓派上網，ssh和遠程桌面

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結