Firstly for this lab, I will build and compile three different versions of a C program provided to me by my professor. I will have to build and compile the code in both aarch64 as well as x86-64, objdump in both and then analyze the differences.
I will start with aarch64.
For hello1 utilizing the standard prinf function,
0000000000400624 <main>: 400624: a9bf7bfd stp x29, x30, [sp, #-16]! 400628: 910003fd mov x29, sp 40062c: 90000000 adrp x0, 400000 <_init-0x4a8> 400630: 911c0000 add x0, x0, #0x700 400634: 97ffffb7 bl 400510 <puts@plt> 400638: 52800000 mov w0, #0x0 // #0 40063c: a8c17bfd ldp x29, x30, [sp], #16 400640: d65f03c0 ret 400644: 00000000 .inst 0x00000000 ; undefined
Puts is the defining feature of the hello c program utilizing the standard printf function to print hello world to the screen.
For hello2 utilizing the direct write to stdout,
0000000000400624 <main>: 400624: a9bf7bfd stp x29, x30, [sp, #-16]! 400628: 910003fd mov x29, sp 40062c: 90000000 adrp x0, 400000 <_init-0x4a8> 400630: 911c2000 add x0, x0, #0x708 400634: d28001a2 mov x2, #0xd // #13 400638: aa0003e1 mov x1, x0 40063c: 52800020 mov w0, #0x1 // #1 400640: 97ffffb4 bl 400510 <write@plt> 400644: 52800000 mov w0, #0x0 // #0 400648: a8c17bfd ldp x29, x30, [sp], #16 40064c: d65f03c0 ret
For the hello2 c program utilizing write, we see in the assembler code that write, not puts is used, also we see that there are three mov operations before the write function, showing its less than optimized operation.
For hello3 utilizing the direct kernal system call to write to file descriptor,
0000000000400624 <main>: 400624: a9bf7bfd stp x29, x30, [sp, #-16]! 400628: 910003fd mov x29, sp 40062c: 90000000 adrp x0, 400000 <_init-0x4a8> 400630: 911c4000 add x0, x0, #0x710 400634: 528001a3 mov w3, #0xd // #13 400638: aa0003e2 mov x2, x0 40063c: 52800021 mov w1, #0x1 // #1 400640: d2800800 mov x0, #0x40 // #64 400644: 97ffffb3 bl 400510 <syscall@plt> 400648: 52800000 mov w0, #0x0 // #0 40064c: a8c17bfd ldp x29, x30, [sp], #16 400650: d65f03c0 ret 400654: 00000000 .inst 0x00000000 ; undefined
For this hello 3 program, we see the assembler code utilizes syscall, this also entails 4 mov operations before the syscall, one more mov function than the previous program utilzing write, for an even less optimized version.
Now I will build and compile the three C programs in x86_64 and analyze how assembly behaves.
0000000000400507 <main>: 400507: 55 push %rbp 400508: 48 89 e5 mov %rsp,%rbp 40050b: bf b0 05 40 00 mov $0x4005b0,%edi 400510: e8 0b ff ff ff callq 400420 <puts@plt> 400515: b8 00 00 00 00 mov $0x0,%eax 40051a: 5d pop %rbp 40051b: c3 retq 40051c: 0f 1f 40 00 nopl 0x0(%rax)
So in xerxes (x86) for hello 1 we can see there are many more operations before the functions such as push, move and the function call than there were for aarch64. While it still uses puts to help optimize the code, the additional operations required before the functions of assembler to run further hamper the optimization of the code.
0000000000400507 <main>: 400507: 55 push %rbp 400508: 48 89 e5 mov %rsp,%rbp 40050b: ba 0d 00 00 00 mov $0xd,%edx 400510: be c0 05 40 00 mov $0x4005c0,%esi 400515: bf 01 00 00 00 mov $0x1,%edi 40051a: e8 01 ff ff ff callq 400420 <write@plt> 40051f: b8 00 00 00 00 mov $0x0,%eax 400524: 5d pop %rbp 400525: c3 retq 400526: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 40052d: 00 00 00
For hello2 which utilizes write instead of printf, we see there are 3 more lines of code, the same amount of excess lines seen between hello and hello2 when built and compiled in aarch64, however there are obviously many more operations occurring before the actual functions take effect. This adds for and additional decrease in optimization.
0000000000400507 <main>: 400507: 55 push %rbp 400508: 48 89 e5 mov %rsp,%rbp 40050b: b9 0d 00 00 00 mov $0xd,%ecx 400510: ba c0 05 40 00 mov $0x4005c0,%edx 400515: be 01 00 00 00 mov $0x1,%esi 40051a: bf 01 00 00 00 mov $0x1,%edi 40051f: b8 00 00 00 00 mov $0x0,%eax 400524: e8 f7 fe ff ff callq 400420 <syscall@plt> 400529: b8 00 00 00 00 mov $0x0,%eax 40052e: 5d pop %rbp 40052f: c3 retq
For hello3 whch utilizes the syscall function, similarly like when we compiled the programs in aarch64, there is an extra mov function in the assembly code. This is the most least optimized code that we have documented, with the extra operations before the function calls, as well as the additional mov function.
For the last part of the group assignment we had to create an assembler program that looped to 30 while ensuring it reads both decimal and ASCII characters. We are provided with an initial Hello World program, as well as a basic loop program to work off of.
I firstly created the program for x86_64:
For the sake of not making the post too long I will only include snippets of code. For the code we had start by initializing the index as the counter for the loop. We then choose a register to become our new variable, and thus set this variable to 0 by initializing the value to “0x30”, this is 0 in hexadecimal which gets converted to ASCII, then you must initialize the remainder.
mov $0x30, %r12 /*Initialize to 0 in ASCII*/ mov $0, %rdx /*Initialize the remainder*/
We must then have to set the dividend as the index counter which. We then set the operator that perform the division. We then call the operator and divide
mov %r15, %rax /*Set dividend*/ mov $10, %r10 /*Set the divisor*/ div %r10 /*divide*/ mov %rax, %r14 /*store quotient*/ mov %rdx, %r13 /*store remainder*/ add $0x30, %r14 /*convert first digit from quotient into ascii*/ add $0x30, %r13 /*convert second digit from remainder into ascii*/
After we set the division, we proceed to store the quotient and remainder in two registers. We then convert the value of the quotient and remainder into ASCII.
We then modify the message variable with the remainder and compare the initial value of r12 to r14 which holds the first digit. We then modify the message but add the quotient this time around.
Within the print section we set the length and value of the string. It then increments the value of the loop register and compares it to the max value of the loop register. If the loop register does not equal max then it loops back again, once it does equal the max it changes syscall to exit.
mov $len, %rdx /*Length*/ mov $msg, %rsi /*Message*/ mov $1, %rdi /*stdout*/ mov $1, %rax /*change syscall to 1*/ syscall inc %r15 /*Increments r15*/ cmp $max, %r15 /*compare r15 with the max value*/ jne loop /*goes to the loop section*/ mov $0, %rdi /*exit status*/ mov $60, %rax /*syscall 60 = exit*/ syscall
Now we had to do the same thing but for aarch64. It is similar to the x86_64 code as they follow the same logic of the movement of info, however there are minor differences. x86-64 has to call increment every time it goes through a loop but in aarch64 you do not need to.
Programming in assembly was exceedingly difficult, not only because it was learning a new language, but also the heavy reliance on manually setting registers, space in memory etc. adds so many more lines of code and command, and also opens up the avenue for many more memory errors to occur. The fact that it is low level was a learning curve as in the Seneca College CPA program as the lowest level languages we have been exposed to would be RPG, CLLE or C. The logic is completely different, the very archaic feel of the language was also a hurdle to overcome, but oddly enough the compiler error checker and message was similar to that if you were to debug with gcc (which is an option although I avoided doing so as it required the use of main instead of _start).
I definitely did not enjoy coding in assembly, but I do understand it is a valuable learning experience to understand how code operates with machines on a much lower level than we are use to.
Link to x86_64: https://www.pastiebin.com/5a80b40e2a53f
Link to aarch64: https://www.pastiebin.com/5a80b45741ffe