SPO600 Lab6 Inline Assembler Code

This post will be dealing with the scaling of Audio code, however this time we will be dealing with code given to us already by our instructor, we are to run it, and view the runtime and analyze the differences when using one array, two arrays, one dimensional arrays and two dimensional arrays.

Question 1: These variables will be used in our assembler code, so we’re going to hand-allocate which register they are placed in. What is the alternative approach?

Answer 1: The alternative approach is to simply declare variables, have the computer and compiler automatically write registers in the CPU which will allow the program to run faster when more variables can be in the CPU’s registers.

Question 2: Set vol_int to fixed-point representation of 0.75. Should we use 32767 or 32768 in the next line? Why?

Answer 2: The number we will use is 32767 not 32768. The reason for this is because 32767 represents all values, within a 15 bit integer value (compensating for one bit being the fixed point) thus we cannot have 32768.

Question 3: What does it mean to duplicate values in the next line?

Answer 3: The inline assembly instructions duplicate the value of the integer into v1.8h and is done so so that sqdmulh is called and has the necessary value to scale the audio.

Question 4 : What happens if we remove the follow two lines? Why?

Answer 4: If the Input and Output operands are excluded from the inline assembly function, the assembly template does not know where ‘in’ and ‘out’ are.

Question 5: Are the results usable are they correct?

Answer 5: Yes the results are usable, however inaccurate. The runtime of the program is 0.030s which is about 0.003s faster than the third program from the previous workshop.


Part B the Individual task we were assigned was to select an open source package (I have choosen groonga), find the assembly-language code in that software, and determine:

  • How much assembley-language code is present
  • Is the assembly code in its own file (.s or .S) or inline
  • Which platform(s) the assembler is used on
  • What happens on other platforms
  • Why it is there (what it does)
  • Your opinion of the value of the assembler code, especially when contrasted with the loss of portability and increase in complexity of the code.


How must assembly code is present? There is quite a lot of assembly code, __asm__ is utilized more than 30 times in more than 10 files.

Is the assembly code it its own file or inline? Most of the code is written in C, however I was not able to find any specific assembler code it its own .s file and was only able to locate inline assembler code.

Which platforms is the assembler used on? There are many #ifdef statements to section out code for WIN32, WIN64, X86_64, AMD64 so it is quite compatible on many different platforms

What happens on other platforms? From what I tested, it seemed to work okay on the platforms I tested it on. I think it is quite compatable with differeing platforms

Why is it there? It is there for numerous reasons, it is used often for the bsrl function, that functionality computes the position to the most significant bit. When it uses the add function it often utilizes lock which ensures that the CPU has exclusive ownership of the appropriate cache line for the duration of the operation, and provides certain additional ordering guarantees. Many registers are manually moved around with mov. Set and add functions are almost entirely written in assembler utilizing volatile for the expressed purpose of working directly with registers and optimizing the software.

Your opinion of the value of the assembler code, especially when contrasted with the loss of portability and increase in complexity of the code. Of the 10+ files I named that have assembler code, most consist of very basic simple code that consists of 1 to 5 lines. There are only 3 of the 10 files that consist of functions entirely written in assembler. The simple assembler code without a doubt can help improve the optimization while not bogging down development with overtly complex code so that is not an issue. However, there are about 6 or so functions written entirely in assembler, I will display an example here:

static ngx_inline ngx_atomic_uint_t

ngx_atomic_cmp_set(ngx_atomic_t *lock, ngx_atomic_uint_t old,
    ngx_atomic_uint_t set)
    ngx_atomic_uint_t res, temp;

    " li %0, 0 \n"                          /* preset "0" to "res" */
    " lwsync \n"                            /* write barrier */
    "1: \n"
    " ldarx %1, 0, %2 \n"                   /* load from [lock] into "temp" */
                                            /* and store reservation */
    " cmpd %1, %3 \n"                       /* compare "temp" and "old" */
    " bne- 2f \n"                           /* not equal */
    " stdcx. %4, 0, %2 \n"                  /* store "set" into [lock] if reservation */
                                            /* is not cleared */
    " bne- 1b \n"                           /* the reservation was cleared */
    " isync \n"                             /* read barrier */
    " li %0, 1 \n"                          /* set "1" to "res" */
    "2: \n"

    : "=&b" (res), "=&b" (temp)
    : "b" (lock), "b" (old), "b" (set)
    : "cc", "memory");

    return res;

The other functions written entirely of assembly as i mentioned previous are of very similar complexity, while it is of greater complexity to write than C, considering that Groonga is a full text search engine, I do believe one of their goals is to be as optimized as possible so the utilization of assembler code to achieve that is necessary in this instance.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s