Random Compiler Experiments on Arrays

One day a guy asked me how to print a 2d string array in C. So I coded an example for him. But just for curiosity, I examined the assembly code. In C both string[0][1] and *(*string + 1) are the same. But in reality, the compiler writes the assembly code in 2 different ways. If we use string[0][1] it will directly move the value from the stack. When we dereference a pointer *(*string + 1) it will actually dereference the address pointed inside the register. This happens only in the MinGW GCC compiler. I compiled this using the latest on Windows which is 8.2.0-3 by the time I am writing this.

The assembly code in the left is this one.
[code language=”C”]
#include <stdio.h>

int main() {
char *string[][2] = {
{"Osanda","Malith"},
{"ABC","JKL"},
{"DEF","MNO"},
};

printf("%s %s\n", string[0][0], string[0][1]);
}
[/code]

The assembly code on the right is this.
[code language=”C”]
#include <stdio.h>

int main() {
char *string[][2] = {
{"Osanda","Malith"},
{"ABC","JKL"},
{"DEF","MNO"},
};

printf("%s %s\n", **string, *(*string + 1));
}
[/code]

When I compiled for 64-bit it’s the same output I received under MinGW for Windows. I have included both printf lines in one program here.

Even though they do the same at a high level, at a low level there is a difference. Maybe that’s why there are 2 syntaxes for the same thing. For optimization?
If you write code for hardware, microcontrollers where optimization is important, use the [][] syntax always as it reduces instructions instead of pointer arithmetic.

I checked this under Linux and it’s the same optimization for GCC for both Windows and Linux. Tested with GCC 8.3. GCC is indeed a very intelligent compiler, it knows how to optimize. This disassembly is for 32-bit Linux.

[code language=”C”]
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
lea eax, -36[ebp] # _1,
add eax, 4 # _1,
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
mov edx, DWORD PTR [eax] # _2, *_1
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
lea eax, -36[ebp] # string.0_3,
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
mov eax, DWORD PTR [eax] # _4, MEM[(char * *)string.0_3]
sub esp, 4 #,
push edx # _2
push eax # _4
lea eax, .LC6@GOTOFF[ebx] # tmp102,
push eax # tmp102
call printf@PLT #
add esp, 16 #,
# arr2.c:16: printf("%s %s\n", string[0][0], string[0][1]);
mov edx, DWORD PTR -32[ebp] # _5, string
mov eax, DWORD PTR -36[ebp] # _6, string
sub esp, 4 #,
push edx # _5
push eax # _6
lea eax, .LC6@GOTOFF[ebx] # tmp103,
push eax # tmp103
call printf@PLT #
add esp, 16 #,
mov eax, 0 # _17,
[/code]

This disassembly is for 64-bit Linux GCC.

[code language=”C”]
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
lea rax, -64[rbp] # _1,
add rax, 8 # _1,
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
mov rdx, QWORD PTR [rax] # _2, *_1
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
lea rax, -64[rbp] # string.0_3,
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
mov rax, QWORD PTR [rax] # _4, MEM[(char * *)string.0_3]
mov rsi, rax #, _4
lea rdi, .LC6[rip] #,
mov eax, 0 #,
call printf@PLT #
# arr2.c:16: printf("%s %s\n", string[0][0], string[0][1]);
mov rdx, QWORD PTR -56[rbp] # _5, string
mov rax, QWORD PTR -64[rbp] # _6, string
mov rsi, rax #, _6
lea rdi, .LC6[rip] #,
mov eax, 0 #,
call printf@PLT #
mov eax, 0 # _17,

[/code]

When you compile in Visual C there’s no special optimization like GCC. It will properly dereference the registers and print it out.

[code language=”C”]
; 15 : printf("%s %s\n", **string, *(*string + 1));

mov eax, 8
imul ecx, eax, 0
mov edx, DWORD PTR _string$[ebp+ecx+4]
push edx
mov eax, 8
imul ecx, eax, 0
lea edx, DWORD PTR _string$[ebp+ecx]
mov eax, 4
imul ecx, eax, 0
mov edx, DWORD PTR [edx+ecx]
push edx
push OFFSET $SG4520
call _printf
add esp, 12 ; 0000000cH

; 16 : printf("%s %s\n", string[0][0], string[0][1]);

mov eax, 8
imul ecx, eax, 0
lea edx, DWORD PTR _string$[ebp+ecx]
mov eax, 4
shl eax, 0
mov ecx, DWORD PTR [edx+eax]
push ecx
mov edx, 8
imul eax, edx, 0
lea ecx, DWORD PTR _string$[ebp+eax]
mov edx, 4
imul eax, edx, 0
mov ecx, DWORD PTR [ecx+eax]
push ecx
push OFFSET $SG4521
call _printf
add esp, 12 ; 0000000cH
[/code]

In Borland C it will dereference the registers and print it out and there’s no difference in the two syntaxes.

2 thoughts on “Random Compiler Experiments on Arrays

  1. hey man i fallow you by lokking HackingLK episode
    i want to go your path
    help me bro

Leave a Reply