One day a guy asked me how to print a 2d string array in C. So I coded an example for him. But just for curiosity, I examined the assembly code. In C both string[0][1] and *(*string + 1) are the same. But in reality, the compiler writes the assembly code in 2 different ways. If we use string[0][1] it will directly move the value from the stack. When we dereference a pointer *(*string + 1) it will actually dereference the address pointed inside the register. This happens only in the MinGW GCC compiler. I compiled this using the latest on Windows which is 8.2.0-3 by the time I am writing this.
The assembly code in the left is this one.
[code language=”C”]
#include <stdio.h>
int main() {
char *string[][2] = {
{"Osanda","Malith"},
{"ABC","JKL"},
{"DEF","MNO"},
};
printf("%s %s\n", string[0][0], string[0][1]);
}
[/code]
The assembly code on the right is this.
[code language=”C”]
#include <stdio.h>
int main() {
char *string[][2] = {
{"Osanda","Malith"},
{"ABC","JKL"},
{"DEF","MNO"},
};
printf("%s %s\n", **string, *(*string + 1));
}
[/code]
When I compiled for 64-bit it’s the same output I received under MinGW for Windows. I have included both printf lines in one program here.
Even though they do the same at a high level, at a low level there is a difference. Maybe that’s why there are 2 syntaxes for the same thing. For optimization?
If you write code for hardware, microcontrollers where optimization is important, use the [][] syntax always as it reduces instructions instead of pointer arithmetic.
I checked this under Linux and it’s the same optimization for GCC for both Windows and Linux. Tested with GCC 8.3. GCC is indeed a very intelligent compiler, it knows how to optimize. This disassembly is for 32-bit Linux.
[code language=”C”]
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
lea eax, -36[ebp] # _1,
add eax, 4 # _1,
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
mov edx, DWORD PTR [eax] # _2, *_1
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
lea eax, -36[ebp] # string.0_3,
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
mov eax, DWORD PTR [eax] # _4, MEM[(char * *)string.0_3]
sub esp, 4 #,
push edx # _2
push eax # _4
lea eax, .LC6@GOTOFF[ebx] # tmp102,
push eax # tmp102
call printf@PLT #
add esp, 16 #,
# arr2.c:16: printf("%s %s\n", string[0][0], string[0][1]);
mov edx, DWORD PTR -32[ebp] # _5, string
mov eax, DWORD PTR -36[ebp] # _6, string
sub esp, 4 #,
push edx # _5
push eax # _6
lea eax, .LC6@GOTOFF[ebx] # tmp103,
push eax # tmp103
call printf@PLT #
add esp, 16 #,
mov eax, 0 # _17,
[/code]
This disassembly is for 64-bit Linux GCC.
[code language=”C”]
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
lea rax, -64[rbp] # _1,
add rax, 8 # _1,
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
mov rdx, QWORD PTR [rax] # _2, *_1
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
lea rax, -64[rbp] # string.0_3,
# arr2.c:15: printf("%s %s\n", **string, *(*string + 1));
mov rax, QWORD PTR [rax] # _4, MEM[(char * *)string.0_3]
mov rsi, rax #, _4
lea rdi, .LC6[rip] #,
mov eax, 0 #,
call printf@PLT #
# arr2.c:16: printf("%s %s\n", string[0][0], string[0][1]);
mov rdx, QWORD PTR -56[rbp] # _5, string
mov rax, QWORD PTR -64[rbp] # _6, string
mov rsi, rax #, _6
lea rdi, .LC6[rip] #,
mov eax, 0 #,
call printf@PLT #
mov eax, 0 # _17,
[/code]
When you compile in Visual C there’s no special optimization like GCC. It will properly dereference the registers and print it out.
[code language=”C”]
; 15 : printf("%s %s\n", **string, *(*string + 1));
mov eax, 8
imul ecx, eax, 0
mov edx, DWORD PTR _string$[ebp+ecx+4]
push edx
mov eax, 8
imul ecx, eax, 0
lea edx, DWORD PTR _string$[ebp+ecx]
mov eax, 4
imul ecx, eax, 0
mov edx, DWORD PTR [edx+ecx]
push edx
push OFFSET $SG4520
call _printf
add esp, 12 ; 0000000cH
; 16 : printf("%s %s\n", string[0][0], string[0][1]);
mov eax, 8
imul ecx, eax, 0
lea edx, DWORD PTR _string$[ebp+ecx]
mov eax, 4
shl eax, 0
mov ecx, DWORD PTR [edx+eax]
push ecx
mov edx, 8
imul eax, edx, 0
lea ecx, DWORD PTR _string$[ebp+eax]
mov edx, 4
imul eax, edx, 0
mov ecx, DWORD PTR [ecx+eax]
push ecx
push OFFSET $SG4521
call _printf
add esp, 12 ; 0000000cH
[/code]
In Borland C it will dereference the registers and print it out and there’s no difference in the two syntaxes.
Nice one. I saw this article on somewhere. But didn’t read it. Today randomly found it. 🙂
hey man i fallow you by lokking HackingLK episode
i want to go your path
help me bro