Random Compiler Experiments on Arrays

One day a guy asked me how to print a 2d string array in C. So I coded an example for him. But just for curiosity, I examined the assembly code. In C both string[0][1] and *(*string + 1) are the same. But in reality, the compiler writes the assembly code in 2 different ways. If we use string[0][1] it will directly move the value from the stack. When we dereference a pointer *(*string + 1) it will actually dereference the address pointed inside the register. This happens only in the MinGW GCC compiler. I compiled this using the latest on Windows which is 8.2.0-3 by the time I am writing this.

The assembly code in the left is this one.

#include <stdio.h>

int main() {
    char *string[][2] = { 
     {"Osanda","Malith"},
     {"ABC","JKL"},
     {"DEF","MNO"}, 
};

	printf("%s %s\n", string[0][0], string[0][1]);
}

The assembly code on the right is this.

#include <stdio.h>

int main() {
    char *string[][2] = { 
     {"Osanda","Malith"},
     {"ABC","JKL"},
     {"DEF","MNO"}, 
};

	printf("%s %s\n", **string, *(*string + 1));
}


When I compiled for 64-bit it’s the same output I received under MinGW for Windows. I have included both printf lines in one program here.

Even though they do the same at a high level, at a low level there is a difference. Maybe that’s why there are 2 syntaxes for the same thing. For optimization?
If you write code for hardware, microcontrollers where optimization is important, use the [][] syntax always as it reduces instructions instead of pointer arithmetic.

I checked this under Linux and it’s the same optimization for GCC for both Windows and Linux. Tested with GCC 8.3. GCC is indeed a very intelligent compiler, it knows how to optimize. This disassembly is for 32-bit Linux.

# arr2.c:15: 	printf("%s %s\n", **string, *(*string + 1));
	lea	eax, -36[ebp]	# _1,
	add	eax, 4	# _1,
# arr2.c:15: 	printf("%s %s\n", **string, *(*string + 1));
	mov	edx, DWORD PTR [eax]	# _2, *_1
# arr2.c:15: 	printf("%s %s\n", **string, *(*string + 1));
	lea	eax, -36[ebp]	# string.0_3,
# arr2.c:15: 	printf("%s %s\n", **string, *(*string + 1));
	mov	eax, DWORD PTR [eax]	# _4, MEM[(char * *)string.0_3]
	sub	esp, 4	#,
	push	edx	# _2
	push	eax	# _4
	lea	eax, .LC6@GOTOFF[ebx]	# tmp102,
	push	eax	# tmp102
	call	printf@PLT	#
	add	esp, 16	#,
# arr2.c:16: 	printf("%s %s\n", string[0][0], string[0][1]);
	mov	edx, DWORD PTR -32[ebp]	# _5, string
	mov	eax, DWORD PTR -36[ebp]	# _6, string
	sub	esp, 4	#,
	push	edx	# _5
	push	eax	# _6
	lea	eax, .LC6@GOTOFF[ebx]	# tmp103,
	push	eax	# tmp103
	call	printf@PLT	#
	add	esp, 16	#,
	mov	eax, 0	# _17,

This disassembly is for 64-bit Linux GCC.

# arr2.c:15: 	printf("%s %s\n", **string, *(*string + 1));
	lea	rax, -64[rbp]	# _1,
	add	rax, 8	# _1,
# arr2.c:15: 	printf("%s %s\n", **string, *(*string + 1));
	mov	rdx, QWORD PTR [rax]	# _2, *_1
# arr2.c:15: 	printf("%s %s\n", **string, *(*string + 1));
	lea	rax, -64[rbp]	# string.0_3,
# arr2.c:15: 	printf("%s %s\n", **string, *(*string + 1));
	mov	rax, QWORD PTR [rax]	# _4, MEM[(char * *)string.0_3]
	mov	rsi, rax	#, _4
	lea	rdi, .LC6[rip]	#,
	mov	eax, 0	#,
	call	printf@PLT	#
# arr2.c:16: 	printf("%s %s\n", string[0][0], string[0][1]);
	mov	rdx, QWORD PTR -56[rbp]	# _5, string
	mov	rax, QWORD PTR -64[rbp]	# _6, string
	mov	rsi, rax	#, _6
	lea	rdi, .LC6[rip]	#,
	mov	eax, 0	#,
	call	printf@PLT	#
	mov	eax, 0	# _17,

When you compile in Visual C there’s no special optimization like GCC. It will properly dereference the registers and print it out.

; 15   : 	printf("%s %s\n", **string, *(*string + 1));

	mov	eax, 8
	imul	ecx, eax, 0
	mov	edx, DWORD PTR _string$[ebp+ecx+4]
	push	edx
	mov	eax, 8
	imul	ecx, eax, 0
	lea	edx, DWORD PTR _string$[ebp+ecx]
	mov	eax, 4
	imul	ecx, eax, 0
	mov	edx, DWORD PTR [edx+ecx]
	push	edx
	push	OFFSET $SG4520
	call	_printf
	add	esp, 12					; 0000000cH

; 16   : 	printf("%s %s\n", string[0][0], string[0][1]);

	mov	eax, 8
	imul	ecx, eax, 0
	lea	edx, DWORD PTR _string$[ebp+ecx]
	mov	eax, 4
	shl	eax, 0
	mov	ecx, DWORD PTR [edx+eax]
	push	ecx
	mov	edx, 8
	imul	eax, edx, 0
	lea	ecx, DWORD PTR _string$[ebp+eax]
	mov	edx, 4
	imul	eax, edx, 0
	mov	ecx, DWORD PTR [ecx+eax]
	push	ecx
	push	OFFSET $SG4521
	call	_printf
	add	esp, 12					; 0000000cH

In Borland C it will dereference the registers and print it out and there’s no difference in the two syntaxes.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.