Socca Systems Embedded Linux Systems & Software

How the hell-o world?!

We have all seen the canonical C language “hello world” program, but how many of us have really looked into how this simple program works?

#include <stdio.h>
int main(int argc, char **argv) {
  printf("hello world\n");
}

I invite you to follow along as I go down the rabbit hole…

Simple as hello world

The first step is to compile the program and run it:

$ gcc -o hello hello.c
$

Running the program produces the expected greeting:

$ ./hello
hello world
$

To understand what the program does let’s first use strace which displays the system calls invoked.

Surprisingly, more than 30 system calls show up in the output of strace ./hello (run on an Ubuntu 16.04 system).

This does not seem right.

A deeper look

A review of these 30 system calls reveals that only three of them are directly related to the program functionality:

execve("./hello", ["./hello"], [/* 74 vars */]) = 0
write(1, "hello world\n", 12)           = 12
exit_group(0)                           = ?

And most of the additional “cruft” is due the dynamic linker.

This is visible in the trace when the dynamic linker opens and loads the GNU C library:

open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1868984, ...}) = 0
...
close(3)                                = 0

Let’s compile the program again, but this time link the executable statically:

$ gcc -static -o hello hello.c
$

The resulting executable works as expected, and the strace output is considerably simplified:

execve("./hello", ["./hello"], [/* 74 vars */]) = 0
uname({sysname="Linux", nodename="apollo", ...}) = 0
brk(NULL)                               = 0x107b000
brk(0x107c1c0)                          = 0x107c1c0
arch_prctl(ARCH_SET_FS, 0x107b880)      = 0
readlink("/proc/self/exe", "/home/laurent/soccasys/projects/"..., 4096) = 57
brk(0x109d1c0)                          = 0x109d1c0
brk(0x109e000)                          = 0x109e000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 18), ...}) = 0
write(1, "hello world\n", 12)           = 12
exit_group(0)                           = ?

However, at 840 kilo-bytes (once stripped), the size of the executable file is completely off the scale!

For reference 6312 bytes (bytes, not kilo-bytes) is the size of the stripped dynamically-linked executable.

That seems execessive just to print a friendly greeting.

What is in the executable?

To understand the content of the executable created by GCC from the “hello world” source code let’s have a look at the content of an ELF executable.

A useful document to look at is the Unix System V ABI which describes, among other things, the ELF file format.

In the description of the sections of an ELF executable here is what caught my eye:

  • .init This section holds executable instructions that contribute to the process initialization code. That is, when a program starts to run, the system arranges to execute the code in this section before calling the main program entry point (called main for C programs).

Disassembling the .init section of the “hello world” executable reveals that the sole purpose of the code in that section is to call a function named __libc_start_main:

$ objdump --section=.init --disassemble hello
    
hello:     file format elf64-x86-64
    
Disassembly of section .init:

00000000004003c8 <.init>:
  4003c8:	48 83 ec 08          	sub    $0x8,%rsp
  4003cc:	48 8b 05 25 0c 20 00 	mov    0x200c25(%rip),%rax        # 600ff8 <__libc_start_main@plt+0x200be8>
  4003d3:	48 85 c0             	test   %rax,%rax
  4003d6:	74 05                	je     4003dd <puts@plt-0x23>
  4003d8:	e8 43 00 00 00       	callq  400420 <__libc_start_main@plt+0x10>
  4003dd:	48 83 c4 08          	add    $0x8,%rsp
  4003e1:	c3                   	retq
$

As the GNU C library is Free sofware it is possible to look for the soure code of that function. The source repository can be found at:

The implementation of the __libc_start_main function is found in the source file csu/libc-start.c. For the curious reader that source code is visible there:

Also for the curious reader here is an alternative implementation of the __libc_start_main function by the musl C library:

Reading this source code it is obvious that quite a bit happens before the main function execution starts, and that this is the root-cause of the excessive size of the statically linked executable.

What can be done about this?

Hello world on a diet

From now on I am setting out to shrink this executable until its size is consistent with the purpose it serves.

For this purpose we:

  • Remove the call to _libc_start_main
  • Replace the call to printf with a call to write

A quick search reveals the following in the GCC link options:

  • -nostartfiles Do not use the standard system startup files when linking. The standard system libraries are used normally, unless -nostdlib or -nodefaultlibs is used.

Here is the updated source code:

#include <unistd.h>
int main(int argc, char **argv) {
        write(1, "hello world\n", 12);
}

Trying to compile our hello.c source code does not work:

$ gcc -static -nostartfiles -o hello hello.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400350
$

It turns out that when the -nostartfiles option is used a function named _start is the first function executed. The source code must be updated as follows:

#include <unistd.h>
void _start() {
        write(1, "hello world\n", 12);
}

Surely we are close to our goal.

Hello … segmentation fault

This code is now compiling, the stripped size is down to 1592 bytes, but it crashes with a segmentation fault:

$ ./hello
hello world
Segmentation fault (core dumped)
$

Where is this SIGSEGV signal coming from?

It turns out that the _start function must not return but instead must conclude with an exit system call.

So let’s try again:

#include <unistd.h>
#include <stdlib.h>
void _start() {
        write(1, "hello world\n", 12);
        exit(0);
}

This code looks right, but unfortunately there are two more surprises waiting for us:

  1. The executable size is back to over 800 Kilo-bytes!
  2. The executable is still finishing with a SIGSEGV!

A quick research shows that to invoke only the exit system call without any of the add-on functionality of the standard C library (such as atexit handling) we must use _exit(0).

Here’s the next iteration of that code:

#include <unistd.h>
#include <stdlib.h>
void _start() {
        write(1, "hello world\n", 12);
        _exit(0);
}

This time the hello world program is working again (yay!) and the size of the executable is reasonable again at 6 kilo-bytes unstripped (YAY!).

Also the output of strace accurately reflects the functionality of the program:

$ strace ./hello
execve("./hello", ["./hello"], [/* 74 vars */]) = 0
write(1, "hello world\n", 12)           = 12
exit_group(0)                           = ?
+++ exited with 0 +++
$

So there you have it:

  • a Linux “hello world” program really boils down is two system calls, write and exit, nothing else.

Using raw system calls

The careful reader will have noticed that the size of the executable went from 1592 bytes to about 6 kilo-bytes just by adding _exit(0).

The root-cause of this size difference is that both the _exit() and write() functions are part of the C library.

A short introduction to the calling convention for the various CPU architectures supported by Linux can be found in the manual page for syscall(2):

Here is one more iteration of the source code for “hello world”:

#include <unistd.h>
#include <sys/syscall.h>
void _start() {
        syscall(SYS_write, 1, "hello world\n", 12);
        syscall(SYS_exit, 0);
}

The program is functional, and the stripped executable size is 1224 bytes.

This, at long last, sounds almost reasonable.

Down to the metal

For the curious reader here are working examples of how to implement “hello world” using asssembly language for two popular CPU architectures.

Using intel/x86_64 assembly:

void _start() {
        const char *message = "hello world\n";
        /* Write system call */
        asm volatile(
        "movq $1, %%rdi\n"      /* stdout       */
        "movq %[buf], %%rsi\n" /* write buffer  */
        "movq $12, %%rdx\n"     /* size         */
        "movq $1, %%rax\n"      /* write syscall */
        "syscall\n"
        : /* output */ : [buf] "r" (message)
        );
        /* Exit system call */
        asm volatile(
        "movq $0, %rdi\n" /* success    */
        "movq $60, %rax\n" /* exit */
        "syscall\n"
        );
}

Using ARM assembly:

void _start() __attribute__ ((naked));
void _start() {
	const char *message = "hello world\n";
	/* Write system call */
	asm volatile(
    	"mov r0, #1\n" 	/* stdout    	*/
    	"mov r1, %[buf]\n" /* write buffer  */
    	"mov r2, #12\n"	/* size      	*/
    	"mov r7, #4\n" 	/* write syscall */
    	"svc #0\n"
    	: /* output */ : [buf] "r" (message)
	);
	/* Exit system call */
	asm volatile(
    	"mov r0, #0\n" /* success 	*/
    	"mov r7, #1\n" /* exit */
    	"svc #0\n"
	);
}

One more thing…

As I was researching for this article I came across this interesting article which tries many more techniques to reduce the size of a simple program, including abusing the ELF format for executables:

I found this article very entertaining, and instructive.

Such is the beauty of Linux and Free Software that if you have the curiosity there are no limits to what you can study and learn by yourself.

Have fun, Laurent.

Feedback, comments, questions welcome at:

Metadata and Navigation

Be social and share this post!