How the hell-o world?!
20 May 2018 · Filed in HowWe have all seen the canonical C language “hello world” program, but how many of us have really looked into how this simple program works?
#include <stdio.h>
int main(int argc, char **argv) {
printf("hello world\n");
}
I invite you to follow along as I go down the rabbit hole…
Simple as hello world
The first step is to compile the program and run it:
$ gcc -o hello hello.c
$
Running the program produces the expected greeting:
$ ./hello
hello world
$
To understand what the program does let’s first use strace
which displays the system calls invoked.
Surprisingly, more than 30 system calls show up in the
output of strace ./hello
(run on an Ubuntu 16.04 system).
This does not seem right.
A deeper look
A review of these 30 system calls reveals that only three of them are directly related to the program functionality:
execve("./hello", ["./hello"], [/* 74 vars */]) = 0
write(1, "hello world\n", 12) = 12
exit_group(0) = ?
And most of the additional “cruft” is due the dynamic linker.
This is visible in the trace when the dynamic linker opens and loads the GNU C library:
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1868984, ...}) = 0
...
close(3) = 0
Let’s compile the program again, but this time link the executable statically:
$ gcc -static -o hello hello.c
$
The resulting executable works as expected, and the strace
output is
considerably simplified:
execve("./hello", ["./hello"], [/* 74 vars */]) = 0
uname({sysname="Linux", nodename="apollo", ...}) = 0
brk(NULL) = 0x107b000
brk(0x107c1c0) = 0x107c1c0
arch_prctl(ARCH_SET_FS, 0x107b880) = 0
readlink("/proc/self/exe", "/home/laurent/soccasys/projects/"..., 4096) = 57
brk(0x109d1c0) = 0x109d1c0
brk(0x109e000) = 0x109e000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 18), ...}) = 0
write(1, "hello world\n", 12) = 12
exit_group(0) = ?
However, at 840 kilo-bytes (once stripped), the size of the executable file is completely off the scale!
For reference 6312 bytes (bytes, not kilo-bytes) is the size of the stripped dynamically-linked executable.
That seems execessive just to print a friendly greeting.
What is in the executable?
To understand the content of the executable created by GCC from the “hello world” source code let’s have a look at the content of an ELF executable.
A useful document to look at is the Unix System V ABI which describes, among other things, the ELF file format.
In the description of the sections of an ELF executable here is what caught my eye:
.init
This section holds executable instructions that contribute to the process initialization code. That is, when a program starts to run, the system arranges to execute the code in this section before calling the main program entry point (calledmain
for C programs).
Disassembling the .init
section of the “hello world” executable reveals
that the sole purpose of the code in that section is to call a function
named __libc_start_main
:
$ objdump --section=.init --disassemble hello
hello: file format elf64-x86-64
Disassembly of section .init:
00000000004003c8 <.init>:
4003c8: 48 83 ec 08 sub $0x8,%rsp
4003cc: 48 8b 05 25 0c 20 00 mov 0x200c25(%rip),%rax # 600ff8 <__libc_start_main@plt+0x200be8>
4003d3: 48 85 c0 test %rax,%rax
4003d6: 74 05 je 4003dd <puts@plt-0x23>
4003d8: e8 43 00 00 00 callq 400420 <__libc_start_main@plt+0x10>
4003dd: 48 83 c4 08 add $0x8,%rsp
4003e1: c3 retq
$
As the GNU C library is Free sofware it is possible to look for the soure code of that function. The source repository can be found at:
The implementation of the __libc_start_main
function is found in the source file csu/libc-start.c
.
For the curious reader that source code is visible there:
Also for the curious reader here is an alternative implementation of the __libc_start_main
function
by the musl C library:
Reading this source code it is obvious that quite a bit happens before the
main
function execution starts, and that this is the root-cause of the excessive
size of the statically linked executable.
What can be done about this?
Hello world on a diet
From now on I am setting out to shrink this executable until its size is consistent with the purpose it serves.
For this purpose we:
- Remove the call to
_libc_start_main
- Replace the call to
printf
with a call towrite
A quick search reveals the following in the GCC link options:
-nostartfiles
Do not use the standard system startup files when linking. The standard system libraries are used normally, unless-nostdlib
or-nodefaultlibs
is used.
Here is the updated source code:
#include <unistd.h>
int main(int argc, char **argv) {
write(1, "hello world\n", 12);
}
Trying to compile our hello.c
source code does not work:
$ gcc -static -nostartfiles -o hello hello.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400350
$
It turns out that when the -nostartfiles
option is used a function named _start
is
the first function executed. The source code must be updated as follows:
#include <unistd.h>
void _start() {
write(1, "hello world\n", 12);
}
Surely we are close to our goal.
Hello … segmentation fault
This code is now compiling, the stripped size is down to 1592 bytes, but it crashes with a segmentation fault:
$ ./hello
hello world
Segmentation fault (core dumped)
$
Where is this SIGSEGV
signal coming from?
It turns out that the _start
function must not return but instead must conclude with an
exit
system call.
So let’s try again:
#include <unistd.h>
#include <stdlib.h>
void _start() {
write(1, "hello world\n", 12);
exit(0);
}
This code looks right, but unfortunately there are two more surprises waiting for us:
- The executable size is back to over 800 Kilo-bytes!
- The executable is still finishing with a
SIGSEGV
!
A quick research shows that to invoke only the exit
system call without any of the
add-on functionality of the standard C library (such as atexit
handling) we must use _exit(0)
.
Here’s the next iteration of that code:
#include <unistd.h>
#include <stdlib.h>
void _start() {
write(1, "hello world\n", 12);
_exit(0);
}
This time the hello world program is working again (yay!) and the size of the executable is reasonable again at 6 kilo-bytes unstripped (YAY!).
Also the output of strace
accurately reflects the functionality of the program:
$ strace ./hello
execve("./hello", ["./hello"], [/* 74 vars */]) = 0
write(1, "hello world\n", 12) = 12
exit_group(0) = ?
+++ exited with 0 +++
$
So there you have it:
- a Linux “hello world” program really boils down is two system calls,
write
andexit
, nothing else.
Using raw system calls
The careful reader will have noticed that the size of the executable went from
1592 bytes to about 6 kilo-bytes just by adding _exit(0)
.
The root-cause of this size difference is that both the _exit()
and write()
functions are part of the C library.
A short introduction to the calling convention for the various CPU architectures
supported by Linux can be found in the manual page for syscall(2)
:
Here is one more iteration of the source code for “hello world”:
#include <unistd.h>
#include <sys/syscall.h>
void _start() {
syscall(SYS_write, 1, "hello world\n", 12);
syscall(SYS_exit, 0);
}
The program is functional, and the stripped executable size is 1224 bytes.
This, at long last, sounds almost reasonable.
Down to the metal
For the curious reader here are working examples of how to implement “hello world” using asssembly language for two popular CPU architectures.
Using intel/x86_64 assembly:
void _start() {
const char *message = "hello world\n";
/* Write system call */
asm volatile(
"movq $1, %%rdi\n" /* stdout */
"movq %[buf], %%rsi\n" /* write buffer */
"movq $12, %%rdx\n" /* size */
"movq $1, %%rax\n" /* write syscall */
"syscall\n"
: /* output */ : [buf] "r" (message)
);
/* Exit system call */
asm volatile(
"movq $0, %rdi\n" /* success */
"movq $60, %rax\n" /* exit */
"syscall\n"
);
}
Using ARM assembly:
void _start() __attribute__ ((naked));
void _start() {
const char *message = "hello world\n";
/* Write system call */
asm volatile(
"mov r0, #1\n" /* stdout */
"mov r1, %[buf]\n" /* write buffer */
"mov r2, #12\n" /* size */
"mov r7, #4\n" /* write syscall */
"svc #0\n"
: /* output */ : [buf] "r" (message)
);
/* Exit system call */
asm volatile(
"mov r0, #0\n" /* success */
"mov r7, #1\n" /* exit */
"svc #0\n"
);
}
One more thing…
As I was researching for this article I came across this interesting article which tries many more techniques to reduce the size of a simple program, including abusing the ELF format for executables:
I found this article very entertaining, and instructive.
Such is the beauty of Linux and Free Software that if you have the curiosity there are no limits to what you can study and learn by yourself.
Have fun, Laurent.
Feedback, comments, questions welcome at: