Linux command line args with no runtime
I have been working on a simple curl app in pure rust with the challenge of not being allowed to use any external dependencies, this includes anything like the C runtime. One of the things this app needed to do is parse command line arguments so a url can be passed to it curl http://google.com
.
Where are they?
Most places I looked said argc would be in the rdi
register and argv would be in the rsi
register. But trying these resulted in endless segfaults, not exactly what I wanted. After more than a few hours of trying different registers and other ways of getting args I stumbled upon a stack overflow answer where they inspected the registers and stack using gdb.
Inspecting with GDB
Getting gdb to work with rust wa as simple as building with the -g
flag. After setting a breakpoint at _start
and running I executed the following gdb commands.
(gdb) info registers
rsi 0x0 0
rdi 0x0 0
The registers rsi
and rdi
were both empty, this explains the segfaults I was getting as I was trying to dereference a null pointer. The next command prints the first 2 64-bit values on the stack.
(gdb) x/2g $sp
0x7fffffffd8a0: 1 140737488345857
This looked a lot more promising. I didn't pass any extra arguments when running it so a argument count of 1 is expected. The second value should then be the pointer to the array of pointers pointing to the arguments.
(gdb) x 140737488345857
0x7fffffffdb01: 0x502f662f746e6d2f
(gdb) x 0x502f662f746e6d2f
0x502f662f746e6d2f: Cannot access memory at address 0x502f662f746e6d2f
Turns out it wasn't a pointer. What if I try print it as a string?
(gdb) x/s 140737488345857
0x7fffffffdb01: "/mnt/f/Projects/pure_rust_curl/main"
Testing it with the arguments hello
and world
resulted in the following.
(gdb) x/5g $sp
0x7fffffffd880: 3 140737488345845
0x7fffffffd890: 140737488345881 140737488345887
0x7fffffffd8a0: 0
(gdb) x/s 140737488345845
0x7fffffffdaf5: "/mnt/f/Projects/pure_rust_curl/main"
(gdb) x/s 140737488345881
0x7fffffffdb19: "hello"
(gdb) x/s 140737488345887
0x7fffffffdb1f: "world"
That's what I wanted. The reason I was struggling so much at the start is when I tried loading argv from the stack I was treating it as a char**
like the C runtime makes it. This resulted in dereferencing a null pointer. The way it's actually laid out is argc at the top of the stack followed by a number of pointers to the argument strings then a null pointer.
Getting these values in Rust
Now I know where they are, accessing them from Rust shouldn't be to hard, right? It should be as easy as getting the pointer to the argument count and then using some pointer math to get the values. Since the argument count is at the top of the stack all I need is the stack pointer which is in the rsp
register.
pub unsafe extern "C" fn _start() {
let argc_ptr: *const usize;
asm!("mov {}, rsp", out(reg) argc_ptr);
let first_arg = *argc_ptr.add(8);
// ...
}
28 pub unsafe extern "C" fn _start() {
(gdb) info registers rsp
rsp 0x7fffffffd8a0 0x7fffffffd8a0 # Correct value
(gdb) step
Breakpoint 1, main::_start () at main.rs:31
31 asm!("mov {}, rsp", out(reg) argc_ptr);
(gdb) step
32 let first_arg = *argc_ptr.add(8);
(gdb) info local
argc_ptr = 0x7fffffffd878 # Incorrect value
However when stepping though in gdb we can see that the initial rsp
value is not the same as what ends up in the local variable.
(gdb) x/8g $sp
0x7fffffffd878: 93824992235552 140737488345208
0x7fffffffd888: 0 0
0x7fffffffd898: 140737354019530 1
0x7fffffffd8a8: 140737488345857 0
This seems to be because stuff has been added to the stack after the start of the _start
function but before my first line. To try and get the original value I moved rsp
in to r8
instead of creating a local variable which I thought could be messing with things.
pub unsafe extern "C" fn _start() {
asm!("mov r8, rsp");
}
28 pub unsafe extern "C" fn _start() {
(gdb) info registers rsp r8
rsp 0x7fffffffd8a0 0x7fffffffd8a0 # Correct value
r8 0x0 0
(gdb) step
Breakpoint 1, main::_start () at main.rs:29
29 asm!("mov r8, rsp");
(gdb) info registers rsp r8
rsp 0x7fffffffd898 0x7fffffffd898 # Value changes
r8 0x0 0
(gdb) step
34 syscalls::exit(0);
(gdb) info registers rsp r8
rsp 0x7fffffffd898 0x7fffffffd898
r8 0x7fffffffd898 140737488345240 # Wrong value
Here r8
should be 0x7fffffffd8a0
but for some reason in between the app starting and moving rsp
in to r8
, stuff is being added to the stack, therefore changing rsp
.
_start:
pushq %rax ; Not my instruction
#APP
movq %rsp, %r8 ; My instruction
#NO_APP
xorl %eax, %eax
The assembly emitted by rustc shows some other instruction being ran before mine. The value of the rax
register is being pushed on to the stack. Remember this for later.
(gdb) info registers rax
rax 0x1c 28
(gdb) step
29 asm!("mov r8, rsp");
(gdb) x/8g $sp
0x7fffffffd898: 28 1
0x7fffffffd8a8: 140737488345857 0
0x7fffffffd8b8: 140737488345893 140737488345909
0x7fffffffd8c8: 140737488345933 140737488345956
Going back to gdb shows that it is in fact the rax
register being push onto the stack.
(gdb) x/s 140737488345857
0x7fffffffdb01: "/mnt/f/Projects/pure_rust_curl/main"
Then printing the 3rd element as a string shows that it is just the single 28 from rax
being pushed. This means I should be able offset the pointer by 8 bytes to get the argument count I want.
(gdb) info registers rsp r8
rsp 0x7fffffffd8a0 0x7fffffffd8a0
r8 0x0 0
# ...
(gdb) info registers rsp r8
rsp 0x7fffffffd878 0x7fffffffd878
r8 0x7fffffffd878 140737488345208
Not quite. The difference between the initial stack pointer (to argc) and the pointer being moved in to r8
is 40 bytes, lucky for me, all I have to do is add some extra bytes.
(gdb) info locals
first_arg_ptr = 0x7fffffffd8a0
(gdb) x 0x7fffffffd8a0
0x7fffffffd8a0: 0x00000001 # Argument count
And there we have it. It only took 2 days.
(gdb) run hello world
# ...
(gdb) info locals
first_arg_ptr = 0x7fffffffd880
(gdb) x 0x7fffffffd880
0x7fffffffd880: 0x00000003
Just to make sure it works I tested adding some arguments and it worked like a charm.
One last problem
Remember that weird extra instruction than runs before mine from earlier? It changes. This means the 40 byte offset I was doing also needs to be changed.
_start:
- pushq %rax ; Previous not my instruction
+ subq $24, %rsp ; New not my instruction
#APP
movq %rsp, %r8 ; My first instruction
#NO_APP
For some reason when I start adding logic like checking the value of argc and exiting if not enough arguments were provided, the compiler feels the need to change this instruction. According to the x64 Cheat Sheet this line allocates 24 bytes on the stack. After some more time in gdb I found the new offset was 88 bytes. By keeping the code in _setup
to a minimum and moving everything else in to a separate main
function I should't have to keep changing this offset, hopefully.
Another one last problem
The instruction changes depending on wether I'm compiling with or without the -g
flag for gdb symbols.
_start:
- subq $24, %rsp ; without -g
+ subq $88, %rsp ; with -g
#APP
movq %rsp, %r8
#NO_APP
When I have been emitting assembly I have always been doing it without -g
as it adds a bunch of information for gdb which I didn't care about when viewing the assembly, but I had still always been running the app with -g
and therefore never noticed th change. At least now the magic 88 offset makes sense. Unfortunately I don't know a way around this aside from swapping out offsets of 88 bytes and 24 bytes depending on wether I going to use gdb or not.
The result
$ ./main https://google.com
https://google.com
My app can now repeat the users first argument back to them and even return a error if one is not provided. Truly a marvel of engineering. If you're interested go ahead and checkout the amazing code.