Last post I talked about one of the annoying features of the amd64 ABI – the optional frame pointer. Today, I’ll examine the much more painful problem of argument passing on amd64. For sake of discussion, I’ll avoid structure passing and floating point – nasty little kinks in the problem.
Argument Passing on i386
On i386, all arguments are passed on the stack. Before establishing a frame, the caller pushes each argument to the function in reverse order. This gives you this stack layout:
...
arg1
arg0 |
return PC |
previous frame |
%ebp
current frame
%esp |
If you want to access the third argument, you simply reference 16(%ebp) (8 for the frame + 8 to skip first two args). This makes debugging a breeze. For any given frame pointer (easy to find thanks to the i386 ABI), we can always find the initial arguments to the function. Another trick we use is that nearly every function call is followed by a addl x, %esp instruction. Using this information, we can figure out how many arguments were passed to the function, without relying on CTF or STABS data. Putting this all together, it’s easy to get a meaningful stack trace:
> a76de800::findstack -v
stack pointer for thread a76de800: a77c5dd4
[ a77c5dd4 0xfe81994d() ]
a77c5dec swtch+0x1cb()
a77c5e10 cv_wait_sig+0x12c(a78a79b0, a6c57028)
a77c5e70 cte_get_event+0x4d()
a77c5ea4 ctfs_endpoint_ioctl+0xc2()
a77c5ec4 ctfs_bu_ioctl+0x2f()
a77c5ee4 fop_ioctl+0x1e(a79a7980, 63746502, 80d3f48, 102001, a69daf08, a77c5f74)
a77c5f80 ioctl+0x19b()
a77c5fac sys_call+0x16e()
Arguments Passing on AMD64
Enter amd64. As previously mentioned, the amd64 ABI was designed primarily for for performance, not debugging. The architects decided that pushing arguments on the stack was expensive, and that with 16 general purpose registers, we might as use some of them to pass arguments. Specifically, we have:
arg0 |
%rdi |
arg1 |
%rsi |
arg2 |
%rdx |
arg3 |
%rcx |
arg4 |
%r8 |
arg5 |
%r9 |
argN |
8*(N-4)(%ebp) |
This is an disaster for debugging. Debugging tools that operate in-place (DTrace and truss) can get meaningful arguments, but cannot know how many there are. Tools which examine a stack trace (pstack, mdb) cannot get arguments for any frame. The arguments may or may not be pushed on the stack, or they could be lost completely. If we try to get a stack with arguments, we find:
> ffffffff8af1c720::findstack -v
stack pointer for thread ffffffff8af1c720: ffffffffb2a51af0
ffffffffb2a51d00 vpanic()
ffffffffb2a51d30 0xfffffffffe972ae3()
ffffffffb2a51d60 exitlwps+0x1f1()
ffffffffb2a51dd0 proc_exit+0x40()
ffffffffb2a51de0 exit+9()
ffffffffb2a51e40 psig+0x2bc()
ffffffffb2a51ee0 post_syscall+0x7d5()
ffffffffb2a51f00 syscall_exit+0x5d()
ffffffffb2a51f10 sys_syscall32+0x1d8()
The solution
The solution, as envisioned by the amd64 ABI designers, is to rely on DWARF to get the necessary information. If you have ever read the DWARF spec, you know that it a gigantic, ugly beast – an interpreted language that can be used to mine virtually any debugging data in an abstract manner. The problem here is that it requires significantly more work than on i386, and it requires debugging information to be present in the target object.
Implementing a DWARF interpreter is technically quite doable. We even had one brave soul go so far as to implement a limited DWARF disassembler capable of grabbing arguments for functions. But it turns out that the sheer amount of data we would have to add to the kernel to enable this was prohibitive. The bloat would have pushed us past the limit of the miniroot, not to mention the increased memory footprint and necessary changes to krtld and KMDB. That’s not to say we won’t support it in userland some day.
The lack of an argument count is a less serious. DTrace doesn’t need to know how many arguments there are. For the moment, truss simply shows the first 6 arguments always. But truss could be enhanced to use CTF and/or DWARF data to determine the number of arguments to a given function. But it probably won’t happen any time soon.
Workaround
Given that there will be no solution to this problem any time soon, you may ask how one can do any kind of debugging at all. The answer is “painfully”. I’ll walk through an example of finding the arguments to a function, using the following stack:
> ffffffff8356c100::findstack -v
stack pointer for thread ffffffff8356c100: ffffffffb2bbdb10
[ ffffffffb2bbdb10 _resume_from_idle+0xe4() ]
ffffffffb2bbdb40 swtch+0xc9()
ffffffffb2bbdb90 cv_wait_sig+0x170()
ffffffffb2bbdc50 cte_get_event+0xb0()
ffffffffb2bbdc70 ctfs_endpoint_ioctl+0x7e()
ffffffffb2bbdc80 ctfs_bu_ioctl+0x32()
ffffffffb2bbdc90 fop_ioctl+0xb()
ffffffffb2bbdd70 ioctl+0xac()
ffffffffb2bbde00 dosyscall+0x12b()
ffffffffb2bbdf00 trap+0x1308()
>
Let’s say that we want to know the first argument to fop_ioctl(), which is a vnode. The first step is to look at the caller and see where the argument came from:
> ioctl+0xac::dis -n 6
------> ioctl+0x8e: movq 0x10(%r12),%rdi
ioctl+0x93: movq 0x1a0(%rax),%r8
ioctl+0x9a: leaq -0xcc(%rbp),%r9
ioctl+0xa1: movq %r15,%rdx
ioctl+0xa4: movl %r13d,%esi
------> ioctl+0xa7: call +0xeed99 <fop_ioctl>
ioctl+0xac: testl %eax,%eax
ioctl+0xae: movl %eax,%ebx
ioctl+0xb0: jne +0x74 <ioctl+0x124>
ioctl+0xb2: cmpl $0x8004667e,%r13d
ioctl+0xb9: je +0x27 <ioctl+0xe0>
ioctl+0xbb: movl %r14d,%edi
ioctl+0xbe: call -0x1408e <releasef>
We can see that %rdi (the first argument) came from %r12. Looks like we lucked out – %r12 must be preserved by the function being called. So we look at fop_ioctl():
> fop_ioctl::dis
fop_ioctl: movq 0x40(%rdi),%rax
fop_ioctl+4: pushq %rbp
fop_ioctl+5: movq %rsp,%rbp
fop_ioctl+8: call *0x28(%rax)
fop_ioctl+0xb: leave
fop_ioctl+0xc: ret
No dice. We can see that %r12 (as well as %rdi) is still active at this point. Let’s keep looking:
> ctfs_bu_ioctl::dis ! grep r12
> ctfs_endpoint_ioctl::dis ! grep r12
> cte_get_event::dis ! grep r12
cte_get_event+0x13: pushq %r12
cte_get_event+0x32: movq 0x20(%rdi),%r12
...
Finally, we found a function that preserves %r12. Taking a closer look at cte_get_event():
> cte_get_event::dis -n 8
cte_get_event: pushq %rbp
cte_get_event+1: movq %rsp,%rbp
cte_get_event+4: pushq %r15
cte_get_event+6: movl %esi,%r15d
cte_get_event+9: pushq %r14
cte_get_event+0xb: movq %rcx,%r14
cte_get_event+0xe: pushq %r13
cte_get_event+0x10: movl %r9d,%r13d
cte_get_event+0x13: pushq %r12
We can see that %r12 was pushed fourth after establishing the frame pointer. This would put it 32 bytes below %rbp for this frame. Remembering that what was really passed was 0x10(%r12), we can finally find our original argument:
> ffffffffb2bbdc50-20/K
0xffffffffb2bbdc30: ffffffff8330ec88
> ffffffff8330ec88+10/K
0xffffffff8330ec98: ffffffff83a5f600
> ffffffff83a5f600::print vnode_t v_path
v_path = 0xffffffff83978c40 "/system/contract/process/pbundle"
Whew. We can see that we have the proper vnode, since the path references a /system/contract file. And all it took was about 12 steps! You can see how this has become such a pain for us kernel developers. From the above example, you can see the approximate method is:
-
Determine where the argument came from in the caller. Hopefully, you will find something that came from the stack, or one of the callee-saved registers (%r12-%r15). If not, look at the function and see if the argument was pushed on the stack or moved somewhere more permanent. This doesn’t happen often, so it may be that your argument is lost forever.
-
If the argument came from a callee-saved register, examine every function in the stack until you find one that saves the value.
-
By this point, you’ve hopefully found a place where the value is stored relative to %ebp. Using the frame pointers displayed in the stack trace, fetch the value from the stack.
This is not always guaranteed to work, and is obviously a royal pain. In my next post, I’ll go into some future ideas we have to make this (and other debugging) better.