Porting Bugs
November 17, 2005 5:52Today I worked with a developer from another corporation on porting some C programs they wrote for a SPARC (with Solaris) to our machines, that are all based on Intel x86. It when pretty much straightforward, we were ready to face the endianess problem (for network communication and binary files reading). But we found a very interesting bug, that didn't manifest on SPARC but fired immediately on Intel. The souce was more or less:
int Function1(int paramA, int paramB)
{
if (paramA == 0)
{
//bad value
return 0;
}
if (paramB != 1 || paramB != 2)
{
//bad value
return 0;
}
//do stuff
//call another function
Function2();
}
int main()
{
//...
if (Function1(a, b) == 0)
printf("Error: Function1 returned 0\n");
//...
}
Can you already spot the bug? We did, after a little "tracing" (printf) (I know, is the worst type of debugging.. but we were compiling over ssh, in a shell, and it proved to be a quick solution).
If you have already guessed what's next, you can stop reading and go to see the solution. If not, you can enjoy a trip in the calling conventions of two very different platforms: a CISC and a rich-register RISC.
I know, defining a Pentium IV as a CISC is not quite correct. Pentium processors have now a superscalar architecture, unordered execution, and internelly they use micro-ops that are very RISC-like. And they have a tons of registers (anybody knows how many, exactly?). But the "programming interface", i.e. the instruction set and register set exposes by the CPU to the world is CISC. It has only four general purpose registers, and a rich instruction set.
On the other side, SPARC architecture, instruction set and register set are very different. Even fuction calling conventions on SPARC are deeply influenced by its early hardware design. Since the instruction set was reduced (in the first version there wasn't even an instruction to divide two numbers), engineers tought to use the spared die space to fill it with a lot of register. And in order to use them efficiently, they organized them in several "register windows".
On x86, every function has its own stack space (the stack frame). On SPARC every function has also its own "register space", a register window. A register window is a set of about 24 register, 8 %i (input) registers, 8 %o (output) and 8 %l (local). The registers are mapped in a clever way: out registers in the caller become in registers in the callee, so that the first 8 parameters are passed through the 8 out-in registres. The first one has a special additional semantic:
%o0 (r08) [3] outgoing parameter 0 / return value from callee
%i0 (r24) [3] incoming parameter 0 / return value to caller
This register, unlike the other input registers, is assumed by caller to be preserved across a procedure call
This mapping is done with the help of the SAVE and RESTORE functions; the prolog/epilog structure on SPARC so it is like:
function:
save %sp, -K, %sp
; perform function
; put return value in register %i0 (if the function returns a value)
ret
restore
It is important to note that in the SAVE instruction the source operands (first two parameters) are read from the old register window, and the destination operand (the rightmost parameter) is written to the new window. "%sp" is used as both source and destination, but the result is written into the stack pointer of the new window, which is a different register (the stack pointer of the old window is renamed and becomes the frame pointer).
The call sequence is:
call
mov 10, %o0
The delay slot is often filled with an instruction to set some parameters, in this example it loads the first parameter.
What is a delay slot?
When the control flow is altered (by a unconditional jump instruction, like call, jmpl or branch) the order of execution is inverted: the instruction after the unconditional jump instruction is being fetched simultaneously to the unconditional jump instruction.
This is done to implement a simple pipeline solution: when performing a jump instruction, there is already another instruction in the pipeline: the instruction in the delay slot, will be executed before the processor actually has a chance to jump to the new location. This is the simplest scheme for the chip designer to implement, since the general pipelining mechanism can be used without making any exceptions for transfer instructions; and it is the fastest way of arranging things, since no instructions are discarded.
The SPARC assembly language programmer must be aware of delay slots constantly while coding any change in flow of control, since the order of instruction execution is reversed from the order in which the instructions appear.
When the control flow is altered (by a unconditional jump instruction, like call, jmpl or branch) the order of execution is inverted: the instruction after the unconditional jump instruction is being fetched simultaneously to the unconditional jump instruction.
This is done to implement a simple pipeline solution: when performing a jump instruction, there is already another instruction in the pipeline: the instruction in the delay slot, will be executed before the processor actually has a chance to jump to the new location. This is the simplest scheme for the chip designer to implement, since the general pipelining mechanism can be used without making any exceptions for transfer instructions; and it is the fastest way of arranging things, since no instructions are discarded.
The SPARC assembly language programmer must be aware of delay slots constantly while coding any change in flow of control, since the order of instruction execution is reversed from the order in which the instructions appear.
Returing to our problem, when function a calls function b %o registers in function a become %i in function b.
If b do not touch %i0 (it reaches the end of the function without a return statement) %i0 / return value remains equal to the incoming parameter.
In the original code, parameterA is always != 0 (otherwise, the function returns with value 0), so the fuction returns to the caller with return value != 0 (uqual to the first parameter).
What happens when porting to Intel? Intel calling conventions are very different. Remeber, x86 has only 4 general purpose registers accessible through assembly language, so it has to pass parameters using the stack:
- Arguments are pushed on the stack in reverse order.
- The caller pops arguments after return.
- Primitive data types, except floating point values, are returned in EAX or EAX:EDX depending on the size.
- float and double are returned in fp0, i.e. the first floating point register.
- Simple data structures with 8 bytes or less in size are returned in EAX:EDX.
- Complex Class objects are returned in memory.
- When a return is made in memory the caller passes a pointer to the memory location as the first parameter. The callee populates the memory, and returns the pointer. The caller pops the hidden pointer together with the rest of the arguments.
If no C return statement is found, when the control flow reaches the end of the function the ret
instruction is executed and the control returns to the caller. The return value, i.e. the one in the EAX register, is the last value that register assumed... possibly the return value of the last function called.
In our case, the last function before the end (Function2) returned 0, and so our Function1.
What is the lesson? Compiling and testing on one platform (even with a cross complier, like the GCC we used) isn't enough. What surprised me was the absolute absence of compiler errors or warnings. Maybe I am too used to tools that give you warnings if you do something wrong =) but that's the way I like them. When a compiler seems to be too strict or to give you too much warning, remember that an error detected by the complier is a bug less in your product.