C Calling Convention and the 8086

The C Calling Convention and the 8086: Using the Stack Frame

Quick Reference

C Functions on the 8086 (Near Calls)

When a C function call is made, arguments are passed to the function by being pushed one word (i.e., 16-bits) at a time onto the stack in reverse of the order as listed in the C function declaration. All near functions are of the following form (or an equivalent form):

MyFunc:
      push    bp                           ; (1) save bp
      mov     bp, sp                       ; (2) set bp for referencing stack
      sub     sp, {local data size}        ; (3) allocate space for local variables
      ...
      ... <- function body
      ...
      mov     {return reg}, {return value} ; (4) set return value
      mov     sp, bp                       ; (5) free space used by local variables
      pop     bp                           ; (6) restore bp
      ret

The first thing a C function does (1) is save bp. It can then (2) put a copy of sp into bp. Once bp is set up, (3) storage space for any local variables is reserved on the stack. These first three steps set up the stack frame for the function. Since the initial value of sp is saved in register bp, bp can be used as a reference to access the arguments that were pushed onto the stack when the function was called as well as the local variables declared within the function. At the end of the function, if the function returns a value, (4) the return value is placed in the appropriate registers. To clean up, (5) the stack is restored to its initial value, freeing the local data space that was allocated in step 3, and (6) bp is restored, thus removing the stack frame. In order for this system to work properly, bp should not be modified in the function body.

Note 1: If there are no local variables then there is no need to reserve space on the stack (step (3)). If there are no function arguments or local variables then it may not be necessary to set up and restore the stack frame (steps (1)-(3) and (5)-(6)). If there is no return value for a function then step (4) can be omitted.

Note 2: When writing functions in assembly language, the conventions above should always be followed. Also, many compilers (including c86) expect certain registers to be unchanged when a function returns, except for those registers used to save the return value. Therefore, the stack should be used to save and restore registers that are used in the function body. For example if I wanted to use the register bx in a function I should execute "push bx" at the beginning of the function and "pop bx" at the end.

Consider the following C function:

int MyFunc(int arg1, int arg2, int arg3)
{
    int local1;
    int local2;
    int local3;
    ...
    ... <- function body
    ...
    return 3;
}

For this function, compared to the assembly version above, {local data size} would be 6 for the three word-sized variables (or 6 bytes), {return reg} would be ax because MyFunc returns type int, and {return value} would be 3. Access to the arguments and to the local variables is made relative to bp. For example, the following assembly memory references would be used to access the variables in MyFunc():

	[bp+8] -> arg3
	[bp+6] -> arg2
	[bp+4] -> arg1
	[bp+2] -> saved ip (return address)
	[bp]   -> saved bp
	[bp-2] -> local1
	[bp-4] -> local2
	[bp-6] -> local3

Therefore, if you wanted to load variable local2 into register dx, you could use the following instruction:

	mov	dx, word [bp-4]

The numbers used in the memory references change based on the size of the data values. However, the first argument is always located at [bp+4] and the first local variable is always just below [bp] (i.e., at [bp-1] for a byte, at [bp-2] for a word, etc.).

Byte Sized Variables

Byte sized variables can be a point of confusion. When passing arguments to a function, byte sized arguments are always pushed onto the stack as words, since the 8086 push instruction will only push 16-bit values. The least significant byte of the word is used to store the value (i.e., the lower memory address). Similarly, for local variables, a full word is reserved for byte sized variables. However, for byte sized local variables, the part of the word that is used is the most significant byte (i.e., the higher memory address). Using word sizes ensures that there will be no misaligned memory accesses, which can slow performance. It's important to remember when accessing byte sized variables that the unused byte of the word may or may not be zero and should be avoided.

Local Variable Examples

The table below shows the effects that changing the above C function has on local variable locations. In the left column is the original example. The middle column shows the effects of changing local1 from type int to type char. The right column shows the effects of changing local1 from type int to type long. The changes are noted in bold.

Original Int Example

Char Local Variable

Long Local Variable

int MyFunc(...)
{
    int local1;
    int local2;
    int local3;
    ...
    ...
}

int MyFunc(...)
{
    char local1;
    int  local2;
    int  local3;
    ...
    ...
}

int MyFunc(...)
{
    long local1;
    int  local2;
    int  local3;
    ...
    ...
}

[bp-2] -> local1 (word)
[bp-4] -> local2 (word)
[bp-6] -> local3 (word)

[bp-1] -> local1 (byte, [bp-2] is unused)
[bp-4] -> local2 (word)
[bp-6] -> local3 (word)

[bp-4] -> local1 (dword, [bp-2] is high word)
[bp-6] -> local2 (word)
[bp-8] -> local3 (word)

Argument Examples

The table below shows the effects that changing the C function example has on argument locations. In the left column is the original example. The middle column shows the effects of changing arg1 from type int to type char. The right column shows the effects of changing arg1 from type int to type long. The changes are noted in bold.

Original Int Example

Char Argument

Long Argument

int MyFunc(int arg1, int arg2, int arg3)
{
    ...
    ...
}

int MyFunc(char arg1, int arg2, int arg3)
{
    ...
    ...
}

int MyFunc(long arg1, int arg2, int arg3)
{
    ...
    ...
}

[bp+8] -> arg3 (word)
[bp+6] -> arg2 (word)
[bp+4] -> arg1 (word)

[bp+8] -> arg3 (word)
[bp+6] -> arg2 (word)
[bp+4] -> arg1 (byte, [bp+5] is unused)

[bp+10] -> arg3 (word)
[bp+8]  -> arg2 (word)
[bp+4]  -> arg1 (dword, [bp+6] is high word)

Return Values

The following table shows the registers that are used to return values, based on the size of the return type.

    byte             al
    word  (2 bytes)  ax
    dword (4 bytes)  dx::ax (i.e., the high word in dx and the low word in ax).

On the 8086 char types are a byte; short, enum, and int types are a word (2 bytes); and long types are a dword (4 bytes). For larger types (e.g., structs), a more sophisticated method is used to return values.

C Functions on the 8086 (Far Calls)

The following information is supplementary and is not required for any labs.

Far functions are used to make function calls across segment boundaries, which is often required in programs larger than 64 KB. Far calls differ from near calls in that both the CS and IP registers are saved on the stack when the call is made, rather than just saving IP. The only difference between the assembly code of a near function and a far function is that when returning, the far function uses the retf instruction instead of ret. The retf instruction reloads both CS and IP from the stack.

For example, consider the following function:

int far MyFunc(int arg1, int arg2, int arg3)
{
    int local1;
    int local2;
    int local3;
    ...
    ...
    return 3;
}

Note the far keyword in the declaration. Because an extra word is saved on the stack for the return segment when the function call is made, the arguments placed on the stack are offset by one extra word relative to bp (the local variables remain the same relative to bp). The variables for the above function are located as follows:

	[bp+10] -> arg3
	[bp+8]  -> arg2
	[bp+6]  -> arg1
	[bp+4]  -> return CS (segment)
	[bp+2]  -> return IP (offset)
	[bp]    -> saved sp
	[bp-2]  -> local1
	[bp-4]  -> local2
	[bp-6]  -> local3

Compare the locations of arg1, arg2, and arg3 to the locations for the near function example. There may also be a different data segment associated with a far function, in which case the DS register is saved and modified at the beginning of the function then restored at the end. This is necessary so that the global and static data associated with the far function can be correctly accessed.

Quick Reference

For near calls:
Location relative to bp                        | Variable
-----------------------------------------------|------------
...                                            | ...
[bp+4+wsize(arg1)+wsize(arg2)] --------------- | arg3
[bp+4+wsize(arg1)] --------------------------- | arg2
[bp+4] --------------------------------------- | arg1
[bp+2] --------------------------------------- | return address
[bp] ----------------------------------------- | saved bp
[bp-size(local1)] ---------------------------- | local1
[bp-wsize(local1)-size(local2)] -------------- | local2
[bp-wsize(local1)-wsize(local2)]-size(local3)] | local3
...                                            | ...

For far calls:
Location relative to bp        | Variable
-------------------------------|------------
...                            | ...
[bp+6+wsize(arg1)+wsize(arg2)] | arg3
[bp+6+wsize(arg1)] ----------- | arg2
[bp+6] ----------------------- | arg1
[bp+4] ----------------------- | return address segment
[bp+2] ----------------------- | return address offset
{same as above} -------------- | local variables
...                            | ...

Notes: size(X) is the size (in bytes) of X and wsize(X) is the word size of X, i.e., the largest multiple of 2 that is at least as big as size(X). The variables local1, local2, etc. refer to local variables and the variables arg1, arg2, etc. refer to function arguments. The names with a 1 suffix designate those that are declared first in the C source.

Return values are placed in al, ax, or dx:ax for byte, word, and dword sized values respectively.