8086 Assembly Language

Six Things You Should Know About the 8086

1) The 8086 is a 16-bit processor.

Because the word size is limited to 16-bits, many data types in C have different sizes then they do on the spice machines or modern PCs. Below is a list of C data types and their sizes.

   Type          | Length  | Range
  ---------------|---------|-----------------------------------------
   unsigned char |  8 bits |                 0 to 255
   char          |  8 bits |              -128 to 127
   enum          | 16 bits |           -32,768 to 32,767
   unsigned int  | 16 bits |                 0 to 65,535
   short int     | 16 bits |           -32,768 to 32,767
   int           | 16 bits |           -32,768 to 32,767
   unsigned long | 32 bits |                 0 to 4,294,967,295
   long          | 32 bits |    -2,147,483,648 to 2,147,483,647

Most importantly, note the range limitation on the int data type. It is also important to realize that any operations on 32-bit data types (long and unsigned long) require from several to very many instructions to perform operations. 32-bit operations in C should be avoided unless absolutely necessary.

The following names are used to refer to data sizes on the 8086:

   Length | Size Name
  --------|-----------------------
   4-bit  | nibble
   8-bit  | byte
   16-bit | word
   32-bit | dword (or doubleword)

2) The 8086 uses little endian format.

This means that the least significant byte of a value is stored first (i.e., at the low address) in memory. This gives the appearance of numbers being stored in memory backwards. For example, the 32-bit value 0x11223344 would be stored as bytes in the following order:

  (low addr)              (high addr)
     0x44    0x33    0x22    0x11

This must be kept in mind when accessing different parts of data.

3) The 8086 uses segmented memory.

A memory address on the 8086 consists of two numbers, usually written in hexadecimal and separated by a colon, representing the segment and the offset. This combination of segment and offset is referred to as a logical address. The segment number refers to a 64 KB block of memory and the offset is an index into the memory segment. For example the address AB10:1024 corresponds to the byte stored in segment 0xAB10 at offset 0x1024.

Both the segment and offset are represented by a 16-bit number, allowing each segment to be 2^16 bytes in size (i.e., 65536 bytes, or 64 KB). This would seem to suggest that the 8086 can address up to 2^32 bytes, or 4 GB, since 32 bits are used for each address. This is NOT the case.

When the processor obtains a logical address (segment and offset), it performs a simple calculation to determine the 20-bit physical address in memory to which the logical address refers:

	physical address = (segment << 4) + offset

This is equivalent to multiplying the segment by 16 and adding the offset (i.e., physical address = segment * 16 + offset). This means that the 64 KB segments overlap, with a new segment starting every 16 bytes. This also means that there can be more than one address for the same memory location. For example, 0000:0100, 0001:00F0, and 0010:0000 all refer to physical address 0x100. There are even more examples we could give for that same memory location.

In this class, you will not usually need to worry about segments because your programs will only deal with the first segment of memory. In this case, you can think of memory as a single continuous piece of memory that is 64 KB in size.

4) Registers in the 8086 have intended uses.

The 8086 has four 16-bit general purpose registers, five 16-bit offset registers for accessing memory, four 16-bit segment registers also for memory access, and a 16-bit flags register. Nine bits of the flags register are accessible to the programmer and each of these bits is referred to as a flag. Each flag either indicates a condition or controls the behavior of certain instructions. For example, the cmp instruction compares two numbers and sets flags based on the relationship between these numbers. Other instructions, such as je (for jump if equal), can then be used, behaving differently depending on the state of the flags previously set by the cmp instruction.

Most instructions only allow certain registers to be used as operands and some instructions require specific registers to be used. Therefore, it is important to be familiar with the different registers and their intended purposes. However, there is still a lot of freedom in what registers can be used. Below is a list of the 8086 registers. This listing can also be obtained in Emu86 by entering "regs". For each register the assembly symbol, name, and intended use are given.

  General Purpose Registers (a.k.a. scratch registers)
    AX (AH,AL)  Accumulator : Main arithmetic register
    BX (BH,BL)  Base        : Generally used as a memory base or offset
    CX (CH,CL)  Counter     : Generally used as a counter for loops
    DX (DH,DL)  Data        : General 16-bit storage, division remainder

  Offset Registers
    IP  Instruction pointer : Current instruction offset
    SP  Stack pointer       : Current stack offset
    BP  Base pointer        : Base for referencing values stored on stack
    SI  Source index        : General addressing, source offset in string ops
    DI  Destination index   : General addressing, destination in string ops

  Segment Registers
    CS  Code segment   : Segment to which IP refers
    SS  Stack segment  : Segment to which SP refers
    DS  Data segment   : General addressing, usually for program's data area
    ES  Extra segment  : General addressing, destination segment in string ops

  Flags Register (Respectively bits 11,10,9,8,7,6,4,2,0)
    OF  Overflow flag  : Indicates a signed arithmetic overflow occurred
    DF  Direction flag : Controls incr. direction in string ops (0=inc, 1=dec)
    IF  Interrupt flag : Controls whether interrupts are enabled
    TF  Trap flag      : Controls debug interrupt generation after instructions
    SF  Sign flag      : Indicates a negative result or comparison
    ZF  Zero flag      : Indicates a zero result or an equal comparison
    AF  Auxiliary flag : Indicates adjustment is needed after BCD arithmetic
    PF  Parity flag    : Indicates an even number of 1 bits
    CF  Carry flag     : Indicates an arithmetic carry occurred

The general purpose registers AX, BX, CX, and DX are 16-bit registers but each can also be used as two separate 8-bit registers. For example, the high (or upper) byte of AX is called AH and the low byte is called AL. The same H and L notation applies to the BX, CX, and DX. Most instructions allow these 8-bit registers as operands.

Registers AX, BX, CX, DX, SI, DI, BP, and SP can be used as operands for most instructions. However, only AX, BX, CX, and DX should be used for general purposes since SI, DI, BP, and SP are usually used for addressing.

5) The 8086 instructions can use register, immediate, and memory operands.

The 8086 is not limited to immediate or register operands. Most instructions also allow memory operands to be used. For example, if a word sized variable were pointed to by the value stored in register BX, the number 3 could be added to it using the following instruction:

	add	word [bx], 3

The brackets indicate that BX is to be used as a pointer to a memory location. The only limitation is that there can be only one memory reference per instruction. For example, the following addition instruction is invalid:

	add	word [bx], word [si]	; Bad instruction!

Instead you would use two instructions:

	mov	ax, [si]	; Load [si] into ax
	add	[bx], ax	; Add to [bx]

6) The 8086 is the ancestor of modern Intel processors.

8086 code runs fine on modern x86 processors, such as the Pentium processors. However, modern x86 code rarely runs on an 8086. When experimenting with 8086 assembly language code, be careful to check the processor on which instructions work. Many instructions have been added since the 8086 was first produced so instructions for newer processors must be avoided. The documentation for this class only covers 8086 instructions.

The 8086 Instruction Set

The 8086 supports many instructions, most of which you do not need to be familiar with. Refer to the documentation when using unfamiliar instructions since many instructions must use or indirectly assume the use of specific registers. A description of the 8086 instructions you should be familiar with can viewed from the following link: 8086 Instruction Set

Referencing Memory

Segment and Offset

Recall that the 8086 uses logical addresses composed of a segment and an offset to reference memory. Every memory reference on the 8086 will use one of the segment registers (i.e., DS, ES, SS, CS, or SS) as the segment combined with an offset (usually given in the instruction) to determine the physical address being referenced. The physical address referenced is always

	physical address = (segment << 4) + offset.

The Effective Address

There are several ways to reference memory locations and specific registers that must be used. A memory reference is placed in brackets to distinguish it from a register or immediate value. In general, memory accesses take the form of the following example:

	mov	ax, [baseReg + indexReg + constant]

This example copies a word sized value into the register AX. Combined, the three parameters in brackets determine what is called the effective address, which is simply the offset referenced by the instruction. The following rules apply:

   baseReg can be:  bp or bx
  indexReg can be:  si or di
  constant can be:  16-bit signed number if combined with registers, as in "mov ax,[bp+2]"
                    16-bit unsigned number if by itself, as in "mov ax,[2]"

Any one or two of the memory access parameters (i.e., constant, baseReg, or indexReg) can be omitted, allowing for several memory access modes.

It is important to realize that the effective address, or offset, does NOT give the complete address for the memory reference. A segment register is either implied or given in the instruction. This topic is discussed in the section Segment Registers below.

Segment Registers

One of the segment registers is always used as the segment when evaluating an address. The available segment registers are the Data Segment (DS), Extra Segment (ES), Stack Segment (SS), or Code Segment (CS). Therefore, you must be aware of which segment register is used when an address is evaluated as part of an instruction. When a memory reference is given in an instruction, the processor sums any baseReg, indexReg, and constant that are given and uses this sum as the offset into the segment.

Which segment register that is used in the address calculation depends on the register that is used for baseReg. The DS register is assumed for the segment unless baseReg is the register BP, in which case SS is assumed. However, any segment register can be explicitly specified using what is called a segment override prefix (discussed below). Also, some special instructions may assume other segment registers.

Segment Overrides

A segment override prefix allows any segment register (DS, ES, SS, or CS) to be used as the segment when evaluating addresses in an instruction. An override is made by adding the segment register plus a colon to the beginning of the memory reference of the instruction as in the following examples:

	mov	ax, [es:60126]          ; Use es as the segment
	mov	ax, [cs:bx]             ; Use cs as the segment
	mov	ax, [ss:bp+si+3]        ; Use ss as the segment

Operand Size

A memory reference can be used as a source or destination operand for most 8086 instructions. Any time a memory reference is given as part of an instruction, the size of the memory operand is either implied or must be specified. For example consider the following instruction:

	mov	ax, [bx]

This instruction will move the word stored at DS:BX and put it into AX. The size of word is implied since the AX register is one word in size. In some cases the size of the operand must be given in order for the assembler to generate an instruction. For example, to increment a variable pointed to by BX, the assembler will not accept the following:

	inc	[bx]		; WRONG!!

This is because it does not know if [bx] addresses a byte or word sized value. So the size of [bx] must be specified, as in the following two examples:

	inc	word [bx]	; Increment word at [bx]
	inc	byte [bx]	; Increment byte at [bx]

It is not necessary to specify the size if one of the operands has a known size, such as a register operand, as in:

	add	al, [bx]	; Assembler knows al is a byte so "byte [bx]" is assumed

Addressing Modes

Here are some examples of the allowed addressing modes:

	xor	cx, [59507]	 ; Direct mode (XOR CX with word at DS:E873)
	push	word [bx]	 ; Register-indirect mode (Push word at DS:BX onto stack)
	mov	ax, [bp-4]	 ; Base mode (Move word at SS:(BP-4) into AX)
	sub	[si+2], bx	 ; Indexed mode (Subtract BX from word at DS:(SI+2))
	not	byte [bp+di]	 ; Base-indexed mode (Invert bits of byte at SS:(BP+DI))	
	add	[bx+si+2], dx	 ; Base-indexed mode with dispacement (Add DX to word at DS:(BX+SI+2))

The five addressing modes available are outlined more precisely for your reference below:

  Direct Mode: [constant]
    constant: 16-bit unsigned value

  Register-Indirect Mode: [register]
    register: bx, si, or di
    Note: bp technically isn't allowed. If used, assembler will generate [bp+0] instead.
  
  Base Mode: [constant + baseReg]
    constant: 8-bit or 16-bit signed value
    baseReg: bp or bx
  
  Indexed Mode: [constant + indexReg]
    constant: 8-bit or 16-bit signed value
    indexReg: si or di

  Base-Indexed Mode: [baseReg + indexReg]
    baseReg: bp or bx
    indexReg: si or di

  Base-Indexed Mode with Displacement: [constant + baseReg + indexReg]
    constant: 8-bit or 16-bit signed value
    baseReg: bp or bx
    indexReg: si or di

NASM Syntax

NASM, or the Netwide Assembler, is the assembler that will be used for this class. In order to use it fully, there are a few things you should now about it. Read the document NASM Syntax for essential information.