This is part of a series of posts analysing the Chip-8 interpreter on the RCA COSMAC VIP computer. These posts may be useful if you are building a Chip-8 interpreter on another platform or if you have an interest in the operation of the COSMAC VIP. For other posts in the series refer to the index or instruction index.
In a previous post I explained how the initialisation section of the Chip-8 interpreter worked. In this post I’ll look at the Call loop (which is the equivalent of a fetch and decode sequence for a machine language).
This part of the interpreter is responsible for fetching the next Chip-8 instruction from memory, decoding it by analysing its parts and then calling the appropriate routine to deal with each instruction type. To understand how this works, we first need to understand how a Chip-8 instruction is structured.
Each Chip-8 instruction is two bytes. Each byte can be broken down further into two nibbles. The first nibble of the first byte is used to indicate one of sixteen possible instruction groups. These are shown in the table below.
First Instruction Nibble | Instruction Group |
0 | Call machine code instructions |
1 | Branch instructionsv |
2 | Call Chip-8 subroutine instructions |
3 | Skip if variable equal to immediate operand instructions |
4 | Skip if variable not equal to immediate operand instructions |
5 | Skip if variable equal to register instructions |
6 | Load variable with immediate operand instructions |
7 | Add immediate operand to variable instructions |
8 | Other arithmetic and logic instructions |
9 | Skip if register not equal to register instructions |
A | Memory indexing instructions |
B | Branch with offset instructions |
C | Random number generation instructions |
D | Display instructions |
E | Skip if key pressed or not pressed instructions |
F | Various I/O (timers and keyboard) and memory instructions |
So the first thing the interpreter must do after it has fetched the first byte of the instruction is decode the most significant four bits to determine which of these instruction groups is to be executed.
Group 0 is dealt with as a special case. These instructions are used to call machine language subroutines from within a Chip-8 programme. If one of these instructions is detected, the handling routine is executed immediately at this point. Basically this forms an address by masking off the most significant digit of the first byte of the Chip-8 instruction, which then becomes the high order byte of the address and then using the second byte of the Chip-8 instruction as the low order byte of the address. The routine at that address is then called. This means Chip-8 can call any address in the on-card RAM (i.e. any address from 0x0000 to 0x0FFF), but routines in any extended RAM can not be called directly. I’ll look further at machine code integration in a future post.
For the remaining instruction groups, the interpreter next sets up two variable pointers. For some instructions the low order nibble of the first byte is used to select a variable, designated VX. For some of these instructions a second variable, designated VY, is indicated by the high order nibble of the second byte. The interpreter gets these values and uses them to construct pointers to where the relevant variables are stored in memory.
Note that these pointers are established even if the instruction will not make use of them. This makes some instructions a little less efficient than they could be in terms of execution time, but the trade off is that the interpreter is more straightforward and compact. In a 2K system, efficient memory use was more important than getting the best possible execution speed.
The interpreter now uses the instruction group code to index a couple of lookup tables that are used to find the address of the handling routine for each instruction group. The routine at this address is then called. When execution is returned to the Chip-8 call routine, it loops back round to fetch the next instruction. Here’s a flowchart of the sequence:
Now here’s the code from the interpreter:
Labels | Code (hex) | Comments |
FETCH_ DECODE_LOOP: | 96 | Get the high order byte of the VX pointer … |
001C | B7 | … and copy this to the high order byte of the VY pointer. |
001D | E2 | Use the stack pointer (R2) for indirect register addressing operations. |
001E | 94 | Copy high order byte of CALL routine pointer (R4) … |
001F | BC | … and copy it to RC (RC will be used later as a pointer into a pair of lookup tables that hold the addresses of the routines that handle each instruction group). |
0020 | 45 | Get the first byte of the next Chip-8 instruction and advance the instruction pointer (R5). |
0021 | AF | Copy first byte of Chip-8 instruction to RF.0. |
0022 | F6 | The next four instructions move the most significant digit of the Chip-8 instruction (first byte) – the instruction group code – to the position of the least significant digit. The least significant digit is discarded. |
0023 | F6 | |
0024 | F6 | |
0025 | F6 | |
0026 | 32 44 | If the instruction group code indicates a machine language call (code 0), jump directly to the routine that handles this. |
0028 | F9 50 | Apply a mask to the instruction group code to turn it into the low-order part of an address that points to an entry in a lookup table (This table is stored from 0x0051 to 0x005F). |
002A | AC | RC now points to the correct entry in a lookup table for the instruction group of the current instruction - this table holds the high order byte of the address of the routine that handles that instruction group. |
002B | 8F | Retrieve the unaltered copy of the first byte of the Chip-8 instruction from RF.0. |
002C | FA 0F | Apply a mask to the least significant digit of the first byte of the Chip-8 instruction to form the low order byte of a pointer to the relevant variable (These variables are stored in the final page of on-card RAM from 0x0XF0 to 0x0XFF). |
0030 | A6 | The VX pointer (R6) now points to the correct variable for this instruction. |
0031 | 05 | Get the second byte of the Chip-8 instruction (do not advance the instruction pointer). |
0032 | F6 | The next four instructions move the most significant digit of the Chip-8 instruction (second byte) - VY - to the position of the least significant digit. The least significant digit is discarded). |
0033 | F6 | |
0034 | F6 | |
0035 | F6 | |
0036 | F9 F0 | Apply a mask to the VY part of the Chip-8 instruction to form the low order byte of a pointer to the relevant variable (These variables are stored in the final page of on-card RAM from 0x0XF0 to 0x0XFF). |
0038 | A7 | The VY pointer (R7) now points to the correct variable for this instruction. |
0039 | 4C | Get high-order byte of routine from look-up table. |
003A | B3 | Store this in the high order byte of the interpreter programme counter (R3). |
003B | 8C | Get the low order byte of the address currently pointed to by RC - this will have been moved on by 1 by the LDA instruction… |
003C | FC 0F | … so, as the corresponding entries in each table are placed 16 bytes apart, it's just necessary to add 0x0F to the address … |
003E | AC | … so that RC now points to the correct place in the second look up table. |
003F | 0C | Get the low order byte of the address from the lookup table. |
CALL_ SUBROUTINE: | A3 | And use this to set the low order byte of the interpreter programme counter (R3). |
0041 | D3 | Now call the interpreter subroutine to handle this instruction group. |
0042 | 30 1B | On return from the subroutine, loop back and get the next Chip-8 instruction. |
FIRST_ DIGIT_0: | 8F | This subroutine is entered when the first digit of the instruction is 0x0. This indicates a call to the machine code routine stored in the remaining three digits of the instruction. The routine starts by retrieving the original first byte of the Chip-8 instruction in RF.0. |
0045 | FA 0F | Use a mask to remove the first digit of the instruction (leaving the high order byte of the address to be called). |
0047 | B3 | Use this to set the high order byte of the interpreter programme counter (R3), as this is also used as the programme counter for machine code routines called with this instruction. |
0048 | 45 | Get the low-order byte of the address to be called directly from memory using the Chip-8 programme counter (R5) and then advance this. |
0049 | 30 40 | Now return to the main fetch and decode loop and call the relevant subroutine. |
004B-004E | There is a short routine here to turn on the COSMAC VIP's display. I'll analyse this in a future post. | |
004F | 00 00 | This is filler before the subroutine address lookup tables so that the last digit of the address for each entry corresponds to the digit that indicates the instruction group (i.e. the entry for instruction group 1 is found at 0x0051, the entry for instruction group 2 at 0x0052, etc.). |
0051 | 01 01 01 01 01 01 01 01 01 01 01 01 00 01 01 | A lookup table holding the high order bytes of the addresses of the subroutines for Chip-8 instruction groups 1 through F. |
0060 | 00 | This is filler between the tables so that the second table is also aligned to instruction group numbers (i.e. 1 is at 0x0061, 2 at 0x0062, etc.). |
0061 | 7C 75 83 88 95 B4 87 BC 91 EB A4 D9 70 99 05 | A lookup table holding the low-order bytes of the addresses of the subroutines for Chip-8 instruction groups 1 through F. So the completed addresses for each digit are: 0x1: 0x017C 0x2: 0x0175 0x3: 0x0183 0x4: 0x018B 0x5: 0x0195 0x6: 0x01B4 0x7: 0x01B7 0x8: 0x01BC 0x9: 0x0191 0xA: 0x01EB 0xB: 0x01A4 0xC: 0x01D9 0xD: 0x0070 0xE: 0x0199 0xF: 0x0105. |
Execution times for the fetch and decode loop are 40 machine cycles (181.6 microseconds) for group 0 instructions and 68 machine cycles (308.72 microseconds) for all other Chip-8 instructions. Note that these are the execution times for the fetch and decode loop only – the execution time for the called routine needs to be added to this to get the total execution time for the instruction.
Contemporary interpreters might not use this fetch and decode algorithm. For example, a fairly common way to select routines for each instruction group is to use a switch statement. Other interpreters might use function pointers, which is closer to the algorithm used here. It also may not be necessary to set up variable pointers this early in a contemporary interpreter.
In future posts I’ll analyse each instruction group, starting with group 0 for machine code integration.
Be First to Comment