Emulators and Emulation


I enjoy writing emulators for antique computers! Thus far, I have written emulators for the DEC PDP-8/e which runs the DEC OS called OS/8, the Zilog Z80A which runs CP/M 2.2 and the DEC LSI-11 which, thus far, won't run any OS. I wrote all three in ANSI C and hope that they are portable. I very carefully considered the "endian-ness" problem at every juncture and, I must admit, I seemed to have done it correctly. When I moved the original code for the Z80A emulator from a big-endian machine (Amiga) to a little-endian machine (IBM Aptiva), everything worked without a change to a single line of code. When I sent the original source code developed on the Amiga for the PDP-8/e emulator to my friend Jim Van Zee at the University of Washington in Seattle for him to compile on a PC clone machine, that also worked with no endian problems.

Of the three emulators, the DEC PDP-8/e and Z80A emulators seem to work quite well. The Z80A emulator has passed some pretty severe CPU tests which vigorously test ALL the bits of the Flags byte (even the undocumented bits 3 and 5). The PDP-8/e emulator has passed all the DEC MAINDEC diagnostics that have been tried on it to date. I have attempted no serious debugging on the LSI-11 emulator. My only experiment to date has been to type in the ENTIRE binary for fig-Forth. I triple checked the fig-Forth code for data errors, then started the emulator. It printed out the fig-Forth banner, then promptly crashed! I have not seriously looked at it since.

Concerning my emulator philosophy: I write emulators to run sufficiently fast to do actual work. I do not look upon them as toys or curiosities. Each of my emulators is written ENTIRELY in C. I wanted NO assembler code, yet I wanted speed. As you probably know, these are conflicting goals. I tried for the middle ground by NOT using switch statements to interpret the emulated code. I use a dispatch table which is indexed by the emulated instruction itself. For example, the Z80A instruction: LD BC,1234H has an opcode byte of 01H followed by the bytes 34H and 12H (the Z80A is a little-endian machine). The instruction byte (01H) is used to index into the instruction dispatch table which then sends control directly to the 01H instruction emulator function which retrieves the next two bytes, advances the PC and stores the bytes in the emulated B and C registers. Since the dispatch table consists of addresses of functions, a machine using byte-sized opcodes will have a dispatch table of 256 entries. Now, the Z80A has several opcodes which signal the presence of additional instruction bytes. For example, the 0DDH opcode is followed by an additional instruction byte. In this case, the initial dispatch is to a function which retrieves the next instruction byte, then uses another dispatch table to locate the instruction emulator function. All of this seems like it would still take a long time to perform: it certainly looks like slow emulation, does it not? On my Amiga's M68040 CPU running at 33 MHz, the Z80A emulator runs slightly faster than 4 MHz. I know this figure accurately because the emulator also counts T-states and prints out this information when it exits. On my dual Pentium III CPU running at 450 MHz, the emulator usually runs at around 60-80 MHz! This is faster than any Z80A hardware ever ran. I have CP/M 2.2 running on the emulator and it will bring up WordStar 4.0's opening screen in less than a half second! Frank Cringle's ZEXALL.COM Z80 emulator validation program takes about 12 minutes to complete all its Z80A instruction tests.

As part of my emulation speed philosophy, my Z80A emulator does not compute flags values. It looks them up in a table and loads the flags register in its entirety with a single store. How does it do this? I have a TeleVideo 802H computer which contains a Zilog Z80A CPU. I wrote assembler programs on the TeleVideo and, when they executed, they stored the flags values in files for the desired operations. For example, I executed all 65536 possible combinations for adding two bytes together and stored the results flags in a file 65536 bytes long. I then copied this file over to the development system. I then processed it with a small C program I had written to turn the binary data into compilable C code. The Z80A emulator indexes into this array to retreive the proper flags. Actually, to account for both Z80A SUB and SBC instructions, the array is 128K bytes long. Here is the actual code for SBC A,B:

static int i98 (void)				/* SBC B		*/
{
    carry = f & CF ;
    f = sbc [(((carry << 8) | a) << 8) | *b] ;
    a -= *b + carry ;
    return (4) ;
}

The index for the SUB instruction is the same except that the Carry Flag (CF) is not included. This indexing of flags values GREATLY speeds up the emulation. Yes, it certainly requires more memory. The emulator executable is about 600K and about half of that consists of Flags tables.

The PDP-8/e emulator uses the same dispatch table philosophy. But, the PDP-8/e uses a 12-bit word and instruction length. So, does its dispatch table have 4096 entries? No! The PDP-8/e can run in "executive" or "background" mode. All the I/O instructions and some of the operate group instructions perform differently depending upon the mode. So, I index the dispatch table with the mode bit shifted left 12 bits and or'ed with the current instruction. This means that the dispatch table contains 8192 pointers to functions! What, 8192 functions? No. Several of the MRI (Memory Reference Instructions) use a common function to extract the address portion of the instruction. I divided the MRI emulations into several classes of functions depending on the type of memory reference. The classes are: zero page direct reference, current page direct reference, zero page indirect reference, current page indirect reference, and a special case to check whether the code had the current page bit set, but, in fact, was located in page zero (this can cause an incorrect emulation if the referenced location is a zero page index register). Is not that a wonderful special case?

In addition to the basic PDP-8/e CPU instruction set, the user may select whether the EAE (Extended Arithmetic Element) is present. When installed, the EAE became a part of the CPU and extended its instruction set. When this option is present, many of the former "operate" opcodes, which were NOPs, now have functionality. When the user selects the EAE option, a function is called to "plug" the dispatch table with the proper pointers to the EAE emulator functions. I ran across several interesting situations while writing the EAE emulator. The PDP-8/e Small Computer Handbook had conflicting definitions for several of the EAE instructions including those instructions which manipulated the GTF or "Greater Than Flag". Bob Supnik at DEC responded to my plea for help on the alt.sys.pdp8 discussion group. He looked at the schematics for the EAE and worked out the circuit logic to arrive at a definitive description of the GTF's operation.

My friend at the University of Washington, Jim Van Zee, suggested to me that I also emulate the PDP-8/e's floating point processor. I must admit that I was a bit horrified at this thought since I had just completed the hard-fought battle to emulate the EAE. But, I tackled the FPP-8/a with the same enthusiasm I originally had for the basic instruction set. The FPP-8/a is a co-processor that runs in parallel with the PDP-8/e CPU. The CPU sets up an "APT" or Active Parameter Table which contains data for the FPP-8/a, then issues an I/O instruction to start the FPP-8/a. When the FPP-8/a reaches an FHALT instruction, it generates an I/O interrupt and exits. The CPU can then take necessary action on the results left in memory by the FPP-8/a. The FPP-8/a has its own ENTIRELY separate set of 12-bit instructions that have no relation to PDP-8/e 12-bit instructions. So, the FPP-8/a emulator uses a separate 4096 entry dispatch table to jump to the emulated FPP-8/a functions. The FPP-8/a has DIRECT access to all 32K words in the PDP-8/e's memory and can jump across memory fields with no special instructions such as the CPU requires. The FPP-8/a runs in three data modes: integer, floating point and double precision floating point. Floating point data are stored in big-endian order as follows: 1st word contains the 12-bit exponent (wow, that's bigger than IEEE-754 double precision's exponent!). The next 12-bit word contains the high order of the mantissa (yes, I hate that word, too, but I must use it here because the fractional part of the datum is stored in two's complement format, not in IEEE-754 significand format). The high order bit is present (no hidden bit as in IEEE-754). For single precision, there is one more word. For double precision, there are four additional words. This means that in double precision mode, 60 bits are used for the mantissa (including sign). This is a larger number of bits than IEEE-754 uses for its significand in double precision mode. So, how did I handle that? I cheated! To emulate any floating point math function (such as floating addition), I first convert the two FPP-8/a data to IEEE-754 format, perform the operation, then convert back to FPP-8/a format! I thought about doing the grunge work (which I have done several times in the past), but some tests revealed that the emulation actually ran faster by doing all those conversions. The IEEE-754 significand in double precision mode only uses 53 bits (actually 52 with 1 hidden bit) while the FPP-8/a uses 59 bits plus one sign bit for a total of 60 bits. So, every double precision operation my FPP-8/a emulator performs drops a few bits that mean that the emulator is not as accurate as the actual hardware. But, I have used the DEC PDP-8 FORTRAN compiler to compile some test programs that produce very good results using my FPP-8/a emulator. An interesting footnote about the way the PDP-8/e's FORTRAN system handled executable files: the exact same file would execute on a PDP-8/e with NO EAE or FPP-8/a, but it would execute more slowly and it might complain that it had to deal with double precision and the results might not be accurate. If you had an EAE installed, the exact same executable file would execute faster, but might also complain about the same double precision mentioned above. If you had an FPP-8/a installed, it would execute very fast and produce very accurate results. In other words, the runtime system assessed the hardware situation and ran the program accordingly. Pretty nifty, those DEC OS/8 software folks, eh?

By the way, OS/8, the DEC PDP-8 operating system, was a very interesting software system in its own right. Its resident portion was only 256 PDP-8 12-bit words long and that included the hard disk driver! If it needed to talk to you, it would load the Keyboard Monitor as needed. Perhaps the niftiest feature of all was the OS/8 debugger which used only a couple of page zero locations! It would let you examine any memory locations you wished. You could search all you wanted, but you could find no evidence of its existence in memory (outside of those couple of page zero locations)! By starting at location 6 with a cleared ACC, one could also clear all but two memory locations in a 4096 word field with a three word sequence:

0006 3410 DCA I 10
0007 5006 JMP $-1
0010 0010 10

Which two words are left uncleared and what are their contents? I leave that as an exercise to the reader. Email me if you want the answer.

Back to Main Page


Various logos copyright © by their respective owners.
Page copyright © 1996-1999 and digitally signed with PGP by Bill Haygood
Last update: 1999-11-27 Sa 17:07 MST