From: Guillermo J. Rozas Date: Wed, 20 Feb 1991 23:23:59 +0000 (+0000) Subject: Initial revision X-Git-Tag: 20090517-FFI~10918 X-Git-Url: https://birchwood-abbey.net/git?a=commitdiff_plain;h=8437e6751744927db262c6996afe346f7c0f6392;p=mit-scheme.git Initial revision --- diff --git a/v7/src/compiler/documentation/porting.guide b/v7/src/compiler/documentation/porting.guide new file mode 100644 index 000000000..61e5dc801 --- /dev/null +++ b/v7/src/compiler/documentation/porting.guide @@ -0,0 +1,371 @@ +Emacs: Please use -*- Text -*- mode. Thank you. + +$Header: /Users/cph/tmp/foo/mit-scheme/mit-scheme/v7/src/compiler/documentation/porting.guide,v 1.1 1991/02/20 23:23:59 jinx Exp $ + + LIAR PORTING GUIDE + (Very Preliminary) + + +Note: This porting guide applies to version 4.78, but most of the +relevant information has not changed for a while, nor is it likely to +change in a while. + +For questions on Liar not covered by this document, or questions about +this document, contact liar-implementors@zurich.ai.mit.edu . + + + + 0. Introduction and a brief walk through Liar. + +Liar translates Scode as produced by the procedure SYNTAX, or by the +file syntaxer (SF) into compiled code objects. The Scode is +translated into a sequences of languages, the last of which is the +binary representation of the compiled code. + +The sequence of languages manipulated is + + Characters --READ--> + S-Expressions --SYNTAX--> + Scode --COMPILE-SCODE--> + compiled code objects. + +Liar is a multi-pass compiler, where each major pass has multiple +subpasses. Many of the subpasses do not manipulate the whole code +graph, but instead follow threads that link the relevant parts of the +graph. + +Compile-Scode is the main entry point to Liar, although CF is the +usual entry point. CF uses COMPILE-SCODE, and assumes that the code +has been syntaxed by SF producing a .bin file, and dumps the resulting +compiled code into a .com file. + +The internal sublanguages used by Liar are: + + Scode --FGGEN--> + Flow-graph --RTLGEN--> + RTL (Register Transfer Language) --LAPGEN--> + LAP (Lisp assembly program) --ASSEMBLER--> + bits --LINK--> + compiled code object. + +where FGGEN, etc., are some of the major passes of the compiler. + +The remaining major passes are FGOPT (the flow-graph optimizer), and +RTLOPT (the RTL-level optimizer). RTL-level register allocation is +performed by RTLOPT, and hardware-level register allocation is +performed by LAPGEN. Branch-tensioning of the output code is +performed by ASSEMBLER. LINK constructs a Scheme compiled code object +from the bits representing the code and the fixed data that the +compiled code uses at runtime. + +compiler/toplev.scm contains the top-level calls of the compiler and +its pass structure. + + 0.1. Package structure for Liar + +The package structure of the compiler reflects the pass structure. +The package structure is specified in compiler/machines/port/comp.pkg. +The major packages are: + +(COMPILER): + Utilities and data structures shared by most of the compiler. + +(COMPILER MACROS): + Syntax extensions used by the compiler to define language +translation rules. + +(COMPILER TOP-LEVEL): + Top level pass structure of the compiler. + +(COMPILER FG-GENERATOR): + This package contains the flow-graph generator, FGGEN. + +(COMPILER FG-OPTIMIZER): + This package contains the flow-graph analyzer and optimizer, +FGOPT. It has many sub-packages to contain the individual sub-passes. + +(COMPILER RTL-GENERATOR): + This package contains the flow-graph to RTL translator, +RTLGEN. It contains a few sub-packages for the major kinds of +flow-graph operations. + +(COMPILER RTL-OPTIMIZER): + This package contains most of the RTL-level optimizer, RTLOPT. +It has various sub-packages corresponding to some of its sub-passes. + +(COMPILER RTL-CSE): + This package contains the RTL-level common (redundant) +subexpression eliminator pass of the RTL-level optimizer. + +(COMPILER LAP-SYNTAXER): + This package contains most of the machine-dependent parts of +the compiler and the back end utilities. In particular, it contains +the RTL -> LAP translation rules, and the LAP -> bits translation +rules, ie. the LAPGEN and ASSEMBLER passes respectively. It has some +sub-packges for various major utilities (linearizer, map-merger, +etc.). + +(COMPILER ASSEMBLER): + This package contains most of the machine-independent portion +of the assembler. In particular, it contains the bit-assembler, ie. +the portion of the assembler that accumulates the bit strings produced +by ASSEMBLER and performs branch-tensioning on the result. + +(COMPILER DISASSEMBLER): + This package contains the disassembler. It is not needed for +ordinary compiler operation, but is useful for low-level debugging, +and debugging of the compiler and assembler. + + 0.2. Directory structure for Liar + +The directory structure loosely reflects the pass structure of the +compiler. compiler/machines/port/comp.pkg lists the packages and the +files that they include. + +compiler/back: + This directory contains the machine-independent portion of the +back end. It contains bit-string utilities, symbol table utilities, +label management procedures, the hardware register allocator, and the +top-level assembler calls. + +compiler/base: + This directory contains common utilities used by the whole +compiler, and the top level procedures provided by the compiler. + +compiler/etc: + This directory contains utilities used for cross-compiling, +and checking re-compilations. + +compiler/fggen: + This directory contains the front end of the compiler. The +code in this directory translates Scode into a flow-graph used by the +analyzer and optimizer. + +compiler/fgopt: + This directory contains the flow-graph analyzer and optimizer. + +compiler/rtlbase: + This directory contains utilities used by the RTL generator and +optimizer. + +compiler/rtlgen: + This directory contains the code that translates the +flow-graph into register transfer language (RTL). + +compiler/rtlopt: + This directory contains the RTL-level optimizer. It contains +code to perform lifetime analysis, redundant subexpression +elimination, elimination of dead code, etc. + +compiler/machines: + This directory contains a subdirectory for each port of the +compiler. Each of these subdirectories contains the port (machine) +dependent files of the compiler. + +compiler/machines/port: + This directory contains the definition of machine parameters, +the assembler rules, the disassembler, and RTL to assembly-language +rules for the port. + +All machine-dependent files are in compiler/machines/port and is the +only directory that needs to be written to port the compiler to a new +architecture. + + 1. Liar's runtime model. + +Liar does not open-code all operations that the code would need to +execute. In particular, it leaves error handling and recovery, +interrupt processing, and initialization, to a runtime library written +in assembly language. + +Although this runtime library need not run in the context of the +CScheme interpreter, currently the only implementation of this library +runs from the interpreter and uses it for many of its operations. + +In other words, Liar does not depend on the interpreter directly, but +indirectly through the runtime library. It does depend on the ability +to invoke CScheme primitives at runtime, some of which (eval, etc.) +require the interpreter to be present. It should be possible, +however, to provide an alternate runtime library and primitive set +that would allow code produced by Liar to run without the interpreter +being present (F1). + +On the other hand, since the only instance of the runtime library is +that supplied by the interpreter, Liar currently assumes that the +Scheme object representation is the same as that used by the +interpreter, but this is relatively well abstracted and should not be +hard to change (F2). + +The runtime library is currently implemented by microcode/cmpaux-md.m4 +and microcode/cmpint.c . + +microcode/cmpaux-md.m4 is an assembly language port-dependent file +that allows compiled Scheme to call the C-written library routines and +viceversa. It is described in microcode/cmpaux.txt . + +microcode/cmpint.c defines the library in a machine/port-independent +way, but requires some information about the port and this is provided +in microcode/cmpint2.h, a copy (or link) of the appropriate +microcode/cmpint-md.h file. The microcode/cmpint-md.h files are +described in microcode/cmpint.txt . + +microcode/cmpint.txt also describes a lot of the data structures that +the compiled code and runtime library manipulate, and defines some of +the concepts needed to understand the compiler. + +The rest of this document assumes that you are using the runtime +library provided by the CScheme interpreter. If you wish to use Liar +as a compiler for stand-alone programs, a lot of work needs to be +done, and this work is not described here. Perhaps we will do it in +the future. + +If you have not yet read microcode/cmpaux.txt and +microcode/cmpint.txt, please do so before reading the rest of this +document. + +(F1) We often toy with this idea. + +(F2) Famous last words. + + 2. Preliminary Observations + +A. Constraints on architectures to which Liar can be ported: + +- Liar assumes that the target machine has an address space that is +flat enough that all Scheme objects can be addressed uniformly. In +other words, segmented address spaces with segments necessarily +smaller than the Scheme runtime heap will make Liar very hard or +inefficient to port. + +- Liar assumes that code and data can coexist in the same address +space. In other words, a true Harvard architecture, with separate +code and data spaces, would be hard to support without relatively +major changes. This assumption conflicts with some current hardware +that has programmer-visible split data and instruction caches, but +most of these problems can be resolved if the user is given enough +control over flushing of the hardware caches. At some point in the +future we may provide a C back end for Liar which will resolve some of +these problems. Whatever technique the C back end may use can +probably be emulated by architectures with such a strong division. + +- Liar assumes that the target machine is a general-register machine. +Ie. operations are based on processor registers, and there is a +moderately large set of general-purpose registers that can be used +interchangeably. It would be very hard to port Liar to a stack +machine, a graph-reduction engine, or a 4-counter machine. It is +probably also hard to port Liar to an Intel 386/486 because of the +small number of registers and the fact that most of them are special +to some common instructions. + +B. Some implementation decisions that may make your job harder: + +- Liar generates code that passes arguments to procedures on a stack. +This decision especially affects the performance on load-store +architectures, common these days. This may change in the future due +to the fact that most modern machines have large register sets and +memory-based operations are noticeably slower than register-based +operations even when the memory locations have mappings in the cache. + +- Liar assumes that pushing and popping elements from a stack is +cheap. Currently Liar does not attempt to bump the stack pointer once +per block of operations, but instead bumps it once per item. This is +expensive on many modern machines where pre-and-post incrementing are +not supported by the hardware. This may also change in the +not-too-far future. + +- Liar assumes that it is cheap to compute overflow conditions on +integer arithmetic operations. Generic arithmetic primitives have the +common fixnum case open-coded, and the overflow and non-fixnum cases +coded out of line, but this depends on the ability of the code to +detect overflow conditions cheaply. This is not true of some modern +machines, notably the MIPS R3000 processor. If your processor does +not detect such conditions, you may have to emulate what the port to +the MIPS processor does. + +- Liar assumes that extracting, inserting, and comparing bit-fields is +relatively cheap. The current object representation for Liar +(compatible with the interpreter) consists of using a number (6-8) of +bits in the most significant bit positions of a word as a type tag, +and the rest as the datum, typically an encoded address. Not only +must extracting, comparing, and inserting these tags be cheap, but +decoding the address must be cheap as well. These operations are +relatively cheap on architectures with bit-field instructions, but +more expensive if they must be emulated with bitwise boolean +operations and shifts, as on the MIPS R3000. + +C. Emulating an existing port. + +The simplest way to port Liar is to find an architecture to which Liar +has already been ported that is sufficiently similar to the desired +architecture that a port can be obtained by small modifications. In +particular, if the architectures are really close, there may be no +need for architecture-specific additional tuning. + +Note that we develop the compiler on Motorola >=68020 processors, so +this is the best-tuned version, and the other ports are not very well +tuned or not tuned at all because we don't use the other hardware. If +you improve an existing port, please give us the improvements. + +- If you have a Vax-like CISC machine, you can try starting from the +Vax or the Motorola 68020 ports. The Vax port was written by starting +from the 68020 port. This is probably the best solution for some +architectures like the NS32000, and perhaps even the IBM 370. + +- If you have an "enlarged" RISC processor, with more complex +addressing modes, and bit-field instructions, you may want to start by +looking at the Spectrum (HP Precision Architecture) port. This is +probably a good starting point for the Motorola 88000, and the IBM +RS6000 architectures. + +- If you have a bare-bones RISC processor, similar to a MIPS +R2000/R3000 processor, you may want to start from this port. Since +the MIPS R2000 is a minimalist architecture, it should almost subsume +all other RISCs, and may well be a good starting point for all of +them. This is probably a good starting point for the Sparc +architecture. Note that the MIPS port was done by starting from the +Spectrum port. + +- If you have a machine significantly different from those listed +above, you are out of luck and will have to write a port from scratch. + +Of course, no architecture is identical to any other, so you may want +to mix and match ideas from many of the ports already finished, and it +is probably a good idea for you to compare how the various ports solve +the various problems. + + 3. Compiler operation, RTL rules and LAP rules. +Mention the early syntaxing, but tell them to ignore it. +Mention the switches and what they do. + + 4. Description of the files in compiler/machines/port. +Particular emphasis on machin.scm, assmd.scm, the macro files, and the +assembler. + + 5. How to test the compiler once the port files have +been written. +?? How to test the assembler by using LAP->CODE . +Include my upgraded test suited in the compiler directory, and perhaps +some scripts that do the testing. + + 6. How to build a compiler once it has been +preliminarly tested. +Cross compiling. +Spreading the computation over a bunch of machines. +Testing for convergence by doing stages and comparing binaries. +Common bugs. interrupts, dlinks, register allocation bus, and bugs +in the interface. + + 7. How to write RTL rules and use the register +allocator. +Get CPH to help with this. +Closures, multi closures, uuo-link calls, and block-linking. + + 8. How to use the RTL rewriter to improve the output +code. +Suggest looking at the 68000 and the Spectrum versions. + + 9. How to interface to the runtime library. How to +write special-purpose optimized entries. + + 10. How to give us the new port so that we can +distribute it, and help upgrade/maintain it.