From: Guillermo J. Rozas Date: Wed, 4 Sep 1991 03:58:44 +0000 (+0000) Subject: Add a section on debugging, a section on the package system (by X-Git-Tag: 20090517-FFI~10243 X-Git-Url: https://birchwood-abbey.net/git?a=commitdiff_plain;h=2400a4d701529573618cf15541d4a99e0c238bc5;p=mit-scheme.git Add a section on debugging, a section on the package system (by Arthur), a section with suggestions on the order in which to attack the tasks, and accommodate some changes for version 4.87. --- diff --git a/v7/src/compiler/documentation/porting.guide b/v7/src/compiler/documentation/porting.guide index 6de262677..8f1e3dfe8 100644 --- a/v7/src/compiler/documentation/porting.guide +++ b/v7/src/compiler/documentation/porting.guide @@ -1,6 +1,7 @@ -Emacs: Please use -*- Text -*- mode. Thank you. + Emacs: Please use -*- Text -*- mode. Thank you. + +$Header: /Users/cph/tmp/foo/mit-scheme/mit-scheme/v7/src/compiler/documentation/porting.guide,v 1.22 1991/09/04 03:58:44 jinx Exp $ -$Header: /Users/cph/tmp/foo/mit-scheme/mit-scheme/v7/src/compiler/documentation/porting.guide,v 1.21 1991/09/01 14:51:09 jinx Exp $ Copyright (c) 1991 Massachusetts Institute of Technology @@ -12,21 +13,20 @@ Copyright (c) 1991 Massachusetts Institute of Technology Notes: -This porting guide applies to Liar version 4.78, but most of the +This porting guide applies to Liar version 4.87, but most of the relevant information has not changed for a while, nor is it likely to change in major ways any time soon. This is an early version of this document, and the order of presentation leaves a lot to be desired. In particular, the document does not follow a monotonic progression, but is instead organized in a -dictionary-like or graph-like manner. We recommend that you read -through the whole document twice since some important details, -apparently omitted, may have their explanation later on in the -document. When reading the document for the second time, you will -have an idea of where this other information is to be found, if it is -at all present. We have attempted to insert sufficient forward -pointers to make the first reading bearable, but we may have missed -some. +dictionary-like manner. We recommend that you read through the whole +document twice since some important details, apparently omitted, may +have their explanation later in the document. When reading the +document for the second time, you will have an idea of where this +other information is to be found, if it is present at all. We have +attempted to insert sufficient forward pointers to make the first +reading bearable, but we may have missed some. This document implicitly assumes that you are trying to build the compiler under Unix. The only compiler sources that depend on Unix @@ -36,27 +36,23 @@ This document uses Unix pathname syntax and assumes a hierarchical file system, but it should easy to map these directories to a different file system. +This document also assumes that you are familiar with MIT Scheme, C, +and the C preprocessor. + For questions on Liar not covered by this document, or questions about -this document, contact liar-implementors@zurich.ai.mit.edu . +this document, contact ``liar-implementors@zurich.ai.mit.edu''. Text tagged by ==> is intended primarily for the compiler developers. Good luck! - - [*Markf: A section outlining a procedure to use for actually doing -the port (what should be done and when, how to debug ...) would be -useful] - - [*Markf: A discussion (or at least a mention) of the stuff in -base/debug.scm would be useful] - + Acknowledgments Liar is the work of many people. The current version is mostly the effort of Chris Hanson and Bill Rozas, with significant contributions -from Mark Friedman. Arthur Gleckler, Brian LaMacchia, Jim Miller, -and Henry Wu have also contributed to the current version of Liar. -Many other people have offered suggestions and criticisms. +from Mark Friedman. Arthur Gleckler, Brian LaMacchia, Jim Miller, and +Henry Wu have also contributed to the current version of Liar. Many +other people have offered suggestions and criticisms. The current Liar may never have existed had it not been for the efforts and help of the now-extinct BBN Butterfly Lisp group. That @@ -64,12 +60,13 @@ group included Don Allen, Seth Steinberg, Larry Stabile, and Anthony Courtemanche. Don Allen, in particular, babysat computers to painstakingly bootstrap the first version of the then new Liar. -Many of the ideas and algorithms used in Liar, and in particular at -the RTL level, are taken from the GNU C compiler, written by Richard -Stallman and many others. +Many of the ideas and algorithms used in Liar, particularly at the RTL +level, are taken from the GNU C compiler, written by Richard Stallman +and many others. This document was written by Bill Rozas, with modifications and hints -from the people listed above. +from the people listed above. The section on the MIT Scheme package +system was written by Arthur Gleckler. 0. Introduction and a brief walk through Liar. @@ -120,10 +117,9 @@ its pass structure. 0.1. Liar's package structure - [*Arthur: What is a package and what are the basic commands for moving -between packages? Give a brief introduction to the structure of .pkg -files (forward pointer). At least tell where to find this -information.] +This section assumes that you are familiar with the MIT Scheme package +system. If you are not, there is a small description in an appendix +to this document. The package structure of the compiler reflects the pass structure and is specified in compiler/machines/port/comp.pkg, where port is the @@ -169,8 +165,8 @@ sub-packages for various major utilities (linearizer, map-merger, etc.). (COMPILER ASSEMBLER): - This package contains most of the machine-independent portion -of the assembler. In particular, it contains the bit-assembler, i.e. + This package contains most of the port-independent portion of +the assembler. In particular, it contains the bit-assembler, i.e. the portion of the assembler that accumulates the bit strings produced by ASSEMBLER and performs branch-tensioning on the result. @@ -186,7 +182,7 @@ compiler. compiler/machines/port/comp.pkg declares the packages and the files that constitute them. compiler/back: - This directory contains the machine-independent portion of the + This directory contains the port-independent portion of the back end. It contains bit-string utilities, symbol table utilities, label management procedures, the hardware register allocator, and the top-level assembler calls. @@ -291,32 +287,6 @@ before reading the rest of this document. 2.1. Constraints on architectures to which Liar can be ported: -- Liar assumes that the target machine has an address space that is -flat enough that all Scheme objects can be addressed uniformly. In -other words, segmented address spaces with segments necessarily -smaller than the Scheme runtime heap (i.e. Intel 286) will make Liar -very hard or inefficient to port. - -- Liar assumes that instructions and data can coexist in the same -address space, and that new code objects that contain machine -instructions can be allocated from and written to the heap (memory -pool) used to allocate all other Scheme objects. This assumption in -Liar conflicts with some current hardware that has programmer-visible -separate (split) data and instruction caches -- that is, there are two -different caches, one used by the processor for instruction references -and the other for data references, and storing data into memory only -updates the data cache, but not the instruction cache, and perhaps not -even memory. Most of the problems this causes can be resolved if the -user is given enough control over the hardware caches, i.e. some way to -flush or synchronize them. Furthermore, a true Harvard architecture, -with separate code and data memories, would be hard to accommodate -without relatively major changes. At some point in the future we may -write a C back end for Liar that handles this case, since C code space -and data space are typically kept separate by the operating system. -Whatever technique the C back end may use can probably be emulated by -architectures with such a strong division, although it is likely to be -expensive. - - Liar assumes that the target machine is a general-register machine. That is, operations are based on processor registers, and there is a moderately large set of general-purpose registers that can be used @@ -324,8 +294,43 @@ interchangeably. It would be hard to port Liar to a stack machine, a graph-reduction engine, or a 4-counter machine. It is probably also hard to port Liar to an Intel 386/486 because of the small number of registers and the fact that most of them are special to some common -instructions. +instructions. + +- Liar currently assumes that floating-point registers and integer +registers are separate or the same size. In other words, currently +Liar cannot handle quantities that need multiple registers to hold +them. For example, on the DEC VAX, there is a single set of +registers, and double floating point values (the only kind used by +Scheme) take two consecutive integer registers. The register +allocator in Liar does not currently handle this situation, and thus, +floating-point operations are not currently open-coded on the VAX. + +- Liar assumes that the target machine has an address space that is +flat enough that all Scheme objects can be addressed uniformly. In +other words, segmented address spaces with segments necessarily +smaller than the Scheme runtime heap (i.e. Intel 286) will make Liar +hard or inefficient to port. +- Liar assumes that instructions and data can coexist in the same +address space, and that new code objects that contain machine +instructions can be dynamically allocated from and written to the heap +(memory pool) used to allocate all other Scheme objects. This +assumption in Liar conflicts with some current hardware that has +programmer-visible separate (split) data and instruction caches -- +that is, there are two different caches, one used by the processor for +instruction references and the other for data references, and storing +data into memory only updates the data cache, but not the instruction +cache, and perhaps not even memory. Most of the problems this causes +can be resolved if the user is given enough control over the hardware +caches, i.e. some way to flush or synchronize them. Furthermore, a +true Harvard architecture, with separate code and data memories, would +be hard to accommodate without relatively major changes. At some +point in the future we may write a C back end for Liar that handles +this case, since C code space and data space are typically kept +separate by the operating system. Whatever technique the C back end +may use can probably be emulated by architectures with such a strong +division, although it is likely to be expensive. + 2.2. Some implementation decisions that may make your job harder or impair the quality of the output code: @@ -342,7 +347,7 @@ per block of operations, but instead bumps it once per item. This is expensive on many modern machines where pre-and-post incrementing are not supported by the hardware. This may also change in the not-too-far future. - [*Jinx: Wasn't this fixed recently for the MIPS?] + [*Jinx: Wasn't this done recently for the MIPS?] - Liar assumes that it is cheap to compute overflow conditions on integer arithmetic operations. Generic arithmetic primitives have the @@ -360,16 +365,16 @@ computes the overflow conditions explicitly. - Liar assumes that extracting, inserting, and comparing bit-fields is relatively cheap. The current object representation for Liar (compatible with the interpreter) consists of using a number of bits -(6) in the most significant bit positions of a word as a type tag, and -the rest as the datum, usually an encoded address. Not only must -extracting, comparing, and inserting these tags be cheap, but decoding -the address must be cheap as well. These operations are relatively -cheap on architectures with bit-field instructions, but more expensive -if they must be emulated with bitwise boolean operations and shifts, -as on the R3000. Decoding a datum into an address may involve -inserting segment bits in some of the positions where the tag is -placed, further increasing the dependency on cheap bit-field -manipulation. +(6) in the most significant bit positions of a machine word as a type +tag, and the rest as the datum, usually an encoded address. Not only +must extracting, comparing, and inserting these tags be cheap, but +decoding the address must be cheap as well. These operations are +relatively cheap on architectures with bit-field instructions, but +more expensive if they must be emulated with bitwise boolean +operations and shifts, as on the R3000. Decoding a datum into an +address may involve inserting segment bits in some of the positions +where the tag is placed, further increasing the dependency on cheap +bit-field manipulation. 2.3. Emulating an existing port. @@ -424,9 +429,8 @@ optimization on the RTL, generates assembly language (in LAP format) from the RTL, and assembles the resulting bits. Although RTL is a machine-independent language, the particular RTL -generated for a given program will vary from machine to machine. - -The RTL can vary in the following ways: +generated for a given program will vary from machine to machine. The +RTL can vary in the following ways: - RTL is a language for manipulating the contents of conceptual registers. RTL registers are divided into ``pseudo-registers'' and @@ -437,7 +441,7 @@ pseudo-registers represent conceptual locations that contain quantities that will need physical registers or memory locations to hold them in the final translation. An RTL pseudo register can be mapped to any number of physical registers in the final translation, -and may "move" between physical registers. In order to make the RTL +and may ``move'' between physical registers. In order to make the RTL more homogeneous, the RTL registers are not distinguished syntactically in the RTL, but are instead distinguished by their value range. Machine registers are represented as the N lowest numbered RTL @@ -450,7 +454,7 @@ pseudo-registers are equivalent, and all can hold arbitrary Scheme objects, while machine registers can be further divided into separate classes (e.g. address, data, and floating-point registers). -- RTL assumes only a load-store architecture, but can accommodate +- RTL assumes a load-store architecture, but can accommodate architectures that allow memory operands and rich addressing modes. RTL is constructed by generating statements that include relatively complex expressions. These expressions may represent multiple memory @@ -478,15 +482,15 @@ integers, and it may not be advantageous to open code the multiplication primitive. The RTL for a particular program may reflect the set of primitive operations that the back end for the port can open code. - + The resulting RTL program is represented as a control flow-graph where each of the nodes has an associated list of RTL statements. The edges in the graph correspond to conditional and unconditional branches in the code, and include a low-level predicate used to choose between the -alternatives. Linearization of the graph does not occur at the RTL -level, but at the LAP level. There is a debugging RTL linearizer used -by the RTL output routine. - +alternatives. The graph is linearized after the instructions have +been translated to LAP. There is a debugging RTL linearizer used by +the RTL output routine. + Besides assignments and tests, the RTL has some higher level statements that correspond to procedure headers, continuation (return address) headers, etc. Thus an RTL program is made mostly of register @@ -536,23 +540,23 @@ pattern matcher to translate the RTL into LAP. The linear LAP is then translated into binary form by using the same pattern matcher with a different set of rules. These rules define the translation between assembly language and machine language for the -architecture. Most of these rules output bits to be collected +architecture. Most of these rules output bit strings to be collected together, but some output a set of directives to the bit-level assembler to define labels, or choose between alternative encoding of the fields depending on the final value of a displacement. These alternative encodings are typically used for PC-relative quantities. - + The machine-independent bit-assembler collects all the bits together and keeps track of a virtual program counter used to determine the distance between instruction fields. A relaxation process is used to reduce the size of the resulting encoding (to tension branches, i.e. to choose the smallest encoding that will do the job when there are alternatives). - + Since most of the LAPGEN rules generate almost fixed assembly language, where the only difference is the register numbers, most of the LAP to bits translation can be done when the compiler is compiled. A -compiler switch, ``compiler:enable-expansion-declarations?'' allows this +compiler switch, ``COMPILER:ENABLE-EXPANSION-DECLARATIONS?'' allows this process to take place. This mechanism has not been used for a while, however, because the resulting compiler was, although somewhat faster, considerably bigger, so this switch may not currently work. @@ -568,7 +572,7 @@ manipulation. The following switches are of special importance to the back end writer: -* compiler:compile-by-procedures? This switch controls whether the +* COMPILER:COMPILE-BY-PROCEDURES? This switch controls whether the compiler should compile each top-level lambda expression independently or compile the whole input program (or file) as a block. It is usually set to true, but must be set to false for cross-compilation. @@ -582,7 +586,7 @@ as a vector object (instead of as an entry point). The final entry points are generated by cross-compile-bin-file-end running interpreted on the target machine. -* compiler:open-code-primitives? This switch controls whether Liar +* COMPILER:OPEN-CODE-PRIMITIVES? This switch controls whether Liar will open code (inline) MIT Scheme primitives. It is usually set to true and should probably be left that way. On the other hand, it is possible to do a lot less work in porting the compiler by not @@ -595,29 +599,29 @@ open-coded in a port-independent way since their open codings are performed directly in the RTL. Turning this switch to false would prevent the compiler from open coding these primitives as well. -* compiler:generate-rtl-files? and compiler:generate-lap-files? These +* COMPILER:GENERATE-RTL-FILES? and COMPILER:GENERATE-LAP-FILES? These are mostly compiler debugging switches. They control whether the compiler will issue .rtl and .lap files for every file compiled. The .rtl file will contain the RTL for the program, and the .lap file will contain the input to the assembler. Their usual value is false. -* compiler:intersperse-rtl-in-lap? This is another debugging switch. -If turned on, and compiler:generate-lap-files? is also on, the lap +* COMPILER:INTERSPERSE-RTL-IN-LAP? This is another debugging switch. +If turned on, and COMPILER:GENERATE-LAP-FILES? is also on, the lap output file includes the RTL statements as comments preceding their LAP translations. - -* compiler:open-code-floating-point-arithmetic? This switch is + +* COMPILER:OPEN-CODE-FLOATING-POINT-ARITHMETIC? This switch is defined in compiler/machines/port/machin.scm and determines whether floating point primitives can and should be open coded by the compiler or not. If the port provides open codings for them, it should be set to true, otherwise to false. -* compiler:primitives-with-no-open-coding This parameter is defined in +* COMPILER:PRIMITIVES-WITH-NO-OPEN-CODING This parameter is defined in compiler/machines/port/machin.scm. It contains a list of primitive names that the port cannot open code. ==> These last two parameters should probably be combined and -inverted, i.e. compiler:primitives-with-open-codings should replace +inverted, i.e. COMPILER:PRIMITIVES-WITH-OPEN-CODINGS should replace both of the above. This has the advantage that if the RTL level is taught how to deal with additional primitives, but not all ports have open codings for them, there is no need to change all the machin.scm @@ -648,8 +652,8 @@ be good if you transformed this header into The new $ Header $ line would be used by RCS to keep track of the versions of your port and the others could be used to find updates to the originals that would make updating your port easier. - - 4.1 Compiler building files: + + 4.1. Compiler building files: * comp.pkg: This file describes the Scheme package structure of the @@ -677,14 +681,14 @@ section is no longer used by default. The previous three files should be copied or linked to the top-level compiler directory. I.E., compiler/comp.pkg should be a link (symbolic preferably) or copy of compiler/machines/port/comp.pkg . - + * comp.con, comp.ldr, comp.bcon, and comp.bldr: These files are generated by the CREF subsystem from the information in the cref.pkg file. The .bcon and .bldr files are binary versions of the others, which are scheme sources. The .con -file contains the "connectivity code", that is, the code to create and +file contains the ``connectivity code'', that is, the code to create and link the package objects specified in the .pkg file. The .ldr file -contains the "loading code", that is, the code to load the source +contains the ``loading code'', that is, the code to load the source files into the appropriate packages and, in theory, to initialize the packages. The CREF subsystem also generates a comp.cref file that includes cross-reference information. It is useful to examine this @@ -728,7 +732,7 @@ machine-independent dependency management code, and the actual declaration of the dependencies for each port. This would allow us to share more of the code, and make the task of rewriting it less daunting. - 4.2 Miscellaneous files: + 4.2. Miscellaneous files: * rgspcm.scm: This file declares a set of primitives that can be coded by @@ -752,9 +756,9 @@ compiler/rtlbase/rtlty1.scm and compiler/rtlbase/rtlty2.scm. * lapopt.scm: This file defines a LAP-level peephole optimizer. Currently -only used in the MIPS port to reduce the number of NOPs in the "delay -slots" of load instructions. The instructions in each LAP-level basic -block are passed to optimize-linear-lap, which outputs the new +only used in the MIPS port to reduce the number of NOPs in the ``delay +slots'' of load instructions. The instructions in each LAP-level +basic block are passed to optimize-linear-lap, which outputs the new sequence of instructions corresponding to the basic block. Currently all ports (except the MIPS port) implement this procedure as the identity procedure. @@ -767,57 +771,57 @@ mentioned here should be copied verbatim from existing ports. Some of these parameters are not currently in use, but should all be provided for completeness. -- endianness: Should be the symbol LITTLE if an address, when used as +- ENDIANNESS: Should be the symbol LITTLE if an address, when used as a byte address, refers to the least significant byte of the long-word addressed by it. It should be BIG if it refers to the most significant byte of the long-word. Note that the compiler has not been ported to any machines where the quantum of addressability is not an 8-bit byte, so the notion may not apply to those. -- addressing-granularity: How many bits are addressed by the +- ADDRESSING-GRANULARITY: How many bits are addressed by the addressing quantum. I.e., increasing an address by 1 will bump the address to point past this number of bits. Again, the compiler has not been ported to any machine where this value is not 8. -- scheme-object-width: How many bits are taken up by a Scheme object. +- SCHEME-OBJECT-WIDTH: How many bits are taken up by a Scheme object. This should be the number of bits in a C ``unsigned long'', since Scheme objects are declared as such by the portable runtime library. -- scheme-type-width: How many bits at the most-significant end of a +- SCHEME-TYPE-WIDTH: How many bits at the most-significant end of a Scheme object are taken up by the type tag. The value of TYPE_CODE_LENGTH in the microcode must match this value. The value is currently 6 for systems with a compiler and 8 for systems without one. + +- ADDRESS-UNITS-PER-PACKED-CHAR: This parameter defines how much to +increment an address by in order to make it point to the next +character in a string. The compiler has not been ported to any +configuration where this is not 1, but may be if 16-bit characters are +used in the future. -- flonum-size: This is the ceiling of the ratio of the size of a C +- FLONUM-SIZE: This is the ceiling of the ratio of the size of a C ``double'' to the size of a C ``unsigned long''. It reflects how many Scheme units of memory (measured in Scheme objects) the data in a Scheme floating point object will take. -- float-alignment: This value defines the bit-alignment constraints +- FLOAT-ALIGNMENT: This value defines the bit-alignment constraints for a C ``double''. It must be a multiple of scheme-object-width. If floating point values can only be stored at even long-word addresses, for example, this value should be twice scheme-object-width. -- address-units-per-packed-char: This parameter defines how much to -increment an address by in order to make it point to the next -character in a string. The compiler has not been ported to any -configuration where this is not 1, but may be if 16-bit characters are -used in the future. - -- signed-fixnum/upper-limit: This parameter should be derived from +- SIGNED-FIXNUM/UPPER-LIMIT: This parameter should be derived from others, but is specified as a constant due to a shortcoming of the compiler pre-processing system (EXPT is not constant-folded). Use the commented-out expression to derive the value for your port. Note that all values that should be derived but are instead specified as constants are tagged by a comment containing ``***''. -- stack->memory-offset: This procedure is provided to accommodate +- STACK->MEMORY-OFFSET: This procedure is provided to accommodate stacks that grow in either direction, but we have not tested any port in which the stack grows towards larger addresses, because the CScheme interpreter imposes its own direction of growth. It should probably be copied verbatim. -- execute-cache-size: This should match EXECUTE_CACHE_ENTRY_SIZE in +- EXECUTE-CACHE-SIZE: This should match EXECUTE_CACHE_ENTRY_SIZE in microcode/cmpint-port.h, and is explained in cmpint.txt. ==> We should probably rename one or the other to be alike. @@ -827,14 +831,14 @@ some detail in cmpint.txt and in section 5.3.3. Very briefly, a closure is a procedure object that contains a code pointer and a set of free variable locations or values. -- closure-object-first-offset: This procedure takes a single argument, +- CLOSURE-OBJECT-FIRST-OFFSET: This procedure takes a single argument, the number of entry points in a closure object, and computes the distance in long-words between the first long-word in the closure object, and the first long-word containing a free variable. This is the number of long-words taken up by the closure object's header, and the code to represent N closure entry points. -- closure-first-offset: This procedure takes two arguments, the number +- CLOSURE-FIRST-OFFSET: This procedure takes two arguments, the number of entry points in a closure object, and the index of one of them, the first being zero. It computes the distance between that entry's environment pointer and the first free variable in the closure object. @@ -842,14 +846,14 @@ The entry's environment pointer will be the address of the entry point itself if closure entry points are always aligned on long-word boundaries, or the address of the first entry point if they are not. -- closure-entry-distance: This procedure is given the number of entry +- CLOSURE-ENTRY-DISTANCE: This procedure is given the number of entry points in a closure object, and the indices for two of its entry points, and computes the number of bytes that separate the two entry points in the closure object. This distance should be a multiple of the parameter COMPILED_CLOSURE_ENTRY_SIZE described in cmpint.txt and defined in microcode/cmpint-port.h. - -- closure-environment-adjustment: This procedure takes two parameters, + +- CLOSURE-ENVIRONMENT-ADJUSTMENT: This procedure takes two parameters, the number of entry points in a closure object, and the index of one of them. It computes the number of bytes that must be added to the entry point's address to result in the entry point's environment @@ -867,12 +871,12 @@ integers starting from zero. Typically symbolic names are given to each of these integers for use in some of the rules, especially those dealing with the assembly language interface. -- number-of-machine-registers should be the number of machine registers, +- NUMBER-OF-MACHINE-REGISTERS should be the number of machine registers, i.e. one greater than the number assigned to the last machine register. -- number-of-temporary-registers is the number of reserved memory +- NUMBER-OF-TEMPORARY-REGISTERS is the number of reserved memory locations used for storing the contents of spilled pseudo-registers. - + Liar requires certain fixed locations to hold various implementation quantities such as the stack pointer, the heap (free memory) pointer, the pointer to the runtime library and interpreter's ``register'' @@ -883,7 +887,7 @@ holding a bit-mask used to clear type tags from objects (the pointer or datum mask). All of these registers should be given additional symbolic names. -==> What is machine-register-known-value used for? It would seem that +==> What is MACHINE-REGISTER-KNOWN-VALUE used for? It would seem that the datum mask is a known value, but... Currently all the ports seem to have the same definition. @@ -892,7 +896,7 @@ allow some consistency checking. Some machine registers always contain values in a fixed class (e.g. floating point registers and the register holding the datum mask). -- machine-register-value-class is a procedure that maps a register to +- MACHINE-REGISTER-VALUE-CLASS is a procedure that maps a register to its inherent value class. The main value classes are value-class=object, value-class=address, and value-class=float. The registers allocated for the special implementation quantities have @@ -900,7 +904,7 @@ fixed value classes. The remaining registers, managed by the compiler's register allocator, may be generic (value-class=word) or allow only certain values to be stored in them (value-class=float, value-class=address, etc.). - + Most of the remainder of compiler/machines/port/machin.scm is a set of procedures that return and compare the port's chosen locations for various operations. Some of these operations are no longer used by @@ -912,19 +916,19 @@ compiler switch settings the older methods for handling these operations can be re-activated, but this never worked completely, and may no longer work at all. -- rtl:machine-register? should return a machine register for those +- RTL:MACHINE-REGISTER? should return a machine register for those special RTL registers that have been allocated to fixed registers, and false otherwise. -- rtl:interpreter-register? should return the long-word offset in the +- RTL:INTERPRETER-REGISTER? should return the long-word offset in the runtime library's memory ``register'' array for those special RTL registers not allocated to fixed registers, and false otherwise. -- rtl:interpreter-register->offset errors when the special RTL +- RTL:INTERPRETER-REGISTER->OFFSET errors when the special RTL register has not been allocated to a fixed register, and otherwise returns the long-word offset into the register array. -- rtl:constant-cost is a procedure that computes some metric of how +- RTL:CONSTANT-COST is a procedure that computes some metric of how expensive is to generate a particular constant. If the constant is cheaply reconstructed, the register allocator may decide to flush it (rather than spill it to memory) and re-generate it the next time it @@ -932,11 +936,11 @@ is needed. The best estimate is the number of cycles that constructing the constant would take, but the number of bytes of instructions can be used instead. -- compiler:open-code-floating-point-arithmetic? and -compiler:primitives-with-no-open-coding have been described in the +- COMPILER:OPEN-CODE-FLOATING-POINT-ARITHMETIC? and +COMPILER:PRIMITIVES-WITH-NO-OPEN-CODING have been described in the section on compiler switches and parameters. - 4.3 LAPGEN files: + 4.3. LAPGEN files: The following files control the RTL -> LAP translation. They define the rules used by the pattern matcher to perform the translation, and @@ -998,11 +1002,11 @@ locations) in the Scheme ``register'' array. - HOME->REGISTER-TRANSFER generates code that copies the contents of an RTL register's home (its spill location) into a machine register. - + - REGISTER->HOME-TRANSFER generates code that copies the contents of an RTL register, currently held in a machine register, into its memory home. - + The following definitions constitute the linearizer interface, and must be provided by lapgen.scm: @@ -1031,7 +1035,7 @@ that performs a pc-relative branch and stores the return address in a processor register. This instruction can be used (by branching to the next instruction) to obtain its own address, and pc-relative addresses and loads can use them. The MIPS back end currently -implements a simple pc-relative address cacheing scheme that attempts +implements a simple pc-relative address caching scheme that attempts to reduce the number of such branches by re-using the values produced by previous branches if they are still available. This code can be suitably modified to work on most RISC architectures. @@ -1059,7 +1063,7 @@ assembly. routines that handle manipulation of variables in first class environments. Most of these rules are no longer used by the compiler unless some switch settings vary. - + * rulfix.scm: This file contains statement and predicate rules for manipulating fixnums (small integers represented in immediate @@ -1071,27 +1075,27 @@ arithmetic on them, comparison predicates, and overflow tests. manipulating flonums (floating point data in boxed form). The rules handle boxing and un-boxing of flonums, arithmetic on them, and comparison predicates. - - 4.4 Assembler files: + + 4.4. Assembler files: * assmd.scm: This file defines the following machine-dependent parameters and utilities for the bit-level assembler: -- maximum-padding-length: If instructions are not always long-word +- MAXIMUM-PADDING-LENGTH: If instructions are not always long-word aligned, the maximum distance in bits between the end of an instruction and the next (higher) long-word boundary. -- padding-string: A bit-string used for padding the instruction block +- PADDING-STRING: A bit-string used for padding the instruction block to a long-word boundary. If possible, it should encode a HALT or ILLEGAL instruction. The length of this bit-string should evenly divide maximum-padding-length. -- block-offset-width: This should be the size in bits of format_word +- BLOCK-OFFSET-WIDTH: This should be the size in bits of format_word described in cmpint.txt. It should be 16 for all byte-addressed machines where registers hold 32 bits. -- maximum-block-offset: The maximum byte offset that can be encoded in +- MAXIMUM-BLOCK-OFFSET: The maximum byte offset that can be encoded in block-offset-width bits. This depends on the encoding described in cmpint.txt. The least significant bit is always used to indicate whether this block offset points to the start of the object or to @@ -1102,12 +1106,12 @@ instructions always start on even long-word boundaries, the bottom two bits (always zero) are encoded implicitly, and the range is accordingly larger. -- block-offset->bit-string: This procedure is given a byte offset and +- BLOCK-OFFSET->BIT-STRING: This procedure is given a byte offset and a boolean flag indicating whether this is the offset to the start of a compiled code block or to another block-offset, and returns the encoded value of this offset. -- make-nmv-header: This procedure is given the size in long-words of a +- MAKE-NMV-HEADER: This procedure is given the size in long-words of a block of instructions, and constructs the non-marked-vector header that must precede the instructions in memory in order to prevent the garbage collector from examining the data as Scheme objects. This @@ -1118,14 +1122,25 @@ in long-words (excluding the header itself). The following three parameters define how instruction fields are to be assembled in memory depending on the ``endianness'' (byte ordering) of the architecture. You should be able to use the MC68020 (big endian) -or the Vax (little endian) version. - -- instruction-insert! is a procedure, that given a bit-string -encoding instruction fields, a larger bit-string into which the -smaller should be inserted, a position within the larger one, and a -continuation, inserts the smaller bit-string into the larger at the -specified position, and returns the new bit position at which the -immediately following instruction field should be inserted. +or the Vax (little endian) version, or the MIPS version which is +conditionalized for both possibilities since MIPS processors can be +configured either way. + +- INSTRUCTION-INSERT! is a procedure, that given a bit-string encoding +instruction fields, a larger bit-string into which the smaller should +be inserted, a position within the larger one, and a continuation, +inserts the smaller bit-string into the larger at the specified +position, and returns the new bit position at which the immediately +following instruction field should be inserted. + +- INSTRUCTION-INITIAL-POSITION is a procedure, that given a bit-string +representing a segment of compiled code, returns the bit-string +position at which instruction-insert! should insert the first +instruction. + +- INSTRUCTION-APPEND is a procedure, that given the bit-string +encoding successive (fields of) instructions, produces the bit-string +that corresponds to their concatenation in the correct order. * coerce.scm: This file defines a set of coercion procedures. These @@ -1133,8 +1148,9 @@ procedures are used to fill fields in instructions. Each coercion procedure checks the range of its argument and produces a bit string of the appropriate length encoding the argument. Most coercions will coerce their signed or unsigned argument into a bit string of the -required fixed length. - +required fixed length. On some machines (e.g. HP PA), some coercions +may permute the bits appropriately. + * insmac.scm: This file defines port-specific syntax used in the assembler, and the procedure PARSE-INSTRUCTION, invoked by the syntax expander @@ -1166,7 +1182,7 @@ multiple of 8. BYTE is used primarily for instruction opcodes. OPERAND is used for general addressing modes. DISPLACEMENT is used for PC-relative branch displacements. - + - MC68020: (WORD ( ) ( ) @@ -1201,9 +1217,8 @@ corresponding to the lowest numbered range containing the value of instructions of non-decreasing lengths for the branch tensioner to work correctly. Note that the MC68020 port uses GROWING-WORD instead of VARIABLE-WIDTH as the keyword for this syntax. - ==> This should probably be changed. - + * inerly.scm: This file provides alternative expanders for the port-specific syntax. These alternative expanders are used when the assembly @@ -1226,7 +1241,7 @@ code has been placed in instr1.scm, and the MIPS port has no port-specific qualifiers and transformers. Qualifiers and transformers are described further in the chapter on the syntax of translation rules. - + * instr.scm: These files define the instruction set of the architecture by using the syntax defined in insmac.scm and inerly.scm. There can be @@ -1238,8 +1253,8 @@ actually used by the back end in the LAPGEN rules and utility procedures. Privileged/supervisory instructions, BCD (binary coded decimal) instructions, COBOL-style EDIT instructions, etc., can probably be safely ignored. - - 4.5 Disassembler files: + + 4.5. Disassembler files: The disassembler is almost completely machine dependent. For many machines, a reasonable disassembler could be derived from the @@ -1283,7 +1298,7 @@ This file also contains a state machine that allows the disassembler to display data appearing in the instruction stream in an appropriate format (gc and format words, mainly), and heuristics for displaying addressing modes and PC-relative offsets in a more legible form. - + Note that the output of the disassembler need not be identical to the input of the assembler. The disassembler is used almost exclusively for debugging, and additional syntactic hints make it easier to read. @@ -1311,7 +1326,7 @@ assembler. The assembler need not be rule-based, since it is machine dependent, but given the availability of the rule language, using it may be the easiest way to write it. - 5.1 Rule syntax + 5.1. Rule syntax The assembler rules use a somewhat different syntax from the rest and will be described later. @@ -1366,13 +1381,13 @@ will match (MULTIPLE 14 7) and (MULTIPLE 36 4), but will not match (MULTIPLE FOO 3), (MULTIPLE 37 4), (MULTIPLE 2), (MULTIPLE 14 2 3), nor (HELLO 14 7). Note that rules need not have qualifiers. - + * is an arbitrary Lisp expression whose value is the translation determined by the rule. It will typically use the variables bound by ``?'' to perform the translation. The statement and predicate rules use the LAP macro to generate sequences of assembly language instructions. - + The assembler rules use the following syntax: (DEFINE-INSTRUCTION @@ -1410,7 +1425,7 @@ rule bodies are a consequence of the WORD syntax. The meaning of the commas is identical to the meaning of the commas in a ``backquote'' Scheme expression, and is briefly described in section 5.3.1. - 5.2 Rule variable syntax. + 5.2. Rule variable syntax. Although qualifiers and the simple variable syntax shown are sufficient, some additional variable syntax is available for common @@ -1450,14 +1465,14 @@ will match (2 . HELLO), Q will be bound to -21, and Z will be bound to this syntax is used nowhere. The early parser does not understand it. Should it be flushed? - 5.3 Writing statement rules. + 5.3. Writing statement rules. Statement rules provide the translation between RTL instructions and fragments of assembly language. Most RTL instructions are assignments, where an RTL register is written with the contents of a virtual location or the result of some operation. - 5.3.1 Output of the statement rules + 5.3.1. Output of the statement rules The output of the statement rules is a fragment of assembly language written in the syntax expected by the LAP assembler. The fragments, @@ -1488,14 +1503,18 @@ to return a fragment, and do away with INST. An additional macro, INST-EA, is provided to construct a piece of assembly language representing an addressing mode. For example, INST-EA is used by the following procedure in the Vax back-end: + (define (non-pointer->ea type datum) (if (and (zero? type) (<= 0 datum 63)) (INST-EA (S ,datum)) (INST-EA (&U ,(make-non-pointer-literal type datum))))) + where non-pointer->ea may be used in + (LAP (MOV L ,(non-pointer->ea ) ,(any-register-reference target))) + INST-EA is superfluous on machines without general addressing modes (i.e. load-store architectures). @@ -1507,44 +1526,79 @@ The macros LAP, INST, and INST-EA, besides providing the functionality of QUASIQUOTE, also provide a hook for the compiler pre-processing time assembly of the code generated by the rules. - 5.3.2 Hardware register allocation + 5.3.2. Hardware register allocation Hardware register allocation occurs during the RTL->LAP translation. The rules, besides generating assembly language, invoke utilities -provided by the register allocator to manipulate machine registers and -aliases for pseudo-registers and temporaries. - -The register allocator maintains the mapping between pseudo-registers -and hardware registers, and inserts additional assembly language -instructions between the fragments generated by the rules to shuffle -data. - -The register allocator manipulates RTL registers, but there are also -wrappers for the common cases that return and manipulate register -references. Register references are fragments of assembly lanuage -that refer to the registers. For example, on the MC68k, register d3 -is represented as RTL register number 3, and a register reference for -it would be (d 3). - -If you have carefully chosen your RTL register numbers for machine -registers to match the hardware numbering, and your assembly language -does not distinguish between references to a register and other -fields, you can ignore register references and use the RTL register -numbers directly. This is a common situation with load-store -architectures. - +provided by the register allocator to reserve and free hardware +registers on which the operations can be performed. + +Hardware registers are often divided into different non-overlapping +types that are used in different operations. For example, modern +hardware typically has a set of integer registers and a set of +floating point registers. Address operations typically require +operands in integer registers, while floating point operations +typically require floating point registers. On some machines, notably +the Motorola 68K family, the integer register set is further +subdivided into types with specific operations (address and data). + +The register allocator manipulates RTL registers. RTL registers are +just small integers. The low end of the valid range of RTL registers +is used to represent the physical registers of the processor (called +machine registers), and the rest of the numbers represent virtual +(pseudo) registers. The core allocator operations are given an RTL +register number and a register type, and return a suitable machine +register to be used for the operation. + +A machine register that temporarily holds the value of a pseudo +register is called an ``alias'' for the pseudo register. A pseudo +register may have many valid aliases simultaneously (usually of +different types), but any assignment to the pseudo register will +invalidate all aliases but one, namely the machine register actually +written. + +The register allocator maintains a table of associations, called the +register map, that associates each pseudo register with its valid +aliases, and each machine register with the pseudo register whose +value it holds (if any). The register allocator routines modify the +register map after aliases are requested and invalidated, and they +generate assembly language instructions to perform the necessary data +motion at run time. These instructions are usually inserted before +the code output of the RTL rule in execution. + +As a convenience, the register allocator also provides operations that +manipulate register references. A register reference is a fragment of +assembly language, typically a register addressing mode for general +register machines, that when inserted into a LAP instruction, denotes +the appropriate register. For example, on the MC68k, physical +register D3 is represented as RTL register number 3, and a register +reference for it would be ``(D 3)''. RTL pseudo register 44 may at +some point have RTL hardware register 3 as its only data-register +alias. At that time, (REGISTER-ALIAS 44 'DATA) would return 3. + +If you have chosen your RTL register numbers for machine registers so +that they match the hardware numbers, and your assembly language does +not distinguish between references to a register and other fields, you +can ignore register references and use the RTL register numbers +directly. This is commonly the case when using integer registers in +load-store architectures. + The interface to the register allocator is defined in compiler/back/lapgn2.scm. Not all ports use all of the procedures defined there. Often a smaller subset is sufficient depending on -whether there are general addressing modes, etc. A list of -the most frequently used follows: - [*JMiller: I'd like a picture showing RTL reg., hardware reg., map, -etc.] - -* LOAD-ALIAS-REGISTER! expects an RTL register and a register type and -returns a machine register of the specified type that is an alias for -the RTL register and contains the current value of the RTL register. -This procedure should only be used for source RTL registers. +whether there are general addressing modes, etc. A list of the most +frequently used follows: + +* REGISTER-ALIAS expects an RTL register and a register type, and +returns a machine register of the specified type that is a valid alias +for that RTL register if there is one, or false if there is none. +This procedure should only be used for source operand RTL registers. +If the register type is false, then REGISTER-ALIAS will return any +valid alias. + +* LOAD-ALIAS-REGISTER! is like REGISTER-ALIAS but always returns a +machine register, allocating one of the specified type if necessary. +This procedure should only be used for source operand RTL registers. REFERENCE-ALIAS-REGISTER! performs the same action but returns a register reference instead of an RTL register number. @@ -1568,7 +1622,7 @@ valid. Note that STANDARD-REGISTER-REFERENCE should be used only for source pseudo-registers (i.e. those that already contain data), and may return a memory reference for those machines with general addressing modes if there is no preferred type or alternates are acceptable. - + * MOVE-TO-ALIAS-REGISTER! expects a source RTL register, a register type, and a target RTL register. It returns a new alias for the target of the specified type containing a copy of the current contents @@ -1580,93 +1634,113 @@ alias for target. register type and returns an appropriate register containing a copy of the source. The register is intended for temporary use, that is, use only within the code generated by the expansion of the current RTL -instruction. The register becomes automatically available for +instruction, and as such it should not be permanently recorded in the +register map. The register becomes automatically available for subsequent RTL instructions. MOVE-TO-TEMPORARY-REGISTER! attempts to use an existing alias for the source RTL register if it is not the -last remaining alias or the value of the source is not needed -later. +last remaining alias or the value of the source is not needed later. * REUSE-PSEUDO-REGISTER-ALIAS! expects an RTL register, a register type, and two procedures. It attempts to find a reusable alias for the RTL register of the specified type, and invokes the first -procedure passing it the alias if it succeeds, or the second -procedure with no arguments if it fails. MOVE-TO-ALIAS-REGISTER! -and MOVE-TO-TEMPORARY-REGISTER! are written in terms of -REUSE-PSEUDO-REGISTER-ALIAS! but occasionally neither meets the -requirements. - +procedure giving it the alias if it succeeds, or the second procedure +with no arguments if it fails. MOVE-TO-ALIAS-REGISTER! and +MOVE-TO-TEMPORARY-REGISTER! use REUSE-PSEUDO-REGISTER-ALIAS! but +occasionally neither meets the requirements. + * NEED-REGISTER! expects an RTL machine register and informs the register allocator that the rule in use requires that register so it should not be available for subsequent requests while translating the -current RTL statement or expression. The register is available for -later RTL statements or expressions (unless the appropriate rules -invoke NEED-REGISTER! all over). The procedures described above that -allocate and assign aliases call NEED-REGISTER! behind the scenes, -but you may occasionally need to invoke it explicitly. +current RTL instruction. The register is available for later RTL +instructions unless the relevant rules invoke NEED-REGISTER! again. +The procedures described above that allocate and assign aliases and +temporary registers call NEED-REGISTER! behind the scenes, but you will +need to invoke it explicitly when calling out-of-line routines. * LOAD-MACHINE-REGISTER! expects an RTL register and an RTL machine register and generates code that copies the current value of the RTL register to the machine register. It is used to pass arguments in registers to out-of-line code, typically in the compiled code runtime library. - [*Markf: Explain the register map.] * ADD-PSEUDO-REGISTER-ALIAS! expects an RTL pseudo-register and an available machine register (no longer an alias), and makes the specified machine register an alias for the pseudo-register. * CLEAR-REGISTERS! expects any number of RTL registers and clears them -from the register map, pushing their current contents to memory if +from the register map, preserving their current contents in memory if needed. It returns the code that will perform the required motion at runtime. It should be used before invoking LOAD-MACHINE-REGISTER! to ensure that the potentially valid previous contents of the machine -register have been saved if necessary. +register have been saved. * CLEAR-MAP! deletes all aliases from the register map, pushing the -contents held in aliases into the memory homes if needed. -This procedure returns an assembly language code fragment, and is -typically used before invoking out-of-line code. - -* DELETE-DEAD-REGISTERS! informs the register allocator that RTL -pseudo registers whose contents will not be needed after the RTL rule -being translated can be eliminated from the register map and their -aliases reused for other purposes. +data only held in aliases into the memory homes if needed. This +procedure returns an assembly language code fragment, and is typically +used before invoking out-of-line code. +* DELETE-DEAD-REGISTERS! informs the register allocator that RTL +pseudo registers whose contents will not be needed after the current +RTL instruction can be eliminated from the register map and their +aliases subsequently used for other purposes. + Most of the rules are actually written in terms of port-specific -procedures that invoke the procedures listed above in particular fixed -patterns. For example, on a machine with general addressing modes and -memory operands, we might define +procedures that invoke the procedures listed above in fixed ways. +Rule bodies typically match of the following code pattern: + + (let* ((rs1 (standard-source source1)) + (rs2 (standard-source source2)) + (rt (standard-target target))) + (LAP ...)) + +where STANDARD-SOURCE and STANDARD-TARGET are port-specific +procedures. + +On a machine with general addressing modes and memory operands, we +might provide their definitions as follows: + (define (standard-source rtl-reg) (standard-register-reference rtl-reg 'GENERAL true)) (define (standard-target rtl-reg) (delete-dead-registers!) (reference-target-alias! rtl-reg 'GENERAL)) -while on a load-store architecture we might define + +while on a load-store architecture we might define them as follows: + (define (standard-source rtl-reg) (load-alias-register! rtl-reg 'GENERAL)) (define (standard-target rtl-reg) (delete-dead-registers!) (allocate-alias-register! rtl-reg 'GENERAL)) -- VERY IMPORTANT: - -This example brings up the cardinal rule of RTL assignment rules: Any rule -that writes an RTL pseudo-register MUST invoke DELETE-DEAD-REGISTERS! -after allocating aliases for the necessary sources but before -allocating an alias for the target. Rules frequently expand into the -following pattern: - (let* ((r1 (standard-source source1)) - (r2 (standard-source source2)) - (rt (standard-target target))) - (LAP ...)) + - VERY IMPORTANT: - + +This example brings up the cardinal rule of RTL assignments: -Note that LET would not work in the above example since Scheme does not -specify the order of argument evaluation, and Liar chooses arbitrary -orders. + Any rule that writes into an RTL pseudo-register MUST invoke + DELETE-DEAD-REGISTERS! after allocating aliases for the necessary + sources but before allocating an alias for the target. + +If this is not done, the register allocator may decide to spill +no-longer valid data into memory, which will probably make the +compiler get confused in other ways or cause garbage collection +problems later. If it is done too early, the last valid alias for a +source operand may have been reused in the interim, and the compiler +will assume that the source quantity is contained in memory and will +often generate code that fetches and operates on garbage. + +Note that the example above uses LET* instead of LET. LET would not +work in the above example because Scheme does not specify the order of +argument evaluation, and Liar chooses arbitrary orders, so the +DELETE-DEAD-REGISTERS! implicit in STANDARD-TARGET might be called too +early possibly causing STANDARD-SOURCE to fail. MOVE-TO-ALIAS-REGISTER! invokes DELETE-DEAD-REGISTERS! because it simultaneously allocates an alias for a source and for a target. +Thus, if there are other source operands, their aliases must be +allocated before MOVE-TO-ALIAS-REGISTER! is invoked. - 5.3.3 Invocation rules, etc. + 5.3.3. Invocation rules, etc. The meaning and intent of most statement rules in an existing port is readily apparent. The more arcane rules have to do with procedures @@ -1706,10 +1780,8 @@ caches are not automatically kept consistent by the hardware synchronized by the Scheme system. On machines where the programmer is given no control over the caches, this will be very hard to do. -On machines where the control is minimal or flushing is expensive -(i.e., there is a single instruction or operating-system call to flush -the complete caches or synchronize both caches), the following -solution can be used to amortize the cost: +On machines where the control is minimal or flushing is expensive, the +following solution can be used to amortize the cost: The CONS-CLOSURE rules can generate code to allocate a closure from a pre-allocated pool and invoke an out-of-line routine to refill the @@ -1732,11 +1804,11 @@ but and the fixed-routine will do something like load 0(return-address),rtemp jmp 0(rtemp) - + The 68040 version of the Motorola 68000 family port uses this trick because the 68040 cache is typically configured in copyback mode, and synchronizing the caches involves a supervisor call. - + * (INVOCATION:UUO-LINK (? frame-size) (? continuation) (? name)) This rule is used to invoke a procedure named by a free variable. It is the rule used to generate a branch to an execute cache as @@ -1745,6 +1817,12 @@ in the compiled code block by using FREE-UUO-LINK-LABEL, and should then branch to the instruction portion of the execute cache. FRAME-SIZE is the number of arguments passed in the call, plus one. +* (INVOCATION:GLOBAL-LINK (? frame-size) (? continuation) (? name)) + This rule is identical to the previous one, except that the free +variable must be looked up in the global environment. It is used to +improve the expansion of some macros that insert explicit references +to the global environment. + * (INVOCATION-PREFIX:MOVE-FRAME-UP (? frame-size) (? address)) This rule is used to shift call frames on the stack to maintain proper tail recursion. ADDRESS specifies where to start pushing the @@ -1759,7 +1837,6 @@ time of the call, and the section of the stack that contains enclosing environment frames for the called procedure. Two addresses are specified and the one that is closest to the current stack pointer should be used, that is the numerically lower of the two addresses. - ==> This rule need not need not exist in the RTL. It could be expanded into comparisons and uses of INVOCATION-PREFIX:MOVE-FRAME-UP with computed values. @@ -1781,7 +1858,7 @@ generates the following code: = MemTop> Each of the individual headers is somewhat idiosyncratic, but the -idiosyncracies are captured in the machine-independent runtime +idiosyncrasies are captured in the machine-independent runtime library. Note that procedures that expect dynamic links must guarantee that the @@ -1789,7 +1866,7 @@ dynamic link is preserved around the execution of the interrupt handler. This is accomplished by invoking an alternate entry point in the runtime library and passing along the contents of the dynamic link register. - + * (CLOSURE-HEADER (? label-name) (? nentries) (? entry)) NENTRIES is the number of entry points that the closure object has, and ENTRY is the zero-based index for this entry point. Closure @@ -1800,7 +1877,7 @@ push the resulting object on the Scheme stack. When backing out for interrupts, they may have to adjust the canonical closure object to be the real closure object if these two are different. You should read the section on closures in cmpint.txt for a more complete explanation. - + * (ASSIGN (REGISTER (? target)) (CONS-CLOSURE (ENTRY:PROCEDURE (? procedure-label)) (? min) (? max) (? size))) @@ -1820,7 +1897,7 @@ allocate a closure object with NENTRIES entry points. SIZE is the number of words allocated for free variables, and ENTRIES is a vector of entry-point descriptors. Each descriptor is a list containing a label, a min, and a max as in the rule above. - + The file compiler/machines/port/rules3.scm contains most of these procedure-related rules. It also contains three procedures that generate assembly language and are required by the compiler. Both of @@ -1853,9 +1930,8 @@ block to be linked is stored, environment of evaluation should be stored, FREE-OFFSET is the offset of the first linker section in the other compiled code block, and - N-SECTIONS is the number of linker sections in the other compiled -code block. - + N-SECTIONS is the number of linker sections in the other block. + * (GENERATE/CONSTANTS-BLOCK consts reads writes execs global-execs statics) This procedure generates the LAP directives used to generate the constants section of a compiled code block. The constants section @@ -1890,7 +1966,7 @@ EXTRACT_EXECUTE_CACHE_SYMBOL macros from microcode/cmpint-port.h work correctly. The arity MUST NOT be overwritten when the execute cache is initialized to contain instructions. - 5.3.4 Fixnum rules. + 5.3.4. Fixnum rules. Scheme's generic arithmetic primitives cannot be open-coded fully for space reasons. Most Scheme code that manipulates numbers manipulates @@ -1914,7 +1990,7 @@ these open-codings must also detect when the result will not fit in a fixnum in order to invoke the out-of-line utility that will handle them correctly. -Most hardware provide facilities for detecting and branching if an +Most hardware provides facilities for detecting and branching if an integer operation overflows. Fixnums cannot use these facilities directly, because of the tag bits at the high-end of the word. To be able to use these facilities (and get the sign bit in the right @@ -1970,7 +2046,7 @@ The reason is that VECTOR-REF and VECTOR-SET! translate into a sequence that uses these patterns when the index is not a compile-time constant. - 5.3.5 Rules to invoke the runtime library + 5.3.5. Rules to invoke the runtime library Some of the rules issue code that invokes the runtime library. The runtime library is invoked through a primary entry point, @@ -2028,7 +2104,7 @@ in the same way that SCHEME-TO-INTERFACE and SCHEME-TO-INTERFACE-JSB take them, but avoid passing the utility index, and may do part or all of the work of the utility in assembly language instead of invoking the portable C version. - + The following is a possible specialized version of apply where the special entry point expects the procedure argument on the stack rather than in a fixed register: @@ -2038,8 +2114,8 @@ stack rather than in a fixed register: (LAP ,@(clear-map!) ,@(load-rn frame-size 2) (JMP ,entry:compiler-apply))) - - 5.4 Writing predicate rules. + + 5.4. Writing predicate rules. Predicate rules are used to generate code to discriminate between alternatives at runtime. The code generated depends on the @@ -2088,8 +2164,8 @@ returned by the rule body will perform any work needed before the compare-and-branch instructions, and the arguments to SET-CURRENT-BRANCHES! will generate the compare-and-branch instructions. - -For example, on the Vax, a machine with implicit condition codes, + +For example, on the DEC Vax, a machine with implicit condition codes, where compare (and most) instructions set the hidden condition-code register, a predicate rule could be as follows: (define-rule predicate @@ -2103,7 +2179,7 @@ register, a predicate rule could be as follows: ,(any-register-reference register-2)))) The prefix code performs the comparison. The arguments to SET-CURRENT-BRANCHES! branch depending on the result. - + On the HP Precision Architecture (Spectrum), a machine with compare-and-branch instructions, the same rule would be written as follows: @@ -2122,7 +2198,7 @@ follows: (LAP))) There is no prefix code, and the arguments to SET-CURRENT-BRANCHES! perform the comparison and branch. - + The (OVERFLOW-TEST) predicate condition does not fit this model neatly. The current compiler issues overflow tests when open-coding generic arithmetic. Fixnum overflow implies that bignums should be @@ -2172,20 +2248,270 @@ A more efficient solution, currently employed in the MIPS port (version 4.87 or later) depends on the fact that the RTL instruction immediately preceding an RTL OVERFLOW-TEST encodes the arithmetic operation whose overflow condition is being tested. Given this -assumption, the rule for OVERFLOW-TEST need not generate any code, and -the rule for the arithmetic operation can generate both the prefix -code and invoke SET-CURRENT-BRANCHES! as appropriate. This is -possible because the RTL encoding of arithmetic operations includes a -boolean flag that specifies whether the overflow condition is desired -or not. +assumption (that the arithmetic operation producing the overflow +conditions and the test of such condition are adjacent), the rule for +OVERFLOW-TEST need not generate any code, and the rule for the +arithmetic operation can generate both the prefix code and invoke +SET-CURRENT-BRANCHES! as appropriate. This is possible because the +RTL encoding of arithmetic operations includes a boolean flag that +specifies whether the overflow condition is desired or not. + + 6. Suggested ordering of tasks. + +The task of porting the compiler requires a lot of work. In the past, +it has taken approximately three full weeks for a single person +knowledgeable with MIT Scheme and the compiler, but without +documentation. This guide was written after the first three ports. + +One unfortunate aspect is that a lot of mechanism must be in place +before most of the compiler can be tested. In other words, there is a +lot of code that needs to be written before small pieces can be +tested, and the compiler is not properly organized so that parts of it +can be tested independently. Keeping this in mind, here is a +suggested ordering of the tasks: + + 6.1. Learn the target instruction set well. + +In particular, pay close attention to the branch and jump instructions +and to the facilities available for controlling the processor caches +(if necessary). You may need to find out the facilities that the +operating system provides if the instructions to control the cache are +privileged instructions. + + 6.2. Write microcode/cmpaux-port.m4: + +cmpaux.txt documents the entry points that this file must provide. +You need not use m4, but it is convenient to conditionalize the code +for debugging and different type code size. If you decide not to use +it, you should call your file cmpaux-port.s + + 6.2.1. Determine your C compiler's calling convention. Find out what +registers have fixed meanings, which are supposed to be saved by +callees if written, and which are supposed to be saved by callers if +they contain useful data. + + 6.2.2. Find out how C code returns scalars and small C structures. +If the documentation for the compiler does not describe this, you can +write a C program consisting of two procedures, one of which returns a +two-word (two int) struct to the other, and you can examine the +assembly language produced by the compiler. + + 6.2.3. Decide how registers are going to be used and split between +your C compiler and Liar. If your architecture has a large register +set, you can let C keep those registers to which it assigns a fixed +meaning (stack pointer, frame pointer, data segment pointer), and use +the rest for Liar. If your machine has few registers or you feel more +ambitious, you can give all the registers to Liar, but the code for +transferring control between both languages will become more complex. +Either way, you will need to choose appropriate registers for the Liar +fixed registers (stack pointer, free pointer, register block pointer, +dynamic link register and optionally, datum mask, return value +register, memtop register, and scheme_to_interface address pointer). + + 6.2.4. Design how scheme compiled code will invoke the C utilities. +Decide where the parameters (maximum of four) to the utilities will be +passed (preferably wherever C procedures expect arguments), and where +the utility index will be passed (preferably in a C caller-saves +register). + + 6.2.5. Given all this, write a minimalist cmpaux-port.m4. In other +words, write those entry points that are absolutely required +(C_to_interface, interface_to_C, interface_to_scheme, and +scheme_to_interface). Be especially careful with the code that +switches between calling conventions and register sets. +C_to_interface and interface_to_scheme must switch between C and Liar +conventions, while scheme_to_interface must switch the other way. +interface_to_C must return from the original call to C_to_interface. +Make sure that C code always sees a valid C register set and that code +compiled by Liar always sees a valid Scheme register set. + + 6.3. Write microcode/cmpint-port.h: + +cmpint.txt documents most of the definitions that this file must +provide. + + 6.3.1. Design the trampoline code format. Trampolines are used to +invoke C utilities indirectly. In other words, Scheme code treats +trampolines like compiled Scheme entry points, but they immediately +invoke a utility to accomplish their task. Since +return-to-interpreter is implemented as a trampoline, you will need to +get this working before you can run any compiled code at all. + + 6.3.1. Design the closure format and the execute cache format. This +is needed to get the Scheme part of the compiler up AND to get the +compiled code interface in the microcode working. Try to keep the +number of instructions low since closures and execute caches are very +common. + + 6.3.2. Design the interrupt check instructions that are executed on +entry to every procedure, continuation, and closure. Again, try to +keep the number of instructions low, and attempt to make the +non-interrupting case fast at the expense of the case when interrupts +must be processed. + + 6.3.3. Given all this, write cmpint-port.h. Be especially careful +with the code used to extract and insert absolute addresses into +closures and execute caches. A bug in this code would typically +manifest itself much later, after a couple of garbage collections. + + 6.4. Write machin.scm: + +Most of the definitions in this file have direct counterparts or are +direct consequences of the code in microcode/cmpaux-port.m4 and +microcode/cmpint-port.h, so it will be mostly a matter of re-coding +the definitions in Scheme rather than C or assembly language. + + 6.5. Write the assembler: + +You can write the assembler any old way you want, but it is easier to +use the branch tensioner and the rest of the facilities if you use the +same conventions that the existing assemblers use. In particular, +with any luck, you will be able to copy inerly.scm, insmac.scm, and +parts of assmd.scm verbatim from an existing port, and for most +machines, coerce.scm is straightforward to write. + +Note that assmd.scm defines utilities that depend almost exclusively +on the endianness of the architecture. You may want to start with the +MIPS version since this version accommodates both endianness +possibilities as MIPS processors can be configured either way. +If your processor has fixed endianness, you can prune the +inappropriate code. The rest of the code in assmd.scm is either +constant, or must agree with definitions in microcode/cmpint-port.h. + +Assuming that you decide to use the same structure as existing +assemblers, you may need to write parsers for addressing modes if your +machine has them. You can use the versions in the 68020 (bobcat) and +Vax ports for guidance. Addressing modes are described by a set of +conditions under which they are valid, and some output code to issue. +The higher-level code that parses instructions in insmac.scm must +decide where the bits for the addressing modes must appear. The 68020 +version divides the code into two parts,, the part that is inserted +into the opcode word of the instruction (further subdivided into two +parts), and the part that follows the opcode word as an extension. +The Vax version produces all the bits at once since addressing modes +are not split on that architecture. You should write the addressing +mode definitions in port/insutl.scm, plus any additional transformers +that the instruction set may require. + +Once you have the code for the necessary addressing modes and +transformers (if any), and the parsing code for their declarations in +port/insmac.scm, writing the instr.scm files should not be hard. +Remember to include pseudo-opcodes for inserting constants in the +assembly language, and for declaring external labels so that the +gc-offset to the beginning of the compiled code block will be inserted +correctly. See for example, the definition of the EXTERNAL-LABEL +pseudo-opcode in machines/mips/instr1.scm, and its use in +machines/mips/rules3.scm. + + 6.6. Write the LAPGEN rules: + +You will need to have lapgen.scm, rules1.scm, rules2.scm, and +rules3.scm. rules4.scm is not used by the compiler with the ordinary +switch settings and the code may not longer work in any of the ports, +and rulfix.scm and rulflo.scm are only necessary to open code fixnum +and flonum arithmetic. A good way to reduce the amount of code needed +at first is to turn primitive open coding off, and ignore rulfix.scm +and rulflo.scm. + +Lapgen.scm need not include the shared code used to deal with fixnums +and flonums, but will require the rest, especially the code used to +invoke utilities in the compiled code interface. + +rules1.scm and rules2.scm are relatively straightforward since the +RTL instructions whose translations are provided there typically map +easily into instructions. + +rules3.scm is an entirely different matter. It is probably hardest +file to write when porting the compiler. The most complicated parts +to understand, and write, are the closure code, the invocation prefix +code, and the block assembly code. + + The block assembly code can be taken from another port. You will +only have to change how the transmogrify procedure works to take into +account the size and layout of un-linked execute caches. + + The invocation prefix code is used to adjust the stack pointer, and +move a frame in the stack prior to a call to guarantee proper tail +recursion. The frame moved is the one pointed at by the stack +pointer, and it may be moved a distance known at compile time +(invocation-prefix:move-frame-up rules) or a distance that cannot be +computed statically (invocation-prefix:dynamic-link rules). The +move-frame-up rules are simple, but you should remember that the +starting and ending locations for the frame may overlap, so you must +ensure that data is not overwritten prematurely. The dynamic link +rules are similar to the move-frame-up rules (and typically share the +actual moving code) but must first decide the location where the frame +should be placed. This is done by comparing two possible values for +the location, and choosing the value closest to the current stack +pointer (i.e. numerically lower since the stack grows towards smaller +addresses). Again, the source and target locations for the frame may +overlap, so the generated code must be careful to move the data in +such a way that no data will be lost. + + The closure code is the most painful to write. When writing +cmpint-port.h you decided what the actual code in closure entries +would be, and the code for closure headers is a direct consequence of +this. The combination of the instructions in a closure object, the +helper instructions in assembly language (if any), and the +instructions in the closure header must ultimately push the closure +object (or its canonical representative) on the stack as if it were +the last argument to the procedure, and pending interrupts (and gc) +must be checked on entry to the closure. The interrupt back-out code +is different from the ordinary procedure interrupt back-out code +because the procedure object (the closure or its representative) is on +top of the stack. + + The cons-closure rules are used to allocate closure objects from the +runtime heap. Some of this allocation/initialization may be done out +of line, especially of ``assembling'' the appropriate instructions on +the fly would require a lot of code. In addition, you may have to +call out-of-line routines to synchronize the processor caches or +block-allocate multiple closure entries. + + Make sure that you test this code thoroughly when the compiler is up +enough to compile simple programs. + + 6.7. Write stubs for remaining port files: + +rgspcm.scm and dassm1.scm can be copied verbatim from any other port. + +lapopt.scm only needs to define an identity procedure. + +rules4.scm, rulfix.scm, rulflo.scm, and rulrew.scm need not define any +rules, since you can initially turn off open coding of primitive +operators. + +dassm2.scm and dassm3.scm need not be written at first, but they are +useful to debug the assembler (since disassembling some code should +produce code equivalent to the input to the assembler) and compiler +output when you forgot to make it output the LAP. - 6. Building and testing the compiler. + 6.8. Write the compiler-building files: + +make.scm, and comp.cbf should be minorly modified copies of the +corresponding files in another port. + +comp.sf and decls.scm can be essentially copied from another port, but +you will need to change the pathnames to refer to your port directory +instead of the one you copied them from, and in addition, you may have +to add or remove instr and other files as appropriate. + + 6.9. After the preliminary code works: + +Once the compiler is up enough to successfully compile moderately +complex test programs, and the compiled code and the interface have +been tested by running the code, you probably will want to go back and +write the files that were skipped over. In particular, you definitely +should write rulfix.scm and rulrew.scm, and rulflo.scm and the +disassembler if at all possible. + + 7. Building and testing the compiler. Once the port files have been written, you are ready to build and test the compiler. The first step is to build an interpreted compiler and run simple programs. Most simple bugs will be caught by this. - 6.1 Re-building scheme + 7.1. Re-building scheme. You need to build a version of the microcode with the compiled code interface (portable runtime library) in it. Besides writing @@ -2209,7 +2535,6 @@ agree on the number of tag bits if it needs it at all. If your version of m4 does not support command-line definitions, you can use the s/ultrix.m4 script to overcome this problem. Look at the m/vax.h and s/ultrix.h files for m4-related definitions. - ==> We should just switch the default to 6 bits and be done with it. - Modify ymakefile to include the processor dependent section that @@ -2253,8 +2578,8 @@ again after executing (begin (cd "") (load "runtim.sf")) - - 6.2 Building an interpreted compiler + + 7.2. Building an interpreted compiler. Once you have a new microcode, compatible runtime system, and ready cref, you can pre-process the compiler as follows: @@ -2263,7 +2588,7 @@ cref, you can pre-process the compiler as follows: compiler/machines/port directory to the compiler directory. - For convenience, make a link from compiler/machines/port to -compiler/port . +compiler/port. - Invoke scheme with the ``-band runtime+sf.com'' option, and then execute @@ -2297,15 +2622,16 @@ of the procedures defined by it. You should then be able to invoke the compiler by giving scheme the ``-compiler'' option, and use it by invoking CF. - 6.3 Testing the compiler + 7.3. Testing the compiler. There is no comprehensive test suite for the compiler. There is, however, a small test suite that is likely to catch gross errors. The -files for the test suite are in compiler/tests/port. Each file +files for the test suite are in compiler/etc/tests. Each file contains a short description of how it can be used. A good order to try them is + three.scm expr.scm pred.scm close.scm @@ -2329,7 +2655,8 @@ A good order to try them is unv.scm tail.scm - sort/*.scm + y.scm + sort/*.scm (see sort/README for a description) The programs in the first list test various aspects of code generation. The programs in the second list test the handling of various dynamic @@ -2358,7 +2685,7 @@ execute (initialize-microcode-dependencies!) after loading arith.com and before invoking procedures defined there. - 6.4 Compiling the compiler + 7.4. Compiling the compiler. The real test of the compiler comes when it is used to compile itself and the runtime system. Re-compiling the system is a slow process, @@ -2382,9 +2709,11 @@ pattern that you used to generate the interpreted compiler on the Sparc, but running everything on the Vax, and then compiling the cross-compiler on the Vax by running scheme with the ``-compiler'' option, and typing + (begin (cd "") (load "comp.cbf")) + before loading and dumping the compiler. Once you have the cross-compiler, you can use CROSS-COMPILE-BIN-FILE @@ -2400,20 +2729,17 @@ idioms useful: To translate the original .moc files to .psb files, you should use microcode/Bintopsb on the Vax as follows: - Bintopsb upgrade_cc ci_version=?? ci_processor=?? foo.psb -where the value of ci_version should be the value of -COMPILER_INTERFACE_VERSION in microcode/cmpint.c, and the value of -ci_processor should be the value of COMPILER_PROCESSOR_TYPE defined in -microcode/cmpint-port.h. -==> This is redundant. If ci_processor or ci_version are supplied, -Bintopsb should assume upgrade_cc. Furthermore, it should not be -necessary to supply ci_version when it is not changing. + Bintopsb ci_processor=?? foo.psb + +where the value of ci_processor should be the value of +COMPILER_PROCESSOR_TYPE defined in microcode/cmpint-port.h. You can then generate the target .moc files by using microcode/Psbtobin on the Sparc as follows: - Psbtobin allow_cc foo.moc + Psbtobin allow_cc foo.moc + * Distributing the task over several machines: You can use more than one machine to compile the sources. If the @@ -2425,7 +2751,7 @@ CROSS-COMPILE-DIRECTORY, that use a simple-minded protocol based on creating .tch files to reserve files to compile, and can therefore be run on many machines simultaneously without uselessly repeating work or getting in each other's way. - + These two methods are not exclusive. We typically bring up the compiler on a new machine by distributing the cross-compilation job. @@ -2439,8 +2765,8 @@ or re-link the kernel), you may want to use microcode/bchscheme instead of microcode/scheme. Bchscheme uses a disk file for the spare heap, rather than a region of memory, putting the available memory to use at all times. - - 6.5 Compiler convergence testing + + 7.5. Compiler convergence testing. Once you have a compiled compiler, you should run the same test suite that you ran with the interpreted compiler. Once you have some degree @@ -2473,7 +2799,7 @@ subdirectory for each of the source directories, and move all the .com and .binf files there. You can then use compiler/comp.cbf, or compiler/etc/xcbfdir and RECOMPILE-DIRECTORY to regenerate the compiler. - + If you generated the stage1 compiled compiler by running the compiler interpreted, the new .com files should match the stage1 .com files. If you generated the stage1 compiler by cross-compilation, they will @@ -2488,7 +2814,9 @@ binaries, you can use COMPARE-COM-FILES, defined in compiler/etc/comcmp, to compare the binaries. The simplest way to use it is to also load compiler/etc/comfiles and then use the CHECK-STAGE procedure. + (check-stage "STAGE2" '("runtime" "sf" "compiler/base")) + will compare the corresponding .com files from runtime and runtime/STAGE2, sf and sf/STAGE2, and compiler/base and compiler/base/STAGE2. @@ -2501,66 +2829,684 @@ that can safely be ignored. Generally, differences in constants can be ignored, but length and code differences should be understood. The code in question can be disassembled to determine whether the differences are real or not. - - 6.6 Things to watch out for. While testing the compiler, in addition to checking for the correct operation of the compiled code, you should also watch out for crashes and other forms of unexpected failure. In particular, hardware traps -(e.g. segmentation violations, illegal instructions) occurring during -the re-compilation process are a good clue that there is a problem -somewhere. - -The worst bugs to track are interrupt related, and garbage-collection -related. They will often make the compiled code crash at seemingly -random points, and are very hard to reproduce. A common source of -this kind of bug is a problem in the rules for procedure headers. -Make sure that the rules for the various kinds of procedure headers -generate the desired code, and that the desired code operates -correctly. You can test this explicitly by using an assembly-language -debugger (e.g. gdb, adb) to set breakpoints at the entry points of -various kinds of procedures. When the breakpoints are reached, you -can bump the Free pointer to a value larger than MemTop, so that the -interrupt branch will be taken. If the code continues to execute -correctly, you are probably safe. You should especially check -procedures that expect dynamic links for these must be saved and -restored correctly. Closures should also be tested carefully, since -they need to be reentered correctly, and the closure object on the -stack may have to be bumped. - [*JMiller: Show examples from stable ports and how to use gdb/adb to -debug it.] +(e.g. segmentation violations, bus errors, illegal instructions) +occurring during the re-compilation process are a good clue that there +is a problem somewhere. + + 8. Debugging. + +The process of porting a compiler, due to its complexity, is unlikely +to proceed perfectly. Things are likely to break more than once while +running the compiler and testing the compiled code. + +Debugging a compiler is not trivial, because often the failures +(especially after a while) will not manifest themselves until days, +weeks, or months after the compiler was released, at which point the +context of debugging the compiler has been swapped out by the +programmer. Second-order compiler bugs do not make things any easier. + +Liar does not have many facilities to aid in debugging. This section +mentions some of the few, and some techniques to use with +assembly-language debuggers (gdb, dbx, or adb). + +The main assumption in this section is that the front end and other +port-independent parts of the compiler work correctly. Of course, +this cannot be guaranteed, but in all likelihood virtually all of the +bugs that you will meet when porting the compiler will be in the new +port-specific code. + +If you need to examine some of the front-end data structures, you may +want to use the utilities in base/debug.scm which is loaded in the +compiler by default. In particular, you will want to use PO (for +print-object) to examine compiler data structures, and +DEBUG/FIND-PROCEDURE to map procedure names to the data structures +that represent the procedures, or more correctly, the lambda +expressions. + + 8.1. Preliminary debugging of the compiled code interface. + +The first item of business, after the microcode interface +(cmpaux-port.m4 and cmpint-port.h) has been written, is to guarantee +that properly constructed compiled code addresses do not confuse the +garbage collector. This can be done before writing any of the +remaining files, but you must have rebuilt the microcode and the +runtime.com band. + +A simple test to run is the following: + + (define foo + ((make-primitive-procedure 'COERCE-TO-COMPILED-PROCEDURE) + (lambda (x y) + (+ x y)) + 2)) + + (gc-flip) + (gc-flip) + (gc-flip) + +If the system does not crash or complain, in all likelihood the +garbage collector can now properly relocate compiled code objects. + +This object can also be used to test parts of the compiled code +interface. FOO is bound to a trampoline that will immediately revert +back to the interpreter when invoked. + +The next test is to determine that FOO works properly. You can follow +the execution of FOO by using a debugger and placing breakpoints at + + cmpint.c:apply_compiled_procedure, + cmpaux-port.s:C_to_interface, + cmpaux-port.s:scheme_to_interface (or trampoline_to_interface if it +is written), + cmpint.c:comutil_operator_apply_trap, + cmpint.c:comutil_apply, and + cmpaux-port.s:interface_to_C + +and then evaluating (FOO 3 4). + +When setting the breakpoints, remember that C_to_interface, +scheme_to_interface, and interface_to_scheme are not proper C +procedures, so you should use the instruction-level breakpoint +instructions or formats, not the C procedure breakpoint instructions +or formats. If you are using adb, this is moot, since adb is purely +an assembly-language debugger. If you are using gdb, you should use +``break *&C_to_interface'' instead of ``break C_to_interface''. If +you are using dbx, you will want to use the ``stopi'' command, instead +of the ``stop'' command to set breakpoints in the assembly language +routines. + +Make sure that the arguments to comutil_operator_apply_trap look +plausible and that the registers have the appropriate contents when +going into scheme code and back into C. In particular, you probably +should examine the contents of the registers right before jumping into +the trampoline code, and single step the trampoline code until you get +back to scheme_to_interface. + +In order to parse the Scheme objects, you may want to keep a copy of +microcode/type.names handy. This file contains the names of all the +scheme type tags and their values as they appear in the most +significant byte of the word when type tags are 8 bits long and 6 +bits long. Remember that you may have to insert segment bits into +addresses in order to examine memory locations. + +You should also make sure that an error is signalled when FOO is +invoked with the wrong number of arguments, and that the system +correctly recovers from the error (i.e., it gives a meaningful error +message and an error prompt, and resets itself when you type ^G). + +This test exercises most of the required assembly language code. The +only entry point not exercised is interface_to_scheme. + + 8.2. Debugging the assembler. + +Assuming that the compiler generates correctly formatted compiled code +objects, fasdump should be able to dump them out without a problem. +If you have problems when dumping the first objects, and assuming that +you ran the tests in section 8.1., then in all likelihood the block +offsets are not computed correctly. You should probably re-examine +the rule for the EXTERNAL-LABEL pseudo operation, and the block-offset +definitions in machines/port/assmd.scm. + +Once you can dump compiled code objects, you should test the +assembler. A simple, but somewhat inconvenient way of doing this is +to use adb as a disassembler as follows: + +Scheme binary (.bin and .com) files have a 50 longword header that +contain relocation information. The longword that follows immediately +is the root of the dumped object. + +If COMPILER:COMPILE-BY-PROCEDURES? is false, the compiler dumps a +compiled entry point directly, so the format word for the first entry +is at longword location 53 (* 4 = 0xd4), and the instructions follow +immediately. + +If COMPILER:COMPILE-BY-PROCEDURES? is true, the compiler dumps an +Scode object that contains the first entry point as the ``comment +expression''. The format word for the first entry point is then at +longword location 55 (* 4 = 0xdc), and the instructions for the +top-level block follow immediately. + +Thus, assuming that there are four bytes per Scheme object (unsigned +long in C), and that foo.com was dumped by the compiler with +COMPILER:COMPILE-BY-PROCEDURES? set to false, the following would +disassemble the first 10 instructions of the generated code. + + adb foo.com + 0xd8?10i + +If COMPILER:COMPILE-BY-PROCEDURES? was set to true, the following +would accomplish the same task: + + adb foo.com + 0xe0?10i + +You can use adb in this way to compare the input assembly language to +its binary output. Remember that you can obtain the input assembly +language by using the COMPILER:GENERATE-LAP-FILES? switch. + + 8.3. Setting breakpoints in Scheme compiled code. + +Compiled code is not likely to work correctly at first even after the +compiler stops signalling errors. In general, when you find that +compiled code executes incorrectly, you should try to narrow it down +as much as possible by trying the individual procedures, etc., in the +code, but ultimately you may need the ability to set instruction-level +breakpoints and single-step instructions in compiled code. + +A problem peculiar to systems in which is relocated on the fly is that +you can't, in general, obtain a permanent address for a procedure or +entry point. The code may move at every garbage collection, and if +you set a machine-level breakpoint with a Unix debugger, and then the +code moves, you will probably get spurious traps when re-running the +code. Unix debuggers typically replace some instructions at the +breakpoint location with instructions that will cause a specific trap, +and then look up the trapping location in some table when the debugged +process signals the trap. + +One way around this problem is to ``purify'' all compiled scheme code +that you will be setting breakpoints in. If you purify the code, it +will move into ``constant space'' and remain at a constant location +across garbage collections. The PURIFY procedure expects an object +to purify as the first argument, a boolean flag specifying whether the +object should be moved into constant space (if false) or pure space +(if true) as a second argument, and a boolean flag specifying whether +purification should occur immediately (if false) or be delayed until +the next convenient time (if true) as a third argument. You should +use + + (purify false false) + +when moving compiled code objects to constant space for debugging +purposes. Alternatively, you can specify that you want the code to be +purified when you load it by passing appropriate arguments to LOAD. +Since load delays the actual purification, you will need to invoke +GC-FLIP twice to flush the purification queue. + +At any rate, setting the actual breakpoints is not completely trivial, +since you must find the virtual address of the instructions, and then +use them with your assembly-language debugger. + +The simplest way to do this is to get Scheme to print the datum of +the entry points for you, and then type one of Scheme's interrupt +characters to gain the debugger's attention and set the breakpoint. +Continuing from the debugger will allow you to type further +expressions to Scheme. + +Imagine, for example, that we have compiled the runtime file list.scm +and some of the procedures in it, namely MEMBER and MEMQ, do not work +properly. After purifying the code, you can type ``memq'' at the +read-eval-print loop and it will respond with something like + ;Value 37: #[compiled-procedure 37 ("list" #x5A) #x10 #x10FE880] + +This specifies that MEMQ is bound to an ordinary compiled procedure +(not a closure), that it was originally compiled as part of file +``list'' and it was part of compiled code block number 90 (= #x5a) in +that file. The current datum of the object is #x10FE880 (this is the +address without the segment bits if any), and the offset to the +beginning of the containing compiled code block is #x10. Thus you +could then gain the attention of the debugger and set a breakpoint at +address 0x10fe880 (remember to add the segment bits if necessary) and +after continuing back into Scheme, use MEMQ normally to trigger the +breakpoint. + +The case with MEMBER is similar. Typing ``member'' at the +read-eval-print loop will cause something like + ;Value 36: #[compiled-closure 36 ("list" #x56) #x5C #x10FE484 #x1180DF8] +to be printed. + +This specifies that MEMBER is bound to a compiled closure, originally +in compiled code block number 86 (= #x56) in file ``list'', that the +entry point to the closure is at datum #x1180DF8, that the entry point +shared by all closures of the same lambda expression (the ``real'' +entry point) is at datum #x10FE484, and that this entry point is at +offset #x5C of its containing compiled code block. + +Thus if you want to single step the closure code (a good idea when you +try them at first), you would want to set a breakpoint at address +#x1180DF8 (plus appropriate segment bits), and if you want to single +step or examine the real code, then you should use address #x10FE484. +Note that if you purified the code when you loaded it, the real code +would be pure, but the closure itself would not be, since it was not a +part of the file being loaded (closures are allocated dynamically). +Thus, before setting any breakpoints in a closure, you should probably +purify it as specified above, and obtain its address again, since it +would have moved in the meantime. + +For example, if you are using adb on an HP-PA (where the top two bits +of a data segment address are always 01, and thus the top nibble of a +Scheme's object address is always 4), assuming that the interpreter +printed the above addresses, + +0x41180df8:b would set a breakpoint in the MEMBER closure, +0x410fe484:b would set a breakpoint at the start of the code shared + by MEMBER and all closures of the same lambda expression, +0x410fe880:b would set a breakpoint at the start of MEMQ. + +If you are using gdb on a Motorola 68020 machine, with no segment bits +for the data segment, the equivalent commands would be + +break *0x1180df8 for a breakpoint in the MEMBER closure, +break *0x10fe484 for a breakpoint in MEMBER's shared code +break *0x10fe880 for a breakpoint in MEMQ. + + 8.4. Examining arguments to Scheme procedures. + +Commonly, after setting a breakpoint at some interesting procedure, +you will want to examine the arguments. Currently, Liar passes all +arguments on the stack. The Scheme stack always grows towards +decreasing addresses, and arguments are pushed from left to right. + +On entry to a procedure, the stack frame must have been reformatted so +that optional arguments have been defaulted, and tail (lexpr) +arguments have been collected into a list (possibly empty). Thus on +entry to an ordinary procedure's code the stack pointer points to the +rightmost parameter in the lambda list, and the rest of the parameters +follow at increasing longword addresses. + +This is also the case on entry to a closure object's instructions, but +by the time the shared code starts executing the body of the lambda +expression, the closure object itself (or its canonical +representative) will be on top of the stack with the arguments +following in the standard format. On entry to a closure's shared +code, the stack will contain the arguments in the standard format, but +the closure object will typically be in the process of being +constructed and pushed, depending on the distribution of the task +between the instructions in the closure object, the optional helper +instructions in assembly language, and the instructions in the closure +header. + +If you are using adb, you can use the following commands: + $r displays the names and contents of the processor + registers, + : 0x8280 0x003e + +The adb (and dbx) command ``0x4fc564/2x'' should yield similar output. + +This confirms that the object in question is a return address because +the most significant bit of the format word is a 1, and it would be a +0 for a procedure. The encoded gc offset is 0x3e. GC offsets and +format words are described in detail in cmpint.txt. + +Since the least significant bit of the GC offset is 0, it points +directly to the vector header of a compiled code block. The real +offset is + ((0x3e >> 1) << PC_ZERO_BITS) = 0x3e + +Thus the compiled code block starts at location + 0x8fe2ee-0x3e = 0x008fe2b0 + +Examining the top two words at this address (using the gdb command + ``x/2wx 0x008fe2b0'' or the adb and dbx command ``0x8fe2b0/2X'') we +see + 0x8fe2b0 : 0x00000028 0x9c00001d + +The first word is an ordinary vector header, and the second a +non-marked vector header used to inform the GC that 0x1d longwords of +binary data, the actual instructions, follow. + +The last location in the vector, containing the environment, is at +address + 0x8fe2b0+4*0x28 = 0x8fe350 + +Examining the preceding adjacent location and this one (using gdb's + ``x/2wx 0x8fe34c'' or a similar command for a different debugger) +will yield + 0x8fe34c : 0x0498ef9c 0x4898e864 + +The second object is the loading environment, and the first object is +the debugging information, in this case a pair. This pair can be +examined (using gdb's ``x/2wx 0x98ef9c'' or an analogous command for a +different debugger) to yield + 0x98ef9c : 0x789ac5ec 0x6800001c + +The first object is a string, and the second a fixnum, indicating that +the return address at hand belongs to the compiled code block numbered +0x1c in the file whose name is that string. + +Scheme strings have two longwords of header, followed by an ordinary C +string that includes a null terminating character, thus the C string +starts at address 0x9ac5ec+4*2=0x9ac5f4, and the gdb command + ``x/s 0x9ac5f4'' or the adb and dbx command ``0x9ac5f4/s'' +will display + 0x9ac5f4 : (char *) 0x9ac5f4 "/usr/local/lib/mit-scheme/SRC/runtime/parse.binf" + +Thus the return address we are examining is at offset 0x3e in compiled +code block number 0x1c of the runtime system file ``parse.com''. + +If the disassembler is available, you can then use + (compiler:write-lap-file "parse") +to find out what this return address is, or if you compiled (or +re-compile) parse.scm generating lap files, you can probably guess +what return address is at offset 0x3e (the input lap files do not +contain computed offsets, since these are computed on final assembly). + +This interaction would remain very similar for other machines and +compiled entry points, given the same or similar debuggers. The +variations would be the following: + +- Segment bits might have to be added to the object datum components +to produce addresses. For example, on the HP-PA with segment bits 01 +at the most significant end of a word, the C string for Scheme string +object 0x789ac5ec would start at address 0x409ac5ec+8=0x409ac5f4, +instead than at address 0x9ac5ec+8=0x9ac5f4. + +- The gc offset might be computed differently, depending on the value +of PC_ZERO_BITS. For example, on a Vax, where PC_ZERO_BITS has the +value 0, an encoded offset of value 0x3e would imply a real offset of +value (0x3e >> 1)=0x1f. On a MIPS R3000, where PC_ZERO_BITS is 2, the +same encoded offset would encode the real offset value + ((0x3e >> 1) << 2)=0x7c. In addition, if the low order bit of the +encoded gc offset field were 1, a new gc offset would have to be +extracted, and the process repeated until the beginning of the block +was reached. + +- The constant offsets added to various addresses (e.g. that added to +a string object's address to obtain the C string's address) would vary +if the number of bytes per Scheme object (sizeof (unsigned long)) in C +were not 4. + +- Not all compiled entry points have debugging information descriptors +accessible the same way. Trampolines don't have them at all, and +closures have them in the shared code, not in the closure objects. To +check whether something is a trampoline, you can check the format +field (most trampolines have 0xfffd) or verify that the instructions +immediately call the compiled code interface. Closure objects have +type code MANIFEST-COMPILED-CLOSURE instead of MANIFEST-VECTOR in the +length word of the compiled code block. Once you obtain the real +entry point for a closure, you can use the same method to find out the +information about it. + + 8.6. Things to watch out for. + +The worst bugs to track are interrupt and garbage-collection related. +They will often make the compiled code crash at seemingly random +points, and are very hard to reproduce. A common source of this kind +of bug is a problem in the rules for procedure headers. Make sure +that the rules for the various kinds of procedure headers generate the +desired code, and that the desired code operates correctly. You can +test this explicitly by using an assembly-language debugger to set +breakpoints at the entry points of various kinds of procedures. When +the breakpoints are reached, you can bump the Free pointer to a value +larger than MemTop, so that the interrupt branch will be taken. If +the code continues to execute correctly, you are probably safe. You +should especially check procedures that expect dynamic links for these +must be saved and restored correctly. Closures should also be tested +carefully, since they need to be reentered correctly, and the closure +object on the stack may have to be de-canonicalized. Currently +C_to_interface and interface_to_scheme must copy the interpreter's +value register into the compiler's value register, and must extract +the address of this value and store it in the dynamic link register. Register allocation bugs also manifest themselves in unexpected ways. If you forget to use NEED-REGISTER! on a register used by a LAPGEN rule, or if you allocate registers for the sources and target of a rule in the wrong order (remember the cardinal rule!), you may not -notice for a long time, but some poor program will hit it. If this -happens, you will be lucky if you can find and disassemble a -relatively small procedure that does not operate properly, but -typically the only notice you will get is when Scheme crashes in an -unrelated place. Fortunately, this type of bug is reproducible. In -order to find the incorrectly compiled code, you can use binary search -on the sources by mixing interpreted and compiled binaries. When -loading the compiler, .bin files will be used for those files for -which the corresponding .com file does not exist. Thus you can move -.com files in and out of the appropriate directories, reload, and test -again. Once you determine the procedure in which the bug occurs, -re-compiling the module and examining the resulting RTL and LAP -programs should lead to identification of the bug. - - 7. Bibliography +notice for a long time, but some poor program will. If this happens, +you will be lucky if you can find and disassemble a relatively small +procedure that does not operate properly, but typically the only +notice you will get is when Scheme crashes in an unrelated place. +Fortunately, this type of bug is reproducible. In order to find the +incorrectly compiled code, you can use binary search on the sources by +mixing interpreted and compiled binaries. When loading the compiler, +.bin files will be used for those files for which the corresponding +.com file does not exist. Thus you can move .com files in and out of +the appropriate directories, reload, and test again. Once you +determine the procedure in which the bug occurs, re-compiling the +module and examining the resulting RTL and LAP programs should lead to +identification of the bug. + + 9. Bibliography 1. "Efficient Stack Allocation for Tail-Recursive Languages" by Chris Hanson, in Proceedings of the 1990 ACM Conference on Lisp and Functional Programming. 2. "Free Variables and First-Class Environments" by James S. Miller -and Guillermo J. Rozas, to appear in Lisp and Symbolic Computation, -March 1991. +and Guillermo J. Rozas, in Lisp and Symbolic Computation, 4, 107-141, +1991, Kluwer Academic Publishers. 3. "MIT Scheme User's Manual for Scheme Release 7.1" by Chris Hanson, distributed with MIT CScheme version 7.1. 4. "MIT Scheme Reference Manual for Scheme Release 7.1" by Chris Hanson, distributed with MIT CScheme version 7.1. + + A.1. MIT Scheme package system + +The MIT Scheme package system is used to divide large programs into +separate name spaces which are then ``wired together''. A large +program, like the runtime system, Edwin, or the compiler, has many +files and variable names all of which must exist at the same time +without conflict. The package system is a prototype system to +accomplish this separation, but will probably be replaced once a +better module system is developed. + +Currently, each package corresponds, at runtime, to a Scheme +environment. Environments have their usual, tree-shaped structure, +and packages are also structured in a tree, but the trees need not be +isomorphic, although they often are. + +Each package is given a name, e.g.: + + (compiler reference-contexts) + +whose corresponding environment can be found using the procedure +->ENVIRONMENT: + + (->environment '(compiler reference-contexts)) ;; Call this CR + +By convention, this package corresponds to an environment below the + (->environment '(compiler)) ;; Call this C + +environment, and therefore CR contains all variables defined in C, as +well as those specifically defined in CR. The package name ``()'' +corresponds to the system global environment. + +The package structure for the compiler is defined in the file + + /machines//comp.pkg + +In that file, each package has a description of the form: + + (define-package + (files ) + (parent ) + (export ) + (import )) + +where are the names of the files that should be loaded in to +package . + + (parent ) + +declares the package whose name is to be the parent +package of . Lexical scoping will make all variables visible in + also visible in . + +The EXPORT and IMPORT declarations are used to describe cross-package +links. A package may export any of its variables to any other +package using EXPORT; these variables will appear in both packages +(environments), and any side effect to one of these variables in +either package will be immediately visible in the other package. + +Similarly, a package may import any of another package's variables +using IMPORT. Any number (including zero) of IMPORT and EXPORT +declarations may appear in any package declaration. + +Here is an example package declaration, drawn from the compiler: + + (define-package (compiler top-level) + (files "base/toplev" + "base/crstop") + (parent (compiler)) + (export () + cf + compile-bin-file + compile-procedure + compile-scode + compiler:reset! + cross-compile-bin-file + cross-compile-bin-file-end) + (export (compiler fg-generator) + compile-recursively) + (export (compiler rtl-generator) + *ic-procedure-headers* + *rtl-continuations* + *rtl-expression* + *rtl-graphs* + *rtl-procedures*) + (export (compiler lap-syntaxer) + *block-label* + *external-labels* + label->object) + (export (compiler debug) + *root-expression* + *rtl-procedures* + *rtl-graphs*) + (import (runtime compiler-info) + make-dbg-info-vector) + (import (runtime unparser) + *unparse-uninterned-symbols-by-name?*)) + +The read-eval-print loop of Scheme evaluates all expressions in the +same environment. It is possible to change this environment using the +procedure GE, e.g.: + + (ge (->environment '(compiler top-level))) + +To find the package name of the current read-eval-print loop +environment, if there is one, evaluate: + + (pe) + +The package system is currently completely static; it is difficult to +create packages and wire them together on the fly. If you find that +you need to temporarily wire a variable to two different environments +(as you would do with an IMPORT or EXPORT declaration), use the +procedure ENVIRONMENT-LINK-NAME: + + (environment-link-name + + ) + +For example, to make WRITE-RESTARTS, originally defined in the +(runtime debugger) package, also visible in the (edwin debugger) +package, evaluate: + + (environment-link-name (->environment '(edwin debugger)) + (->environment '(runtime debugger)) + 'write-restarts)