Integrate Jmiller's latest changes:

author Guillermo J. Rozas <edu/mit/csail/zurich/gjr>

Mon, 9 Sep 1991 22:10:31 +0000 (22:10 +0000)

committer Guillermo J. Rozas <edu/mit/csail/zurich/gjr>

Mon, 9 Sep 1991 22:10:31 +0000 (22:10 +0000)
author Guillermo J. Rozas <edu/mit/csail/zurich/gjr>
Mon, 9 Sep 1991 22:10:31 +0000 (22:10 +0000)
committer Guillermo J. Rozas <edu/mit/csail/zurich/gjr>
Mon, 9 Sep 1991 22:10:31 +0000 (22:10 +0000)
diff --git a/v7/src/compiler/documentation/porting.guide b/v7/src/compiler/documentation/porting.guide

index 8f1e3dfe8c3047c9be06e3a2ed2f3ab331b99d6f..5b63b153ba870b04eefdb55dd1062259bb61e238 100644 (file)
--- a/v7/src/compiler/documentation/porting.guide
+++ b/v7/src/compiler/documentation/porting.guide
@@ -1,6 +1,6 @@
            Emacs: Please use -*- Text -*- mode.  Thank you.
  
-$Header: /Users/cph/tmp/foo/mit-scheme/mit-scheme/v7/src/compiler/documentation/porting.guide,v 1.22 1991/09/04 03:58:44 jinx Exp $
+$Header: /Users/cph/tmp/foo/mit-scheme/mit-scheme/v7/src/compiler/documentation/porting.guide,v 1.23 1991/09/09 22:10:31 jinx Exp $
  
  
  Copyright (c) 1991 Massachusetts Institute of Technology
@@ -54,7 +54,7 @@ from Mark Friedman.  Arthur Gleckler, Brian LaMacchia, Jim Miller, and
  Henry Wu have also contributed to the current version of Liar.  Many
  other people have offered suggestions and criticisms.
  
-The current Liar may never have existed had it not been for the
+The current Liar might never have existed had it not been for the
  efforts and help of the now-extinct BBN Butterfly Lisp group.  That
  group included Don Allen, Seth Steinberg, Larry Stabile, and Anthony
  Courtemanche.  Don Allen, in particular, babysat computers to
@@ -114,6 +114,9 @@ the fixed data that the compiled code uses at runtime.
  
  compiler/toplev.scm contains the top-level calls of the compiler and
  its pass structure.
+
+The ``.com'' files contain compiled code objects, which are linked
+further at load time.
  \f
         0.1.  Liar's package structure
  
@@ -165,7 +168,7 @@ sub-packages for various major utilities (linearizer, map-merger,
  etc.).
  
  (COMPILER ASSEMBLER):
-       This package contains most of the port-independent portion of
+       This package contains most of the machine-independent portion of
  the assembler.  In particular, it contains the bit-assembler, i.e.
  the portion of the assembler that accumulates the bit strings produced
  by ASSEMBLER and performs branch-tensioning on the result.
@@ -182,7 +185,7 @@ compiler.  compiler/machines/port/comp.pkg declares the packages and
  the files that constitute them.
  
  compiler/back:
-       This directory contains the port-independent portion of the
+       This directory contains the machine-independent portion of the
  back end.  It contains bit-string utilities, symbol table utilities,
  label management procedures, the hardware register allocator, and the
  top-level assembler calls.
@@ -241,13 +244,14 @@ Although this runtime library need not run in the context of the
  CScheme interpreter, currently the only implementation of this library
  runs from the interpreter and uses it for many of its operations.
  
-In other words, Liar does not depend on the interpreter directly, but
-indirectly through the runtime library.  It does depend on the ability
-to invoke CScheme primitives at runtime, some of which (eval, etc.)
-require the interpreter to be present.  It should be possible,
-however, to provide an alternate runtime library and primitive set
-that would allow code produced by Liar to run without the interpreter
-being present. (Foot) We often toy with this idea.
+In other words, code generated by Liar does not depend on the
+interpreter directly, but indirectly through the runtime library.  It
+does depend on the ability to invoke CScheme primitives at runtime,
+some of which (eval, etc.)  require the interpreter to be present.  It
+should be possible, however, to provide an alternate runtime library
+and primitive set that would allow code produced by Liar to run
+without the interpreter being present. (Foot) We often toy with this
+idea.
  
  On the other hand, since the only instance of the runtime library is
  that supplied by the interpreter, Liar currently assumes that the
@@ -260,13 +264,13 @@ microcode/cmpaux-port.m4 and microcode/cmpint.c .  The files
  cmpaux.txt and cmpint.txt document these files.  The documentation
  files may be found in the microcode or the documentation directories.
  
-microcode/cmpaux-port.m4 is an assembly language port-dependent file
+microcode/cmpaux-port.m4 is an assembly language machine-dependent file
  that allows compiled Scheme to call the C-written library routines and
  vice versa.  It is described in cmpaux.txt.
  
-microcode/cmpint.c defines the library in a machine/port-independent
-way, but requires some information about the port and this is provided
-in microcode/cmpint2.h, a copy (or link) of the appropriate
+microcode/cmpint.c defines the library in a machine-independent way,
+but requires some information about the port and this is provided in
+microcode/cmpint2.h, a copy (or link) of the appropriate
  microcode/cmpint-port.h file.  The microcode/cmpint-port.h files are
  described in cmpint.txt .
  
@@ -282,6 +286,9 @@ the future.
  
  If you have not yet read cmpaux.txt and cmpint.txt, please do so
  before reading the rest of this document.
+
+You should probably also read [1] and [2] for a discussion of some of
+the implementation issues.
  \f
                 2. Preliminary Observations
  
@@ -336,10 +343,11 @@ harder or impair the quality of the output code:
  
  - Liar generates code that passes arguments to procedures on a stack.
  This decision especially affects the performance on load-store
-architectures, common these days.  This may change in the future due
-to the fact that most modern machines have large register sets and
-memory-based operations are noticeably slower than register-based
-operations even when the memory locations have been cached.
+architectures, common these days.  Liar may be changed in the future
+to generate code that passes arguments in registers because most
+modern machines have large register sets and memory-based operations
+are slower than register-based operations even when the memory
+locations have been cached.
  
  - Liar assumes that pushing and popping elements from a stack is
  cheap.  Currently Liar does not attempt to bump the stack pointer once
@@ -371,10 +379,24 @@ must extracting, comparing, and inserting these tags be cheap, but
  decoding the address must be cheap as well.  These operations are
  relatively cheap on architectures with bit-field instructions, but
  more expensive if they must be emulated with bitwise boolean
-operations and shifts, as on the R3000.  Decoding a datum into an
+operations and shifts, as on the MIPS R3000.  Decoding a datum into an
  address may involve inserting segment bits in some of the positions
  where the tag is placed, further increasing the dependency on cheap
  bit-field manipulation.
+
+- The CScheme interpreter uses a particularly poor representation for
+fixnums, forcing Liar's hand.  Fixnums are suitably small integers.
+They are immediate objects with a particular tag.  This tag was not
+wisely chosen, making fixnum operations more expensive than they need
+to be.  This tag may be changed in the future.
+
+- The CScheme interpreter manipulates a stack that grows in a fixed
+direction (from higher to lower addresses).  On many modern machines,
+there are no special instructions to deal with the stack, so the
+decision is arbitrary.  On some machines, however, there are special
+instructions to pop and push elements on the stack.  Liar may not be
+able to use these instructions if the machine's preferred direction of
+stack growth does not match the interpreter's.
  \f
         2.3. Emulating an existing port.
  
@@ -385,15 +407,15 @@ trivially translating existing code.  In particular, if the
  architectures are really close, there may be no need for
  architecture-specific additional tuning.
  
-Note that the compiler is primarily developed on Motorola >=68020
-processors, so this is the best-tuned version, and the other ports are
-not very well tuned or not tuned at all.  If you improve an existing
-port, please share the improvements by notifying liar-implementors.
+The compiler is primarily developed on Motorola MC68020 processors, so
+this is the best-tuned version, and the other ports are not very well
+tuned or not tuned at all.  If you improve an existing port, please
+share the improvements by notifying liar-implementors.
  
  - If you have a Vax-like CISC machine, you can try starting from the
-Vax or the Motorola 68020 ports.  The Vax port was written by starting
-from the 68020 port.  This is probably the best solution for some
-architectures like the NS32000, and perhaps even the IBM 370.
+Vax or the Motorola MC68020 ports.  The Vax port was written by
+starting from the MC68020 port.  This is probably the best solution
+for some architectures like the NS32000, and perhaps even the IBM 370.
  
  - If you have an ``enlarged'' RISC processor, with some complex
  addressing modes, and bit-field instructions, you may want to start by
@@ -405,8 +427,8 @@ RS/6000.
  processor, you may want to start from this port.  Since the MIPS R3000
  is a minimalist architecture, it almost subsumes all other RISCs, and
  may well be a good starting point for all of them.  This is probably a
-good starting point for the Sparc.  Note that the MIPS port used the
-Spectrum port as its model.
+good starting point for the Sparc.  The MIPS port used the Spectrum
+port as its model.
  
  - If you have a machine significantly different from those listed
  above, you are out of luck and will have to write a port from scratch.
@@ -433,7 +455,7 @@ generated for a given program will vary from machine to machine.  The
  RTL can vary in the following ways:
  
  - RTL is a language for manipulating the contents of conceptual
-registers.  RTL registers are divided into ``pseudo-registers'' and
+registers.  RTL registers are divided into ``pseudo registers'' and
  ``machine registers''.  Machine registers represent physical hardware
  registers, some of which have been reserved and given fixed meanings
  by the port (stack pointer, value register, etc.)  while
@@ -446,13 +468,17 @@ more homogeneous, the RTL registers are not distinguished
  syntactically in the RTL, but are instead distinguished by their value
  range.  Machine registers are represented as the N lowest numbered RTL
  registers (where N is the number of hardware registers), and all
-others are pseudo-registers. Since some RTL instructions explicitly
+others are pseudo registers.  Since some RTL instructions explicitly
  mention machine registers and these (and their numbers) vary from
  architecture to architecture, the register numbers in an RTL program
-will vary depending on the back end in use.  Note that all
-pseudo-registers are equivalent, and all can hold arbitrary Scheme
-objects, while machine registers can be further divided into separate
-classes (e.g. address, data, and floating-point registers).
+will vary depending on the back end in use.  Machine registers may be
+divided into separate classes (e.g.  address, data, and floating-point
+registers) that can contain different types of values.  Pseudo
+registers are not distinguished a-priori, but the values stored in
+them must be consistent.  For example, if a floating point value is
+stored into a particular pseudo register, the register can only be
+mapped to floating-point machine registers, and non-floating-point
+values cannot be stored in it.
  
  - RTL assumes a load-store architecture, but can accommodate
  architectures that allow memory operands and rich addressing modes.
@@ -476,12 +502,12 @@ for richer instruction sets may require less simplification because
  hardware instructions and addressing modes that encode more
  complicated RTL patterns are directly available.
  
-- The open coding (inlining) of Scheme primitives is port-dependent.
-On some machines, for example, there is no instruction to multiply
-integers, and it may not be advantageous to open code the
-multiplication primitive.  The RTL for a particular program may
-reflect the set of primitive operations that the back end for the port
-can open code.
+- The open coding (inlining) of Scheme primitives is
+machine-dependent.  On some machines, for example, there is no
+instruction to multiply integers, and it may not be advantageous to
+open code the multiplication primitive.  The RTL for a particular
+program may reflect the set of primitive operations that the back end
+for the port can open code.
  \f
  The resulting RTL program is represented as a control flow-graph where
  each of the nodes has an associated list of RTL statements.  The edges
@@ -591,13 +617,13 @@ will open code (inline) MIT Scheme primitives.  It is usually set to
  true and should probably be left that way.  On the other hand, it is
  possible to do a lot less work in porting the compiler by not
  providing the open coding of primitives and turning this switch off.
-Note that some of the primitives are open coded by the
-machine-independent portion of the compiler, since they depend only on
-structural information, and not on the details of the particular
-architecture.  In other words, CAR, CONS, and many others can be
-open-coded in a port-independent way since their open codings are
-performed directly in the RTL.  Turning this switch to false would
-prevent the compiler from open coding these primitives as well.
+ Some of the primitives are open coded by the machine-independent
+portion of the compiler, since they depend only on structural
+information, and not on the details of the particular architecture.
+In other words, CAR, CONS, and many others can be open-coded in a
+machine-independent way since their open codings are performed
+directly in the RTL.  Turning this switch to false would prevent the
+compiler from open coding these primitives as well.
  
  * COMPILER:GENERATE-RTL-FILES? and COMPILER:GENERATE-LAP-FILES? These
  are mostly compiler debugging switches.  They control whether the
@@ -608,7 +634,9 @@ contain the input to the assembler.  Their usual value is false.
  * COMPILER:INTERSPERSE-RTL-IN-LAP? This is another debugging switch.
  If turned on, and COMPILER:GENERATE-LAP-FILES? is also on, the lap
  output file includes the RTL statements as comments preceding their
-LAP translations.
+LAP translations.  Its usual value is true.
+ ==> RTL predicates are not included, making the control-flow hard to
+follow.  This should be fixed.
  \f
  * COMPILER:OPEN-CODE-FLOATING-POINT-ARITHMETIC? This switch is
  defined in compiler/machines/port/machin.scm and determines whether
@@ -618,16 +646,17 @@ to true, otherwise to false.
  
  * COMPILER:PRIMITIVES-WITH-NO-OPEN-CODING This parameter is defined in
  compiler/machines/port/machin.scm.  It contains a list of primitive
-names that the port cannot open code.
-
-==> These last two parameters should probably be combined and
-inverted, i.e. COMPILER:PRIMITIVES-WITH-OPEN-CODINGS should replace
-both of the above.  This has the advantage that if the RTL level is
-taught how to deal with additional primitives, but not all ports have
-open codings for them, there is no need to change all the machin.scm
-files, only those for which the open coding has been provided.
-\f
-               4. Description of the port-specific files
+names that the port cannot open code.  Currently there is no simple
+list of all the primitives that Liar can open-code.  The list is
+implicit in the code contained in rtlgen/opncod.scm.
+ ==> The last two parameters should probably be combined and inverted.
+COMPILER:PRIMITIVES-WITH-OPEN-CODINGS should replace both of the
+above.  This has the advantage that if the RTL level is taught how to
+deal with additional primitives, but not all ports have open codings
+for them, there is no need to change all the machin.scm files, only
+those for which the open coding has been provided.
+\f
+               4. Description of the machine-specific files
  
  The following is the list of files that usually appears in the port
  directory.  The files can be organized differently for each port, but
@@ -727,10 +756,11 @@ You should be able to edit the version from another port in the
  appropriate way.  Mostly you will need to rename the port (i.e. mips
  -> sparc), and add/delete instruction and rule files as needed.
  
-==> decls.scm should probably be split into two sections:  The
+ ==> decls.scm should probably be split into two sections: The
  machine-independent dependency management code, and the actual
  declaration of the dependencies for each port.  This would allow us to
-share more of the code, and make the task of rewriting it less daunting.
+share more of the code, and make the task of rewriting it less
+daunting.
  \f
         4.2. Miscellaneous files:
  
@@ -740,14 +770,14 @@ invoking runtime library procedures.  This file is no longer machine
  dependent, since the portable library has made all the sets identical.
  It lives in machines/port for historical reasons, and should probably
  move elsewhere.  Obviously, you can just copy it from another port.
-==> Let's move it or get rid of it!
+ ==> Let's move it or get rid of it!
  
  * rulrew.scm:
         This file defines the simplifier rules that allow more
  efficient use of the hardware's addressing modes and other
  capabilities.  The rules use the same syntax as the LAPGEN rules, but
  belong in the (rule) rewriting database.  Although these rules are
-port-dependent, it should be straightforward to emulate what other
+machine-dependent, it should be straightforward to emulate what other
  ports have done in order to arrive at a working set.  Moreover, it is
  possible to start out with an empty set and only add them as
  inefficiencies are discovered in the output assembly language.  These
@@ -774,9 +804,9 @@ for completeness.
  - ENDIANNESS: Should be the symbol LITTLE if an address, when used as
  a byte address, refers to the least significant byte of the long-word
  addressed by it.  It should be BIG if it refers to the most
-significant byte of the long-word.  Note that the compiler has not been
-ported to any machines where the quantum of addressability is not an
-8-bit byte, so the notion may not apply to those.
+significant byte of the long-word.  The compiler has not been ported
+to any machines where the quantum of addressability is not an 8-bit
+byte, so the notion may not apply to those.
  
  - ADDRESSING-GRANULARITY: How many bits are addressed by the
  addressing quantum.  I.e., increasing an address by 1 will bump the
@@ -811,9 +841,9 @@ for example, this value should be twice scheme-object-width.
  - SIGNED-FIXNUM/UPPER-LIMIT: This parameter should be derived from
  others, but is specified as a constant due to a shortcoming of the
  compiler pre-processing system (EXPT is not constant-folded).  Use the
-commented-out expression to derive the value for your port.  Note that
-all values that should be derived but are instead specified as
-constants are tagged by a comment containing ``***''.
+commented-out expression to derive the value for your port.  All
+values that should be derived but are instead specified as constants
+are tagged by a comment containing ``***''.
  
  - STACK->MEMORY-OFFSET: This procedure is provided to accommodate
  stacks that grow in either direction, but we have not tested any port
@@ -823,7 +853,7 @@ be copied verbatim.
  
  - EXECUTE-CACHE-SIZE: This should match EXECUTE_CACHE_ENTRY_SIZE in
  microcode/cmpint-port.h, and is explained in cmpint.txt.
-==> We should probably rename one or the other to be alike.
+ ==> We should probably rename one or the other to be alike.
  
  The following parameters describe to the front-end the format of
  closures containing multiple entry points.  Closures are described in
@@ -887,7 +917,7 @@ holding a bit-mask used to clear type tags from objects (the pointer
  or datum mask).  All of these registers should be given additional
  symbolic names.
  
-==> What is MACHINE-REGISTER-KNOWN-VALUE used for?  It would seem that
+ ==> What is MACHINE-REGISTER-KNOWN-VALUE used for?  It would seem that
  the datum mask is a known value, but...  Currently all the ports seem
  to have the same definition.
  
@@ -977,7 +1007,7 @@ reserved by the port.
  
  - SORT-MACHINE-REGISTERS is a procedure that reorders a list of
  registers into the preferred allocation order.
-==> Is this right?
+ ==> Is this right?
  
  - REGISTER-TYPE is a procedure that maps RTL register numbers to their
  inherent register types (typically GENERAL and FLOAT).
@@ -1022,7 +1052,7 @@ to precede the root of the control flow graph.  Its output should use
  the assembler directive ENTRY-POINT and generate format and GC words
  for the entry point.
  
-The rest of the code in lapgen.scm is a port-specific set of utilities
+The rest of the code in lapgen.scm is a machine-specific set of utilities
  for the LAPGEN rules.  Some of the more common procedures are
  described in the section that covers the rules.
  
@@ -1056,13 +1086,16 @@ statements like continuation (return address) invocation, several
  mechanisms for invoking procedures, stack reformatting prior to
  invocation, procedure headers, closure object allocation, expression
  headers and declaring the data segment of compiled code blocks for
-assembly.
+assembly.  See [1] for some background information on stack
+reformatting, and [2] for a discussion of how calls to (the values of)
+free variables are handled by Liar.
  
  * rules4.scm:
-       This file contains RTL statement rules for the runtime library
+       This file contains RTL statement rules for runtime library
  routines that handle manipulation of variables in first class
-environments.  Most of these rules are no longer used by the compiler
-unless some switch settings vary.
+environments.  Many of these rules are no longer used by the compiler
+unless some switch settings are changed.  See [2] for a discussion of
+how Liar handles references to free variables.
  \f
  * rulfix.scm:
         This file contains statement and predicate rules for
@@ -1152,7 +1185,7 @@ required fixed length.  On some machines (e.g. HP PA), some coercions
  may permute the bits appropriately.
  
  * insmac.scm:
-       This file defines port-specific syntax used in the assembler,
+       This file defines machine-specific syntax used in the assembler,
  and the procedure PARSE-INSTRUCTION, invoked by the syntax expander
  for DEFINE-INSTRUCTION to parse the body of each of the instruction
  rules.  This code is typically complex and you are encouraged to
@@ -1207,6 +1240,7 @@ instructions is usually
        ...
        ((() ())
         <instruction-specifier-n>))
+
  Each instruction specifier is an ordinary (i.e. not VARIABLE-WIDTH)
  instruction specifier.  NAME is a variable to be bound to the
  bit-assembly-time value of EXPRESSION.  Each of the ranges
@@ -1215,30 +1249,31 @@ next, and () specifies no bound.  The final format chosen is that
  corresponding to the lowest numbered range containing the value of
  <expression>.  Successive instruction specifiers must yield
  instructions of non-decreasing lengths for the branch tensioner to
-work correctly.  Note that the MC68020 port uses GROWING-WORD instead
-of VARIABLE-WIDTH as the keyword for this syntax.
-==> This should probably be changed.
+work correctly.  The MC68020 port uses GROWING-WORD instead of
+VARIABLE-WIDTH as the keyword for this syntax.
+ ==> This should probably be changed.
  
  * inerly.scm:
-       This file provides alternative expanders for the port-specific
-syntax.  These alternative expanders are used when the assembly
-language that appears in the LAPGEN rules is assembled (early) at
-compiler pre-processing time.  That is, the procedures defined in this
-file are only used if COMPILER:ENABLE-EXPANSION-DECLARATIONS? is set
-to true.  If you reuse the code in insmac.scm from another port, you
-should be able to reuse the inerly.scm file from the same port.
-Alternatively, you can write a dummy version of this code and require
+       This file provides alternative expanders for the
+machine-specific syntax.  These alternative expanders are used when
+the assembly language that appears in the LAPGEN rules is assembled
+(early) at compiler pre-processing time.  That is, the procedures
+defined in this file are only used if
+COMPILER:ENABLE-EXPANSION-DECLARATIONS? is set to true.  If you reuse
+the code in insmac.scm from another port, you should be able to reuse
+the inerly.scm file from the same port.  Alternatively, you can write
+a dummy version of this code and require
  COMPILER:ENABLE-EXPANSION-DECLARATIONS? to be always false.  This
  switch defaults to false, currently.  The Spectrum and MIPS versions
  currently have dummy versions of this code.
  
  * insutl.scm:
-       This file defines port-specific rule qualifiers and
+       This file defines machine-specific rule qualifiers and
  transformers.  It is often used to define addressing-mode filters and
  handling procedures for architectures with general addressing modes.
  This file does not exist in the Spectrum port because all the relevant
  code has been placed in instr1.scm, and the MIPS port has no
-port-specific qualifiers and transformers.  Qualifiers and
+machine-specific qualifiers and transformers.  Qualifiers and
  transformers are described further in the chapter on the syntax of
  translation rules.
  \f
@@ -1272,7 +1307,7 @@ procedures referenced in dassm2.
  * dassm1.scm:
         This file contains the top-level of the disassembler.  It is
  not machine-dependent, and should probably be moved to another directory.
-==> Is compiler/back the right place for this?
+ ==> Is compiler/back the right place for this?
  
  * dassm2.scm:
         This file contains various utilities for the disassembler.  In
@@ -1282,8 +1317,8 @@ particular, it contains the definitions of
  - COMPILED-CODE-BLOCK/OBJECTS-PER-PROCEDURE-CACHE
  - COMPILED-CODE-BLOCK/OBJECTS-PER-VARIABLE-CACHE
    These parameters specify various relative sizes.
-==> Shouldn't these be in machin.scm?  The first two have counterparts
-there, and the last is always 1.
+ ==> Shouldn't these be in machin.scm?  The first two have
+counterparts there, and the last is always 1.
  
  - DISASSEMBLER/READ-VARIABLE-CACHE
  - DISASSEMBLER/READ-PROCEDURE-CACHE
@@ -1299,9 +1334,9 @@ to display data appearing in the instruction stream in an appropriate
  format (gc and format words, mainly), and heuristics for displaying
  addressing modes and PC-relative offsets in a more legible form.
  \f
-Note that the output of the disassembler need not be identical to the
-input of the assembler.  The disassembler is used almost exclusively
-for debugging, and additional syntactic hints make it easier to read.
+The output of the disassembler need not be identical to the input of
+the assembler.  The disassembler is used almost exclusively for
+debugging, and additional syntactic hints make it easier to read.
  
  * dassm3.scm:
         This file contains the code to disassemble one instruction at
@@ -1322,9 +1357,9 @@ instead of the assembler rule data base.
  
  There are three subsystems in Liar that use rule-based languages.
  They are the RTL simplifier, LAPGEN (RTL->LAP translation), and the
-assembler.  The assembler need not be rule-based, since it is machine
-dependent, but given the availability of the rule language, using it
-may be the easiest way to write it.
+assembler.  The assembler need not be rule-based, but given the
+availability of the rule language, using the rule mechanism may be the
+easiest way to write it.
  
         5.1. Rule syntax
  
@@ -1380,7 +1415,7 @@ For example,
  will match (MULTIPLE 14 7) and (MULTIPLE 36 4), but will not match
  (MULTIPLE FOO 3), (MULTIPLE 37 4), (MULTIPLE 2), (MULTIPLE 14 2 3),
  nor (HELLO 14 7).
-Note that rules need not have qualifiers.
+Rule qualifiers are optional.
  \f
  * <rule body> is an arbitrary Lisp expression whose value is the
  translation determined by the rule.  It will typically use the
@@ -1461,7 +1496,7 @@ For example,
  will match (2 . HELLO), Q will be bound to -21, and Z will be bound to
  (2 . HELLO), and will not match 34 or (HELLO . 2).
  
-==> The pattern parser seems to understand (?@ <name>) as well, but
+ ==> The pattern parser seems to understand (?@ <name>) as well, but
  this syntax is used nowhere.  The early parser does not understand it.
  Should it be flushed?
  \f
@@ -1497,8 +1532,8 @@ LAP:MAKE-LABEL-STATEMENT), since you may find yourself later wanting
  to change a single instruction into a fragment in a utility procedure,
  and having to find every use of the procedure.
  
-==> We should change the linearizer to expect LAP:MAKE-LABEL-STATEMENT
-to return a fragment, and do away with INST.
+ ==> We should change the linearizer to expect
+LAP:MAKE-LABEL-STATEMENT to return a fragment, and do away with INST.
  
  An additional macro, INST-EA, is provided to construct a piece of
  assembly language representing an addressing mode.  For example,
@@ -1550,31 +1585,37 @@ machine registers), and the rest of the numbers represent virtual
  register number and a register type, and return a suitable machine
  register to be used for the operation.
  
-A machine register that temporarily holds the value of a pseudo
-register is called an ``alias'' for the pseudo register.  A pseudo
-register may have many valid aliases simultaneously (usually of
-different types), but any assignment to the pseudo register will
-invalidate all aliases but one, namely the machine register actually
-written.
+A machine register that holds the value of a pseudo register is called
+an ``alias'' for the pseudo register.  A pseudo register may have many
+valid aliases simultaneously, usually of different types.  Any
+assignment to the pseudo register will invalidate all aliases but one,
+namely the machine register actually written, rather than copy the new
+value into all the previous aliases.  Thus source references and
+destination references have different effects, and are handled by
+different procedures in the register allocator.
+
+Pseudo registers have associated homes, memory locations that hold
+their values when the machine registers are needed for other purposes.
+Most pseudo registers are never written to their homes, since a pseudo
+register's value is usually kept in machine register aliases until the
+pseudo register is dead, i.e. until its value is no longer needed.  A
+pseudo register's aliases can be reused for other purposes if there
+are other remaining aliases or this is the last reference to the
+pseudo register.  An alias that can be reused is a ``reusable'' alias.
+Occasionally, the value of a pseudo register may be transferred to the
+register's home and the last alias invalidated, if the register
+allocator is running out of registers.  This is called ``spilling'' a
+register.
  
  The register allocator maintains a table of associations, called the
-register map, that associates each pseudo register with its valid
+``register map'', that associates each pseudo register with its valid
  aliases, and each machine register with the pseudo register whose
  value it holds (if any).  The register allocator routines modify the
  register map after aliases are requested and invalidated, and they
  generate assembly language instructions to perform the necessary data
-motion at run time.  These instructions are usually inserted before
-the code output of the RTL rule in execution.
-
-As a convenience, the register allocator also provides operations that
-manipulate register references.  A register reference is a fragment of
-assembly language, typically a register addressing mode for general
-register machines, that when inserted into a LAP instruction, denotes
-the appropriate register.  For example, on the MC68k, physical
-register D3 is represented as RTL register number 3, and a register
-reference for it would be ``(D 3)''.  RTL pseudo register 44 may at
-some point have RTL hardware register 3 as its only data-register
-alias.  At that time, (REGISTER-ALIAS 44 'DATA) would return 3.
+motion for spilling and re-loading at run time.  These instructions
+are usually inserted before the code output of the RTL rule in
+execution.
  
  If you have chosen your RTL register numbers for machine registers so
  that they match the hardware numbers, and your assembly language does
@@ -1583,6 +1624,17 @@ can ignore register references and use the RTL register numbers
  directly.  This is commonly the case when using integer registers in
  load-store architectures.
  \f
+As a convenience, the register allocator also provides operations that
+manipulate register references.  A register reference is a fragment of
+assembly language, typically a register addressing mode for general
+register machines, that when inserted into a LAP instruction, denotes
+the appropriate register.  For example, on the Motorola MC68020,
+physical register A3 is represented as RTL register number 11, and a
+register reference for it would be ``(A 3)''.  RTL pseudo register 44
+may at some point have RTL machine register 11 as its only
+address-register alias.  At that time, (REGISTER-ALIAS 44 'ADDRESS)
+would return 11.
+
  The interface to the register allocator is defined in
  compiler/back/lapgn2.scm.  Not all ports use all of the procedures
  defined there.  Often a smaller subset is sufficient depending on
@@ -1599,15 +1651,19 @@ valid alias.
  * LOAD-ALIAS-REGISTER! is like REGISTER-ALIAS but always returns a
  machine register, allocating one of the specified type if necessary.
  This procedure should only be used for source operand RTL registers.
-REFERENCE-ALIAS-REGISTER! performs the same action but returns a
-register reference instead of an RTL register number.
+
+* REFERENCE-ALIAS-REGISTER! performs the same action as
+LOAD-ALIAS-REGISTER! but returns a register reference instead of an
+RTL register number.
  
  * ALLOCATE-ALIAS-REGISTER! expects an RTL register and a register
  type, and returns a machine register of the specified type that is the
  only alias for the RTL register and should be written with the new
  contents of the RTL register.  ALLOCATE-ALIAS-REGISTER! is used to
-generate aliases for target RTL registers.  REFERENCE-TARGET-ALIAS!
-performs the same action but returns a register reference instead of
+generate aliases for target RTL registers.
+
+* REFERENCE-TARGET-ALIAS!  performs the same action as
+ALLOCATE-ALIAS-REGISTER! but returns a register reference instead of
  an RTL register number.  See CLEAR-REGISTERS! below.
  
  * STANDARD-REGISTER-REFERENCE expects an RTL register, a register
@@ -1618,27 +1674,27 @@ or sometimes of other types if the boolean is true.  In other words,
  the boolean argument determines whether other types are acceptable,
  although not desirable.  The register type may be false, specifying
  that there really is no preference for the type, and any reference is
-valid.  Note that STANDARD-REGISTER-REFERENCE should be used only for
-source pseudo-registers (i.e. those that already contain data), and may
-return a memory reference for those machines with general addressing
-modes if there is no preferred type or alternates are acceptable.
+valid.  STANDARD-REGISTER-REFERENCE should be used only for source
+operands (i.e. those that already contain data), and may return a
+memory reference for those machines with general addressing modes if
+there is no preferred type and alternates are acceptable.
  
  * MOVE-TO-ALIAS-REGISTER! expects a source RTL register, a register
  type, and a target RTL register.  It returns a new alias for the
  target of the specified type containing a copy of the current contents
  of the source.  Often this is accomplished by choosing an alias of the
  source that already contains the correct data and making it the only
-alias for target.
-
+alias for target.  MOVE-TO-ALIAS-REGISTER! attempts to reuse an alias
+for the source register.
+\f
  * MOVE-TO-TEMPORARY-REGISTER! expects a source RTL register and a
  register type and returns an appropriate register containing a copy of
  the source.  The register is intended for temporary use, that is, use
  only within the code generated by the expansion of the current RTL
  instruction, and as such it should not be permanently recorded in the
-register map.  The register becomes automatically available for
-subsequent RTL instructions.  MOVE-TO-TEMPORARY-REGISTER! attempts to
-use an existing alias for the source RTL register if it is not the
-last remaining alias or the value of the source is not needed later.
+register map.  The register becomes automatically freed for subsequent
+RTL instructions.  MOVE-TO-TEMPORARY-REGISTER! attempts to reuse an
+alias for the source register.
  
  * REUSE-PSEUDO-REGISTER-ALIAS! expects an RTL register, a register
  type, and two procedures.  It attempts to find a reusable alias for
@@ -1647,7 +1703,7 @@ procedure giving it the alias if it succeeds, or the second procedure
  with no arguments if it fails.  MOVE-TO-ALIAS-REGISTER!  and
  MOVE-TO-TEMPORARY-REGISTER! use REUSE-PSEUDO-REGISTER-ALIAS! but
  occasionally neither meets the requirements.
-\f
+
  * NEED-REGISTER! expects an RTL machine register and informs the
  register allocator that the rule in use requires that register so it
  should not be available for subsequent requests while translating the
@@ -1660,8 +1716,8 @@ need to invoke it explicitly when calling out-of-line routines.
  * LOAD-MACHINE-REGISTER! expects an RTL register and an RTL machine
  register and generates code that copies the current value of the RTL
  register to the machine register.  It is used to pass arguments in
-registers to out-of-line code, typically in the compiled code runtime
-library.
+fixed registers to out-of-line code, typically in the compiled code
+runtime library.
  
  * ADD-PSEUDO-REGISTER-ALIAS! expects an RTL pseudo-register and an
  available machine register (no longer an alias), and makes the
@@ -1684,17 +1740,18 @@ pseudo registers whose contents will not be needed after the current
  RTL instruction can be eliminated from the register map and their
  aliases subsequently used for other purposes.
  \f
-Most of the rules are actually written in terms of port-specific
+Most of the rules are actually written in terms of machine-specific
  procedures that invoke the procedures listed above in fixed ways.
-Rule bodies typically match of the following code pattern:
+Rule bodies typically match the following code pattern:
  
      (let* ((rs1 (standard-source source1))
            (rs2 (standard-source source2))
            (rt (standard-target target)))
        (LAP ...))
  
-where STANDARD-SOURCE and STANDARD-TARGET are port-specific
-procedures.
+where STANDARD-SOURCE and STANDARD-TARGET are machine-specific
+procedures.  The reason for the use of LET* (instead of LET) is given
+below.
  
  On a machine with general addressing modes and memory operands, we
  might provide their definitions as follows:
@@ -1729,9 +1786,9 @@ source operand may have been reused in the interim, and the compiler
  will assume that the source quantity is contained in memory and will
  often generate code that fetches and operates on garbage.
  
-Note that the example above uses LET* instead of LET.  LET would not
-work in the above example because Scheme does not specify the order of
-argument evaluation, and Liar chooses arbitrary orders, so the
+The example above uses LET* instead of LET.  LET would not work in the
+above example because Scheme does not specify the order of argument
+evaluation, and Liar chooses arbitrary orders, so the
  DELETE-DEAD-REGISTERS! implicit in STANDARD-TARGET might be called too
  early possibly causing STANDARD-SOURCE to fail.
  
@@ -1772,15 +1829,14 @@ passed and reformat the stack frame appropriately.
  is not a procedure, but a return address, a compiled expression, or a
  pointer to an internal label.
  
-The CONS-CLOSURE rules will dynamically create some new instructions
-in the runtime heap, and these instructions must be visible to the
+The CONS-CLOSURE rules will dynamically create some instructions in
+the runtime heap, and these instructions must be visible to the
  processor's instruction fetch unit.  If the instruction and data
-caches are not automatically kept consistent by the hardware
-(especially for newly addressed memory), the caches must be explicitly
+caches are not automatically kept consistent by the hardware,
+especially for newly addressed memory, the caches must be explicitly
  synchronized by the Scheme system.  On machines where the programmer
-is given no control over the caches, this will be very hard to do.
-
-On machines where the control is minimal or flushing is expensive, the
+is given no control over the caches, this will be very hard to do.  On
+machines where the control is minimal or flushing is expensive, the
  following solution can be used to amortize the cost:
  
  The CONS-CLOSURE rules can generate code to allocate a closure from a
@@ -1789,26 +1845,24 @@ pool when it is empty.  The routine allocates more space from the
  heap, initializes the instructions, and synchronizes the caches.
  
  Since the real entry points are not known until the closure objects
-are created, instead of using absolute jumps to the real entry
-points, the pre-allocated closures can contain jumps to a fixed
-routine that will extract the real entry point from the word pointed
-at by the return address and invoke it.  In other words, the code
-inserted in the closure objects will not be
-       jsr real-entry-point
-       <storage for first free variable>
-but
+are created, instead of using absolute jumps to the real entry points,
+the pre-allocated closures can contain jumps to a fixed routine that
+will extract the real entry point from the word pointed at by the
+return address and invoke it.  In other words, the code inserted in
+closure objects will be
+
         jsr fixed-routine
         <storage for real-entry-point>
-       <storage for first free variable>
  
-and the fixed-routine will do something like
+and fixed-routine, written in assembly language, will do something like
+
         load    0(return-address),rtemp
         jmp     0(rtemp)
-\f
+
  The 68040 version of the Motorola 68000 family port uses this trick
  because the 68040 cache is typically configured in copyback mode, and
-synchronizing the caches involves a supervisor call.
-
+synchronizing the caches involves an expensive supervisor (OS) call.
+\f
  * (INVOCATION:UUO-LINK (? frame-size) (? continuation) (? name))
    This rule is used to invoke a procedure named by a free variable.
  It is the rule used to generate a branch to an execute cache as
@@ -1821,14 +1875,55 @@ FRAME-SIZE is the number of arguments passed in the call, plus one.
    This rule is identical to the previous one, except that the free
  variable must be looked up in the global environment.  It is used to
  improve the expansion of some macros that insert explicit references
-to the global environment.
+to the global environment (e.g. The expansion for FLUID-LET inserts
+uses (ACCESS DYNAMIC-WIND #f) as the operator of a call).
  
  * (INVOCATION-PREFIX:MOVE-FRAME-UP (? frame-size) (? address))
    This rule is used to shift call frames on the stack to maintain
  proper tail recursion.  ADDRESS specifies where to start pushing the
  frame.  It should be a pointer into the used portion of the stack,
-i.e.  point to a higher address.
-
+i.e. point to a higher address.
+
+For example, assume that what follows depicts the stack before
+  (INVOCATION-PREFIX:MOVE-FRAME-UP 3 addr)
+
+       |               ...             |
+       |                               |
+       +-------------------------------+
+       |           <value n>           |
+addr ->        +-------------------------------+
+       |                               | direction of
+       |                               | stack growth
+       |                               |
+       |               ...             |       |
+       |                               |       |
+       |                               |       V
+       |                               |
+       +-------------------------------+
+       |           <value 3>           |
+       +-------------------------------+
+       |           <value 2>           |
+       +-------------------------------+
+       |           <value 1>           |
+spbf ->        +-------------------------------+
+
+Where spbf is the contents of the stack pointer register.
+After the invocation prefix, it will look as follows:
+
+       |               ...             |
+       |                               |
+       +-------------------------------+
+       |           <value n>           | direction of
+addr ->        +-------------------------------+ stack growth
+       |           <value 3>           |
+       +-------------------------------+       |
+       |           <value 2>           |       |
+       +-------------------------------+       V
+       |           <value 1>           |
+spaf ->        +-------------------------------+
+
+The stack pointer register will now contain the value of spaf.
+\f
  * (INVOCATION-PREFIX:DYNAMIC-LINK (? frame-size) (? address-1) (? address-2))
    This rule is similar to the INVOCATION-PREFIX:MOVE-FRAME-UP rule,
  but is used when the destination of the frame is not known at compile
@@ -1836,47 +1931,12 @@ time.  The destination depends on the continuation in effect at the
  time of the call, and the section of the stack that contains enclosing
  environment frames for the called procedure.  Two addresses are
  specified and the one that is closest to the current stack pointer
-should be used, that is the numerically lower of the two addresses.
-==> This rule need not need not exist in the RTL.  It could be
-expanded into comparisons and uses of INVOCATION-PREFIX:MOVE-FRAME-UP
-with computed values.
-
-* (OPEN-PROCEDURE-HEADER (? label-name))
-  This rule (and its siblings) is used to generate the entry code to
-procedures and continuations (return addresses).  On entry to
-procedures and continuations, a gc/interrupt check is performed, and
-the appropriate routine in the runtime library is invoked if
-necessary.  This check is performed by comparing the memory Free
-pointer to the compiled code's version of the MemTop pointer.  The
-low-level interrupt handlers change the MemTop pointer to guarantee
-that such comparisons will fail in the future.  A standard header
-generates the following code:
-    (LABEL gc-label)
-       <code to invoke the runtime library>
-       <format and gc words for the entry point>
-    (LABEL label-name)
-       <branch to gc-label if Free >= MemTop>
-
-Each of the individual headers is somewhat idiosyncratic, but the
-idiosyncrasies are captured in the machine-independent runtime
-library.
-
-Note that procedures that expect dynamic links must guarantee that the
-dynamic link is preserved around the execution of the interrupt
-handler.  This is accomplished by invoking an alternate entry point in
-the runtime library and passing along the contents of the dynamic link
-register.
-\f
-* (CLOSURE-HEADER (? label-name) (? nentries) (? entry))
-  NENTRIES is the number of entry points that the closure object has,
-and ENTRY is the zero-based index for this entry point.  Closure
-headers also perform gc/interrupt tests, but they may also have to
-reconstruct the distinguished (canonical) closure object from a
-closure with multiple entry points from the ``return address'' and
-push the resulting object on the Scheme stack.  When backing out for
-interrupts, they may have to adjust the canonical closure object to be
-the real closure object if these two are different.  You should read
-the section on closures in cmpint.txt for a more complete explanation.
+should be used, that is, the target address is the numerically smaller
+of the two addresses since the Liar stack grows towards smaller
+addresses.
+ ==> This rule need not need not exist in the RTL.  It could be
+expanded into a comparison and a use of
+INVOCATION-PREFIX:MOVE-FRAME-UP with a computed address.
  
  * (ASSIGN (REGISTER (? target))
           (CONS-CLOSURE (ENTRY:PROCEDURE (? procedure-label))
@@ -1893,78 +1953,188 @@ cmpint.txt.
  * (ASSIGN (REGISTER (? target))
           (CONS-MULTICLOSURE (? nentries) (? size) (? entries)))
    This rule is similar to the previous rule, but issues code to
-allocate a closure object with NENTRIES entry points.  SIZE is the
-number of words allocated for free variables, and ENTRIES is a vector
-of entry-point descriptors.  Each descriptor is a list containing a
-label, a min, and a max as in the rule above.
+allocate a closure object with NENTRIES entry points.  ENTRIES is a
+vector of entry-point descriptors, each being a list containing a
+label, a min, and a max as in the previous rule.  TARGET receives the
+compiled code object corresponding to the first entry.
+
+* (OPEN-PROCEDURE-HEADER (? label-name))
+  This rule and its siblings are used to generate the entry code to
+procedures and return addresses.  On entry to procedures and
+continuations, a gc/interrupt check is performed, and the appropriate
+routine in the runtime library is invoked if necessary.  This check is
+performed by comparing the memory Free pointer to the compiled code's
+version of the MemTop pointer.  The low-level interrupt handlers
+change the MemTop pointer to guarantee that such comparisons will fail
+in the future.  A standard header generates the following code:
+
+    (LABEL gc-label)
+       <code to invoke the runtime library>
+       <format and gc words for the entry point>
+    (LABEL label-name)
+       <branch to gc-label if Free >= MemTop>
  
+Each kind of header invokes a different runtime library utility.  In
+addition, procedures that expect dynamic links must guarantee that the
+dynamic link is preserved around the execution of the interrupt
+handler.  This is accomplished by passing the contents of the dynamic
+link register to the appropriate runtime library utility.
+
+* (CLOSURE-HEADER (? label-name) (? nentries) (? num-entry))
+  NENTRIES is the number of entry points in the closure object, and
+NUM-ENTRY is the zero-based index for this entry point.  Closure
+headers are similar to other procedure headers but also have to
+complete the Hand-Shake initiated by the instructions stored in the
+closure objects so that the closure object appears on top of the
+stack.  On architectures where it is necessary, they also have to map
+closure objects to their canonical representatives, and back when
+backing out because of interrupts or garbage collection.
+\f
  The file compiler/machines/port/rules3.scm contains most of these
  procedure-related rules.  It also contains three procedures that
-generate assembly language and are required by the compiler.  Both of
-these procedures are used to generate code to be wrapped around the
-top-level code of a compilation unit.
-
-* (GENERATE/QUOTATION-HEADER env-label free-label n-sections)
-  This procedure generates the header for the top-level expression
-given to COMPILE-SCODE, and generates its entry code.  This code
-initializes the compiled code block being executed.  The
-initialization consists of stashing the evaluation environment in the
-compiled code block at the location labeled by ENV-LABEL, and invoking
-the linker to fix the free references in the compiled code block.
+generate assembly language and are required by the compiler.  These
+procedures are used to generate initialization code for compiled code
+blocks.
+
+Compiled code blocks have two sections, a code section that contains
+the instructions, and a ``constants'' section that contains scheme
+objects referenced by the code (e.g. quoted lists and symbols), the
+free variable caches for the code, the debugging information
+descriptor (more on this later), and the environment where the free
+variables in the code must be referenced.  This environment is not
+known at compile time, so the compiler allocates a slot in the
+constants section for it, but the code itself must store it on first
+entry.  In addition, the linker is invoked on first entry to look up
+the free variables and fill the variable caches with their correct
+contents.  The compiler allocates enough space for each free variable
+cache and initializes the space with the information required by the
+linker to patch the reference.  This information consists of the name
+of the free variable in addition to the number of actual arguments
+passed (plus one) for execute references.
+
+If COMPILER:COMPILE-BY-PROCEDURES? is true, the compiler will generate
+multiple compiled code blocks, one corresponding to each top-level
+lambda expression.  Each of these must be initialized and linked, but
+instead of initializing them on first entry, the root compiled code
+block links all of them when it is entered.
  
  The linker (a runtime library utility) expects three arguments:
-  The address of the first word of the compiled code block, labeled
-by the value of *BLOCK-LABEL* during the compilation.
+  The address of the first word of the compiled code block, the word
+containing the GC vector header for the compiled code block.
    The address of the first linker section in the constants area of the
-compiled code block, labeled by FREE-LABEL.
-  The number of linker sections in the compiled code block (N-SECTIONS).
+compiled code block.  The linker sections contain the free variable
+caches and are all contiguous.
+  The number of linker sections in the compiled code block.
+
+* (GENERATE/QUOTATION-HEADER env-label free-label n-sections)
+  This procedure generates the code that initializes the environment
+slot at location labeled ENV-LABEL.  The environment is fetched from
+the interpreter's environment register.  It also generates code to
+invoke the linker on the executing compiled code block.  The first
+word of the compiled code block is labeled by the value of
+*BLOCK-LABEL*, the first linker section is labeled by FREE-LABEL, and
+the number of linker sections is N-SECTIONS.
  
  * (GENERATE/REMOTE-LINK label env-offset free-offset n-sections)
-  This procedure is similar to generate/quotation-header but is used
-to generate code that initializes and links not the executing compiled
-code block, but a different compiled code block, pointed at by a
-location in the currently executing compiled code block.
-  LABEL is a label into current block where the pointer to the code
-block to be linked is stored,
-  ENV-OFFSET is the offset in the other code block where the
+  This procedure is similar to GENERATE/QUOTATION-HEADER but is used
+to generate code that initializes and links a different compiled code
+block.  It is used to generate the code to insert into the root
+compiled code block to link each of the other compiled code blocks
+generated when COMPILER:COMPILE-BY-PROCEDURES? is true.
+  LABEL is a label in current block's constant section where the
+pointer to the code block to be linked is stored,
+  ENV-OFFSET is the vector offset in the other code block where the
  environment of evaluation should be stored,
-  FREE-OFFSET is the offset of the first linker section in the other
-compiled code block, and
+  FREE-OFFSET is the vector offset of the first linker section in the
+other compiled code block, and
    N-SECTIONS is the number of linker sections in the other block.
  \f
  * (GENERATE/CONSTANTS-BLOCK consts reads writes execs global-execs statics)
-  This procedure generates the LAP directives used to generate the
-constants section of a compiled code block.  The constants section
-includes:
-  - The constant objects referenced by the code.
+  This procedure generates the assembler pseudo-ops used to generate
+the constants and linker section for a compiled code block.  This
+section consists of:
+  - The constant objects (e.g. quoted lists) referenced by the code.
    - The read variable caches used by the code.
    - The write variable caches used by the code.
    - The execute variable caches used by the code.
    - The global execute variable caches used by the code.
    - The locations for static variables.
-  - A slot for the debugging information generated by the compiler.
+  - A slot for the debugging information descriptor generated by the
+compiler.
    - A slot for the environment where the code is linked.  
  
+Each word of storage in the constants block is allocated by using a
+SCHEME-OBJECT assembler pseudo-op, and the order in which they appear
+is the same as the order in which they appear in the final object.
+The linker sections (free variable cache sections) must be contiguous,
+and each has a one-word header describing the kind of section and its
+length.  The environment slot must be the last word in the compiled
+code block, immediately preceded by the debugging information
+descriptor.  Each SCHEME-OBJECT directive takes a label and the
+initial contents of the slot.
+
  This procedure is almost machine-independent, and you should be able
  to trivially modify an existing version.  The only machine dependence
  is the layout and size of the storage allocated for each execute cache
-(uuo link).  Each word of storage in the constants block is allocated
-by a SCHEME-OBJECT directive, and the order in which they are issued
-determines the layout in the constants block.  
-
-The machine-dependent expansion is performed by the TRANSMOGRIFLY
-procedure.  TRANSMOGRIFLY expects a list of lists whose first elements
-are the names of free variables and the rest of the elements are the
-frame sizes (number of arguments plus one) passed in the call.  It
-returns a list of pairs of objects and labels.  It should expand each
-<NAME,ARITY> pair into EXECUTE-CACHE-SIZE pairs, one of which should
-contain NAME as the object, another of which should contain ARITY as
-the object, and the rest of which (if any) can contain anything.  The
-pairs containing NAME and ARITY, and any additional ones, must be
-ordered to make the EXTRACT_EXECUTE_CACHE_ARITY,
-EXTRACT_EXECUTE_CACHE_SYMBOL macros from microcode/cmpint-port.h work
-correctly.  The arity MUST NOT be overwritten when the execute cache
-is initialized to contain instructions.
+(uuo link).  This machine-dependence consists entirely of the
+definition of the TRANSMOGRIFLY procedure.
+
+TRANSMOGRIFLY takes a list of the following form:
+
+  ((free-variable-1 (frame-size-1-1 . label-1-1)
+                   (frame-size-1-2 . label-1-2)
+                   ...)
+   (free-variable-2 (frame-size-2-1 . label-2-1)
+                   (frame-size-2-2 . label-2-2)
+                   ...)
+   ...)
+
+This list is interpreted as follows: an execute cache for calls to
+FREE-VARIABLE-1 with frame size FRAME-SIZE-1-1 (number of arguments
+plus one) must be created, and labeled LABEL-1-1, similarly for
+ <FREE-VARIABLE-1, FRAME-SIZE-1-2, LABEL-1-2>, 
+ <FREE-VARIABLE-2, FRAME-SIZE-2-1, LABEL-2-1>, etc.
+\f
+Assuming that the initial layout of an execute cache is
+
+  free variable name           ; labeled word
+  false                                ; optional storage (e.g. for branch delay slot)
+  frame size of call           ; arity + 1
+
+TRANSMOGRIFLY will return a list of the following form:
+
+  ((frame-variable-1 label-1-1)
+   (#f               dummy-label-1-1)  ; optional word(s)
+   (frame-size-1-1   dummy-label-1-1)
+
+   (frame-variable-1 label-1-2)
+   (#f               dummy-label-1-2)  ; optional word(s)
+   (frame-size-1-2   dummy-label-1-2)
+
+   ...
+
+   (frame-variable-2 label-2-1)
+   (#f               dummy-label-2-1)  ; optional word(s)
+   (frame-size-2-1   dummy-label-2-1)
+
+   ...)
+
+There may be any number of optional words, but the layout must match
+that expected by the macros defined in microcode/cmpint-md.h.  In
+particular, the length in longwords must match the definition of
+EXECUTE_CACHE_ENTRY_SIZE in microcode/cmpint-md.h, and the definition
+of EXECUTE-CACHE-SIZE in compiler/machines/port/machin.scm.
+
+Furthermore, the instructions that the linker will insert should
+appear at the word labeled by LABEL-N-M, and should not overwrite the
+relevant part of FRAME-SIZE-N-M, since the frame size will be needed
+when re-linking after an incremental definition or assignment.
+
+The output format of TRANSMOGRIFLY is the input format for the read
+and write execute cache sections.  The procedure DECLARE-CONSTANTS,
+local to GENERATE/CONSTANTS-BLOCK, reformats such lists into the final
+SCHEME-OBJECT directives and tacks on the appropriate linkage section
+headers.
  \f
         5.3.4. Fixnum rules.
  
@@ -2022,10 +2192,10 @@ different.  The RTL instructions that perform fixnum arithmetic have a
  boolean flag that specifies whether overflow conditions should be
  generated or not.
  \f
-Note that the compiler does not generally require fixnum arithmetic to
-be open coded.  If the names of all the fixnum primitives are listed in
+The compiler does not generally require fixnum arithmetic to be open
+coded.  If the names of all the fixnum primitives are listed in
  COMPILER:PRIMITIVES-WITH-NO-OPEN-CODING, all of them will be handled
-by issuing code to invoke them out of line. 
+by issuing code to invoke them out of line.
  
  There is one exception to this, however.  The following rules MUST be
  provided:
@@ -2044,9 +2214,11 @@ provided:
  
  The reason is that VECTOR-REF and VECTOR-SET! translate into a
  sequence that uses these patterns when the index is not a compile-time
-constant.
+constant.  Of course, you can include VECTOR-REF and VECTOR-SET! in
+compiler:PRIMITIVES-WITH-NO-OPEN-CODING to avoid the problem
+altogether.
  \f
-       5.3.5. Rules to invoke the runtime library
+       5.3.5. Rules used to invoke the runtime library
  
  Some of the rules issue code that invokes the runtime library.  The
  runtime library is invoked through a primary entry point,
@@ -2099,7 +2271,7 @@ CLEAR-REGISTERS! and NEED-REGISTER! besides performing the assignment.
  
  For very frequent calls, the assembly language part of the runtime
  library can provide additional entry points.  The calling convention
-for these would be port-dependent, but frequently they take arguments
+for these would be machine-dependent, but frequently they take arguments
  in the same way that SCHEME-TO-INTERFACE and SCHEME-TO-INTERFACE-JSB
  take them, but avoid passing the utility index, and may do part or all
  of the work of the utility in assembly language instead of invoking
@@ -2115,6 +2287,9 @@ stack rather than in a fixed register:
            ,@(load-rn frame-size 2)
             (JMP ,entry:compiler-apply)))
  
+The procedure object will have been pushed on the stack by earlier
+code.
+
         5.4. Writing predicate rules.
  
  Predicate rules are used to generate code to discriminate between
@@ -2370,28 +2545,28 @@ with any luck, you will be able to copy inerly.scm, insmac.scm, and
  parts of assmd.scm verbatim from an existing port, and for most
  machines, coerce.scm is straightforward to write.
  
-Note that assmd.scm defines utilities that depend almost exclusively
-on the endianness of the architecture.  You may want to start with the
-MIPS version since this version accommodates both endianness
-possibilities as MIPS processors can be configured either way.
-If your processor has fixed endianness, you can prune the
-inappropriate code.  The rest of the code in assmd.scm is either
-constant, or must agree with definitions in microcode/cmpint-port.h.
+assmd.scm defines procedures that depend only on the endianness of the
+architecture.  You may want to start with the MIPS version since this
+version accommodates both endianness possibilities as MIPS processors
+can be configured either way.  If your processor has fixed endianness,
+you can prune the inappropriate code.  The block-offset definitions
+must agree with those in microcode/cmpint-port.h, and the padding
+definitions are simple constants.
  \f
  Assuming that you decide to use the same structure as existing
  assemblers, you may need to write parsers for addressing modes if your
-machine has them.  You can use the versions in the 68020 (bobcat) and
-Vax ports for guidance.  Addressing modes are described by a set of
-conditions under which they are valid, and some output code to issue.
-The higher-level code that parses instructions in insmac.scm must
-decide where the bits for the addressing modes must appear.  The 68020
-version divides the code into two parts,, the part that is inserted
-into the opcode word of the instruction (further subdivided into two
-parts), and the part that follows the opcode word as an extension.
-The Vax version produces all the bits at once since addressing modes
-are not split on that architecture.  You should write the addressing
-mode definitions in port/insutl.scm, plus any additional transformers
-that the instruction set may require.
+machine has them.  You can use the versions in the MC68020 (bobcat)
+and Vax ports for guidance.  Addressing modes are described by a set
+of conditions under which they are valid, and some output code to
+issue.  The higher-level code that parses instructions in insmac.scm
+must decide where the bits for the addressing modes must appear.  The
+MC68020 version divides the code into two parts, the part that is
+inserted into the opcode word of the instruction (further subdivided
+into two parts), and the part that follows the opcode word as an
+extension.  The Vax version produces all the bits at once since
+addressing modes are not split on that architecture.  You should write
+the addressing mode definitions in port/insutl.scm, plus any
+additional transformers that the instruction set may require.
  
  Once you have the code for the necessary addressing modes and
  transformers (if any), and the parsing code for their declarations in
@@ -2405,13 +2580,13 @@ machines/mips/rules3.scm.
  
         6.6. Write the LAPGEN rules:
  
-You will need to have lapgen.scm, rules1.scm, rules2.scm, and
-rules3.scm.  rules4.scm is not used by the compiler with the ordinary
-switch settings and the code may not longer work in any of the ports,
-and rulfix.scm and rulflo.scm are only necessary to open code fixnum
-and flonum arithmetic.  A good way to reduce the amount of code needed
-at first is to turn primitive open coding off, and ignore rulfix.scm
-and rulflo.scm.
+You will need to write lapgen.scm, rules1.scm, rules2.scm, rules3.scm,
+and parts of rules4.scm.  Most of rules4.scm is not used by the
+compiler with the ordinary switch settings and the code may no longer
+work in any of the ports, and rulfix.scm and rulflo.scm are only
+necessary to open code fixnum and flonum arithmetic.  A good way to
+reduce the amount of code needed at first is to turn primitive open
+coding off, and ignore rulfix.scm and rulflo.scm.
  
  Lapgen.scm need not include the shared code used to deal with fixnums
  and flonums, but will require the rest, especially the code used to
@@ -2421,13 +2596,17 @@ rules1.scm and rules2.scm are relatively straightforward since the
  RTL instructions whose translations are provided there typically map
  easily into instructions.
  
-rules3.scm is an entirely different matter.  It is probably hardest
-file to write when porting the compiler.  The most complicated parts
-to understand, and write, are the closure code, the invocation prefix
-code, and the block assembly code.
+rules4.scm need only have the INTERPRETER-CALL:CACHE-??? rules, and
+these are simple invocations of runtime library routines which you can
+emulate from exisiting ports.
+
+rules3.scm is an entirely different matter.  It is probably the
+hardest file to write when porting the compiler.  The most complicated
+parts to understand, and write, are the closure code, the invocation
+prefix code, and the block assembly code.
  
   The block assembly code can be taken from another port.  You will
-only have to change how the transmogrify procedure works to take into
+only have to change how the transmogrifly procedure works to take into
  account the size and layout of un-linked execute caches.
  \f
   The invocation prefix code is used to adjust the stack pointer, and
@@ -2463,7 +2642,7 @@ top of the stack.
  
   The cons-closure rules are used to allocate closure objects from the
  runtime heap.  Some of this allocation/initialization may be done out
-of line, especially of ``assembling'' the appropriate instructions on
+of line, especially if ``assembling'' the appropriate instructions on
  the fly would require a lot of code.  In addition, you may have to
  call out-of-line routines to synchronize the processor caches or
  block-allocate multiple closure entries.
@@ -2477,9 +2656,8 @@ rgspcm.scm and dassm1.scm can be copied verbatim from any other port.
  
  lapopt.scm only needs to define an identity procedure.
  
-rules4.scm, rulfix.scm, rulflo.scm, and rulrew.scm need not define any
-rules, since you can initially turn off open coding of primitive
-operators.
+rulfix.scm, rulflo.scm, and rulrew.scm need not define any rules,
+since you can initially turn off open coding of primitive operators.
  
  dassm2.scm and dassm3.scm need not be written at first, but they are
  useful to debug the assembler (since disassembling some code should
@@ -2520,22 +2698,22 @@ cmpint-port.h and cmpaux-port.m4, you will need to do the following:
  - Copy (or link) cmpint-port.h to cmpint2.h.
  
  - Modify m.h to use 6-bit-long type tags (rather than the default 8)
-if you did not do this when you installed the microcode.  Note that if
-you do this, you will not be able to load .bin files created with 6
-bit type tags.  You can overcome this problem by using the original
-.psb files again to regenerate the .bin files, or using a version of
-Bintopsb compiled with 8-bit tags to generate new .psb files, and a
-version of Psbtobin compiled with 6-bit tags to generate the new .bin
-files.  Alternatively, you can try to bring the whole compiler up
-using 8 bit tags, but you may run out of address space.  The simplest
-way to specify 6-bit type tags is to add a definition of
-C_SWITCH_MACHINE that includes -DTYPE_CODE_LENGTH=6 .  Be sure to add
-any m4 switches that you may need so that the assembly language will
-agree on the number of tag bits if it needs it at all.  If your
-version of m4 does not support command-line definitions, you can use
-the s/ultrix.m4 script to overcome this problem.  Look at the m/vax.h
-and s/ultrix.h files for m4-related definitions.
-==> We should just switch the default to 6 bits and be done with it.
+if you did not do this when you installed the microcode.  If you do
+this, you will not be able to load .bin files created with 8 bit type
+tags.  You can overcome this problem by using the original .psb files
+again to regenerate the .bin files.  Alternatively, you can use a
+version of Bintopsb compiled with 8-bit tags to generate new .psb
+files, and a version of Psbtobin compiled with 6-bit tags to generate
+the new .bin files.  Anotheroption is to bring the compiler up using 8
+bit tags, but you may run out of address space.  The simplest way to
+specify 6-bit type tags is to add a definition of C_SWITCH_MACHINE
+that includes -DTYPE_CODE_LENGTH=6 .  Be sure to add any m4 switches
+that you may need so that the assembly language will agree on the
+number of tag bits if it needs it at all.  If your version of m4 does
+not support command-line definitions, you can use the s/ultrix.m4
+script to overcome this problem.  Look at the m/vax.h and s/ultrix.h
+files for m4-related definitions.
+ ==> We should just switch the default to 6 bits and be done with it.
  
  - Modify ymakefile to include the processor dependent section that
  lists the cmpint-port.h and cmpaux-port.m4 files.  You can emulate the
@@ -2658,10 +2836,11 @@ A good order to try them is
         y.scm
         sort/*.scm (see sort/README for a description)
  
-The programs in the first list test various aspects of code generation.
-The programs in the second list test the handling of various dynamic
-conditions (i.e. error recovery).
-The programs in the third list are somewhat larger, and register
+ The programs in the first list test various aspects of code
+generation.
+ The programs in the second list test the handling of various dynamic
+conditions (e.g. error recovery).
+ The programs in the third list are somewhat larger, and register
  allocation bugs, etc., are more likely to show up in them.
  
  A good idea at the beginning is to turn COMPILER:GENERATE-RTL-FILES?
@@ -2674,9 +2853,10 @@ be invoked as follows:
      (compiler:write-lap-file "<pathname of .com file>") ; writes a .lap file.
      (compiler:disassemble <compiled entry point>) ; writes on the screen.
  
-Note that both COMPILER:GENERATE-LAP-FILES? and
-COMPILER:WRITE-LAP-FILE write .lap files, so you may want to rename
-one of them.
+The .lap filename extension is used by COMPILER:WRITE-LAP-FILE and by
+the compiler when COMPILER:GENERATE-LAP-FILES? is true, so you may
+want to rename the .lap file generated by the compiler to avoid
+overwriting it when using COMPILER:WRITE-LAP-FILE.
  
  Various runtime system files also make good tests.  In particular, you
  may want to try list.scm, vector.scm, and arith.scm.  You can try them
@@ -2720,7 +2900,7 @@ Once you have the cross-compiler, you can use CROSS-COMPILE-BIN-FILE
  to generate .moc files.  The .moc files can be translated to .psb
  files on the Vax.  These .psb files can in turn be translated to .moc
  files on the Sparc, and you can generate the final .com files by using
-CROSS-COMPILE-BIN-FILE-END defined in compiler/base/crsend.  Note that
+CROSS-COMPILE-BIN-FILE-END defined in compiler/base/crsend.
  compiler/base/crsend can be loaded on a plain runtime system (i.e.
  without SF or a compiler).  You will probably find the following
  idioms useful:
@@ -2755,14 +2935,14 @@ or getting in each other's way.
  These two methods are not exclusive.  We typically bring up the
  compiler on a new machine by distributing the cross-compilation job.
  
-Note that the compiler (and the cross-compiler) use a lot of memory
-while running, and that virtual memory is really no substitute for
-physical memory.  You may want to increase your physical memory limit
-on those systems where this can be controlled (e.g. under BSD use the
-``limit'' command).  If your machines don't have much physical memory,
-or it is too painful to increase your limit (i.e. you have to re-compile
-or re-link the kernel), you may want to use microcode/bchscheme instead
-of microcode/scheme.  Bchscheme uses a disk file for the spare heap,
+The compiler and the cross-compiler use a lot of memory while running,
+and virtual memory is really no substitute for physical memory.  You
+may want to increase your physical memory limit on those systems where
+this can be controlled (e.g. under BSD use the ``limit'' command).  If
+your machines don't have much physical memory, or it is too painful to
+increase your limit, i.e. you have to re-compile or re-link the
+kernel, you may want to use microcode/bchscheme instead of
+microcode/scheme.  Bchscheme uses a disk file for the spare heap,
  rather than a region of memory, putting the available memory to use at
  all times.
  
@@ -2854,10 +3034,10 @@ mentions some of the few, and some techniques to use with
  assembly-language debuggers (gdb, dbx, or adb).
  
  The main assumption in this section is that the front end and other
-port-independent parts of the compiler work correctly.  Of course,
+machine-independent parts of the compiler work correctly.  Of course,
  this cannot be guaranteed, but in all likelihood virtually all of the
  bugs that you will meet when porting the compiler will be in the new
-port-specific code.
+machine-specific code.
  
  If you need to examine some of the front-end data structures, you may
  want to use the utilities in base/debug.scm which is loaded in the
@@ -2997,15 +3177,15 @@ as much as possible by trying the individual procedures, etc., in the
  code, but ultimately you may need the ability to set instruction-level
  breakpoints and single-step instructions in compiled code.
  
-A problem peculiar to systems in which is relocated on the fly is that
-you can't, in general, obtain a permanent address for a procedure or
-entry point.  The code may move at every garbage collection, and if
-you set a machine-level breakpoint with a Unix debugger, and then the
-code moves, you will probably get spurious traps when re-running the
-code.  Unix debuggers typically replace some instructions at the
-breakpoint location with instructions that will cause a specific trap,
-and then look up the trapping location in some table when the debugged
-process signals the trap.
+A problem peculiar to systems in which code is relocated on the fly is
+that you cannot, in general, obtain a permanent address for a
+procedure or entry point.  The code may move at every garbage
+collection, and if you set a machine-level breakpoint with a Unix
+debugger, and then the code moves, you will probably get spurious
+traps when re-running the code.  Unix debuggers typically replace some
+instructions at the breakpoint location with instructions that will
+cause a specific trap, and then look up the trapping location in some
+table when the debugged process signals the trap.
  
  One way around this problem is to ``purify'' all compiled scheme code
  that you will be setting breakpoints in.  If you purify the code, it
@@ -3069,10 +3249,10 @@ Thus if you want to single step the closure code (a good idea when you
  try them at first), you would want to set a breakpoint at address
  #x1180DF8 (plus appropriate segment bits), and if you want to single
  step or examine the real code, then you should use address #x10FE484.
-Note that if you purified the code when you loaded it, the real code
-would be pure, but the closure itself would not be, since it was not a
-part of the file being loaded (closures are allocated dynamically).
-Thus, before setting any breakpoints in a closure, you should probably
+If you purified the code when you loaded it, the real code would be
+pure, but the closure itself would not be, since it was not a part of
+the file being loaded (closures are created dynamically).  Thus,
+before setting any breakpoints in a closure, you should probably
  purify it as specified above, and obtain its address again, since it
  would have moved in the meantime.
  \f
@@ -3086,13 +3266,16 @@ printed the above addresses,
                 by MEMBER and all closures of the same lambda expression,
  0x410fe880:b   would set a breakpoint at the start of MEMQ.
  
-If you are using gdb on a Motorola 68020 machine, with no segment bits
+If you are using gdb on a Motorola MC68020 machine, with no segment bits
  for the data segment, the equivalent commands would be
  
  break *0x1180df8       for a breakpoint in the MEMBER closure,
  break *0x10fe484       for a breakpoint in MEMBER's shared code
  break *0x10fe880       for a breakpoint in MEMQ.
  
+If you are using dbx, you will need to use a command like 
+ ``stopi at 0x10fe484'' to achieve the same effect.
+
         8.4. Examining arguments to Scheme procedures.
  
  Commonly, after setting a breakpoint at some interesting procedure,
@@ -3162,7 +3345,7 @@ expect return addresses and no other links.
  In general, it is impossible to find out what procedure called another
  in Scheme, due to tail recursion.  Procedures whose last action is a
  call to another procedure, and whose local frames are not part of the
-environment of their callees, will have their frames popped of the
+environment of their callees, will have their frames popped off the
  stack before the callee is entered, and there will be no record left
  of their execution.  The interpreter uses a cache of previous frames
  (called the history) to provide additional debugging information.
@@ -3190,15 +3373,24 @@ with this.
  
  If your debugger allows you to dynamically call C procedures, and it
  is not hopelessly confused by the Scheme register set, you can use the
-C procedure compiled_entry_filename to determine the filename that a
-return address (or other compiled entry) belongs to.  Its only
+C procedure ``compiled_entry_filename'' to determine the filename that
+a return address (or other compiled entry) belongs to.  Its only
  argument should be the compiled entry object whose origin you want to
-find.
-
-Unfortunately, debuggers are often confused by the register
+find.  Unfortunately, debuggers are often confused by the register
  assignments within Scheme compiled code, precisely when you need them
-most.  Thus you will need to do this the hard way.
+most.  You can bypass this problem the following way:
+
+On entry to a procedure whose return address you wish to examine,
+write down the return address object, change the compiled code's
+version of MemTop so that the comparison with free will fail and the
+code will take an interrupt, set a breakpoint in the runtime library
+routine ``compiler_interrupt_common'', and continue the code.  When
+the new breakpoint is hit, you can use ``compiled_entry_filename'' to
+examine the return address you had written down.
  \f
+Here is how to do this the hard way, which you will have to resort to
+often:
+
  Compiled code blocks generated by the compiler always encompass two
  special locations.  The last location in the compiled code block
  contains the environment where the compiled code block was loaded
@@ -3221,7 +3413,7 @@ external entry points.
  
  For example, imagine that the return address for a procedure is
  0xa08fe2ee.  Furthermore, assume that we are running on a Motorola
-68020 with four bytes per longword, no segment bits, and for which
+MC68020 with four bytes per longword, no segment bits, and for which
  cmpint-mc68k.h defines PC_ZERO_BITS to be 1.
  
  Extracting the word at address 0x8fe2ea (four bytes before the entry
@@ -3280,8 +3472,9 @@ Scheme strings have two longwords of header, followed by an ordinary C
  string that includes a null terminating character, thus the C string
  starts at address 0x9ac5ec+4*2=0x9ac5f4, and the gdb command
   ``x/s 0x9ac5f4'' or the adb and dbx command ``0x9ac5f4/s''
-will display
- 0x9ac5f4 <fpa_loc+11159028>:  (char *) 0x9ac5f4 "/usr/local/lib/mit-scheme/SRC/runtime/parse.binf"
+will display something like:
+ 0x9ac5f4 <fpa_loc+11159028>:  
+       (char *) 0x9ac5f4 "/usr/local/lib/mit-scheme/SRC/runtime/parse.binf"
  
  Thus the return address we are examining is at offset 0x3e in compiled
  code block number 0x1c of the runtime system file ``parse.com''.
@@ -3301,7 +3494,7 @@ variations would be the following:
  to produce addresses.  For example, on the HP-PA with segment bits 01
  at the most significant end of a word, the C string for Scheme string
  object 0x789ac5ec would start at address 0x409ac5ec+8=0x409ac5f4,
-instead than at address 0x9ac5ec+8=0x9ac5f4.
+instead of at address 0x9ac5ec+8=0x9ac5f4.
  
  - The gc offset might be computed differently, depending on the value
  of PC_ZERO_BITS.  For example, on a Vax, where PC_ZERO_BITS has the
author	Guillermo J. Rozas <edu/mit/csail/zurich/gjr>
	Mon, 9 Sep 1991 22:10:31 +0000 (22:10 +0000)
committer	Guillermo J. Rozas <edu/mit/csail/zurich/gjr>
	Mon, 9 Sep 1991 22:10:31 +0000 (22:10 +0000)