--- /dev/null
+-*- Text -*-
+
+$Header: /Users/cph/tmp/foo/mit-scheme/mit-scheme/v8/src/compiler/documentation/cmpint.txt,v 1.1 1995/08/05 16:25:13 adams Exp $
+
+
+Copyright (c) 1991-1992 Massachusetts Institute of Technology
+
+
+ Documentation of the C interface to
+ MIT Scheme compiled code
+ *DRAFT*
+
+
+
+ Remarks:
+
+In the following, whenever Scheme is used, unless otherwise specified,
+we refer to the MIT Scheme dialect and its CScheme implementation.
+
+This file describes the compiled-code data structures and macros
+defined in cmpint-md.h and required by cmpint.c and cmpgc.h .
+
+cmpaux.txt describes the assembly language code that must be written
+to get all of this to work.
+
+cmpint-md.h is the machine dependent header file that defines many of
+these parameters. A new version must be written for each
+architecture.
+
+The "settable" fields in cmpint-md.h are described in the paragraphs
+marked with "=>".
+
+In the following, word and longword are the size of an item that fills
+a processor register, typically 32 bits. Halfword is half this size,
+and byte is typically 8 bits.
+\f
+ Description of compiled-code objects and relevant types:
+
+The Scheme compiler compiles scode expressions (often procedure
+definitions) into native code. As its output, it produces Scheme
+objects that represent compiled expressions. To execute these
+expressions, they are passed as arguments to scode-eval together with
+an environment.
+
+Typically these expressions will construct some pointers to compiled
+procedures and define them in the environment. These procedures can
+then be invoked normally from the read-eval-print loop, from
+interpreted code, or from other compiled code.
+
+In the course of their computation, these procedures will need to
+call other procedures and then proceed the computation. In order to
+accomplish this, they will push compiled return addresses on the
+stack, and these will eventually be popped and "jumped through" to
+return.
+
+Compiled code and objects referenced by it are collected into
+"vector-like" objects called compiled-code blocks.
+
+The above four classes of objects (compiled expressions, compiled
+procedures, compiled return addresses, and compiled-code blocks) are
+implemented using two microcode types:
+
+TC_COMPILED_CODE_BLOCK is used to implement compiled-code blocks. It
+is a vector type, that is, it is a pointer type, the word addressed by
+the pointer contains the length of the vector (not including the
+length header word), and the rest of the words (length) follow at
+increasing addresses in memory. Typically the first word after the
+header is a non-marked-vector header, and the instructions follow it.
+The non-marked-vector header covers all the instructions, but the
+vector may contain arbitrary objects after the instructions, covered
+only by the normal vector header. The optional, additional space at
+the end of the block is called the "constants" section, since it is
+used, among other things, to keep copies of constant objects used by
+the compiled code. See the picture below for a diagram of the typical
+layout.
+
+TC_COMPILED_ENTRY is used to implement compiled expressions,
+compiled return addresses, compiled procedures, and some other entry
+points that the compiler and the compiled-code interface need. A
+compiled entry is a non-standard pointer type described below.
+\f
+ Description of compiled entries:
+
+The address portion of a compiled entry object points to an
+instruction in the "middle" of a compiled-code block.
+
+In order for the garbage collector to be able to move the whole
+block as a unit it must be able to determine the address of the
+first word. Note that this word contains the length of the whole
+block, so this need not be specified again.
+
+The address of the first word of the block can be found from the
+address of the instruction, and a few bytes, currently a halfword
+preceding the instruction. These bytes are called the offset field of
+a compiled entry object, and typically encode the distance (in bytes)
+between the beginning of the block and the compiled entry.
+
+A few bytes preceding the offset field are called the format field and
+encode the type of compiled entry (procedure vs. expression, etc.) and
+some type-specific information (number of arguments, offset to next
+return address on the stack, etc.).
+
+The gc-offset field and the format field must be the same size, and
+their size is determined by the C typedef of format_word at the
+beginning of cmpint-md.h. Note that, to date, the compiler has only
+been ported to systems where this size is 2 bytes (for each), but it
+should be possible to port it to systems where these fields are
+larger.
+
+ Encoding of the offset field:
+
+The offset field is decoded as follows:
+
+If the low order bit is 0 the offset is a simple offset, ie.
+subtracting the offset from the address of the compiled entry
+results in the address of the compiled-code block that contains the
+entry.
+
+If the low order bit is 1, it is a continued offset, ie. subtracting
+the offset from the address of the compiled entry results in the
+address of another compiled entry whose offset may or may not have a
+low bit of 0.
+
+The rest of the bits (typically 15) are some encoding of the offset:
+
+- If instructions can appear at arbitrary byte addresses (including
+odd addresses), this field is the offset itself.
+
+- If instructions have alignment constraints (ie. halfword alignment
+on MC68K, or longword alignment on many RISCs), this field is the
+offset shifted appropriately. In this way, no bits are wasted, and
+the range of offsets is increased.
+
+For example,
+
+The DEC VAX can have instructions at any byte location. The 15 bits
+are the offset.
+
+The MC68020 can have instructions only at halfword boundaries. The 15
+bit field shifted left by 1 is the real offset. Note that in this
+case, if the low bit of the offset field is 0, the offset field is the
+real offset.
+
+The HP Precision Architecture, and many other RISCs, can have
+instructions only at longword boundaries. The 15 bit field shifted
+left by 2 is the real offset.
+\f
+ Encoding of the format field:
+
+The preceding bytes encode the kind of compiled entry in the following
+way:
+
+The format field is further subdivided into two equal sized halves,
+used, roughly, for the minimum (high order half) and maximum (low
+order half) number of arguments that a compiled procedure will accept.
+Inappropriate values for these numbers of arguments imply that the
+entry is not a procedure, and then the two halves may be combined to
+generate information appropriate to the entry type. The examples
+below assume that the format field is 2 bytes long.
+
+- For compiled expressions it is always -1 (0xffff)
+
+- For compiled entries it is always -3 or -2 (0xfff[d-e]). It is -2
+for compiler generated entries, -3 for compiler-interface generated
+entries.
+
+- For compiled return addresses with saved dynamic links it is always
+-4 (0xfffc). The next item on the stack is then a dynamic link.
+
+- For the special return address `return_to_interpreter' it is
+always -5 (0xfffb).
+
+- For all other compiled return addresses, both halves (bytes) must
+have their sign bit set, that is, they must appear negative when
+sign-extended. The remaining bits of the high-order half of the field
+(all but the sign bit) and all but the two most significant bits of
+the low-order half of the field (sign bit and adjacent bit), when
+concatenated, form the offset in the stack to the previous (earlier)
+return address. This information is used by the debugger to "parse"
+the stack into frames. The sub-fields are actually concatenated
+backwards, with the bits from the high order half being the low order
+bits in the result. If the format field is two bytes long, each half
+is a single byte, and the valid range for the high-order half is
+0x80-0xff, while the valid range for the low-order half is 0x80-0xdf
+
+- For compiled procedures, the format field describes the arity
+(number of parameters) and the format of the frame on the stack:
+
+The high order half of the field is (1+ REQ) where REQ is the number
+of required arguments. Note that REQ must such that the resulting
+half of the format field does not appear negative! If the format
+field is two bytes long, REQ must be less than 127.
+
+The low order half of the field is given by the expression
+(* (EXPT -1 REST?) FRAME-SIZE)
+where FRAME-SIZE is (+ 1 REQ OPT REST?), REQ is as above, OPT
+is the number of named optional arguments, and REST? is 1 if
+the procedure has a rest parameter (ie. it is a "lexpr"), or 0
+otherwise. FRAME-SIZE must not appear negative, thus if the format
+field is two bytes long, FRAME-SIZE must be less than 127.
+\f
+ Picture of typical compiled-code block and entry:
+
+ ----------------------------------------
+ start_address | MANIFEST-VECTOR | tl |
+ ----------------------------------------<---------\
+ | NM-HEADER | il | \
+ ----------------------------------------<---\ |
+ | | \ |
+ | | | |
+ | | | |
+ | | | |
+ | some instructions | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ ---------------------------------------- | |
+ | format_field_1 | offset_field_1 | | |
+ ---------------------------------------- | |
+ entry_address_1 | movel arg0,reg0 | | |
+ ---------------------------------------- | |
+ | | | |
+ | | | |
+ | | > il |
+ | | | |
+ | more instructions | | |
+ | | | |
+ | | | |
+ | | | |
+ ---------------------------------------- | > tl
+ | format_field_2 | offset_field_2 | | |
+ ---------------------------------------- | |
+ entry_address_2 | andl pointer_mask,arg0,reg0 | | |
+ ---------------------------------------- | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | more instructions | | |
+ | | | |
+ | | | |
+ | | / |
+ /--->----------------------------------------<---/ |
+ / | Scheme object | |
+ | ---------------------------------------- |
+ "cons- | | | |
+ tants" | | | |
+ | | | |
+ < | | |
+ | | more Scheme objects | |
+ section | | | |
+ | | | |
+ | | | |
+ \ | | /
+ \--->----------------------------------------<----------/
+
+Note: The picture above assumes that each machine instruction takes the same
+space as a scheme object, and that this is also the combined length of
+the gc-offset and format fields. The type tags are always at the most
+significant end of the word, which depending on endianness may be at
+the lowest or highest addressed part of the word in memory. The
+picture above depicts them on the left.
+\f
+ Description of picture:
+
+[TC_COMPILED_CODE_BLOCK | start_address] would be the object
+representing the compiled-code block.
+
+[TC_COMPILED_ENTRY | entry_address_1] would represent entry1.
+
+[TC_COMPILED_ENTRY | entry_address_2] would represent entry2.
+
+1) Assuming that instructions are longword aligned and that
+entry_address_1 is close enough to start_address not to need an
+extension, but entry_address_2 is not, then
+
+offset_field_1 = ((entry_address_1 - start_address) >> 1)
+offset_field_2 = (((entry_address_2 - entry_address_1) >> 1) | 1)
+
+note that entry_address_1 - start_address is a multiple of 4 because
+of the alignment assumption.
+
+2) Assuming that instructions are halfword aligned and that
+entry_address_1 is close enough to start_address not to need an
+extension, but entry_address_2 is not, then
+
+offset_field_1 = (entry_address_1 - start_address)
+offset_field_2 = ((entry_address_2 - entry_address_1) | 1)
+
+note that entry_address_1 - start_address is a multiple of 2 because
+of the alignment assumption.
+
+3) Assuming that instructions are byte aligned and that
+entry_address_1 is close enough to start_address not to need an
+extension, but entry_address_2 is not, then
+
+offset_field_1 = ((entry_address_1 - start_address) << 1)
+offset_field_2 = (((entry_address_2 - entry_address_1) << 1) | 1)
+
+The length of the "constants" section is (tl - il).
+There are (tl + 1) total words in the object.
+
+=> Macro PC_ZERO_BITS should be defined to be the number of bits in
+instruction addresses that are always 0 (0 if no alignment
+constraints, 1 if halfword, etc.).
+
+=> format_word should be 'typedefd' to be the size of the descriptor
+fields. It is assumed that the offset field and the format field are
+the same size. This definition is unlikely to need modification.
+\f
+ Compiled closures:
+
+Most compiled procedures are represented as a simple compiled entry
+pointing to the compiled-code block generated by the compiler.
+
+Some procedures, called closures, have free variables whose locations
+cannot be determined statically by the compiler or the linker. The
+compiler will generate code to construct a tiny compiled-code block on
+the fly and make the compiled procedure be an entry point pointing to
+this dynamically allocated compiled-code block.
+
+For example, consider the following code, appearing at top level,
+
+ (define foo
+ (lambda (x)
+ (lambda (y) (+ x y)))) ;lambda-1
+
+The outer LAMBDA will be represented as a compiled entry pointing to
+the appropriate block. The inner LAMBDA cannot be since there may be
+more than one instance, each with its independent value for X:
+
+ (define foo1 (foo 1))
+ (define foo2 (foo 2))
+
+Compiled closures are implemented in the following way: The entry
+corresponding to the procedure points to a jump-to-subroutine (or
+branch-and-link) instruction. The target of this jump is the code
+corresponding to the body of the procedure. This code resides in the
+compiled-code block that the compiler generated. The free variables
+follow the jump-to-subroutine instruction (after aligning to longword).
+
+Using this representation, the caller need not know whether it is
+invoking a "normal" compiled procedure or a compiled closure. When
+the closure is invoked normally, it jumps to the real code for the
+procedure, after leaving a "return address" into the closure object in
+a standard place (stack or link register). This "return address" is
+the address of the free variables of the procedure, so the code can
+reference them by using indirect loads through the "return address".
+
+Here is a stylized picture of the situation, where the procedure
+object (closure entry point) is a pointer to <1>.
+
+closure object:
+ +-------------------------------+
+ | |
+ | <header> |
+ | |
+ +-------------------------------+
+<1> | jsr instruction to <2> |
+ +-------------------------------+
+ | <value of X> |
+ +-------------------------------+
+
+compiled code blok produced by the compiler:
+
+ +-------------------------------+
+ | |
+ | ... |
+ | |
+ +-------------------------------+
+<2> | <code for inner lambda> |
+ | | |
+ | V |
+ | |
+ +-------------------------------+
+\f
+The code above could be compiled as (in pseudo-assembly language, in
+which & denotes an immediate value):
+
+ const format-word:0x0202;gc-offset:??
+foo:
+ mov rfree,reg0
+ mov &[TC_MANIFEST_CLOSURE | 4],reg1 ; gc header
+ mov reg1,0(reg0)
+ mov &[format_field | offset_field],reg1 ; entry descriptor
+ mov reg1,NEXT_WORD(reg0)
+ mov &[JSR absolute opcode],reg1 ; jsr absolute opcode/prefix
+ mov reg1,2*NEXT_WORD(reg0)
+ mova lambda-1,reg1 ; entry point
+ mov reg1,3*NEXT_WORD(reg0)
+ mov arg1,4*NEXT_WORD(reg0) ; x
+ mov &5*NEXT_WORD,reg1
+ add reg0,reg1,rfree
+ mov &[TC_COMPILED_ENTRY | 2*NEXT_WORD],reg1
+ add reg0,reg1,retval
+ ret
+
+ const format-word:0xfffe;gc-offset:??
+lambda-1:
+ mov arg1,reg0 ; y
+ mov x_offset(retlnk),reg1 ; x x_offset = 0
+ add reg1,reg0,reg0
+ mov reg0,retval
+ ret
+
+A more detailed picture of the closure object would be:
+
+ ----------------------------------------
+ | MANIFEST-CLOSURE | 4 |
+ ----------------------------------------
+ | format_field | gc_offset_field | ;format = 0x0202
+ ---------------------------------------- ;offset = encode(8)
+entry | JSR absolute opcode |
+ ----------------------------------------
+ | address of lambda-1 |
+ ---------------------------------------- ;address of retadd
+retadd | value of x | ; -> retlnk before
+ ---------------------------------------- ; entering lambda-1
+
+The following macros are used to manipulate closure objects:
+
+=> COMPILED_CLOSURE_ENTRY_SIZE specifies the size of a compiled
+closure entry (there may be many in a single compiled closure block)
+in bytes. In the example above this would be 12 bytes (4 total for
+the format and gc offset fields, 4 for JSR opcode, and 4 for the
+address of the real entry point).
+
+=> EXTRACT_CLOSURE_ENTRY_ADDRESS is used to extract the real address
+of the entry point from a closure object when given the address of the
+closure entry. Note that the real entry point may be smeared out over
+multiple instructions. In the example above, given the address the
+word labeled ENTRY, it would extract the address of LAMBDA-1.
+
+=> STORE_CLOSURE_ENTRY_ADDRESS is the inverse of
+EXTRACT_CLOSURE_ENTRY_ADDRESS. That is, given the address of a
+closure entry point, and a real entry point, it stores the real entry
+point in the closure object. In the example above, given the address
+of the word labeled ENTRY, and a different entry point, say for
+LAMBDA-2, it would make the closure jump to LAMBDA-2 instead. This is
+used to relocate closures after garbage collection and similar
+processes.
+\f
+ Some caveats:
+
+- The code for lambda-1 described above does not match what the compiler
+would currently generate.
+
+The current parameter-passing convention specifies that all the state
+needed to continue the computation at a procedure's entry point must
+be on the stack and all the information on the stack must be valid
+objects (for GC correctness in case of interrupt, more on this below).
+Thus the contents of retlnk must be pushed on the stack as a valid
+object, and this is done by reconstructing the closure object whose
+datum field encodes the address of entry and whose type tag is
+TC_COMPILED_ENTRY, and then pushing it onto the stack. Note that on
+some machines, the return address for the subroutine-call instruction
+is pushed on the stack by the hardware, and thus this value might have
+to be popped, adjusted, and re-pushed if it cannot be adjusted in
+place.
+
+The code for lambda-1 would then be closer to:
+
+lambda-1:
+ sub retlnk,&retadd-entry,retlnk
+ or &[TC_COMPILED_ENTRY | 0],retlnk,retlnk ; set type code
+ push retlnk
+ <interrupt check> ; more on this below
+ mov arg1,reg0
+ mov top_of_stack,reg1
+ and &[0 | -1],reg1,reg1 ; remove type code
+ mov x_offset+retadd-entry(reg1),reg1
+ add reg1,reg0,retval
+ pop ; the closure object
+ ret
+
+Note that (retadd-entry) is a constant known at compile time, and is the
+same for the first entry point of all closures. On many machines, the
+combination sub/or can be obtained with a single add instruction:
+ add &([TC_COMPILED_ENTRY | 0]-(retadd-entry)),retlnk,retlnk
+
+This value is called the "magic constant", encoded in the first few
+instructions of a closure's code.
+
+- Multiple closures sharing free variables can share storage by having
+multiple entry points (multiple JSR instructions) in the closure
+object. The compiler occasionally merges multiple related closures
+into single objects.
+
+A complication arises when closure entry points are not necessarily
+long-word aligned, since the compiler expects all variable offsets
+(like x_offset above) to be long-word offsets.
+
+This problem only occurs on machines where instructions are not all
+long-word aligned and for closures with multiple entry points, since
+the first entry point is guaranteed to be aligned on a long-word
+boundary on all machines.
+
+The current solution to this problem, on those machines on which it is
+a problem, is to choose a canonical entry point (the first one)
+guaranteed to be aligned properly, and push that on the stack on entry
+to a closure's code. The compiler keeps track of what actual entry
+point the code belongs to even though the value on the stack may
+correspond to a different entry point.
+
+The "magic constant" becomes an entry-point dependent value, since each
+return address may have to be bumped back to the first entry point in
+the closure object rather than to the immediately preceding entry point.
+\f
+ Interrupts:
+
+Scheme polls for interrupts. That is, interrupt processing is divided
+into two stages:
+
+- When an asynchronous interrupt arrives, the handler (written in C)
+invoked by the operating system sets a bit in a pending-interrupts
+mask, stores the relevant information (if any) in a queue, and
+proceeds the computation where it was interrupted.
+
+- The interpreter and compiled code periodically check whether an
+interrupt is pending and if so, invoke an interrupt handler written in
+Scheme to process the interrupt. The interpreter checks for
+interrupts at the apply point. Compiled code currently checks at
+every procedure entry (including loops) and at every continuation
+invocation. This may change in the future, although it will always be
+the case that interrupts will be checked at least once in each
+iteration of a loop or recursion.
+
+Compiled code does not actually check the bits in the mask to
+determine whether an interrupt is pending. It assumes that the
+first-level interrupt handler (the handler written in C) not only sets
+the bits, but also changes the copy of the MemTop (top of consing
+area) pointer used by the compiler so that it will appear that we have
+run out of consing room. Thus compiled code merely checks whether the
+Free pointer (pointer into the heap) is numerically larger than the
+MemTop pointer, and if so it invokes an assembly-language or C utility
+that decides whether a garbage collection is needed or an interrupt
+must be processed. Sometimes this utility will decide that the
+interrupt need not be processed (it is disabled, for example), and
+will need to return to the compiled code skipping the interrupt check
+since otherwise we will get into an infinite loop.
+
+The interrupt check code is fixed (so that the handler can determine
+how much code to skip) and comes in two varieties: closure interrupt
+code, and normal-entry (other) interrupt code. Normal-entry interrupt
+code is always the first code in an entry point (procedure or
+continuation, but not closure code) and merely compares the Free and
+MemTop pointers and branches. Closure code does this comparison after
+setting up the closure object. Closure code assumes that the closure
+object is in the first parameter location (the closure itself is
+argument 0) so that free variables can be fetched. Thus a closure
+label must first set this up correctly, and then check for interrupts.
+\f
+In pseudo-assembly language, a "normal" entry might look like
+
+gc_or_int LOADI #interrupt-handler-index,rindex
+ LOADA entry,rentry
+ JMP scheme-to-interface
+ format word and gc word for the entry
+entry CMP Free,MemTop
+ BGE gc_or_int
+after_entry <actual code for the entry>
+
+a "closure" entry might look like (this is not in the closure object,
+but in the code block at which the closure object points)
+
+gc_or_int LOADI #interrupt-handler-index,rindex
+ LOADA entry,rentry
+ JMP scheme-to-interface
+ format word and gc word for the entry
+entry ADDI offset,retadd,ret_add ; bump ret. add. to entry point
+ ORI #[TC_CLOSURE | 0],ret_add
+ PUSH ret_add ; arguments on the stack
+ CMP Free,MemTop
+ BGE gc_or_int
+after_entry <actual code for the entry>
+
+The following macros are used by the C utility and handler to
+determine how much code to skip:
+
+=> ENTRY_SKIPPED_CHECK_OFFSET is the number of bytes between
+entry and after_entry in a normal entry.
+
+=> CLOSURE_SKIPPED_CHECK_OFFSET is the number of bytes
+between entry and after_entry in a closure entry.
+
+=> ENTRY_PREFIX_LENGTH is the number of bytes between gc_or_int and
+entry in a normal entry, not counting those taken up by the format
+word and the gc word.
+\f
+ Important considerations:
+
+The Scheme compiled code register set includes the current copy of the
+Free pointer, but does not include the copy of MemTop, although it is
+mostly constant. The reason is that the C-level interrupt handler
+does not have convenient access to the register set at the point of
+the interrupt, and thus would have a hard time changing the version of
+MemTop used by compiled code at the point of the interrupt. Thus the
+copy of MemTop used by compiled code is kept in memory.
+
+On machines where register-to-memory comparisons can be done directly
+this is no problem, but on load/store architectures (most RISCs for
+example), this is not feasible. Furthermore, most RISCs have a few
+cycles of delay for memory loads, and adjacent instructions may not be
+interlocked by the hardware. Thus a sequence like
+
+ LOAD Memory_MemTop,Rtemp
+ CMP Rfree,Rtemp
+ BGE gc_or_int
+
+may be very slow and NOPs may have to be inserted explicitly between
+the LOAD and CMP instructions to make the code work.
+
+Since Scheme's interrupt response is not immediate, and polling is
+frequent, the following sequence can be used instead:
+
+ CMP Rfree,Rmemtop
+ BGE gc_or_int
+ LOAD Memory_MemTop,Rmemtop
+
+Where Rmemtop is a register that holds a recent value of MemTop and is
+reloaded at every interrupt check. Thus interrupt processing will
+start at the second interrupt check after the actual interrupt comes
+in. In other words, if the sequence of entry points executed
+dynamically is ep1, ep2, ep3, and an asynchronous interrupt occurs
+between ep1 and ep2, the interrupt handler will not be invoked until
+ep3, rather than ep2.
+
+This instruction sequence eliminates the need to wait for the LOAD to
+complete, and the LOAD will have completed (or will be handled by the
+hardware's interlock mechanism) by the next check since at least one
+instruction (a branch instruction), and often many more, will
+intervene.
+
+Note that this delayed checking does not affect garbage collection
+interruptions since MemTop is constant between garbage collections,
+and thus the value being loaded is always the same, in the absence of
+asynchronous interrupts.
+
+Various operating systems allow the signal handler convenient access
+to the interrupted code's register set. In such a situation, the LOAD
+instruction can be eliminated and the C-level interrupt handler can
+modify Rmemtop directly. Rmemtop should be chosen from the
+caller-saves convention (super-temporary) registers if possible, since
+these registers must be explicitly saved by the signal handler, rather
+than implicitly by the calling convention.
+\f
+ Interrupts and closures that share storage:
+
+If an interrupt arrives on entry to the closure, the correct closure
+object must be reconstructed so that the computation will continue
+correctly on return from the interrupt. The code to reconstruct the
+correct closure is also issued by the compiler, which at compile time
+maintains the identity of each closure and the distance to the
+canonical closure used for environment purposes.
+
+If the interrupt is dismissed, instead of processed, we need to
+continue the computation bypassing the interrupt checking code in
+order to avoid an infinite loop. This is what the macro
+CLOSURE_SKIPPED_CHECK_OFFSET is used for. We must skip the preamble
+of the closure code and emulate part of it, that is, adjust the object
+on top of the stack to be the closure object that the code expects to
+have there. This can be done by extracting the magic constant from
+the entry point, and bumping the corresponding return address by this
+value. The macro ADJUST_CLOSURE_AT_CALL accomplishes this feat on
+those machines where it is needed.
+
+=> ADJUST_CLOSURE_AT_CALL, when given an entry point and a location,
+adjusts the closure object stored at location so that it is the
+closure object that the entry point expects on top of the stack. On
+machines where all instructions are long-word aligned, it is a NOP, on
+other machines (eg. 68k, VAX), it extracts the magic constant from the
+closure's code, and uses it to construct the appropriate closure
+object.
+\f
+ External calls from compiled code:
+
+Many calls in scheme code (and particularly in large programs) are
+calls to independently compiled procedures or procedures appearing at
+the top level of a file. All these calls are calls to potentially
+unknown procedures since the names to which they are bound can be
+unbound or redefined dynamically at run time.
+
+The code issued by the compiler for such an external call must take
+into account the possibility of the lack of a valid value, run-time
+definition, and run-time assignment. This is done as follows:
+
+For each external procedure called with a fixed number of arguments
+(more on this below), a small contiguous space is allocated in the
+"constants" section of the compiled-code block.
+
+This space initially contains the name of the external variable
+whose value is being invoked, and the number of arguments (+ 1 for
+technical reasons) being passed to the procedure.
+
+These locations will be replaced at load time by an absolute jump to
+the correct entry point of the called procedure if the number of
+arguments matches and the callee (target procedure) is compiled, or by
+an absolute jump to some utility code generated on the fly to
+interface the caller and the callee (called a trampoline procedure).
+Note that both procedures need not be in the same compiled-code block.
+
+The fixed code in the code section of the compiled-code block contains
+a pc-relative branch instruction to this space allocated in the
+"constants" section.
+
+When the compiled-code block is loaded, a linker that resolves these
+references and replaces the name and arguments with machine-specific
+code to do the absolute jump is invoked. The linker also records the
+locations of all such jump instructions so that a subsequent
+redefinition or assigment of the same name will cause the jump
+instruction to be replaced by a new one to the correct value. Note
+that the number of arguments needs to be checked only by the linker,
+so no instructions are issued to check it at run time. It is for this
+reason that the number of arguments is part of the information left by
+the compiler in the "constants" section.
+
+These entries in the "constants" section are called execute caches,
+operator links, or "UUO" links for historical reasons. They must be
+large enough to contain the instructions required for an absolute jump
+(and possibly some delay slot instructions in a RISC-style machine),
+and the number of arguments passed in the call. This number of
+arguments is not used in the call sequence, but is used by the linker
+when initially linking and when relinking.
+
+Execute caches are contiguous in the "constants" section, and the
+whole lot is preceded by a GC header of type TC_LINKAGE_SECTION which
+contains two fields. The least-significant halfword of the header
+contains the size in longwords of the execute-cache section (note that
+each cache entry may take up more than one longword). The remaining
+bits (ignoring the type code) MUST be 0. If a file makes enough
+external calls that this halfword field cannot hold the size, the
+links caches must be separated into multiple blocks each with its own
+header.
+
+Occasionally a procedure is called with more than one number of
+arguments within the same file. For example, the LIST procedure may
+be called with three and seven arguments in the same file. In this
+case there would be two execute caches for LIST. One would correspond
+to the argument count of three, and the other to seven.
+\f
+As an example, consider the code generated for
+
+(sort <some list> <some predicate>)
+
+where sort is the "global" procedure sort.
+
+The code section would contain
+
+ <compute some predicate>
+ push <some predicate>
+ <compute some list>
+ push <some list>
+ branch sort-uuo-link
+
+In the constants section there would be a label that would contain the
+following after linking
+
+sort-uuo-link:
+ jump sort ; Absolute address for sort
+ 3 ; Number of arguments + 1
+
+Before linking it would contain
+
+sort-uuo-link:
+ SORT ; The symbol SORT
+ 3 ; Number of arguments + 1
+
+This assumes that the absolute jump instruction takes one word. If it
+takes more, the appropriate padding would have to be inserted between
+the symbol SORT and the number 3. On machines where instructions are
+not necessarily longword aligned (MC68020 and VAX, for example), the
+padding bits for the instruction can be used to contain the argument
+count. Note that the order of the instructions and the count are
+machine dependent, although typically the instructions precede the
+count.
+\f
+The following macros are used to manipulate execute caches:
+
+=> EXECUTE_CACHE_ENTRY_SIZE specifies the length (in longwords) of an
+execute-cache entry. This includes the size of the instructions and
+the argument count. For the example above it would be 3, assuming
+that the jump instruction and the absolute address take two words
+together (the third is for the argument count). Note that on RISC
+machines, this size may have to include the size of the branch delay
+slot instruction.
+
+=> EXTRACT_EXECUTE_CACHE_ARITY specifies how to read the argument
+count from an execute-cache entry when given the address of the entry.
+In the above example, it would extract 3 from the address labeled
+sort-uuo-link.
+
+=> EXTRACT_EXECUTE_CACHE_SYMBOL specifies how to read the symbol from
+an execute-cache entry (before it is actually linked) when given the
+address of an entry. In the above example, it would extract the
+symbol SORT from sort-uuo-link.
+
+=> EXTRACT_EXECUTE_CACHE_ADDRESS fetches the real entry point stored
+in an execute-cache entry when given the address of the entry. In the
+above example, it would extract the entry point of the sort procedure
+when given the address of the jump instruction (labeled as
+sort-uuo-link).
+
+=> STORE_EXECUTE_CACHE_ADDRESS is the inverse of this, ie. when given a
+target entry point and the address of an execute cache entry, it
+stores the entry point there. In the above example, given a new entry
+point for sort, and sort-uuo-link, it would modify the jump
+instruction to jump to the new location.
+
+=> STORE_EXECUTE_CACHE_CODE stores the fixed instructions (opcodes),
+if any, in an execute cache cell. If the opcodes depend on the actual
+target address, this macro should be a NOP, and all the work should be
+done by STORE_EXECUTE_CACHE_ADDRESS. These two macros are separated
+to avoid extra work at garbage collection time on architectures where
+some or all of the code need not change. In the example above, this
+macro would store the jump opcode.
+
+
+ Trampolines:
+
+Trampolines are the linker-generated procedures that interface the
+caller and the callee when they are not directly compatible. They may
+not be directly compatible because the callee may not exist, may not
+be a compiled procedure, or may expect arguments in different
+locations. Trampolines typically call a C or assembly-language
+procedure to reformat the argument list or invoke the error handler.
+C procedures are invoked using the scheme_to_interface (and
+trampoline_to_interface) code described below.
+
+A trampoline is similar to a compiled closure in that it is a small
+compiled-code block with some additional storage needed by the
+trampoline handler (like the actual procedure being invoked, the
+variable that is unbound, or the number of arguments being passed).
+The code typically invokes an out-of-line handler passing it the
+address of the storage section, and an index into a table of C or
+assembly language procedures that handle the actual transfer.
+\f
+A typical trampoline looks like
+
+ ----------------------------------------
+block | MANIFEST-VECTOR | 6 | (4 + words of storage)
+ ----------------------------------------
+ | NM-HEADER | 3 | (fixed)
+ ----------------------------------------
+ | format_field | offset_field | (fixed)
+ ----------------------------------------
+entry | LOADI #index,rindex | (index varies)
+ ----------------------------------------
+ | JSR trampoline_to_interface | (fixed)
+ ----------------------------------------
+retadd | first word of storage | (variable)
+ ----------------------------------------
+ | second word of storage | (variable)
+ ----------------------------------------
+
+=> TRAMPOLINE_ENTRY_SIZE is the size in longwords of the compiled-code
+portion of a trampoline. It is similar to COMPILED_CLOSURE_ENTRY_SIZE
+but in longwords, and will typically represent less storage since an
+absolute address is not needed (or desirable). It must include the
+format word and the GC offset for the entry. In the example above it
+would be 3. Note that it must be an integer number of longwords, so the
+code should be padded if necessary.
+
+=> TRAMPOLINE_ENTRY_POINT returns the address of the entry point of a
+trampoline when given the address of a trampoline's block, i.e. the
+address of the word that will contain the manifest vector header. In
+the picture above, when given the address of the byte labeled `block',
+it will return the address of the byte labeled `entry'.
+
+=> TRAMPOLINE_STORAGE returns the address of the first storage word in
+a trampoline when given the addres of the first instruction (the entry
+point of the trampoline). In the picture above it would return the
+address of the word labeled `retadd' when given the address of the
+word labeled `entry'.
+
+Most versions of cmpint-md.h define the last two macros in the same
+way, in terms of TRAMPOLINE_BLOCK_TO_ENTRY, which is used only for
+this purpose. The definitions that use TRAMPOLINE_BLOCK_TO_ENTRY
+assume that the first instruction of a trampoline is aligned on a
+longword boundary. If this is the case, you can define
+TRAMPOLINE_BLOCK_TO_ENTRY appropriately and use the definitions of
+TRAMPOLINE_ENTRY_POINT and TRAMPOLINE_STORAGE from another machine.
+TRAMPOLINE_BLOCK_TO_ENTRY is the number of longwords from the start
+of a trampoline's block (the manifest vector header in the picture
+above), to the first instruction, which must be longword aligned.
+This will typically be 3 since there are two scheme header words, and
+the gc and format word typically take one longword together.
+
+=> STORE_TRAMPOLINE_ENTRY stores the "compiled" code into an "empty"
+trampoline. It is given the address of the entry point, and the index
+of the C procedure to invoke (they are all in a table), and stores the
+machine code necessary to invoke scheme_to_interface (or
+trampoline_to_interface), passing the index and the address of the
+trampoline storage area as parameters. In the example above this
+macro would store the LOADI (load immediate) and JSR instructions.
+\f
+ Compiled code and processor caches:
+
+Many modern computers have processor caches that speed up the average
+memory reference if the code exhibits sufficient locality in its
+reference patterns. In order to obtain increased performance at a
+lower cost, many processors have split caches for instructions and
+data that are not guaranteed to be consistent, ie. they are not
+necessarily invisible to the programmer.
+
+This presents problems for self-modifying code and for dynamic loaders
+and linkers, since instructions are stored using data references (and
+therefore the data cache), but the instruction cache may not reflect
+the updates. Modern hardware with split caches often provides some
+way to synchronize both caches so that the operating system can
+guarantee correct operation of newly-loaded programs.
+
+The Scheme compiled code support performs some of the same tasks that
+operating systems do, and therefore runs into these problems.
+
+The ways in which the consistency problem arises in the Scheme system
+are:
+
+- Newly allocated instructions. The compiler can be invoked
+dynamically, compiled code can be loaded dynamically into freshly
+allocated storage, and compiled closures are created dynamically. The
+instruction cache must reflect the changes made to memory through the
+data cache. The operating system's program loader must solve
+precisely this problem.
+
+- Execute caches may change their contents. Execute caches contain
+jump instructions to the appropriate code, but these instructions may
+change when the corresponding variables are assigned. If the
+instruction cache is not updated, the wrong code may be entered on
+subsequent calls. Operating systems with dynamic linking must solve
+this problem as well.
+
+- Code is moved by the garbage collector, since code space is not
+separate from data space and static. If the caches are not
+synchronized after a garbage collection, subsequent instruction
+fetches may result in the execution of incorrect instructions.
+The operating system must solve this problem when it re-allocates
+virtual memory pages.
+
+The problem can be solved by synchronizing the caches in the
+appropriate places. The relevant places in the Scheme system have
+been identified, and use two machine-dependent macros to
+synchronize both caches or flush the instruction cache.
+
+=> FLUSH_I_CACHE is used to flush the portion of the I-cache that
+Scheme code addresses may be in, or alternatively, to guarantee that
+the I-cache contains only valid data. It may flush/synchronize the
+entire I-cache instead, if it is easier. It is used after garbage
+collections and image loads.
+
+=> FLUSH_I_CACHE_REGION is used to flush or synchronize a region of
+the address space from the I-cache. It is given the base address and
+the number of long-words of the region of memory that has just been
+modified and whose new contents must be copied into the I-cache for
+correct execution.
+
+It is used after updating an execute cache while running between
+garbage collections. It is not used during garbage collection since
+FLUSH_I_CACHE will be used afterwards.
+\f
+These macros need not be defined if it is not needed to flush the
+cache. A NOP version is provided by the code when they are not
+defined in cmpint-md.h
+
+Note that on some machine/OS combinations, all system calls cause a
+cache flush, thus an innocuous system call (eg., a time reading call)
+may be used to achieve this purpose.
+
+Many modern machines only make their cache flushing instructions
+available to the operating system (they are priviledged instructions),
+and some operating systems provide no system calls to perform this
+task. In the absence of information on the structure and
+characteristics of the cache (the information could be used to write
+flushing routines), the Scheme compiler and system may have to be
+changed in order to run on such machines. Here is a set of changes
+that will bypass the problem, at the expense of some functionality and
+perhaps performance:
+
+- Change the entry code for closures and execute caches.
+
+The code in execute caches can be changed from
+
+ jump target
+
+to
+
+ jsr fixed-utility-routine
+ target address
+
+where fixed-utility-routine extracts target address from the return
+address and invokes it. The same change can be made to the closure
+entry code.
+
+This solves the problem of assignment to variables with execute
+caches.
+
+This change can be done quite easily since the format of closures and
+execute caches is already machine dependent, and all the accessors,
+constructors, and mutators have been abstracted into macros or can
+be easily rewritten in the compiler.
+
+- Change the storage management scheme to accomodate a code area that
+is never garbage collected so that code, once placed there, never
+moves.
+
+This would constitue a major change to the system. The main problem
+that this change would present is the following:
+
+Closures are data structures created and dropped on the fly, thus they
+cannot be allocated from a region of memory that is never reclaimed.
+Thus closures would have to be allocated from data space, and could no
+longer contain instructions. This implies that the format of entry
+points would have to change, since the relevant information would no
+longer consist of a single address, but two, ie. the address of the
+code in code space and the address of the data in data space. This
+would imply many changes to the compiler for there are implicit
+assumptions throughout that compiled entry points take no space
+besides the space taken by the code. In particular, simple closures
+would disappear, and multi-closures would have to be redesigned.
+\f
+ Implementation registers and utilities
+
+The C runtime support maintains some state variables that Scheme code
+may need to access. In order to make these variables easily
+accessible to both languages, all these variables are collected into a
+contiguous vector (the register block) accesible by both. The C
+variable "Registers" holds the address of this vector, and while
+compiled Scheme code is being executed, a processor register also
+holds the address of the vector. Among other data, the register block
+contains the memory version of MemTop, the interpreter's expression
+and environment registers, the interrupt mask and pending interrupt
+words.
+
+In addition, the compiler occasionally needs static memory locations
+into which it can spill the values contained in processor registers.
+Rather than using another register to hold the address of the spill
+locations, these are allocated on the same vector as the register
+block, and the register that holds the address of the register block
+can be used to access the spill locations as well.
+
+Compiled code also needs to invoke assembly language and C utilities
+to perform certain tasks that would take too much space to code
+in-line. Rather than choosing fixed addresses for these routines, or
+having to update them every time a piece of code is loaded or dumped,
+a register is reserved to hold the address of one of them, and the
+distance between them is pre-determined, so that compiled code can
+invoke any of them by adding an offset to the value in the register
+and jumping there.
+
+On processors with few registers (eg. 68k), it would be wasteful to
+reserve two registers in this fashion. Both registers are therefore
+merged. Yet another section of the register block array is reserved
+for utility procedures, and appropriate jump instructions are placed
+there so that compiled code can invoke the utilities by jumping into
+the register block array.
+
+The following macros define the sizes of the various areas of the
+array. None of them need to be defined except to override the
+default. The default assumes that there are enough processor
+registers that another one can be reserved to point to the utility
+handles.
+
+=> COMPILER_REGBLOCK_N_FIXED is the size of the register block
+section of the array. It must accomodate at least as many locations
+as the interpreter expects to have.
+
+=> COMPILER_REGBLOCK_N_TEMPS is the number of spill locations.
+
+=> COMPILER_TEMP_SIZE is the size (in long words) of the contents of a
+floating point register if different from a double. For example, an
+MC68881 saves registers in 96 bit (3 longword) blocks. The default is
+fine for most machines.
+
+=> COMPILER_REGBLOCK_EXTRA_SIZE is the additional size (in longwords)
+to be reserved for utility handles. It is typically defined the
+following way as (COMPILER_REGBLOCK_N_HOOKS * COMPILER_HOOK_SIZE).
+
+=> COMPILER_REGBLOCK_N_HOOKS is the maximum number of utility handles.
+
+=> COMPILER_HOOK_SIZE is the size in longwords of a utility handle (an
+absolute jump instruction).
+
+=> Macro ASM_RESET_HOOK can be used to initialize the register block
+array. It is invoked at boot time.
+\f
+ Miscellany:
+
+Macro IN_CMPINT_C, defined in cmpint.c, can be used to conditionally
+include code (extern procedures) needed by the port. It is only
+defined when cmpint-md.h is included by cmpint.c .
+
+=> Macro COMPILER_PROCESSOR_TYPE identifies the processor type. It
+should be unique for each kind of processor.
--- /dev/null
+ Emacs: Please use -*- Text -*- mode. Thank you.
+
+$Id: porting.guide,v 1.1 1995/08/05 16:25:13 adams Exp $
+
+
+Copyright (c) 1991-1993 Massachusetts Institute of Technology
+
+
+ LIAR PORTING GUIDE
+ *DRAFT*
+
+
+
+Notes:
+
+This porting guide applies to Liar version 4.91, but most of the
+relevant information has not changed for a while, nor is it likely to
+change in major ways any time soon.
+
+This is an early version of this document, and the order of
+presentation leaves a lot to be desired. In particular, the document
+does not follow a monotonic progression, but is instead organized in a
+dictionary-like manner. We recommend that you read through the whole
+document twice since some important details, apparently omitted, may
+have their explanation later in the document. When reading the
+document for the second time, you will have an idea of where this
+other information is to be found, if it is present at all. We have
+attempted to insert sufficient forward pointers to make the first
+reading bearable, but we may have missed some.
+
+This document implicitly assumes that you are trying to build the
+compiler under Unix. The only compiler sources that depend on Unix
+are the sources that contain the pathnames of other files.
+The syntax could easily be changed for other file systems.
+This document uses Unix pathname syntax and assumes a hierarchical
+file system, but it should easy to map these directories to a
+different file system. The DOS runtime library accepts forward
+slashes (#\/) as a substitute for backward slashes (#\\), so the
+scripts are shared between Unix and DOS.
+
+This document also assumes that you are familiar with MIT Scheme, C,
+and the C preprocessor. It does not describe Liar in detail, and it
+does not cover many machine-independent portions at all. It is
+intended to guide a programmer familiar with (MIT) Scheme and C in the
+task of porting the compiler to a new architecture, not in modifying
+the compiler in any other way.
+
+For questions on Liar not covered by this document, or questions about
+this document, contact ``liar-implementors@zurich.ai.mit.edu''.
+
+Text tagged by ==> is intended primarily for the compiler developers.
+
+Good luck!
+\f
+ Acknowledgments
+
+Liar is the work of many people. The current version is mostly the
+effort of Chris Hanson and Bill Rozas, with significant contributions
+from Mark Friedman and Jim Miller. Arthur Gleckler, Brian LaMacchia,
+and Henry Wu have also contributed to the current version of Liar.
+Many other people have offered suggestions and criticisms.
+
+The current Liar might never have existed had it not been for the
+efforts and help of the now-extinct BBN Butterfly Lisp group. That
+group included Don Allen, Seth Steinberg, Larry Stabile, and Anthony
+Courtemanche. Don Allen, in particular, babysat computers to
+painstakingly bootstrap the first version of the then new Liar.
+
+Many of the ideas and algorithms used in Liar, particularly at the RTL
+level, are taken from the GNU C compiler, written by Richard Stallman
+and many others.
+
+This document was written by Bill Rozas, with modifications and hints
+from the people listed above. The section on the MIT Scheme package
+system was written by Arthur Gleckler.
+\f
+ 0. Introduction and a brief walk through Liar.
+
+Liar translates Scode as produced by the procedure SYNTAX, or by the
+file syntaxer (SF, for syntax file) into compiled code objects. The
+Scode is translated into a sequences of languages, the last of which
+is the binary representation of the compiled code.
+
+The sequence of external languages manipulated is
+
+ Characters --READ-->
+ S-Expressions --SYNTAX-->
+ Scode --COMPILE-SCODE-->
+ compiled code objects.
+
+Liar is a multi-pass compiler, where each major pass has multiple
+subpasses. Many of the subpasses do not manipulate the whole code
+graph, but instead follow threads that link the relevant parts of the
+graph.
+
+COMPILE-SCODE is the main entry point to Liar, although CBF (for
+compile bin file) is the usual entry point. CBF uses COMPILE-SCODE,
+and assumes that the code has been syntaxed by SF producing a .bin
+file, and dumps the resulting compiled code into a .com file. CF (for
+compile file) invokes SF and then CBF on a file name argument.
+
+The internal sub-languages used by Liar are:
+
+ Scode --FGGEN-->
+ Flow-graph --RTLGEN-->
+ RTL (Register Transfer Language) --LAPGEN-->
+ LAP (Lisp assembly program) --ASSEMBLER-->
+ bits --LINK-->
+ compiled code object.
+
+where FGGEN, etc., are some of the major passes of the compiler.
+
+The remaining major passes are FGOPT (the flow-graph optimizer), and
+RTLOPT (the RTL-level optimizer). RTL-level register allocation is
+performed by RTLOPT, and hardware-level register allocation is
+performed by LAPGEN. ASSEMBLER branch-tensions the output code.
+Branch-tensioning is described in a later section. LINK constructs a
+Scheme compiled code object from the bits representing the code and
+the fixed data that the compiled code uses at runtime.
+
+compiler/toplev.scm contains the top-level calls of the compiler and
+its pass structure.
+
+The ``.com'' files contain compiled code objects, which are linked
+further at load time.
+\f
+ 0.1. Liar's package structure
+
+This section assumes that you are familiar with the MIT Scheme package
+system. If you are not, there is a small description in an appendix
+to this document.
+
+The package structure of the compiler reflects the pass structure and
+is specified in compiler/machines/port/comp.pkg, where port is the
+name of a machine (bobcat, vax, spectrum, mips, i386, alpha, etc.).
+The major packages are:
+
+(COMPILER):
+ Utilities and data structures shared by most of the compiler.
+
+(COMPILER MACROS):
+ Syntax extensions used by the compiler to define language
+translation rules.
+
+(COMPILER TOP-LEVEL):
+ Top level pass structure of the compiler.
+
+(COMPILER FG-GENERATOR):
+ This package contains the flow-graph generator, FGGEN.
+
+(COMPILER FG-OPTIMIZER):
+ This package contains the flow-graph analyzer and optimizer,
+FGOPT. It has many sub-packages to contain the individual sub-passes.
+
+(COMPILER RTL-GENERATOR):
+ This package contains the flow-graph to RTL translator,
+RTLGEN. It contains a few sub-packages for the major kinds of
+flow-graph operations.
+
+(COMPILER RTL-OPTIMIZER):
+ This package contains most of the RTL-level optimizer, RTLOPT.
+It has various sub-packages corresponding to some of its sub-passes.
+
+(COMPILER RTL-CSE):
+ This package contains the RTL-level common (redundant)
+subexpression eliminator pass of the RTL-level optimizer.
+
+(COMPILER LAP-SYNTAXER):
+ This package contains most of the machine-dependent parts of
+the compiler and the back end utilities. In particular, it contains
+the RTL -> LAP translation rules, and the LAP -> bits translation
+rules, i.e. the LAPGEN and ASSEMBLER passes respectively. It has some
+sub-packages for various major utilities (linearizer, map-merger,
+etc.).
+
+(COMPILER ASSEMBLER):
+ This package contains most of the machine-independent portion of
+the assembler. In particular, it contains the bit-assembler, i.e.
+the portion of the assembler that accumulates the bit strings produced
+by ASSEMBLER and performs branch-tensioning on the result.
+
+(COMPILER DISASSEMBLER):
+ This package contains the disassembler. It is not needed for
+ordinary compiler operation, but is useful for low-level debugging,
+and debugging of the compiler and assembler.
+\f
+ 0.2. Liar's sources' directory structure
+
+The directory structure loosely reflects the pass structure of the
+compiler. compiler/machines/port/comp.pkg declares the packages and
+the files that constitute them.
+
+compiler/back:
+ This directory contains the machine-independent portion of the
+back end. It contains bit-string utilities, symbol table utilities,
+label management procedures, the hardware register allocator, and the
+top-level assembler calls.
+
+compiler/base:
+ This directory contains common utilities used by the whole
+compiler, and the top level procedures provided by the compiler.
+
+compiler/etc:
+ This directory contains utilities used for cross-compiling,
+and checking re-compilations.
+
+compiler/fggen:
+ This directory contains the front end of the compiler. The
+code in this directory translates Scode into a flow-graph used by the
+analyzer and optimizer.
+
+compiler/fgopt:
+ This directory contains the flow-graph analyzer and optimizer.
+
+compiler/rtlbase:
+ This directory contains utilities used by the RTL generator and
+optimizer.
+
+compiler/rtlgen:
+ This directory contains the code that translates the
+flow-graph into register transfer language (RTL).
+
+compiler/rtlopt:
+ This directory contains the RTL-level optimizer. It contains
+code to perform lifetime analysis, redundant subexpression
+elimination, elimination of dead code, etc.
+
+compiler/machines:
+ This directory contains a subdirectory for each port of the
+compiler. Each of these subdirectories contains the port (machine)
+dependent files of the compiler.
+
+compiler/machines/port:
+ This directory contains the definition of machine parameters,
+the assembler rules, the disassembler, and RTL to assembly-language
+rules for the port.
+
+All machine-dependent files are in compiler/machines/port and this is
+the only directory that needs to be written to port the compiler to a
+new architecture.
+\f
+ 1. Liar's runtime model.
+
+Liar does not open code (inline) all operations that the code would
+need to execute. In particular, it leaves error handling and
+recovery, interrupt processing, initialization, and invocation of
+unknown procedures, to a runtime library written in assembly language.
+
+Although this runtime library need not run in the context of the
+CScheme interpreter, currently the only implementation of this library
+runs from the interpreter and uses it for many of its operations.
+
+In other words, code generated by Liar does not depend on the
+interpreter directly, but indirectly through the runtime library. It
+does depend on the ability to invoke CScheme primitives at runtime,
+some of which (eval, etc.) require the interpreter to be present. It
+should be possible, however, to provide an alternate runtime library
+and primitive set that would allow code produced by Liar to run
+without the interpreter being present. (Foot) We often toy with this
+idea.
+
+On the other hand, since the only instance of the runtime library is
+that supplied by the interpreter, Liar currently assumes that the
+Scheme object representation is the same as that used by the
+interpreter, but this is relatively well abstracted and should not be
+hard to change. (Foot) Famous last words.
+
+The runtime library is currently implemented by
+microcode/cmpaux-port.m4 and microcode/cmpint.c . The files
+cmpaux.txt and cmpint.txt document these files. The documentation
+files may be found in the microcode or the documentation directories.
+
+microcode/cmpaux-port.m4 is an assembly language machine-dependent file
+that allows compiled Scheme to call the C-written library routines and
+vice versa. It is described in cmpaux.txt.
+
+microcode/cmpint.c defines the library in a machine-independent way,
+but requires some information about the port and this is provided in
+microcode/cmpint2.h, a copy (or link) of the appropriate
+microcode/cmpint-port.h file. The microcode/cmpint-port.h files are
+described in cmpint.txt .
+
+cmpint.txt also describes many of the data structures that the
+compiled code and runtime library manipulate, and defines some of the
+concepts needed to understand the compiler.
+
+The rest of this document assumes that you are using the runtime
+library provided by the CScheme interpreter. If you wish to use Liar
+as a compiler for stand-alone programs, a lot of work needs to be
+done, and this work is not described here. Perhaps we will do it in
+the future.
+
+If you have not yet read cmpaux.txt and cmpint.txt, please do so
+before reading the rest of this document.
+
+You should probably also read [1] and [2] for a discussion of some of
+the implementation issues.
+\f
+ 2. Preliminary Observations
+
+ 2.1. Constraints on architectures to which Liar can be ported:
+
+- Liar assumes that the target machine is a general-register machine.
+That is, operations are based on processor registers, and there is a
+set of general-purpose registers that can be used interchangeably. It
+would be hard to port Liar to a pure stack machine, a graph-reduction
+engine, a Turing machine, or a 4-counter machine. However, the
+register set required is not huge. Liar has been ported to the
+386/486 architecture which only has eight registers, four of which are
+reserved for implementation quantities (e.g. stack pointer and free
+pointer) and four of which are left to the register allocator.
+
+- Liar currently assumes that floating-point registers and integer
+registers are separate or the same size. In other words, currently
+Liar cannot handle quantities that need multiple registers to hold
+them. For example, on the DEC VAX and the Motorola 88100, there is a
+single set of registers, and double floating point values (the only
+kind used by Scheme) take two consecutive integer registers. The
+register allocator in Liar does not currently handle this situation,
+and thus, floating-point operations are not currently open-coded on
+the VAX.
+
+- Liar assumes that the target machine has an address space that is
+flat enough that all Scheme objects can be addressed uniformly. In
+other words, segmented address spaces with segments necessarily
+smaller than the Scheme runtime heap (i.e. Intel 286) will make Liar
+difficult to port.
+
+- Liar assumes that instructions and data can coexist in the same
+address space, and that new code objects that contain machine
+instructions can be dynamically allocated from and written to the heap
+(memory pool) used to allocate all other Scheme objects. This
+assumption in Liar conflicts with some current hardware that has
+programmer-visible separate (split) data and instruction caches --
+that is, there are two different caches, one used by the processor for
+instruction references and the other for data references, and storing
+data into memory only updates the data cache, but not the instruction
+cache, and perhaps not even memory. Most of the problems this causes
+can be resolved if the user is given enough control over the hardware
+caches, i.e. some way to flush or synchronize them. Furthermore, a
+true Harvard architecture, with separate code and data memories, would
+be hard to accommodate without relatively major changes. At some
+point in the future we may write a C back end for Liar that handles
+this case, since C code space and data space are typically kept
+separate by the operating system. Whatever technique the C back end
+may use can probably be emulated by architectures with such a strong
+division, although it is likely to be expensive.
+\f
+ 2.2. Some implementation decisions that may make your job
+harder or impair the quality of the output code:
+
+- Liar generates code that passes arguments to procedures on a stack.
+This decision especially affects the performance on load-store
+architectures, common these days. Liar may be changed in the future
+to generate code that passes arguments in registers because most
+modern machines have large register sets and memory-based operations
+are slower than register-based operations even when the memory
+locations have been cached.
+
+- Liar assumes that pushing and popping elements from a stack is
+cheap. Currently Liar does not attempt to bump the stack pointer once
+per block of operations, but instead bumps it once per item. This is
+expensive on many modern machines where pre-and-post incrementing are
+not supported by the hardware. This may also change in the
+not-too-far future.
+
+- Liar assumes that it is cheap to compute overflow conditions on
+integer arithmetic operations. Generic arithmetic primitives have the
+frequent fixnum (small integer) case open coded, and the overflow and
+non-fixnum cases coded out of line, but this depends on the ability of
+the code to detect and branch on overflow conditions cheaply. This is
+not true of some modern machines, notably MIPS processors. If your
+processor does not make branching on such conditions reasonably cheap,
+you may have to use code similar to that used in the MIPS port. The
+MIPS processor has trapping and non-trapping arithmetic instructions.
+The trapping arithmetic instructions trap on overflow, but the trap
+recovery code is typically so expensive that the generated code
+computes the overflow conditions explicitly.
+
+- Liar assumes that extracting, inserting, and comparing bit-fields is
+relatively cheap. The current object representation for Liar
+(compatible with the interpreter) consists of using a number of bits
+(usually 6) in the most significant bit positions of a machine word as
+a type tag, and the rest as the datum, usually an encoded address.
+Not only must extracting, comparing, and inserting these tags be
+cheap, but decoding the address must be cheap as well. These
+operations are relatively cheap on architectures with bit-field
+instructions, but more expensive if they must be emulated with bitwise
+boolean operations and shifts, as on the MIPS R3000. Decoding a datum
+into an address may involve inserting segment bits in some of the
+positions where the tag is placed, further increasing the dependency
+on cheap bit-field manipulation.
+
+- The CScheme interpreter uses a particularly poor representation for
+fixnums, forcing Liar's hand. Fixnums are suitably small integers.
+They are immediate objects with a particular tag. This tag was not
+wisely chosen, making fixnum operations more expensive than they need
+to be. This tag may be changed in the future.
+
+- The CScheme interpreter manipulates a stack that grows in a fixed
+direction (from higher to lower addresses). On many modern machines,
+there are no special instructions to deal with the stack, so the
+decision is arbitrary. On some machines, however, there are special
+instructions to pop and push elements on the stack. Liar may not be
+able to use these instructions if the machine's preferred direction of
+stack growth does not match the interpreter's.
+\f
+ 2.3. Emulating an existing port.
+
+The simplest way to port Liar is to find an architecture to which Liar
+has already been ported that is sufficiently similar to the target
+architecture that most of the code can be written by copying or
+trivially translating existing code. In particular, if the
+architectures are really close, there may be no need for
+architecture-specific additional tuning.
+
+The compiler is primarily developed on Motorola MC68020 processors, so
+this is the best-tuned version, and the other ports are not very well
+tuned or not tuned at all. If you improve an existing port, please
+share the improvements by notifying liar-implementors.
+
+- If you have a Vax-like CISC machine, you can try starting from the
+Vax, the Motorola MC68020, or the i386 ports. The Vax and i386 ports
+were written by starting from the MC68020 port. This is probably the
+best solution for some architectures like the NS32000, and perhaps
+even the IBM 370.
+
+- If you have an ``enlarged'' RISC processor, with some complex
+addressing modes, and bit-field instructions, you may want to start by
+looking at the Spectrum (HP Precision Architecture) port. This is
+probably a good starting point for the Motorola 88000 and for the IBM
+RS/6000.
+
+- If you have a bare-bones RISC processor, similar to a MIPS R3000
+processor, you may want to start from this port. Since the MIPS R3000
+is a minimalist architecture, it almost subsumes all other RISCs, and
+may well be a good starting point for all of them. This is probably a
+good starting point for the Sparc. The MIPS port used the Spectrum
+port as its model, and the Alpha port used the MIPS port as its model.
+
+- If you have a machine significantly different from those listed
+above, you are out of luck and will have to write a port from scratch.
+For example, the port to the Intel 386+387/486 uses some of the
+concepts and code from ports to other CISCs, but due to the
+floating-point stack architecture (instead of register-based), the
+floating-point stack management is different (but not very good).
+
+Of course, no architecture is identical to any other, so you may want
+to mix and match ideas from many of the ports already done, and it is
+probably a good idea for you to compare how the various ports solve
+the various problems.
+\f
+ 3. Compiler operation, LAPGEN rules and ASSEMBLER rules.
+
+The front end of the compiler translates Scode into a flow-graph that
+is then translated into RTL. The back end does machine-independent
+optimization on the RTL, generates assembly language (in LAP format)
+from the RTL, and assembles the resulting bits.
+
+Although RTL is a machine-independent language, the particular RTL
+generated for a given program will vary from machine to machine. The
+RTL can vary in the following ways:
+
+- RTL is a language for manipulating the contents of conceptual
+registers. RTL registers are divided into ``pseudo registers'' and
+``machine registers''. Machine registers represent physical hardware
+registers, some of which have been reserved and given fixed meanings
+by the port (stack pointer, value register, etc.) while
+pseudo-registers represent conceptual locations that contain
+quantities that will need physical registers or memory locations to
+hold them in the final translation. An RTL pseudo register can be
+mapped to any number of physical registers in the final translation,
+and may ``move'' between physical registers. In order to make the RTL
+more homogeneous, the RTL registers are not distinguished
+syntactically in the RTL, but are instead distinguished by their value
+range. Machine registers are represented as the N lowest numbered RTL
+registers (where N is the number of hardware registers), and all
+others are pseudo registers. Since some RTL instructions explicitly
+mention machine registers and these (and their numbers) vary from
+architecture to architecture, the register numbers in an RTL program
+will vary depending on the back end in use. Machine registers may be
+divided into separate classes (e.g. address, data, and floating-point
+registers) that can contain different types of values. Pseudo
+registers are not distinguished a-priori, but the values stored in
+them must be consistent. For example, if a floating point value is
+stored into a particular pseudo register, the register can only be
+mapped to floating-point machine registers, and non-floating-point
+values cannot be stored in it.
+
+- RTL assumes a load-store architecture, but can accommodate
+architectures that allow memory operands and rich addressing modes.
+RTL is constructed by generating statements that include relatively
+complex expressions. These expressions may represent multiple memory
+indirections or other operations. An RTL simplifier runs over this
+initial RTL, assigning these intermediate quantities to new pseudo
+registers and rewriting the original statements to manipulate the
+original and new pseudo-registers. Typically this simplification
+results in a sequence of assignments to pseudo-registers with single
+operations per assignment and where the only memory operations are
+load and store. However, this simplification pass is controlled by
+the port. The port supplies a set of rewriting rules to the
+simplifier that causes the simplifier to leave more complex
+expressions untouched, or to be simplified in different ways,
+depending on the availability of memory operands or richer addressing
+modes. Since these rules vary from port to port, the final RTL
+differs for the different ports. The simplification process is also
+controlled by the availability of various rules in the port, and ports
+for richer instruction sets may require less simplification because
+hardware instructions and addressing modes that encode more
+complicated RTL patterns are directly available.
+
+- The open coding (inlining) of Scheme primitives is
+machine-dependent. On some machines, for example, there is no
+instruction to multiply integers, and it may not be advantageous to
+open code the multiplication primitive. The RTL for a particular
+program may reflect the set of primitive operations that the back end
+for the port can open code.
+\f
+The resulting RTL program is represented as a control flow-graph where
+each of the nodes has an associated list of RTL statements. The edges
+in the graph correspond to conditional and unconditional branches in
+the code, and include a low-level predicate used to choose between the
+alternatives. The graph is linearized after the instructions have
+been translated to LAP. There is a debugging RTL linearizer used by
+the RTL output routine.
+
+Besides assignments and tests, the RTL has some higher level
+statements that correspond to procedure headers, continuation (return
+address) headers, etc. Thus an RTL program is made mostly of register
+to register operation statements, a few conditional tests, and a few
+higher-level ``glue'' statements.
+
+Once a program has been translated to RTL, the RTL code is optimized
+in a machine-independent way by minimizing the number of RTL
+pseudo-registers used, removing redundant subexpressions, eliminating
+dead code, and by using various other techniques.
+
+The RTL program is then translated into a Lisp-format
+assembly-language program (LAP). Hardware register allocation occurs
+during this translation. The register allocator is
+machine-independent and can accommodate different register classes,
+but does not currently accommodate register pairs (this is why floating
+point operations are not currently open coded on the Vax).
+
+The register allocator works by considering unused machine registers
+(those not reserved by the port) to be a cache for the
+pseudo-registers. Thus a particular pseudo-register may map into
+multiple machine registers of different types, and these aliases are
+invalidated as the pseudo-registers are written or the corresponding
+machine registers reused. Thus the most basic facility that the
+register allocator provides is a utility to allocate an alias of a
+particular type for a given pseudo-register.
+
+The port defines the types and numbers of machine registers and the
+subset that is available for allocation, and the register allocator
+manages the associations between the pseudo-registers and their
+aliases and the set of free machine registers. The register allocator
+also automatically spills the contents of machine registers to memory
+when pressed for machine registers, and reloads the values when
+necessary.
+
+Thus the resulting LAP program is the collection of the code issued by
+the rules that translate RTL into LAP, the instructions issued behind
+the scenes by the register allocator, and the instructions used to
+linearize the control flow graph.
+
+The back end provides a set of rules for the translation of RTL to
+LAP, and a set of procedures that the register allocator and the
+linearizer use to generate such instructions. The rules are written
+using a special syntax that creates entries in a data base used by a
+pattern matcher to translate the RTL into LAP.
+
+The linear LAP is then translated into binary form by using the same
+pattern matcher with a different set of rules. These rules define the
+translation between assembly language and machine language for the
+architecture. Most of these rules output bit strings to be collected
+together, but some output a set of directives to the bit-level
+assembler to define labels, or choose between alternative encoding of
+the fields depending on the final value of a displacement. These
+alternative encodings are typically used for PC-relative quantities.
+\f
+The machine-independent bit-assembler collects all the bits together
+and keeps track of a virtual program counter used to determine the
+distance between instruction fields. A relaxation process is used to
+reduce the size of the resulting encoding (to tension branches, i.e.
+to choose the smallest encoding that will do the job when there are
+alternatives).
+
+Since most of the LAPGEN rules generate almost fixed assembly language,
+where the only difference is the register numbers, most of the LAP to
+bits translation can be done when the compiler is compiled. A
+compiler switch, ``COMPILER:ENABLE-EXPANSION-DECLARATIONS?'' allows this
+process to take place. This mechanism has not been used for a while,
+however, because the resulting compiler was, although somewhat faster,
+considerably bigger, so this switch may not currently work.
+
+Several other compiler parameters and switches control various aspects
+of the operation of the back end. Most parameters and switches are
+machine independent, and are defined in compiler/base/switch.scm .
+The remaining parameters and switches are defined in
+compiler/machines/port/machin.scm. All compiler parameters and
+switches are exported to the Scheme global package for easy
+manipulation.
+
+The following switches are of special importance to the back end
+writer:
+
+* COMPILER:COMPILE-BY-PROCEDURES? This switch controls whether the
+compiler should compile each top-level lambda expression independently
+or compile the whole input program (or file) as a block. It is
+usually set to true, but must be set to false for cross-compilation.
+The cross-compiler does this automatically. (Foot) The reason for
+this is that the gc offset words in compiled entry points may differ
+for the source and target machines, and thus the source machine's
+garbage collector may be confused by the target machine's compiled
+entry points. This is circumvented by having the cross compiler
+generate a single compiled code block object, manipulated and dumped
+as a vector object (instead of as an entry point). The final entry
+points are generated by cross-compile-bin-file-end running interpreted
+on the target machine.
+
+* COMPILER:OPEN-CODE-PRIMITIVES? This switch controls whether Liar
+will open code (inline) MIT Scheme primitives. It is usually set to
+true and should probably be left that way. On the other hand, it is
+possible to do a lot less work in porting the compiler by not
+providing the open coding of primitives and turning this switch off.
+ Some of the primitives are open coded by the machine-independent
+portion of the compiler, since they depend only on structural
+information, and not on the details of the particular architecture.
+In other words, CAR, CONS, and many others can be open-coded in a
+machine-independent way since their open codings are performed
+directly in the RTL. Turning this switch to false would prevent the
+compiler from open coding these primitives as well.
+
+* COMPILER:GENERATE-RTL-FILES? and COMPILER:GENERATE-LAP-FILES? These
+are mostly compiler debugging switches. They control whether the
+compiler will issue .rtl and .lap files for every file compiled. The
+.rtl file will contain the RTL for the program, and the .lap file will
+contain the input to the assembler. Their usual value is false.
+
+* COMPILER:INTERSPERSE-RTL-IN-LAP? This is another debugging switch.
+If turned on, and COMPILER:GENERATE-LAP-FILES? is also on, the lap
+output file includes the RTL statements as comments preceding their
+LAP translations. Its usual value is true.
+ ==> RTL predicates are not included, making the control-flow hard to
+follow. This should be fixed.
+\f
+* COMPILER:OPEN-CODE-FLOATING-POINT-ARITHMETIC? This switch is
+defined in compiler/machines/port/machin.scm and determines whether
+floating point primitives can and should be open coded by the compiler
+or not. If the port provides open codings for them, it should be set
+to true, otherwise to false.
+
+* COMPILER:PRIMITIVES-WITH-NO-OPEN-CODING This parameter is defined in
+compiler/machines/port/machin.scm. It contains a list of primitive
+names that the port cannot open code. Currently there is no simple
+list of all the primitives that Liar can open-code. The list is
+implicit in the code contained in rtlgen/opncod.scm.
+ ==> The last two parameters should probably be combined and inverted.
+COMPILER:PRIMITIVES-WITH-OPEN-CODINGS should replace both of the
+above. This has the advantage that if the RTL level is taught how to
+deal with additional primitives, but not all ports have open codings
+for them, there is no need to change all the machin.scm files, only
+those for which the open coding has been provided.
+\f
+ 4. Description of the machine-specific files
+
+The following is the list of files that usually appears in the port
+directory. The files can be organized differently for each port, but
+it is probably easiest if the same pattern is kept. In particular,
+the best way to write most is by editing the corresponding files from
+an existing port. Keeping the structure identical will make writing
+decls.scm, comp.pkg, and comp.sf straightforward, and will make future
+updates easier to track.
+
+A useful thing to do when writing new port files is to keep
+track of the original version from which you started, and
+additionally, that on which your original is based. For example, if
+you use machines/mips/assmd.scm as a model for your version, in it you
+would find something like
+ $ Header: assmd.scm,v 1.1 90/05/07 04:10:19 GMT jinx Exp $
+ $MC68020-Header: assmd.scm,v 1.36 89/08/28 18:33:33 GMT cph Exp $
+In order to allow an easier merge in the future, it would
+be good if you transformed this header into
+ $ Header $
+ $mips-Header: assmd.scm,v 1.1 90/05/07 04:10:19 GMT jinx Exp $
+ $MC68020-Header: assmd.scm,v 1.36 89/08/28 18:33:33 GMT cph Exp $
+The new $ Header $ line would be used by RCS to keep track of the
+versions of your port and the others could be used to find updates to
+the originals that would make updating your port easier.
+
+ 4.1. Compiler building files:
+
+* comp.pkg:
+ This file describes the Scheme package structure of the
+compiler, the files loaded into each package, and what names are
+exported and imported from each package.
+ To write this file, copy the similar file from an existing
+port, change the name of the port (i.e. mips -> sparc), and add or
+remove files as appropriate. You should only need to add or remove
+assembler and LAPGEN files.
+
+* comp.cbf:
+ This file is a script that can be used to compile the compiler
+from scratch. You can copy this file from another port, and change
+the port name. There is more information in a later section about how
+to build the compiler.
+
+* comp.sf:
+ This file is a script that is used to pre-process the compiler
+sources before they are loaded to be interpreted or compiled. You
+should be able to copy the file from an existing port and replace the
+name of the port. You should also edit the names of the instruction
+files in the assembler instruction database section, although this
+section is no longer used by default.
+
+The previous three files should be copied or linked to the top-level
+compiler directory. I.E., compiler/comp.pkg should be a link (symbolic
+preferably) or copy of compiler/machines/port/comp.pkg .
+\f
+* comp.con, comp.ldr, comp.bcon, and comp.bldr:
+ These files are generated by the CREF subsystem from the
+information in the cref.pkg file. The .bcon and .bldr files are
+binary versions of the others, which are scheme sources. The .con
+file contains the ``connectivity code'', that is, the code to create and
+link the package objects specified in the .pkg file. The .ldr file
+contains the ``loading code'', that is, the code to load the source
+files into the appropriate packages and, in theory, to initialize the
+packages. The CREF subsystem also generates a comp.cref file that
+includes cross-reference information. It is useful to examine this
+file to find unbound references (often typos).
+
+* make.scm:
+ This file is used to load the compiler on top of a runtime
+system that has the file syntaxer (SF) loaded, and defines the version
+of the compiler. The list of files does not appear here because the
+comp.pkg already declares them, CREF will build the comp.con and
+comp.ldr files that contain this information and will be loaded by
+make.scm.
+
+* decls.scm:
+ This file defines the pre-processing dependencies between the
+various source files. There are three kinds of pre-processing
+dependencies:
+ - Syntactic: Different files need to be processed in different syntax
+tables that define the macros used by the files.
+ - Integrations: Different files import integrable (inline) definitions
+from other files, and must be processed in the right sequence in order
+to obtain the maximum effect from the integrations (mostly because of
+transitive steps).
+ - Expansions: Certain procedures can be expanded at compiler
+pre-processing time into accumulations of simpler calls. This is how
+the assembly language in the LAPGEN rules can be translated into bits
+at compiler pre-processing time. The files that define the
+pre-processing-time expansion functions must be loaded in order to
+process those files that use the procedures that can be expanded.
+decls.scm builds a database of the dependencies. This database is
+topologically sorted by some of the code in decls.scm itself in order
+to determine the processing order. Since there are circularities in
+the integration dependencies, some of the files are processed multiple
+times, but the mechanism in decls takes care of doing this correctly.
+You should be able to edit the version from another port in the
+appropriate way. Mostly you will need to rename the port (i.e. mips
+-> sparc), and add/delete instruction and rule files as needed.
+
+ ==> decls.scm should probably be split into two sections: The
+machine-independent dependency management code, and the actual
+declaration of the dependencies for each port. This would allow us to
+share more of the code, and make the task of rewriting it less
+daunting.
+\f
+ 4.2. Miscellaneous files:
+
+* rgspcm.scm:
+ This file declares a set of primitives that can be coded by
+invoking runtime library procedures. This file is no longer machine
+dependent, since the portable library has made all the sets identical.
+It lives in machines/port for historical reasons, and should probably
+move elsewhere. Obviously, you can just copy it from another port.
+ ==> Let's move it or get rid of it!
+
+* rulrew.scm:
+ This file defines the simplifier rules that allow more
+efficient use of the hardware's addressing modes and other
+capabilities. The rules use the same syntax as the LAPGEN rules, but
+belong in the (rule) rewriting database. Although these rules are
+machine-dependent, it should be straightforward to emulate what other
+ports have done in order to arrive at a working set. Moreover, it is
+possible to start out with an empty set and only add them as
+inefficiencies are discovered in the output assembly language. These
+rules manipulate RTL expressions by using the procedures defined in
+compiler/rtlbase/rtlty1.scm and compiler/rtlbase/rtlty2.scm.
+
+* lapopt.scm:
+ This file defines a LAP-level peephole optimizer. Currently
+only used in the MIPS port to reduce the number of NOPs in the ``delay
+slots'' of load instructions. The instructions in each LAP-level
+basic block are passed to optimize-linear-lap, which outputs the new
+sequence of instructions corresponding to the basic block. Currently
+all ports (except the MIPS port) implement this procedure as the
+identity procedure.
+
+* machin.scm:
+ This file defines architecture and port parameters needed by
+various parts of the compiler. The following is the current list of
+the primary parameters. The definitions of derived parameters not
+mentioned here should be copied verbatim from existing ports. Some of
+these parameters are not currently in use, but should all be provided
+for completeness.
+
+- USE-PRE/POST-INCREMENT?: Should be true or false depending on
+whether the architecture has addressing modes that update the base
+address. It is true on the MC68020, Vax, i386, and HP-PA, and false
+on the MIPS and Alpha.
+
+- ENDIANNESS: Should be the symbol LITTLE if an address, when used as
+a byte address, refers to the least significant byte of the long-word
+addressed by it. It should be BIG if it refers to the most
+significant byte of the long-word. The compiler has not been ported
+to any machines where the quantum of addressability is not an 8-bit
+byte, so the notion may not apply to those.
+
+- ADDRESSING-GRANULARITY: How many bits are addressed by the
+addressing quantum. I.e., increasing an address by 1 will bump the
+address to point past this number of bits. Again, the compiler has
+not been ported to any machine where this value is not 8.
+
+- SCHEME-OBJECT-WIDTH: How many bits are taken up by a Scheme object.
+This should be the number of bits in a C ``unsigned long'', since Scheme
+objects are declared as such by the portable runtime library.
+
+- SCHEME-TYPE-WIDTH: How many bits at the most-significant end of a
+Scheme object are taken up by the type tag. The value of
+TYPE_CODE_LENGTH in the microcode must match this value. The value is
+currently 6 for systems with a compiler and 8 for systems without one.
+\f
+- ADDRESS-UNITS-PER-PACKED-CHAR: This parameter defines how much to
+increment an address by in order to make it point to the next
+character in a string. The compiler has not been ported to any
+configuration where this is not 1, but may be if 16-bit characters are
+used in the future.
+
+- FLONUM-SIZE: This is the ceiling of the ratio of the size of a C
+``double'' to the size of a C ``unsigned long''. It reflects how many
+Scheme units of memory (measured in Scheme objects) the data in a
+Scheme floating point object will take.
+
+- FLOAT-ALIGNMENT: This value defines the bit-alignment constraints
+for a C ``double''. It must be a multiple of scheme-object-width. If
+floating point values can only be stored at even long-word addresses,
+for example, this value should be twice scheme-object-width.
+
+- SIGNED-FIXNUM/UPPER-LIMIT: This parameter should be derived from
+others, but is specified as a constant due to a shortcoming of the
+compiler pre-processing system (EXPT is not constant-folded). Use the
+commented-out expression to derive the value for your port. All
+values that should be derived but are instead specified as constants
+are tagged by a comment containing ``***''.
+
+- STACK->MEMORY-OFFSET: This procedure is provided to accommodate
+stacks that grow in either direction, but we have not tested any port
+in which the stack grows towards larger addresses, because the CScheme
+interpreter imposes its own direction of growth. It should probably
+be copied verbatim.
+
+- EXECUTE-CACHE-SIZE: This should match EXECUTE_CACHE_ENTRY_SIZE in
+microcode/cmpint-port.h, and is explained in cmpint.txt.
+ ==> We should probably rename one or the other to be alike.
+
+The following parameters describe to the front-end the format of
+closures containing multiple entry points. Closures are described in
+some detail in cmpint.txt and in section 5.3.3. Very briefly, a
+closure is a procedure object that contains a code pointer and a set
+of free variable locations or values.
+
+- CLOSURE-OBJECT-FIRST-OFFSET: This procedure takes a single argument,
+the number of entry points in a closure object, and computes the
+distance in long-words between the first long-word in the closure
+object, and the first long-word containing a free variable. This is
+the number of long-words taken up by the closure object's header, and
+the code to represent N closure entry points.
+
+- CLOSURE-FIRST-OFFSET: This procedure takes two arguments, the number
+of entry points in a closure object, and the index of one of them, the
+first being zero. It computes the distance between that entry's
+environment pointer and the first free variable in the closure object.
+The entry's environment pointer will be the address of the entry point
+itself if closure entry points are always aligned on long-word
+boundaries, or the address of the first entry point if they are not.
+
+- CLOSURE-ENTRY-DISTANCE: This procedure is given the number of entry
+points in a closure object, and the indices for two of its entry
+points, and computes the number of bytes that separate the two entry
+points in the closure object. This distance should be a multiple of
+the parameter COMPILED_CLOSURE_ENTRY_SIZE described in cmpint.txt and
+defined in microcode/cmpint-port.h.
+\f
+- CLOSURE-ENVIRONMENT-ADJUSTMENT: This procedure takes two parameters,
+the number of entry points in a closure object, and the index of one
+of them. It computes the number of bytes that must be added to the
+entry point's address to result in the entry point's environment
+pointer. If entry points are always aligned on long-word boundaries,
+this number should always be zero, otherwise it should be the distance
+to the first (lowest addressed) entry point.
+
+The remaining code in machin.scm describes the register set of the
+architecture and defines the register conventions imposed by the port.
+These conventions must match the expectations of
+microcode/cmpaux-port.m4 described in cmpaux.txt.
+
+Machine registers are assigned a contiguous range of non-negative
+integers starting from zero. Typically symbolic names are given to
+each of these integers for use in some of the rules, especially those
+dealing with the assembly language interface.
+
+- NUMBER-OF-MACHINE-REGISTERS should be the number of machine registers,
+i.e. one greater than the number assigned to the last machine register.
+
+- NUMBER-OF-TEMPORARY-REGISTERS is the number of reserved memory
+locations used for storing the contents of spilled pseudo-registers.
+
+Liar requires certain fixed locations to hold various implementation
+quantities such as the stack pointer, the heap (free memory) pointer,
+the pointer to the runtime library and interpreter's ``register''
+array, and the dynamic link ``register''. Typically each of these
+locations is a fixed machine register. In addition, a processor
+register is typically reserved for returned values and another for
+holding a bit-mask used to clear type tags from objects (the pointer
+or datum mask). All of these registers should be given additional
+symbolic names.
+
+ ==> What is MACHINE-REGISTER-KNOWN-VALUE used for? It would seem that
+the datum mask is a known value, but... Currently all the ports seem
+to have the same definition.
+
+The contents of pseudo-registers are divided into various classes to
+allow some consistency checking. Some machine registers always
+contain values in a fixed class (e.g. floating point registers and the
+register holding the datum mask).
+
+- MACHINE-REGISTER-VALUE-CLASS is a procedure that maps a register to
+its inherent value class. The main value classes are
+value-class=object, value-class=address, and value-class=float.
+The registers allocated for the special implementation quantities have
+fixed value classes. The remaining registers, managed by the
+compiler's register allocator, may be generic (value-class=word) or
+allow only certain values to be stored in them (value-class=float,
+value-class=address, etc.).
+\f
+Most of the remainder of compiler/machines/port/machin.scm is a set of
+procedures that return and compare the port's chosen locations for
+various operations. Some of these operations are no longer used by
+the compiler, and reflect a previous reliance on the interpreter to
+accomplish certain environment operations. These operations are now
+handled by invoking the appropriate primitives rather than using
+special entry points in the runtime library for them. Under some
+compiler switch settings the older methods for handling these
+operations can be re-activated, but this never worked completely, and
+may no longer work at all.
+
+- RTL:MACHINE-REGISTER? should return a machine register for those
+special RTL registers that have been allocated to fixed registers, and
+false otherwise.
+
+- RTL:INTERPRETER-REGISTER? should return the long-word offset in the
+runtime library's memory ``register'' array for those special RTL
+registers not allocated to fixed registers, and false otherwise.
+
+- RTL:INTERPRETER-REGISTER->OFFSET errors when the special RTL
+register has not been allocated to a fixed register, and otherwise
+returns the long-word offset into the register array.
+
+- RTL:CONSTANT-COST is a procedure that computes some metric of how
+expensive is to generate a particular constant. If the constant is
+cheaply reconstructed, the register allocator may decide to flush it
+(rather than spill it to memory) and re-generate it the next time it
+is needed. The best estimate is the number of cycles that
+constructing the constant would take, but the number of bytes of
+instructions can be used instead.
+
+- COMPILER:OPEN-CODE-FLOATING-POINT-ARITHMETIC? and
+COMPILER:PRIMITIVES-WITH-NO-OPEN-CODING have been described in the
+section on compiler switches and parameters.
+\f
+ 4.3. LAPGEN files:
+
+The following files control the RTL -> LAP translation. They define
+the rules used by the pattern matcher to perform the translation, and
+procedures used by the register allocator and linearizer to connect
+the code that results from each rule. The rules, and how to write
+them, are described further in a later section.
+
+The rule set is partitioned into multiple subsets. This is not
+necessary, but makes re-compiling the compiler faster and reduces the
+memory requirements of the compiler. The partition can be done in a
+different way, but is probably best left as uniform as possible
+between the different ports to facilitate comparison and updating.
+
+The LAPGEN (RTL->LAP) rules are separated into two different data
+bases. The larger is the statement data base, used to translate whole
+RTL instructions. The smaller is the predicate data base, used to
+translate decisions to branch between the RTL basic blocks.
+
+* lapgen.scm:
+ This file does not define any rules, but provides a set of
+utilities for the back end. It provides utilities for the rules,
+typically procedures for generating code that manipulates the object
+representation, additional entry points to the register allocator that
+are better suited to the port, and the interface procedures for the
+register allocator and the linearizer.
+
+The following definitions constitute the register allocator interface
+and must be provided by lapgen.scm:
+
+- AVAILABLE-MACHINE-REGISTERS is a list of the RTL register numbers
+corresponding to those registers that the register allocator should
+manage. This should include all machine registers except those
+reserved by the port.
+
+- SORT-MACHINE-REGISTERS is a procedure that reorders a list of
+registers into the preferred allocation order.
+ ==> Is this right?
+
+- REGISTER-TYPE is a procedure that maps RTL register numbers to their
+inherent register types (typically GENERAL and FLOAT).
+
+- REGISTER-TYPES-COMPATIBLE? is a boolean procedure that decides
+whether two registers can hold the same range of values.
+
+- REGISTER-REFERENCE maps RTL register numbers into register
+references, i.e. pieces of assembly language used to refer to those
+registers.
+
+- REGISTER->REGISTER-TRANSFER issues code to copy the contents of one
+RTL register into another.
+
+- REFERENCE->REGISTER-TRANSFER issues code to copy the contents of a
+machine register described by its reference into a given RTL register.
+
+- PSEUDO-REGISTER-HOME maps RTL registers to a fragment of assembly
+language used to refer to the memory location into which they will be
+spilled if necessary. This is typically a location (or set of
+locations) in the Scheme ``register'' array.
+
+- HOME->REGISTER-TRANSFER generates code that copies the contents of
+an RTL register's home (its spill location) into a machine register.
+\f
+- REGISTER->HOME-TRANSFER generates code that copies the contents of
+an RTL register, currently held in a machine register, into its memory
+home.
+
+The following definitions constitute the linearizer interface, and
+must be provided by lapgen.scm:
+
+- LAP:MAKE-LABEL-STATEMENT generates an assembly language directive
+that defines the specified label.
+
+- LAP:MAKE-UNCONDITIONAL-BRANCH generates a fragment of assembly
+language used to unconditionally transfer control to the specified
+label.
+
+- LAP:MAKE-ENTRY-POINT generates a fragment of assembly language used
+to precede the root of the control flow graph. Its output should use
+the assembler directive ENTRY-POINT and generate format and GC words
+for the entry point.
+
+The rest of the code in lapgen.scm is a machine-specific set of utilities
+for the LAPGEN rules. Some of the more common procedures are
+described in the section that covers the rules.
+
+Of special interest are the utility procedures for manipulating
+pc-relative addresses and loads on RISC machines. RISC machines
+typically only have pc-relative branch instructions, but no
+pc-relative loads or pc-relative load-effective-address instructions.
+On the other hand, they usually have a branch-and-link instruction
+that performs a pc-relative branch and stores the return address in
+a processor register. This instruction can be used (by branching to
+the next instruction) to obtain its own address, and pc-relative
+addresses and loads can use them. The MIPS back end currently
+implements a simple pc-relative address caching scheme that attempts
+to reduce the number of such branches by re-using the values produced
+by previous branches if they are still available. This code can be
+suitably modified to work on most RISC architectures.
+
+* rules1.scm:
+ This file contains RTL statement rules for simple register assignments
+and operations. In particular, it contains the rules for constructing
+and destructuring Scheme objects, allocating storage, and memory <->
+register transfers.
+
+* rules2.scm:
+ This file contains RTL predicate rules for simple equality
+predicates (EQ-TEST, TYPE-TEST).
+
+* rules3.scm:
+ This file contains RTL statement rules for control-flow
+statements like continuation (return address) invocation, several
+mechanisms for invoking procedures, stack reformatting prior to
+invocation, procedure headers, closure object allocation, expression
+headers and declaring the data segment of compiled code blocks for
+assembly. See [1] for some background information on stack
+reformatting, and [2] for a discussion of how calls to (the values of)
+free variables are handled by Liar.
+
+* rules4.scm:
+ This file contains RTL statement rules for runtime library
+routines that handle manipulation of variables in first class
+environments. Many of these rules are no longer used by the compiler
+unless some switch settings are changed. See [2] for a discussion of
+how Liar handles references to free variables.
+\f
+* rulfix.scm:
+ This file contains statement and predicate rules for
+manipulating fixnums (small integers represented in immediate
+form). The rules handle tagging and de-tagging fixnum objects,
+arithmetic on them, comparison predicates, and overflow tests.
+
+* rulflo.scm:
+ This file contains statement and predicate rules for
+manipulating flonums (floating point data in boxed form). The rules
+handle boxing and un-boxing of flonums, arithmetic on them, and
+comparison predicates.
+
+ 4.4. Assembler files:
+
+* assmd.scm:
+ This file defines the following machine-dependent parameters
+and utilities for the bit-level assembler:
+
+- MAXIMUM-PADDING-LENGTH: If instructions are not always long-word
+aligned, the maximum distance in bits between the end of an
+instruction and the next (higher) long-word boundary.
+
+- PADDING-STRING: A bit-string used for padding the instruction block
+to a long-word boundary. If possible, it should encode a HALT or
+ILLEGAL instruction. The length of this bit-string should evenly
+divide maximum-padding-length.
+
+- BLOCK-OFFSET-WIDTH: This should be the size in bits of format_word
+described in cmpint.txt. It should be 16 for all byte-addressed
+machines where registers hold 32 bits.
+
+- MAXIMUM-BLOCK-OFFSET: The maximum byte offset that can be encoded in
+block-offset-width bits. This depends on the encoding described in
+cmpint.txt. The least significant bit is always used to indicate
+whether this block offset points to the start of the object or to
+another block offset, so the range may be smaller than the obvious
+value. Furthermore, if instruction alignment constraints are tighter
+than byte boundaries, this range may be larger. For example, if
+instructions always start on even long-word boundaries, the bottom two
+bits (always zero) are encoded implicitly, and the range is
+accordingly larger.
+
+- BLOCK-OFFSET->BIT-STRING: This procedure is given a byte offset and
+a boolean flag indicating whether this is the offset to the start of a
+compiled code block or to another block-offset, and returns the
+encoded value of this offset.
+
+- MAKE-NMV-HEADER: This procedure is given the size in long-words of a
+block of instructions, and constructs the non-marked-vector header
+that must precede the instructions in memory in order to prevent the
+garbage collector from examining the data as Scheme objects. This
+header is just an ``object'' whose type tag is manifest-nm-vector
+(TC_MANIFEST_NM_VECTOR in the microcode) and whose datum is the size
+in long-words (excluding the header itself).
+
+The following three parameters define how instruction fields are to be
+assembled in memory depending on the ``endianness'' (byte ordering) of
+the architecture. You should be able to use the MC68020 (big endian)
+or the Vax (little endian) version, or the MIPS version which is
+conditionalized for both possibilities since MIPS processors can be
+configured either way.
+\f
+- INSTRUCTION-INSERT! is a procedure, that given a bit-string encoding
+instruction fields, a larger bit-string into which the smaller should
+be inserted, a position within the larger one, and a continuation,
+inserts the smaller bit-string into the larger at the specified
+position, and returns the new bit position at which the immediately
+following instruction field should be inserted.
+
+- INSTRUCTION-INITIAL-POSITION is a procedure, that given a bit-string
+representing a segment of compiled code, returns the bit-string
+position at which instruction-insert! should insert the first
+instruction.
+
+- INSTRUCTION-APPEND is a procedure, that given the bit-string
+encoding successive (fields of) instructions, produces the bit-string
+that corresponds to their concatenation in the correct order.
+
+* coerce.scm:
+ This file defines a set of coercion procedures. These
+procedures are used to fill fields in instructions. Each coercion
+procedure checks the range of its argument and produces a bit string
+of the appropriate length encoding the argument. Most coercions will
+coerce their signed or unsigned argument into a bit string of the
+required fixed length. On some machines (e.g. HP PA), some coercions
+may permute the bits appropriately.
+
+* insmac.scm:
+ This file defines machine-specific syntax used in the assembler,
+and the procedure PARSE-INSTRUCTION, invoked by the syntax expander
+for DEFINE-INSTRUCTION to parse the body of each of the instruction
+rules. This code is typically complex and you are encouraged to
+emulate one of the existing ports in order to reuse its code.
+
+The following ports use the following syntax for describing
+instructions in machine language:
+
+- Spectrum and MIPS:
+ (LONG (<width 1> <value 1> <coercion type 1>)
+ (<width 2> <value 2> <coercion type 2>)
+ ...
+ (<width n> <value n> <coercion type n>))
+where all the widths must add up to an even multiple of 32.
+
+- Vax:
+Instruction descriptions are made of arbitrary sequences of the
+following field descriptors:
+ (BYTE (<width 1> <value 1> <coercion type 1>)
+ (<width 2> <value 2> <coercion type 2>)
+ ...
+ (<width n> <value n> <coercion type n>))
+ (OPERAND <size> <value>)
+ (DISPLACEMENT (<width> <value>))
+The total width of each of these field descriptors must add up to a
+multiple of 8.
+BYTE is used primarily for instruction opcodes.
+OPERAND is used for general addressing modes.
+DISPLACEMENT is used for PC-relative branch displacements.
+\f
+- MC68020:
+ (WORD (<width 1> <value 1> <coercion type 1> <size 1>)
+ (<width 2> <value 2> <coercion type 2> <size 2>)
+ ...
+ (<width n> <value n> <coercion type n> <size 3>))
+where all the widths must add up to an even multiple of 16.
+Size refers to immediate operands to be encoded in the instruction,
+and are omitted when irrelevant.
+
+A missing coercion type means that the ordinary unsigned coercion (for
+the corresponding number of bits) should be used.
+
+Additionally, each of these ports provides a syntax for specifying
+instructions whose final format must be determined by the
+branch-tensioning algorithm in the bit assembler. The syntax of these
+instructions is usually
+ (VARIABLE-WIDTH (<name> <expression>)
+ ((<low-1> <high-1>)
+ <instruction-specifier-1>)
+ ((<low-2> <high-2>)
+ <instruction-specifier-2>)
+ ...
+ ((() ())
+ <instruction-specifier-n>))
+
+Each instruction specifier is an ordinary (i.e. not VARIABLE-WIDTH)
+instruction specifier. NAME is a variable to be bound to the
+bit-assembly-time value of EXPRESSION. Each of the ranges
+<low-1>-<high-1> <low-2>-<high-2>, etc. must be properly nested in the
+next, and () specifies no bound. The final format chosen is that
+corresponding to the lowest numbered range containing the value of
+<expression>. Successive instruction specifiers must yield
+instructions of non-decreasing lengths for the branch tensioner to
+work correctly. The MC68020 port uses GROWING-WORD instead of
+VARIABLE-WIDTH as the keyword for this syntax.
+ ==> This should probably be changed.
+
+* inerly.scm:
+ This file provides alternative expanders for the
+machine-specific syntax. These alternative expanders are used when
+the assembly language that appears in the LAPGEN rules is assembled
+(early) at compiler pre-processing time. That is, the procedures
+defined in this file are only used if
+COMPILER:ENABLE-EXPANSION-DECLARATIONS? is set to true. If you reuse
+the code in insmac.scm from another port, you should be able to reuse
+the inerly.scm file from the same port. Alternatively, you can write
+a dummy version of this code and require
+COMPILER:ENABLE-EXPANSION-DECLARATIONS? to be always false. This
+switch defaults to false, currently. The Spectrum and MIPS versions
+currently have dummy versions of this code.
+
+* insutl.scm:
+ This file defines machine-specific rule qualifiers and
+transformers. It is often used to define addressing-mode filters and
+handling procedures for architectures with general addressing modes.
+This file does not exist in the Spectrum port because all the relevant
+code has been placed in instr1.scm, and the MIPS port has no
+machine-specific qualifiers and transformers. Qualifiers and
+transformers are described further in the chapter on the syntax of
+translation rules.
+\f
+* instr<n>.scm:
+ These files define the instruction set of the architecture by
+using the syntax defined in insmac.scm and inerly.scm. There can be
+as many of these files or as few as desired by whoever writes the
+assembler. They are usually split according to the size of the files
+or along the divisions in the architecture manual. Not all
+instructions in the architecture need to be listed here -- only those
+actually used by the back end in the LAPGEN rules and utility procedures.
+Privileged/supervisory instructions, BCD (binary coded decimal)
+instructions, COBOL-style EDIT instructions, etc., can probably be
+safely ignored.
+
+ 4.5. Disassembler files:
+
+ The disassembler is almost completely machine dependent. For
+many machines, a reasonable disassembler could be derived from the
+description of the instruction set used to assemble programs. The Vax
+disassembler, is essentially constructed this way. Unfortunately this
+has not been generalized, and currently each port has its own
+disassembler, often duplicating information contained in the
+assembler.
+
+The disassembler is not necessary for the operation of the compiler
+proper. It is, however, a good debugging tool. You can bring the
+compiler up without a disassembler by providing stubs for the
+procedures referenced in dassm2.
+
+* dassm1.scm:
+ This file contains the top-level of the disassembler. It is
+not machine-dependent, and should probably be moved to another directory.
+ ==> Is compiler/back the right place for this?
+
+* dassm2.scm:
+ This file contains various utilities for the disassembler. In
+particular, it contains the definitions of
+
+- COMPILED-CODE-BLOCK/BYTES-PER-OBJECT
+- COMPILED-CODE-BLOCK/OBJECTS-PER-PROCEDURE-CACHE
+- COMPILED-CODE-BLOCK/OBJECTS-PER-VARIABLE-CACHE
+ These parameters specify various relative sizes.
+ ==> Shouldn't these be in machin.scm? The first two have
+counterparts there, and the last is always 1.
+
+- DISASSEMBLER/READ-VARIABLE-CACHE
+- DISASSEMBLER/READ-PROCEDURE-CACHE
+ These procedures are used to extract free variable information from
+a linked compiled code block. Variable caches are maintained as
+native addresses (i.e. no tag bits), and procedure (execute) caches
+contain absolute jump instructions that must be decoded to extract the
+address of the called procedure. Appropriate type bits must be added
+to both values before they are returned.
+
+This file also contains a state machine that allows the disassembler
+to display data appearing in the instruction stream in an appropriate
+format (gc and format words, mainly), and heuristics for displaying
+addressing modes and PC-relative offsets in a more legible form.
+\f
+The output of the disassembler need not be identical to the input of
+the assembler. The disassembler is used almost exclusively for
+debugging, and additional syntactic hints make it easier to read.
+
+* dassm3.scm:
+ This file contains the code to disassemble one instruction at
+a time. It is completely machine dependent at the time, and any old
+way of doing it is fine.
+
+* dinstr<n>.scm:
+ In the VAX port, these are copies (or links) to the
+instr<n>.scm files. They are processed with a different syntax table
+to construct the disassembler tables instead of the assembler tables.
+
+* dsyn.scm:
+ In the VAX port, this file provides the alternative expansion
+of DEFINE-INSTRUCTION used to construct the disassembler tables
+instead of the assembler rule data base.
+\f
+ 5. All about rules
+
+There are three subsystems in Liar that use rule-based languages.
+They are the RTL simplifier, LAPGEN (RTL->LAP translation), and the
+assembler. The assembler need not be rule-based, but given the
+availability of the rule language, using the rule mechanism may be the
+easiest way to write it.
+
+ 5.1. Rule syntax
+
+The assembler rules use a somewhat different syntax from the rest and
+will be described later.
+
+The rest of the rules are defined in the following way:
+
+ (DEFINE-RULE <rule-database>
+ <rule pattern>
+ <qualifier> ; optional
+ <rule body>)
+
+* <rule-database> is an expression evaluating to a rule database.
+It should be one of STATEMENT, PREDICATE, or REWRITING.
+
+* <rule pattern> is a list that represents the pattern to match.
+Variables in the pattern are written by using the ``?'' syntax.
+For example,
+
+- (hello) matches the constant list (hello)
+
+- (? thing) matches anything, and THING is bound in <qualifier> and
+<rule body> to whatever was matched.
+
+- (hello (? person)) matches a list of two elements whose first
+element is the symbol HELLO, and whose second element can be anything.
+The variable PERSON will be bound in <qualifier> and <rule body> and
+will have as its value the second element of the list matched.
+Thus it would match (hello bill) and PERSON would be the symbol BILL,
+(hello (bill rozas)) would match and PERSON would be the list (BILL ROZAS).
+
+- (hello . (? person)) matches a list of one or more elements whose
+first element is the symbol HELLO. PERSON is bound to the rest of the
+list.
+Thus (hello my dog likes frankfurters) would match and PERSON would be
+(MY DOG LIKES FRANKFURTERS). (hello (my dog)) would match, and PERSON
+would be ((MY DOG)).
+
+Variable syntax is further described below.
+
+* <qualifier> is (QUALIFIER <expression>) where <expression> evaluates
+to a boolean and further filters matches. If the qualifier expression
+evaluates to false, the rule is not fired. Otherwise it is.
+For example,
+ (DEFINE-RULE <some database>
+ (multiple (? number) (? divisor))
+ (QUALIFIER (and (number? number)
+ (number? divisor)
+ (zero? (remainder number divisor))))
+ <rule body>)
+
+will match (MULTIPLE 14 7) and (MULTIPLE 36 4), but will not match
+(MULTIPLE FOO 3), (MULTIPLE 37 4), (MULTIPLE 2), (MULTIPLE 14 2 3),
+nor (HELLO 14 7).
+Rule qualifiers are optional.
+\f
+* <rule body> is an arbitrary Lisp expression whose value is the
+translation determined by the rule. It will typically use the
+variables bound by ``?'' to perform the translation. The statement
+and predicate rules use the LAP macro to generate sequences of
+assembly language instructions.
+
+The assembler rules use the following syntax:
+
+ (DEFINE-INSTRUCTION <opcode>
+ (<pattern1> <qualifier1> <body1>)
+ (<pattern2> <qualifier2> <body2>)
+ ...
+ )
+
+Where <opcode> is the name of the instruction, and the patterns will
+be matched against the cdr of lists whose car is <opcode>.
+The <patterns>, <qualifiers>, and <bodies> are as in the RTL rules,
+except that there are typically no qualifiers, and the bodies are
+typically written in a special syntax defined in
+compiler/machines/port/insmac.scm and described in section 4.4.
+
+For example,
+ (DEFINE-INSTRUCTION ADD
+ (((R (? target)) (R (? reg1)) (R (? reg2)))
+ (WORD (6 #x24)
+ (5 target)
+ (5 reg1)
+ (5 reg2)
+ (11 0)))
+ (((R (? target)) (R (? reg)) (& (? constant)))
+ (WORD (6 #x23)
+ (5 target)
+ (5 reg)
+ (16 constant SIGNED))))
+would match (ADD (R 1) (R 2) (R 3)) and (ADD (R 7) (R 22) (& 257)),
+firing the corresponding body.
+
+The bodies are defined in terms of the WORD syntax defined in
+insmac.scm, and the ``commas'' used with the pattern variables in the
+rule bodies are a consequence of the WORD syntax. The meaning of the
+commas is identical to the meaning of the commas in a ``backquote''
+Scheme expression, and is briefly described in section 5.3.1.
+\f
+ 5.2. Rule variable syntax.
+
+Although qualifiers and the simple variable syntax shown are
+sufficient, some additional variable syntax is available for common
+patterns. Moreover, the early matcher (used when
+COMPILER:ENABLE-EXPANSION-DECLARATIONS? is true) cannot currently
+handle qualifiers but can handle the additional variable syntax that
+can supplant most qualifiers. The early matcher is used only on the
+assembler rules, so if you want to use it, you only need to use the
+restricted language when writing those rules.
+
+The complete variable syntax is as follows:
+
+* (? <name>) This syntax matches anything in that position of the
+potential instance, and binds <name> to the sub-structure matched.
+
+* (? <name> <transform>) This syntax matches anything in that position
+of the potential instance as long as <transform> returns non-false on
+the sub-structure matched. <name> is bound to the result returned by
+<transform>. For example,
+ (? q (lambda (obj) (and (number? obj) (* 2 obj))))
+will match 2, and Q will be bound to 4, but will not match FOO.
+
+* (? <name1> <transform> <name2>) <name1> and <transform> have the same
+meaning as in the previous syntax, and this syntax matches exactly the
+same objects, but provides the additional convenience of binding
+<name2> to the sub-structure matched, before the transformation.
+For example,
+ (? q (lambda (obj)
+ (and (pair? obj)
+ (number? (car obj))
+ (- (car obj) 23)))
+ z)
+will match (2 . HELLO), Q will be bound to -21, and Z will be bound to
+(2 . HELLO), and will not match 34 or (HELLO . 2).
+
+ ==> The pattern parser seems to understand (?@ <name>) as well, but
+this syntax is used nowhere. The early parser does not understand it.
+Should it be flushed?
+\f
+ 5.3. Writing statement rules.
+
+Statement rules provide the translation between RTL instructions and
+fragments of assembly language. Most RTL instructions are
+assignments, where an RTL register is written with the contents of a
+virtual location or the result of some operation.
+
+ 5.3.1. Output of the statement rules
+
+The output of the statement rules is a fragment of assembly language
+written in the syntax expected by the LAP assembler. The fragments,
+containing any number of machine instructions, are constructed by
+using the LAP macro, built on top of Scheme's QUASIQUOTE (back-quote).
+Within a LAP form, you can use UNQUOTE (comma) and UNQUOTE-SPLICING
+(comma at-sign) to tag subexpressions that should be evaluated and
+appended. For example,
+ (LAP (MOV L ,r1 ,r2)
+ (ADD L ,r3 ,r2))
+constructs a fragment with two instructions in it where the values of
+r1, r2, and r3 are substituted in the instructions.
+The code
+ (LAP (MOV L ,r1 ,r2)
+ ,@(generate-test r2))
+constructs a fragment whose first instruction is a MOV instruction,
+and the rest is the fragment returned by generate-test.
+
+The INST macro is similar to LAP but constructs a single instruction.
+It should not be used unless necessary (i.e. in
+LAP:MAKE-LABEL-STATEMENT), since you may find yourself later wanting
+to change a single instruction into a fragment in a utility procedure,
+and having to find every use of the procedure.
+
+ ==> We should change the linearizer to expect
+LAP:MAKE-LABEL-STATEMENT to return a fragment, and do away with INST.
+
+An additional macro, INST-EA, is provided to construct a piece of
+assembly language representing an addressing mode. For example,
+INST-EA is used by the following procedure in the Vax back-end:
+
+ (define (non-pointer->ea type datum)
+ (if (and (zero? type)
+ (<= 0 datum 63))
+ (INST-EA (S ,datum))
+ (INST-EA (&U ,(make-non-pointer-literal type datum)))))
+
+where non-pointer->ea may be used in
+
+ (LAP (MOV L ,(non-pointer->ea <type> <datum>)
+ ,(any-register-reference target)))
+
+INST-EA is superfluous on machines without general addressing modes
+(i.e. load-store architectures).
+
+Each port provides a procedure, named REGISTER-REFERENCE, that maps
+between RTL machine registers and the assembly language syntax used to
+refer to the registers. It uses INST-EA to build such references.
+
+The macros LAP, INST, and INST-EA, besides providing the functionality
+of QUASIQUOTE, also provide a hook for the compiler pre-processing
+time assembly of the code generated by the rules.
+\f
+ 5.3.2. Hardware register allocation
+
+Hardware register allocation occurs during the RTL->LAP translation.
+The rules, besides generating assembly language, invoke utilities
+provided by the register allocator to reserve and free hardware
+registers on which the operations can be performed.
+
+Hardware registers are often divided into different non-overlapping
+types that are used in different operations. For example, modern
+hardware typically has a set of integer registers and a set of
+floating point registers. Address operations typically require
+operands in integer registers, while floating point operations
+typically require floating point registers. On some machines, notably
+the Motorola 68K family, the integer register set is further
+subdivided into types with specific operations (address and data).
+
+The register allocator manipulates RTL registers. RTL registers are
+just small integers. The low end of the valid range of RTL registers
+is used to represent the physical registers of the processor (called
+machine registers), and the rest of the numbers represent virtual
+(pseudo) registers. The core allocator operations are given an RTL
+register number and a register type, and return a suitable machine
+register to be used for the operation.
+
+A machine register that holds the value of a pseudo register is called
+an ``alias'' for the pseudo register. A pseudo register may have many
+valid aliases simultaneously, usually of different types. Any
+assignment to the pseudo register will invalidate all aliases but one,
+namely the machine register actually written, rather than copy the new
+value into all the previous aliases. Thus source references and
+destination references have different effects, and are handled by
+different procedures in the register allocator.
+
+Pseudo registers have associated homes, memory locations that hold
+their values when the machine registers are needed for other purposes.
+Most pseudo registers are never written to their homes, since a pseudo
+register's value is usually kept in machine register aliases until the
+pseudo register is dead, i.e. until its value is no longer needed. A
+pseudo register's aliases can be reused for other purposes if there
+are other remaining aliases or this is the last reference to the
+pseudo register. An alias that can be reused is a ``reusable'' alias.
+Occasionally, the value of a pseudo register may be transferred to the
+register's home and the last alias invalidated, if the register
+allocator is running out of registers. This is called ``spilling'' a
+register.
+
+The register allocator maintains a table of associations, called the
+``register map'', that associates each pseudo register with its valid
+aliases, and each machine register with the pseudo register whose
+value it holds (if any). The register allocator routines modify the
+register map after aliases are requested and invalidated, and they
+generate assembly language instructions to perform the necessary data
+motion for spilling and re-loading at run time. These instructions
+are usually inserted before the code output of the RTL rule in
+execution.
+
+If you have chosen your RTL register numbers for machine registers so
+that they match the hardware numbers, and your assembly language does
+not distinguish between references to a register and other fields, you
+can ignore register references and use the RTL register numbers
+directly. This is commonly the case when using integer registers in
+load-store architectures.
+\f
+As a convenience, the register allocator also provides operations that
+manipulate register references. A register reference is a fragment of
+assembly language, typically a register addressing mode for general
+register machines, that when inserted into a LAP instruction, denotes
+the appropriate register. For example, on the Motorola MC68020,
+physical register A3 is represented as RTL register number 11, and a
+register reference for it would be ``(A 3)''. RTL pseudo register 44
+may at some point have RTL machine register 11 as its only
+address-register alias. At that time, (REGISTER-ALIAS 44 'ADDRESS)
+would return 11.
+
+The interface to the register allocator is defined in
+compiler/back/lapgn2.scm. Not all ports use all of the procedures
+defined there. Often a smaller subset is sufficient depending on
+whether there are general addressing modes, etc. A list of the most
+frequently used follows:
+
+* REGISTER-ALIAS expects an RTL register and a register type, and
+returns a machine register of the specified type that is a valid alias
+for that RTL register if there is one, or false if there is none.
+This procedure should only be used for source operand RTL registers.
+If the register type is false, then REGISTER-ALIAS will return any
+valid alias.
+
+* LOAD-ALIAS-REGISTER! is like REGISTER-ALIAS but always returns a
+machine register, allocating one of the specified type if necessary.
+This procedure should only be used for source operand RTL registers.
+
+* REFERENCE-ALIAS-REGISTER! performs the same action as
+LOAD-ALIAS-REGISTER! but returns a register reference instead of an
+RTL register number.
+
+* ALLOCATE-ALIAS-REGISTER! expects an RTL register and a register
+type, and returns a machine register of the specified type that is the
+only alias for the RTL register and should be written with the new
+contents of the RTL register. ALLOCATE-ALIAS-REGISTER! is used to
+generate aliases for target RTL registers.
+
+* REFERENCE-TARGET-ALIAS! performs the same action as
+ALLOCATE-ALIAS-REGISTER! but returns a register reference instead of
+an RTL register number. See CLEAR-REGISTERS! below.
+
+* STANDARD-REGISTER-REFERENCE expects an RTL register, a register
+type, and a boolean. It will return a reference for an alias of the
+specified register containing the current value of the RTL register.
+This reference will be of the specified type if the boolean is false,
+or sometimes of other types if the boolean is true. In other words,
+the boolean argument determines whether other types are acceptable,
+although not desirable. The register type may be false, specifying
+that there really is no preference for the type, and any reference is
+valid. STANDARD-REGISTER-REFERENCE should be used only for source
+operands (i.e. those that already contain data), and may return a
+memory reference for those machines with general addressing modes if
+there is no preferred type and alternates are acceptable.
+
+* MOVE-TO-ALIAS-REGISTER! expects a source RTL register, a register
+type, and a target RTL register. It returns a new alias for the
+target of the specified type containing a copy of the current contents
+of the source. Often this is accomplished by choosing an alias of the
+source that already contains the correct data and making it the only
+alias for target. MOVE-TO-ALIAS-REGISTER! attempts to reuse an alias
+for the source register.
+\f
+* MOVE-TO-TEMPORARY-REGISTER! expects a source RTL register and a
+register type and returns an appropriate register containing a copy of
+the source. The register is intended for temporary use, that is, use
+only within the code generated by the expansion of the current RTL
+instruction, and as such it should not be permanently recorded in the
+register map. The register becomes automatically freed for subsequent
+RTL instructions. MOVE-TO-TEMPORARY-REGISTER! attempts to reuse an
+alias for the source register.
+
+* REUSE-PSEUDO-REGISTER-ALIAS! expects an RTL register, a register
+type, and two procedures. It attempts to find a reusable alias for
+the RTL register of the specified type, and invokes the first
+procedure giving it the alias if it succeeds, or the second procedure
+with no arguments if it fails. MOVE-TO-ALIAS-REGISTER! and
+MOVE-TO-TEMPORARY-REGISTER! use REUSE-PSEUDO-REGISTER-ALIAS! but
+occasionally neither meets the requirements.
+
+* NEED-REGISTER! expects an RTL machine register and informs the
+register allocator that the rule in use requires that register so it
+should not be available for subsequent requests while translating the
+current RTL instruction. The register is available for later RTL
+instructions unless the relevant rules invoke NEED-REGISTER! again.
+The procedures described above that allocate and assign aliases and
+temporary registers call NEED-REGISTER! behind the scenes, but you will
+need to invoke it explicitly when calling out-of-line routines.
+
+* LOAD-MACHINE-REGISTER! expects an RTL register and an RTL machine
+register and generates code that copies the current value of the RTL
+register to the machine register. It is used to pass arguments in
+fixed registers to out-of-line code, typically in the compiled code
+runtime library.
+
+* ADD-PSEUDO-REGISTER-ALIAS! expects an RTL pseudo-register and an
+available machine register (no longer an alias), and makes the
+specified machine register an alias for the pseudo-register.
+
+* CLEAR-REGISTERS! expects any number of RTL registers and clears them
+from the register map, preserving their current contents in memory if
+needed. It returns the code that will perform the required motion at
+runtime. It should be used before invoking LOAD-MACHINE-REGISTER! to
+ensure that the potentially valid previous contents of the machine
+register have been saved.
+
+* CLEAR-MAP! deletes all aliases from the register map, pushing the
+data only held in aliases into the memory homes if needed. This
+procedure returns an assembly language code fragment, and is typically
+used before invoking out-of-line code.
+
+* DELETE-DEAD-REGISTERS! informs the register allocator that RTL
+pseudo registers whose contents will not be needed after the current
+RTL instruction can be eliminated from the register map and their
+aliases subsequently used for other purposes.
+\f
+Most of the rules are actually written in terms of machine-specific
+procedures that invoke the procedures listed above in fixed ways.
+Rule bodies typically match the following code pattern:
+
+ (let* ((rs1 (standard-source source1))
+ (rs2 (standard-source source2))
+ (rt (standard-target target)))
+ (LAP ...))
+
+where STANDARD-SOURCE and STANDARD-TARGET are machine-specific
+procedures. The reason for the use of LET* (instead of LET) is given
+below.
+
+On a machine with general addressing modes and memory operands, we
+might provide their definitions as follows:
+
+ (define (standard-source rtl-reg)
+ (standard-register-reference rtl-reg 'GENERAL true))
+ (define (standard-target rtl-reg)
+ (delete-dead-registers!)
+ (reference-target-alias! rtl-reg 'GENERAL))
+
+while on a load-store architecture we might define them as follows:
+
+ (define (standard-source rtl-reg)
+ (load-alias-register! rtl-reg 'GENERAL))
+ (define (standard-target rtl-reg)
+ (delete-dead-registers!)
+ (allocate-alias-register! rtl-reg 'GENERAL))
+
+ - VERY IMPORTANT: -
+
+This example brings up the cardinal rule of RTL assignments:
+
+ Any rule that writes into an RTL pseudo-register MUST invoke
+ DELETE-DEAD-REGISTERS! after allocating aliases for the necessary
+ sources but before allocating an alias for the target.
+
+If this is not done, the register allocator may decide to spill
+no-longer valid data into memory, which will probably make the
+compiler get confused in other ways or cause garbage collection
+problems later. If it is done too early, the last valid alias for a
+source operand may have been reused in the interim, and the compiler
+will assume that the source quantity is contained in memory and will
+often generate code that fetches and operates on garbage.
+
+The example above uses LET* instead of LET. LET would not work in the
+above example because Scheme does not specify the order of argument
+evaluation, and Liar chooses arbitrary orders, so the
+DELETE-DEAD-REGISTERS! implicit in STANDARD-TARGET might be called too
+early possibly causing STANDARD-SOURCE to fail.
+
+MOVE-TO-ALIAS-REGISTER! invokes DELETE-DEAD-REGISTERS! because it
+simultaneously allocates an alias for a source and for a target.
+Thus, if there are other source operands, their aliases must be
+allocated before MOVE-TO-ALIAS-REGISTER! is invoked.
+\f
+ 5.3.3. Invocation rules, etc.
+
+The meaning and intent of most statement rules in an existing port is
+readily apparent. The more arcane rules have to do with procedures
+and the representation of numbers. What follows is a description of
+some of the more obscure rules related to procedures and some of the
+implementation concepts required to understand them.
+
+In the invocation rules, FRAME-SIZE is the number of arguments passed
+in the call (often plus one), and there is often more than one rule
+with the same keyword, typically to handle the common cases (small
+FRAME-SIZE) more efficiently.
+
+Various of the rules specify the number of arguments that the
+resulting procedure will accept. The range is described in terms of
+two parameters, MIN and MAX:
+ - MIN is always positive and it is one greater than the smallest
+number of arguments allowed.
+ - MAX may be positive or negative. If positive, it is one greater
+than the largest number of arguments allowed. If negative, it
+indicates that the procedure will accept an unbounded number of
+arguments, and the absolute value of MAX, minus (MIN + 1), is the
+number of positional optional parameters. Either way, the absolute
+value of MAX is the size of the procedure's call frame counting the
+procedure itself.
+ These two values are encoded in the format word of the resulting
+procedures so that dynamic APPLY can check the number of arguments
+passed and reformat the stack frame appropriately.
+ Non-positive MINs are used to indicate that the compiled entry point
+is not a procedure, but a return address, a compiled expression, or a
+pointer to an internal label.
+
+The CONS-CLOSURE rules will dynamically create some instructions in
+the runtime heap, and these instructions must be visible to the
+processor's instruction fetch unit. If the instruction and data
+caches are not automatically kept consistent by the hardware,
+especially for newly addressed memory, the caches must be explicitly
+synchronized by the Scheme system. On machines where the programmer
+is given no control over the caches, this will be very hard to do. On
+machines where the control is minimal or flushing is expensive, the
+following solution can be used to amortize the cost:
+
+The CONS-CLOSURE rules can generate code to allocate a closure from a
+pre-allocated pool and invoke an out-of-line routine to refill the
+pool when it is empty. The routine allocates more space from the
+heap, initializes the instructions, and synchronizes the caches.
+
+Since the real entry points are not known until the closure objects
+are created, instead of using absolute jumps to the real entry points,
+the pre-allocated closures can contain jumps to a fixed routine that
+will extract the real entry point from the word pointed at by the
+return address and invoke it. In other words, the code inserted in
+closure objects will be
+
+ jsr fixed-routine
+ <storage for real-entry-point>
+
+and fixed-routine, written in assembly language, will do something like
+
+ load 0(return-address),rtemp
+ jmp 0(rtemp)
+
+The 68040 version of the Motorola 68000 family port uses this trick
+because the 68040 cache is typically configured in copyback mode, and
+synchronizing the caches involves an expensive supervisor (OS) call.
+The Alpha back-end also uses this trick because the caches can be
+synchronized only by using the CALL_PAL IMB instruction, which flushes
+the complete instruction cache, therefore implying a large re-start
+cost. The Alpha version of this code is currently better than the
+68040 version, so you should probably emulate that version.
+\f
+* (INVOCATION:UUO-LINK (? frame-size) (? continuation) (? name))
+ This rule is used to invoke a procedure named by a free variable.
+It is the rule used to generate a branch to an execute cache as
+described in cmpint.txt. The rule should allocate a new execute cache
+in the compiled code block by using FREE-UUO-LINK-LABEL, and should
+then branch to the instruction portion of the execute cache.
+FRAME-SIZE is the number of arguments passed in the call, plus one.
+
+* (INVOCATION:GLOBAL-LINK (? frame-size) (? continuation) (? name))
+ This rule is identical to the previous one, except that the free
+variable must be looked up in the global environment. It is used to
+improve the expansion of some macros that insert explicit references
+to the global environment (e.g. The expansion for FLUID-LET inserts
+uses (ACCESS DYNAMIC-WIND #f) as the operator of a call).
+
+* (INVOCATION-PREFIX:MOVE-FRAME-UP (? frame-size) (? address))
+ This rule is used to shift call frames on the stack to maintain
+proper tail recursion. ADDRESS specifies where to start pushing the
+frame. It should be a pointer into the used portion of the stack,
+i.e. point to a higher address.
+
+For example, assume that what follows depicts the stack before
+ (INVOCATION-PREFIX:MOVE-FRAME-UP 3 addr)
+
+ | ... |
+ | |
+ +-------------------------------+
+ | <value n> |
+addr -> +-------------------------------+
+ | | direction of
+ | | stack growth
+ | |
+ | ... | |
+ | | |
+ | | V
+ | |
+ +-------------------------------+
+ | <value 3> |
+ +-------------------------------+
+ | <value 2> |
+ +-------------------------------+
+ | <value 1> |
+spbf -> +-------------------------------+
+
+Where spbf is the contents of the stack pointer register.
+After the invocation prefix, it will look as follows:
+
+ | ... |
+ | |
+ +-------------------------------+
+ | <value n> | direction of
+addr -> +-------------------------------+ stack growth
+ | <value 3> |
+ +-------------------------------+ |
+ | <value 2> | |
+ +-------------------------------+ V
+ | <value 1> |
+spaf -> +-------------------------------+
+
+The stack pointer register will now contain the value of spaf.
+\f
+* (INVOCATION-PREFIX:DYNAMIC-LINK (? frame-size) (? address-1) (? address-2))
+ This rule is similar to the INVOCATION-PREFIX:MOVE-FRAME-UP rule,
+but is used when the destination of the frame is not known at compile
+time. The destination depends on the continuation in effect at the
+time of the call, and the section of the stack that contains enclosing
+environment frames for the called procedure. Two addresses are
+specified and the one that is closest to the current stack pointer
+should be used, that is, the target address is the numerically smaller
+of the two addresses since the Liar stack grows towards smaller
+addresses.
+ ==> This rule need not need not exist in the RTL. It could be
+expanded into a comparison and a use of
+INVOCATION-PREFIX:MOVE-FRAME-UP with a computed address.
+
+* (ASSIGN (REGISTER (? target))
+ (CONS-CLOSURE (ENTRY:PROCEDURE (? procedure-label))
+ (? min) (? max) (? size)))
+ This rule issues the code to create a closure object whose real
+entry point is PROCEDURE-LABEL, that will accept a number of arguments
+specified by MIN and MAX, and that will have storage for SIZE free
+variables. The free variable storage need not be initialized since it
+will be written immediately by subsequent RTL instructions. The entry
+point of the resulting closure object should be written to RTL
+register TARGET. The format of closure objects is described in
+cmpint.txt.
+
+* (ASSIGN (REGISTER (? target))
+ (CONS-MULTICLOSURE (? nentries) (? size) (? entries)))
+ This rule is similar to the previous rule, but issues code to
+allocate a closure object with NENTRIES entry points. ENTRIES is a
+vector of entry-point descriptors, each being a list containing a
+label, a min, and a max as in the previous rule. TARGET receives the
+compiled code object corresponding to the first entry.
+
+* (OPEN-PROCEDURE-HEADER (? label-name))
+ This rule and its siblings are used to generate the entry code to
+procedures and return addresses. On entry to procedures and
+continuations, a gc/interrupt check is performed, and the appropriate
+routine in the runtime library is invoked if necessary. This check is
+performed by comparing the memory Free pointer to the compiled code's
+version of the MemTop pointer. The low-level interrupt handlers
+change the MemTop pointer to guarantee that such comparisons will fail
+in the future. A standard header generates the following code:
+
+ (LABEL gc-label)
+ <code to invoke the runtime library>
+ <format and gc words for the entry point>
+ (LABEL label-name)
+ <branch to gc-label if Free >= MemTop>
+
+Each kind of header invokes a different runtime library utility. In
+addition, procedures that expect dynamic links must guarantee that the
+dynamic link is preserved around the execution of the interrupt
+handler. This is accomplished by passing the contents of the dynamic
+link register to the appropriate runtime library utility.
+
+* (CLOSURE-HEADER (? label-name) (? nentries) (? num-entry))
+ NENTRIES is the number of entry points in the closure object, and
+NUM-ENTRY is the zero-based index for this entry point. Closure
+headers are similar to other procedure headers but also have to
+complete the Hand-Shake initiated by the instructions stored in the
+closure objects so that the closure object appears on top of the
+stack. On architectures where it is necessary, they also have to map
+closure objects to their canonical representatives, and back when
+backing out because of interrupts or garbage collection.
+\f
+The file compiler/machines/port/rules3.scm contains most of these
+procedure-related rules. It also contains three procedures that
+generate assembly language and are required by the compiler. These
+procedures are used to generate initialization code for compiled code
+blocks.
+
+Compiled code blocks have two sections, a code section that contains
+the instructions, and a ``constants'' section that contains scheme
+objects referenced by the code (e.g. quoted lists and symbols), the
+free variable caches for the code, the debugging information
+descriptor (more on this later), and the environment where the free
+variables in the code must be referenced. This environment is not
+known at compile time, so the compiler allocates a slot in the
+constants section for it, but the code itself must store it on first
+entry. In addition, the linker is invoked on first entry to look up
+the free variables and fill the variable caches with their correct
+contents. The compiler allocates enough space for each free variable
+cache and initializes the space with the information required by the
+linker to patch the reference. This information consists of the name
+of the free variable in addition to the number of actual arguments
+passed (plus one) for execute references.
+
+If COMPILER:COMPILE-BY-PROCEDURES? is true, the compiler will generate
+multiple compiled code blocks, one corresponding to each top-level
+lambda expression. Each of these must be initialized and linked, but
+instead of initializing them on first entry, the root compiled code
+block links all of them when it is entered.
+
+The linker (a runtime library utility) expects three arguments:
+ The address of the first word of the compiled code block, the word
+containing the GC vector header for the compiled code block.
+ The address of the first linker section in the constants area of the
+compiled code block. The linker sections contain the free variable
+caches and are all contiguous.
+ The number of linker sections in the compiled code block.
+
+* (GENERATE/QUOTATION-HEADER env-label free-label n-sections)
+ This procedure generates the code that initializes the environment
+slot at location labeled ENV-LABEL. The environment is fetched from
+the interpreter's environment register. It also generates code to
+invoke the linker on the executing compiled code block. The first
+word of the compiled code block is labeled by the value of
+*BLOCK-LABEL*, the first linker section is labeled by FREE-LABEL, and
+the number of linker sections is N-SECTIONS.
+
+* (GENERATE/REMOTE-LINK label env-offset free-offset n-sections)
+ This procedure is similar to GENERATE/QUOTATION-HEADER but is used
+to generate code that initializes and links a different compiled code
+block. It is used to generate the code to insert into the root
+compiled code block to link each of the other compiled code blocks
+generated when COMPILER:COMPILE-BY-PROCEDURES? is true.
+ LABEL is a label in current block's constant section where the
+pointer to the code block to be linked is stored,
+ ENV-OFFSET is the vector offset in the other code block where the
+environment of evaluation should be stored,
+ FREE-OFFSET is the vector offset of the first linker section in the
+other compiled code block, and
+ N-SECTIONS is the number of linker sections in the other block.
+\f
+* (GENERATE/CONSTANTS-BLOCK consts reads writes execs global-execs statics)
+ This procedure generates the assembler pseudo-ops used to generate
+the constants and linker section for a compiled code block. This
+section consists of:
+ - The constant objects (e.g. quoted lists) referenced by the code.
+ - The read variable caches used by the code.
+ - The write variable caches used by the code.
+ - The execute variable caches used by the code.
+ - The global execute variable caches used by the code.
+ - The locations for static variables.
+ - A slot for the debugging information descriptor generated by the
+compiler.
+ - A slot for the environment where the code is linked.
+
+Each word of storage in the constants block is allocated by using a
+SCHEME-OBJECT assembler pseudo-op, and the order in which they appear
+is the same as the order in which they appear in the final object.
+The linker sections (free variable cache sections) must be contiguous,
+and each has a one-word header describing the kind of section and its
+length. The environment slot must be the last word in the compiled
+code block, immediately preceded by the debugging information
+descriptor. Each SCHEME-OBJECT directive takes a label and the
+initial contents of the slot.
+
+This procedure is almost machine-independent, and you should be able
+to trivially modify an existing version. The only machine dependence
+is the layout and size of the storage allocated for each execute cache
+(uuo link). This machine-dependence consists entirely of the
+definition of the TRANSMOGRIFLY procedure.
+
+TRANSMOGRIFLY takes a list of the following form:
+
+ ((free-variable-1 (frame-size-1-1 . label-1-1)
+ (frame-size-1-2 . label-1-2)
+ ...)
+ (free-variable-2 (frame-size-2-1 . label-2-1)
+ (frame-size-2-2 . label-2-2)
+ ...)
+ ...)
+
+This list is interpreted as follows: an execute cache for calls to
+FREE-VARIABLE-1 with frame size FRAME-SIZE-1-1 (number of arguments
+plus one) must be created, and labeled LABEL-1-1, similarly for
+ <FREE-VARIABLE-1, FRAME-SIZE-1-2, LABEL-1-2>,
+ <FREE-VARIABLE-2, FRAME-SIZE-2-1, LABEL-2-1>, etc.
+\f
+Assuming that the initial layout of an execute cache is
+
+ free variable name ; labeled word
+ false ; optional storage (e.g. for branch delay slot)
+ frame size of call ; arity + 1
+
+TRANSMOGRIFLY will return a list of the following form:
+
+ ((frame-variable-1 label-1-1)
+ (#f dummy-label-1-1) ; optional word(s)
+ (frame-size-1-1 dummy-label-1-1)
+
+ (frame-variable-1 label-1-2)
+ (#f dummy-label-1-2) ; optional word(s)
+ (frame-size-1-2 dummy-label-1-2)
+
+ ...
+
+ (frame-variable-2 label-2-1)
+ (#f dummy-label-2-1) ; optional word(s)
+ (frame-size-2-1 dummy-label-2-1)
+
+ ...)
+
+There may be any number of optional words, but the layout must match
+that expected by the macros defined in microcode/cmpint-port.h. In
+particular, the length in longwords must match the definition of
+EXECUTE_CACHE_ENTRY_SIZE in microcode/cmpint-port.h, and the definition
+of EXECUTE-CACHE-SIZE in compiler/machines/port/machin.scm.
+
+Furthermore, the instructions that the linker will insert should
+appear at the word labeled by LABEL-N-M, and should not overwrite the
+relevant part of FRAME-SIZE-N-M, since the frame size will be needed
+when re-linking after an incremental definition or assignment.
+
+The output format of TRANSMOGRIFLY is the input format for the read
+and write execute cache sections. The procedure DECLARE-CONSTANTS,
+local to GENERATE/CONSTANTS-BLOCK, reformats such lists into the final
+SCHEME-OBJECT directives and tacks on the appropriate linkage section
+headers.
+\f
+ 5.3.4. Fixnum rules.
+
+Scheme's generic arithmetic primitives cannot be open-coded fully for
+space reasons. Most Scheme code that manipulates numbers manipulates
+small integers used as counters, vector indices, etc., and using
+out-of-line arithmetic procedures to operate on them would make the
+code too slow. The compromise, therefore, is to open-code the common
+small integer case, and to handle the rest out of line. This, of
+course, does not perform particularly well for the other common case
+of floating point data.
+
+Scheme integers are represented in two formats. The most common,
+fixnum representation, uses the datum field of the objects to directly
+encode the values. The other format, bignum representation, stores
+the values in multiple words in memory, and the datum is a pointer to
+this storage. Scheme generic arithmetic procedures will generate
+fixnums whenever possible, resorting to bignums when the value exceeds
+the range that can be represented in fixnum format.
+
+Since the open-codings provided for the compiler only handle fixnums,
+these open-codings must also detect when the result will not fit in a
+fixnum in order to invoke the out-of-line utility that will handle
+them correctly.
+
+Most hardware provides facilities for detecting and branching if an
+integer operation overflows. Fixnums cannot use these facilities
+directly, because of the tag bits at the high-end of the word. To be
+able to use these facilities (and get the sign bit in the right
+place), Scheme fixnums are converted to an internal format before they
+are operated on, and converted back to Scheme object format before
+storing them in memory or returning them as values.
+
+In this internal format, the value has been shifted left so that the
+fixnum sign-bit coincides with the integer sign bit, and a number of
+bits in the least-significant end of the word hold zeros. The shift
+amount is the length of the type-tag field.
+
+The rules
+ (ASSIGN (REGISTER (? target)) (OBJECT->FIXNUM (REGISTER (? source))))
+and
+ (ASSIGN (REGISTER (? target)) (FIXNUM->OBJECT (REGISTER (? source))))
+perform this translation.
+
+The open-coding of fixnum arithmetic assumes that the sources and the
+result are in this format. This format is good for value comparisons,
+addition, subtraction, and bitwise logical operations, but must be
+transformed for multiplication, division, and shifting operations.
+
+In addition to open-coding fixnum operations within generic
+arithmetic, fixnum primitives can be invoked directly, and the code
+can be open coded as well. Under these circumstances, the result will
+not be checked for overflow, and the code generated can be quite
+different. The RTL instructions that perform fixnum arithmetic have a
+boolean flag that specifies whether overflow conditions should be
+generated or not.
+\f
+The compiler does not generally require fixnum arithmetic to be open
+coded. If the names of all the fixnum primitives are listed in
+COMPILER:PRIMITIVES-WITH-NO-OPEN-CODING, all of them will be handled
+by issuing code to invoke them out of line.
+
+There is one exception to this, however. The following rules MUST be
+provided:
+
+ (ASSIGN (REGISTER (? target))
+ (FIXNUM-2-ARGS MULTIPLY-FIXNUM
+ (OBJECT->FIXNUM (CONSTANT 4))
+ (OBJECT->FIXNUM (REGISTER (? source)))
+ #F))
+
+ (ASSIGN (REGISTER (? target))
+ (FIXNUM-2-ARGS MULTIPLY-FIXNUM
+ (OBJECT->FIXNUM (REGISTER (? source)))
+ (OBJECT->FIXNUM (CONSTANT 4))
+ #F))
+
+The reason is that VECTOR-REF and VECTOR-SET! translate into a
+sequence that uses these patterns when the index is not a compile-time
+constant. Of course, you can include VECTOR-REF and VECTOR-SET! in
+compiler:PRIMITIVES-WITH-NO-OPEN-CODING to avoid the problem
+altogether, but this is probably not advisable.
+\f
+ 5.3.5. Rules used to invoke the runtime library
+
+Some of the rules issue code that invokes the runtime library. The
+runtime library is invoked through a primary entry point,
+SCHEME-TO-INTERFACE, typically directly accessible through a dedicated
+processor register. SCHEME-TO-INTERFACE expects at least one and up
+to five arguments. The first argument is the index of the runtime
+library service to invoke, and the rest are the parameters to the
+service routine. These arguments are passed in fixed locations,
+typically registers. Runtime library utilities return their values
+(if any) in the compiler's value register. The following is a typical
+example of such an invocation where INVOKE-INTERFACE expects the index
+of a utility, and generates the code that writes the index into the
+appropriate location and jumps to SCHEME-TO-INTERFACE.
+
+ (define-rule statement
+ (INVOCATION:APPLY (? frame-size) (? continuation))
+ (LAP ,@(clear-map!)
+ ,@(load-rn frame-size 2)
+ (MOV L (@R+ 14) (R 1))
+ ,@(invoke-interface code:compiler-apply)))
+
+The code names are typically defined in
+compiler/machines/port/lapgen.scm.
+
+Many of the utilities expect return addresses as their first argument,
+and it is convenient to define a procedure, INVOKE-INTERFACE-JSB
+(sometimes called LINK-TO-INTERFACE) which receives an index but
+leaves the appropriate return address in the first argument's
+location. INVOKE-INTERFACE-JSB can be written by using
+INVOKE-INTERFACE (and SCHEME-TO-INTERFACE), but given the frequency of
+this type of call, it is often written in terms of an alternate entry
+point to the runtime library (e.g. SCHEME-TO-INTERFACE-JSB).
+
+An example of a more complicated call to the runtime library is
+ (define-rule statement
+ (INTERPRETER-CALL:CACHE-ASSIGNMENT (? extension) (? value))
+ (QUALIFIER (and (interpreter-call-argument? extension)
+ (interpreter-call-argument? value)))
+ (let* ((set-extension
+ (interpreter-call-argument->machine-register! extension r2))
+ (set-value
+ (interpreter-call-argument->machine-register! value r3))
+ (clear-map (clear-map!)))
+ (LAP ,@set-extension
+ ,@set-value
+ ,@clear-map
+ ,@(invoke-interface-jsb code:compiler-assignment-trap))))
+where INTERPRETER-CALL-ARGUMENT->MACHINE-REGISTER! invokes
+CLEAR-REGISTERS! and NEED-REGISTER! besides performing the assignment.
+
+For very frequent calls, the assembly language part of the runtime
+library can provide additional entry points. The calling convention
+for these would be machine-dependent, but frequently they take
+arguments in the same way that SCHEME-TO-INTERFACE and
+SCHEME-TO-INTERFACE-JSB take them, but avoid passing the utility
+index, and may do part or all of the work of the utility in assembly
+language instead of invoking the portable C version. Many of the
+ports have out-of-line handlers for generic arithmetic, with the
+commond fixnum/flonum cases handled there.
+\f
+The following is a possible specialized version of apply
+where the special entry point expects the procedure argument on the
+stack rather than in a fixed register:
+
+ (define-rule statement
+ (INVOCATION:APPLY (? frame-size) (? continuation))
+ (LAP ,@(clear-map!)
+ ,@(load-rn frame-size 2)
+ (JMP ,entry:compiler-apply)))
+
+The procedure object will have been pushed on the stack by earlier
+code.
+
+ 5.4. Writing predicate rules.
+
+Predicate rules are used to generate code to discriminate between
+alternatives at runtime. The code generated depends on the
+conditional branch facilities of the hardware at hand. There are two
+main ways in which architectures provide conditional branching
+facilities:
+
+* condition codes. Arithmetic instructions compute condition codes
+that are stored in hardware registers. These hardware registers may
+be targeted explicitly by the programmer or implicitly by the
+hardware. Conditional branch instructions determine whether to branch
+or not depending on the contents of the condition registers at the
+time the branch instruction is executed. These condition registers
+may be named explicitly by the instructions, or assumed implicitly.
+
+* compare-and-branch instructions. The instruction set includes
+instructions that compare two values (or a value against 0) and branch
+depending on the comparison. The results of the comparison are not
+stored in special or explicit registers, since they are used
+immediately, by the instruction itself, to branch to the desired
+target.
+
+Liar accommodates both models for branching instructions.
+Predicate rules generate code that precede the actual branches, and
+then invoke the procedure SET-CURRENT-BRANCHES! informing it of the
+code to generate to branch to the target.
+Depending on the model, the prefix code may be empty, and all the code
+may appear in the arguments to SET-CURRENT-BRANCHES!
+
+SET-CURRENT-BRANCHES! expects two procedures as arguments. Each of
+them receives a label as an argument, and is supposed to generate code
+that branches to the label if the predicate condition is true (first
+argument) or false (second argument). Both options are provided
+because linearization of the control-flow graph occurs after LAP
+generation, and it is therefore not known when the predicate rule is
+fired which of the two possible linearizations will be chosen.
+
+Thus on an architecture with condition codes, the rule will return the
+code that performs the comparison, targeting the appropriate
+condition-code registers (if they are not implicit), and the arguments
+to SET-CURRENT-BRANCHES! will just generate the conditional-branch
+instructions that use the generated condition codes.
+
+On an architecture with compare-and-branch instructions, the code
+returned by the rule body will perform any work needed before the
+compare-and-branch instructions, and the arguments to
+SET-CURRENT-BRANCHES! will generate the compare-and-branch
+instructions.
+\f
+For example, on the DEC Vax, a machine with implicit condition codes,
+where compare (and most) instructions set the hidden condition-code
+register, a predicate rule could be as follows:
+ (define-rule predicate
+ (EQ-TEST (REGISTER (? register-1)) (REGISTER (? register-2)))
+ (set-current-branches!
+ (lambda (label)
+ (LAP (B EQL (@PCR ,label))))
+ (lambda (label)
+ (LAP (B NEQ (@PCR ,label)))))
+ (LAP (CMP L ,(any-register-reference register-1)
+ ,(any-register-reference register-2))))
+The prefix code performs the comparison. The arguments to
+SET-CURRENT-BRANCHES! branch depending on the result.
+
+On the HP Precision Architecture (Spectrum), a machine with
+compare-and-branch instructions, the same rule would be written as
+follows:
+ (define-rule predicate
+ ;; test for two registers EQ?
+ (EQ-TEST (REGISTER (? source1)) (REGISTER (? source2)))
+ (let* ((r1 (standard-source! source1))
+ (r2 (standard-source! source2)))
+ (set-current-branches!
+ (lambda (label)
+ (LAP (COMB (EQ) ,r1 ,r2 (@PCR ,label))
+ (NOP ()))) ; handle delay slot
+ (lambda (label)
+ (LAP (COMB (LTGT) ,r1 ,r2 (@PCR ,label))
+ (NOP ())))) ; handle delay slot
+ (LAP)))
+There is no prefix code, and the arguments to SET-CURRENT-BRANCHES!
+perform the comparison and branch.
+\f
+The (OVERFLOW-TEST) predicate condition does not fit this model
+neatly. The current compiler issues overflow tests when open-coding
+generic arithmetic. Fixnum overflow implies that bignums should be
+used for the result, and this predicate is used to conditionally
+invoke out-of-line utilities.
+
+The problem is that the decomposition of the code assumes that the
+result of the overflow test is stored implicitly by the code that
+generates the arithmetic instructions, and that this condition can be
+later used for branching by the code generated for (OVERFLOW-TEST).
+The code for the test will be adjacent to the code for the
+corresponding arithmetic operation, but the compiler assumes that the
+condition can be passed implicitly between these adjacent
+instructions. This decomposition only matches hardware with condition
+codes.
+
+Hardware with compare-and-branch instructions can be accommodated by
+explicitly computing conditions into a hardware register reserved for
+this purpose, and the code generated by the predicate rule can then
+branch according to the contents of this register. On these machines,
+the arithmetic operator will not only generate the desired result, but
+will set or clear a fixed register according to whether the
+computation overflowed or not. The predicate code will then branch
+when the fixed register contains a non-zero value for the first
+linearization choice, or zero for the other possibility.
+
+This problem is particularly acute on MIPS processors. The MIPS
+architecture does not detect overflow conditions, so the overflow
+condition must be computed by examining the inputs and outputs of the
+arithmetic instructions. There are conditional branches used just to
+store the correct overflow condition in a register, and the code
+generated for the overflow test will then branch again depending on
+the value stored. This makes the code generated by the open-coding of
+generic arithmetic contain multiple branches and quite large.
+
+The Spectrum port solves this problem a little differently. On the
+Spectrum, arithmetic instructions can conditionally cause the
+following instruction to be skipped. Since the code generated by
+(OVERFLOW-TEST) is guaranteed to follow the code generated by the
+arithmetic operation, the last instruction generated by the arithmetic
+operations conditionally skips if there is no overflow. The
+(OVERFLOW-TEST) code generates an unconditional branch for the first
+linearization choice, and an unconditional skip and an unconditional
+branch for the alternative linearization.
+
+A more efficient solution, currently employed in the MIPS port
+(version 4.87 or later) depends on the fact that the RTL instruction
+immediately preceding an RTL OVERFLOW-TEST encodes the arithmetic
+operation whose overflow condition is being tested. Given this
+assumption (that the arithmetic operation producing the overflow
+conditions and the test of such condition are adjacent), the rule for
+OVERFLOW-TEST need not generate any code, and the rule for the
+arithmetic operation can generate both the prefix code and invoke
+SET-CURRENT-BRANCHES! as appropriate. This is possible because the
+RTL encoding of arithmetic operations includes a boolean flag that
+specifies whether the overflow condition is desired or not.
+\f
+ 6. Suggested ordering of tasks.
+
+The task of porting the compiler requires a lot of work. In the past,
+it has taken approximately three full weeks for a single person
+knowledgeable with MIT Scheme and the compiler, but without
+documentation. This guide was written after the first three ports.
+
+One unfortunate aspect is that a lot of mechanism must be in place
+before most of the compiler can be tried out. In other words, there
+is a lot of code that needs to be written before small pieces can be
+tested, and the compiler is not properly organized so that parts of it
+can be run independently.
+
+Note also that cmpint-port.h, machin.scm, rules3.scm, and
+cmpaux-port.m4 are very intertwined, and you may often have to iterate
+while writing them until you converge on a final design.
+
+Keeping all this in mind, here is a suggested ordering of the tasks:
+
+ 6.1. Learn the target instruction set well.
+
+In particular, pay close attention to the branch and jump instructions
+and to the facilities available for controlling the processor caches
+(if necessary). You may need to find out the facilities that the
+operating system provides if the instructions to control the cache are
+privileged instructions.
+
+ 6.2. Write microcode/cmpint-port.h:
+
+cmpint.txt documents most of the definitions that this file must
+provide.
+
+ 6.2.1. Design the trampoline code format. Trampolines are used to
+invoke C utilities indirectly. In other words, Scheme code treats
+trampolines like compiled Scheme entry points, but they immediately
+invoke a utility to accomplish their task. Since
+return-to-interpreter is implemented as a trampoline, you will need to
+get this working before you can run any compiled code at all.
+
+ 6.2.1. Design the closure format and the execute cache format. This
+is needed to get the Scheme part of the compiler up AND to get the
+compiled code interface in the microcode working. Try to keep the
+number of instructions low since closures and execute caches are very
+common.
+
+ 6.2.2. Design the interrupt check instructions that are executed on
+entry to every procedure, continuation, and closure. Again, try to
+keep the number of instructions low, and attempt to make the
+non-interrupting case fast at the expense of the case when interrupts
+must be processed. Note that when writing the Scheme code to generate
+the interrupt sequences, you can use the ADD-END-OF-BLOCK-CODE!
+procedure to make sure that the interrupt sequence does not confuse
+your hardware's branch prediction strategy.
+
+ 6.2.3. Given all this, write cmpint-port.h. Be especially careful
+with the code used to extract and insert absolute addresses into
+closures and execute caches. A bug in this code would typically
+manifest itself much later, after a couple of garbage collections.
+
+During this process you will be making decisions about what registers
+will be fixed by the port, namely the stack pointer, the free pointer,
+the register block pointer, and at least one register holding the
+address of a label used to get back to C, typically
+scheme_to_interface.
+\f
+ 6.3. Write machin.scm:
+
+Most of the definitions in this file have direct counterparts or are
+direct consequences of the code in and microcode/cmpint-port.h, so it
+will be mostly a matter of re-coding the definitions in Scheme rather
+than C.
+
+In particular, you will have to decide how registers are going to be
+used and split between your C compiler and Liar. If your architecture
+has a large register set, you can let C keep those registers to which
+it assigns a fixed meaning (stack pointer, frame pointer, global
+pointer), and use the rest for Liar. If your machine has few
+registers or you feel more ambitious, you can give all the registers
+to Liar, but the code for transferring control between both languages
+in cmpaux-port.m4 will become more complex. Either way, you will need
+to choose appropriate registers for the Liar fixed registers (stack
+pointer, free pointer, register block pointer, dynamic link register
+and optionally, datum mask, return value register, memtop register,
+and scheme_to_interface address pointer).
+
+ 6.4. Write the assembler:
+
+You can write the assembler any old way you want, but it is easier to
+use the branch tensioner and the rest of the facilities if you use the
+same conventions that the existing assemblers use. In particular,
+with any luck, you will be able to copy inerly.scm, insmac.scm, and
+parts of assmd.scm verbatim from an existing port, and for most
+machines, coerce.scm is straightforward to write.
+
+assmd.scm defines procedures that depend only on the endianness of the
+architecture. You may want to start with the MIPS version since this
+version accommodates both endianness possibilities as MIPS processors
+can be configured either way. If your processor has fixed endianness,
+you can prune the inappropriate code. The block-offset definitions
+must agree with those in microcode/cmpint-port.h, and the padding
+definitions are simple constants.
+
+Assuming that you decide to use the same structure as existing
+assemblers, you may need to write parsers for addressing modes if your
+machine has them. You can use the versions in the MC68020 (bobcat),
+Vax, and i386 (Intel 386) ports for guidance. Addressing modes are
+described by a set of conditions under which they are valid, and some
+output code to issue. The higher-level code that parses instructions
+in insmac.scm must decide where the bits for the addressing modes must
+appear. The MC68020 version divides the code into two parts, the part
+that is inserted into the opcode word of the instruction (further
+subdivided into two parts), and the part that follows the opcode word
+as an extension. The Vax version produces all the bits at once since
+addressing modes are not split on that architecture. You should write
+the addressing mode definitions in port/insutl.scm, plus any
+additional transformers that the instruction set may require.
+
+Once you have the code for the necessary addressing modes and
+transformers (if any), and the parsing code for their declarations in
+port/insmac.scm, writing the instr<n>.scm files should not be hard.
+Remember to include pseudo-opcodes for inserting constants in the
+assembly language, and for declaring external labels so that the
+gc-offset to the beginning of the compiled code block will be inserted
+correctly. See for example, the definition of the EXTERNAL-LABEL
+pseudo-opcode in machines/mips/instr1.scm, and its use in
+machines/mips/rules3.scm.
+\f
+ 6.5. Write the LAPGEN rules:
+
+You will need to write lapgen.scm, rules1.scm, rules2.scm, rules3.scm,
+and parts of rules4.scm. Most of rules4.scm is not used by the
+compiler with the ordinary switch settings and the code may no longer
+work in any of the ports, and rulfix.scm and rulflo.scm are only
+necessary to open code fixnum and flonum arithmetic. A good way to
+reduce the amount of code needed at first is to turn primitive open
+coding off, and ignore rulfix.scm and rulflo.scm.
+
+Lapgen.scm need not include the shared code used to deal with fixnums
+and flonums, but will require the rest, especially the code used to
+invoke utilities in the compiled code interface.
+
+rules1.scm and rules2.scm are relatively straightforward since the
+RTL instructions whose translations are provided there typically map
+easily into instructions.
+
+rules4.scm need only have the INTERPRETER-CALL:CACHE-??? rules, and
+these are simple invocations of runtime library routines which you can
+emulate from exisiting ports.
+
+rules3.scm is an entirely different matter. It is probably the
+hardest file to write when porting the compiler. The most complicated
+parts to understand, and write, are the closure code, the invocation
+prefix code, and the block assembly code.
+
+ The block assembly code can be taken from another port. You will
+only have to change how the transmogrifly procedure works to take into
+account the size and layout of un-linked execute caches.
+
+ The invocation prefix code is used to adjust the stack pointer, and
+move a frame in the stack prior to a call to guarantee proper tail
+recursion. The frame moved is the one pointed at by the stack
+pointer, and it may be moved a distance known at compile time
+(invocation-prefix:move-frame-up rules) or a distance that cannot be
+computed statically (invocation-prefix:dynamic-link rules). The
+move-frame-up rules are simple, but you should remember that the
+starting and ending locations for the frame may overlap, so you must
+ensure that data is not overwritten prematurely. The dynamic link
+rules are similar to the move-frame-up rules (and typically share the
+actual moving code) but must first decide the location where the frame
+should be placed. This is done by comparing two possible values for
+the location, and choosing the value closest to the current stack
+pointer (i.e. numerically lower since the stack grows towards smaller
+addresses). Again, the source and target locations for the frame may
+overlap, so the generated code must be careful to move the data in
+such a way that no data will be lost.
+
+ The closure code is the most painful to write. When writing
+cmpint-port.h you decided what the actual code in closure entries
+would be, and the code for closure headers is a direct consequence of
+this. The combination of the instructions in a closure object, the
+helper instructions in assembly language (if any), and the
+instructions in the closure header must ultimately push the closure
+object (or its canonical representative) on the stack as if it were
+the last argument to the procedure, and pending interrupts (and gc)
+must be checked on entry to the closure. The interrupt back-out code
+is different from the ordinary procedure interrupt back-out code
+because the procedure object (the closure or its representative) is on
+top of the stack.
+
+ The cons-closure rules are used to allocate closure objects from the
+runtime heap. Some of this allocation/initialization may be done out
+of line, especially if ``assembling'' the appropriate instructions on
+the fly would require a lot of code. In addition, you may have to
+call out-of-line routines to synchronize the processor caches or
+block-allocate multiple closure entries.
+\f
+ 6.6. Write stubs for remaining port files:
+
+rgspcm.scm and dassm1.scm can be copied verbatim from any other port.
+
+lapopt.scm only needs to define an identity procedure.
+
+rulfix.scm, rulflo.scm, and rulrew.scm need not define any rules,
+since you can initially turn off open coding of primitive operators.
+
+dassm2.scm and dassm3.scm need not be written at first, but they are
+useful to debug the assembler (since disassembling some code should
+produce code equivalent to the input to the assembler) and compiler
+output when you forgot to make it output the LAP.
+
+ 6.7. Write the compiler-building files:
+
+make.scm, and comp.cbf should be minorly modified copies of the
+corresponding files in another port.
+
+comp.sf and decls.scm can be essentially copied from another port, but
+you will need to change the pathnames to refer to your port directory
+instead of the one you copied them from, and in addition, you may have
+to add or remove instr<n> and other files as appropriate.
+
+ 6.8. Write microcode/cmpaux-port.m4:
+
+cmpaux.txt documents the entry points that this file must provide.
+You need not use m4, but it is convenient to conditionalize the code
+for debugging and different type code size. If you decide not to use
+it, you should call your file cmpaux-port.s
+
+ 6.8.1. Determine your C compiler's calling convention. Find out what
+registers have fixed meanings, which are supposed to be saved by
+callees if written, and which are supposed to be saved by callers if
+they contain useful data.
+
+ 6.8.2. Find out how C code returns scalars and small C structures.
+If the documentation for the compiler does not describe this, you can
+write a C program consisting of two procedures, one of which returns a
+two-word (two int) struct to the other, and you can examine the
+assembly language produced by the compiler.
+
+ 6.8.3. Design how scheme compiled code will invoke the C utilities.
+Decide where the parameters (maximum of four) to the utilities will be
+passed (preferably wherever C procedures expect arguments), and where
+the utility index will be passed (preferably in a C caller-saves
+register).
+
+ 6.8.4. Given all this, write a minimalist cmpaux-port.m4. In other
+words, write those entry points that are absolutely required
+(C_to_interface, interface_to_C, interface_to_scheme, and
+scheme_to_interface). Be especially careful with the code that
+switches between calling conventions and register sets.
+C_to_interface and interface_to_scheme must switch between C and Liar
+conventions, while scheme_to_interface must switch the other way.
+interface_to_C must return from the original call to C_to_interface.
+Make sure that C code always sees a valid C register set and that code
+compiled by Liar always sees a valid Scheme register set.
+
+ 6.9. After the preliminary code works:
+
+Once the compiler is up enough to successfully compile moderately
+complex test programs, and the compiled code and the interface have
+been tested by running the code, you probably will want to go back and
+write the files that were skipped over. In particular, you definitely
+should write rulfix.scm and rulrew.scm, and rulflo.scm and the
+disassembler if at all possible.
+\f
+ 7. Building and testing the compiler.
+
+Once the port files have been written, you are ready to build and test
+the compiler. The first step is to build an interpreted compiler and
+run simple programs. Most simple bugs will be caught by this.
+
+ 7.1. Re-building scheme.
+
+You need to build a version of the microcode with the compiled code
+interface (portable runtime library) in it. Besides writing
+cmpint-port.h and cmpaux-port.m4, you will need to do the following:
+
+- Copy (or link) cmpint-port.h to cmpint2.h.
+
+- Modify m.h to use 6-bit-long type tags (rather than the default 8)
+if you did not do this when you installed the microcode. If you do
+this, you will not be able to load .bin files created with 8 bit type
+tags. You can overcome this problem by using the original .psb files
+again to regenerate the .bin files. Alternatively, you can use a
+version of Bintopsb compiled with 8-bit tags to generate new .psb
+files, and a version of Psbtobin compiled with 6-bit tags to generate
+the new .bin files. Anotheroption is to bring the compiler up using 8
+bit tags, but you may run out of address space. The simplest way to
+specify 6-bit type tags is to add a definition of C_SWITCH_MACHINE
+that includes -DTYPE_CODE_LENGTH=6 . Be sure to add any m4 switches
+that you may need so that the assembly language will agree on the
+number of tag bits if it needs it at all. If your version of m4 does
+not support command-line definitions, you can use the s/ultrix.m4
+script to overcome this problem. Look at the m/vax.h and s/ultrix.h
+files for m4-related definitions.
+ ==> We should just switch the default to 6 bits and be done with it.
+
+- Modify ymakefile to include the processor dependent section that
+lists the cmpint-port.h and cmpaux-port.m4 files. You can emulate the
+version for any other compiler port. It is especially important that
+the microcode sources be compiled with HAS_COMPILER_SUPPORT defined.
+
+- Remove (or save elsewhere) all the .o files, scheme.touch, and
+scheme, the linked scheme microcode.
+
+- Do ``make xmakefile;make -f xmakefile scheme'' to generate a new
+linked microcode.
+
+Once you have a new linked microcode, you need to regenerate the
+runtime system image files even if you have not changed the length of
+the type tags. This is done as follows:
+
+- Re-generate a runtime.com (actually runtime.bin) image file by
+invoking scheme with the options ``-large -fasl make.bin'' while
+connected to the runtime directory, and then typing
+ (disk-save "<lib directory pathname>/runtime.com")
+at the Scheme prompt.
+
+- You should probably also generate a runtime+sf.com file by typing
+ (begin
+ (cd "<sf directory pathname>")
+ (load "make")
+ (disk-save "<lib directory pathname>/runtime+sf.com"))
+at the Scheme prompt.
+\f
+You also need to have a working version of cref. This can be done by
+invoking scheme with the options ``-band runtime+sf.com'', and then
+typing
+ (begin
+ (cd "<cref directory pathname>")
+ (load "cref.sf"))
+at the Scheme prompt.
+
+If this errors because of the lack of a ``runtim.glob'' file, try it
+again after executing
+ (begin
+ (cd "<runtime directory pathname>")
+ (load "runtim.sf"))
+
+ 7.2. Building an interpreted compiler.
+
+Once you have a new microcode, compatible runtime system, and ready
+cref, you can pre-process the compiler as follows:
+
+- Copy (or link) comp.pkg, comp.sf, and comp.cbf from the
+compiler/machines/port directory to the compiler directory.
+
+- For convenience, make a link from compiler/machines/port to
+compiler/port.
+
+- Invoke scheme with the ``-band runtime+sf.com'' option, and then
+execute
+ (begin
+ (cd "<compiler directory pathname>")
+ (load "comp.sf"))
+This will take quite a while, and pre-process some of the files twice.
+At the end of this process, you should have a .bin file for each of
+the .scm files, a .ext file for some of them, and a bunch of
+additional files in the compiler directory (comp.con, comp.ldr,
+comp.bcon, comp.bldr, comp.glob, comp.free, comp.cref).
+
+It is a good idea to look at the comp.cref file. This is a
+cross-reference of the compiler and may lead you to find typos or
+other small mistakes. The first section of the cref file (labeled
+``Free References:'') lists all variables that are not defined in the
+compiler or the runtime system. The only variables that should be in
+this list are SF, and SF/PATHNAME-DEFAULTING. The ``Undefined
+Bindings:'' section lists those variables defined in the runtime
+system and referenced freely by the compiler sources. The remainder
+of the cref file lists the compiler packages and the cross reference
+of the procedures defined by it.
+
+- Load up the compiler. Invoke scheme with the options
+``-compiler -band runtime+sf.com'', and then type
+ (begin
+ (cd "<compiler directory pathname>")
+ (load "port/make")
+ (disk-save "<lib directory pathname>/compiler.com"))
+
+You should then be able to invoke the compiler by giving scheme the
+``-compiler'' option, and use it by invoking CF.
+\f
+ 7.3. Testing the compiler.
+
+There is no comprehensive test suite for the compiler. There is,
+however, a small test suite that is likely to catch gross errors. The
+files for the test suite are in compiler/etc/tests. Each file
+contains a short description of how it can be used.
+
+Make sure, in particular, that you test the closure code thoroughly,
+especially if closure allocation hand-shakes with out-of-line code to
+accomodate the CPU's caches.
+
+A good order to try the test suite in is
+
+ three.scm
+ expr.scm
+ pred.scm
+ close.scm
+ blast.scm
+ reverse.scm
+ arith.scm
+ bitwse.scm
+ fib.scm
+ vector.scm
+ reptd.scm
+ lexpr.scm
+ klexpr.scm
+
+ close2.scm
+ prim.scm
+ free.scm
+ uvuuo.scm
+ link.scm
+ uuo.scm
+ unv.scm
+
+ tail.scm
+ y.scm
+ sort/*.scm (see sort/README for a description)
+
+ The programs in the first list test various aspects of code
+generation.
+ The programs in the second list test the handling of various dynamic
+conditions (e.g. error recovery).
+ The programs in the third list are somewhat larger, and register
+allocation bugs, etc., are more likely to show up in them.
+
+A good idea at the beginning is to turn COMPILER:GENERATE-RTL-FILES?
+and COMPILER:GENERATE-LAP-FILES? on and compare them for plausibility.
+If you have ported the disassembler as well, you should try
+disassembling some files and comparing them to the input LAP. They
+won't be identical, but they should be similar. The disassembler can
+be invoked as follows:
+
+ (compiler:write-lap-file "<pathname of .com file>") ; writes a .lap file.
+ (compiler:disassemble <compiled entry point>) ; writes on the screen.
+
+The .lap filename extension is used by COMPILER:WRITE-LAP-FILE and by
+the compiler when COMPILER:GENERATE-LAP-FILES? is true, so you may
+want to rename the .lap file generated by the compiler to avoid
+overwriting it when using COMPILER:WRITE-LAP-FILE.
+
+Various runtime system files also make good tests. In particular, you
+may want to try list.scm, vector.scm, and arith.scm. You can try them
+by loading them, and invoking procedures defined in them, but you must
+execute
+ (initialize-microcode-dependencies!)
+after loading arith.com and before invoking procedures defined there.
+\f
+ 7.4. Compiling the compiler.
+
+The real test of the compiler comes when it is used to compile itself
+and the runtime system. Re-compiling the system is a slow process,
+that can take a few hours even with a compiled compiler on a fast
+machine. Compiling the compiler with an interpreted compiler would
+probably take days.
+
+There are two ways to speed up the process:
+
+* Cross-compiling:
+
+If you can access some machines on which the compiler already runs,
+you can cross-compile the sources using a compiled compiler. This
+method is somewhat involved because you will need binaries for both
+machines, since neither can load or dump the other's .bin files.
+
+Imagine that you have a Vax, and you are porting to a Sparc. You will
+need to pre-process and compile the Sparc's compiler on the Vax to use
+it as a cross-compiler. This can be done by following the same
+pattern that you used to generate the interpreted compiler on the
+Sparc, but running everything on the Vax, and then compiling the
+cross-compiler on the Vax by running scheme with the ``-compiler''
+option, and typing
+
+ (begin
+ (cd "<sparc compiler directory>")
+ (load "comp.cbf")
+ (disk-restore "runtime+sf.com"))
+
+ ;; After the disk-restore
+
+ (begin
+ (load "make")
+ (in-package (->environment '(compiler))
+ (set! compiler:cross-compiling? true))
+ (disk-save "sparccom.com"))
+
+to produce a cross-compiler band called "sparccom.com".
+
+Once you have the cross-compiler, you can use CROSS-COMPILE-BIN-FILE
+to generate .moc files. The .moc files can be translated to .psb
+files on the Vax. These .psb files can in turn be translated to .moc
+files on the Sparc, and you can generate the final .com files by using
+CROSS-COMPILE-BIN-FILE-END defined in compiler/base/crsend.
+compiler/base/crsend can be loaded on a plain runtime system (i.e.
+without SF or a compiler). You will probably find the following
+idioms useful:
+ (for-each cross-compile-bin-file (directory-read "<some dir>/*.bin"))
+ (for-each cross-compile-bin-file-end (directory-read "<some dir>/*.moc")).
+
+To translate the original .moc files to .psb files, you should use
+microcode/Bintopsb on the Vax as follows:
+
+ Bintopsb ci_processor=?? <foo.moc >foo.psb
+
+where the value of ci_processor should be the value of
+COMPILER_PROCESSOR_TYPE defined in microcode/cmpint-port.h.
+
+You can then generate the target .moc files by using
+microcode/Psbtobin on the Sparc as follows:
+
+ Psbtobin allow_cc <foo.psb >foo.moc
+\f
+* Distributing the task over several machines:
+
+You can use more than one machine to compile the sources. If the
+machines do not share a file system, you will have to pre-partition
+the job and generate a script for each machine. If the machines share
+a (network) file system, you can try to use compiler/etc/xcbfdir.
+This file defines two procedures, COMPILE-DIRECTORY, and
+CROSS-COMPILE-DIRECTORY, that use a simple-minded protocol based on
+creating .tch files to reserve files to compile, and can therefore be
+run on many machines simultaneously without uselessly repeating work
+or getting in each other's way.
+
+These two methods are not exclusive. We typically bring up the
+compiler on a new machine by distributing the cross-compilation job.
+
+The compiler and the cross-compiler use a lot of memory while running,
+and virtual memory is really no substitute for physical memory. You
+may want to increase your physical memory limit on those systems where
+this can be controlled (e.g. under BSD use the ``limit'' command). If
+your machines don't have much physical memory, or it is too painful to
+increase your limit, i.e. you have to re-compile or re-link the
+kernel, you may want to use microcode/bchscheme instead of
+microcode/scheme. Bchscheme uses a disk file for the spare heap,
+rather than a region of memory, putting the available memory to use at
+all times.
+
+ 7.5. Compiler convergence testing.
+
+Once you have a compiled compiler, you should run the same test suite
+that you ran with the interpreted compiler. Once you have some degree
+of confidence that the compiled compiler works, you should make sure
+that it can correctly compile itself and the runtime system. This
+re-compilation can manifest second-order compiler bugs, that is, bugs
+in the compiler that cause it to compile parts of itself incorrectly
+without crashing, so that programs compiled by this
+incorrectly-compiled compiler fail even though these programs did not
+fail when compiled by the original compiler.
+
+Of course, you can never really tell if the compiler has compiled
+itself successfully. You can only tell that it is not obviously wrong
+(i.e. it did not crash). Furthermore, there could be higher-order bugs
+that would take many re-compilations to find. However, if the binaries
+produced by two successive re-compilations are identical, further
+re-compilations would keep producing identical binaries and no
+additional bugs will be found this way. Moreover, if the compiler
+and system survive a couple of re-compilations, the compiler is likely
+to be able to compile correctly most programs.
+
+To run this compiler convergence test, you need to re-compile the
+compiler. In order to do this, you need to move the .com files from
+the source directories so that COMPILE-DIRECTORY and
+RECOMPILE-DIRECTORY will not skip all the files (they avoid compiling
+those already compiled). The simplest way to move all these files is
+to type ``make stage1'' at your shell in the source directories
+(runtime, sf, cref, and compiler). This command will create a STAGE1
+subdirectory for each of the source directories, and move all the .com
+and .binf files there. You can then use compiler/comp.cbf, or
+compiler/etc/xcbfdir and RECOMPILE-DIRECTORY to regenerate the
+compiler.
+\f
+If you generated the stage1 compiled compiler by running the compiler
+interpreted, the new .com files should match the stage1 .com files.
+If you generated the stage1 compiler by cross-compilation, they will
+not. The cross-compiler turns COMPILER:COMPILE-BY-PROCEDURES? off,
+while the default setting is on. In the latter case, you want to
+generate one more stage to check for convergence, i.e. execute ``make
+stage2'' in each source directory, and re-compile once more, at each
+stage using the compiler produced by the previous stage.
+
+Once you have two stages that you think should have identical
+binaries, you can use COMPARE-COM-FILES, defined in
+compiler/etc/comcmp, to compare the binaries. The simplest way to use
+it is to also load compiler/etc/comfiles and then use the CHECK-STAGE
+procedure.
+
+ (check-stage "STAGE2" '("runtime" "sf" "compiler/base"))
+
+will compare the corresponding .com files from runtime and
+runtime/STAGE2, sf and sf/STAGE2, and compiler/base and
+compiler/base/STAGE2.
+
+If nothing is printed, the binaries are identical. Otherwise some
+description of the differences is printed. COMPARE-COM-FILES does not
+check for isomorphism of Scode objects, so any sources that reference
+Scode constants (e.g. runtime/advice.scm) will show some differences
+that can safely be ignored. Generally, differences in constants can
+be ignored, but length and code differences should be understood. The
+code in question can be disassembled to determine whether the
+differences are real or not.
+
+While testing the compiler, in addition to checking for the correct
+operation of the compiled code, you should also watch out for crashes
+and other forms of unexpected failure. In particular, hardware traps
+(e.g. segmentation violations, bus errors, illegal instructions)
+occurring during the re-compilation process are a good clue that there
+is a problem somewhere.
+\f
+ 8. Debugging.
+
+The process of porting a compiler, due to its complexity, is unlikely
+to proceed perfectly. Things are likely to break more than once while
+running the compiler and testing the compiled code.
+
+Debugging a compiler is not trivial, because often the failures
+(especially after a while) will not manifest themselves until days,
+weeks, or months after the compiler was released, at which point the
+context of debugging the compiler has been swapped out by the
+programmer. Second-order compiler bugs do not make things any easier.
+
+Liar does not have many facilities to aid in debugging. This section
+mentions some of the few, and some techniques to use with
+assembly-language debuggers (gdb, dbx, or adb).
+
+The main assumption in this section is that the front end and other
+machine-independent parts of the compiler work correctly. Of course,
+this cannot be guaranteed, but in all likelihood virtually all of the
+bugs that you will meet when porting the compiler will be in the new
+machine-specific code.
+
+If you need to examine some of the front-end data structures, you may
+want to use the utilities in base/debug.scm which is loaded in the
+compiler by default. In particular, you will want to use PO (for
+print-object) to examine compiler data structures, and
+DEBUG/FIND-PROCEDURE to map procedure names to the data structures
+that represent the procedures, or more correctly, the lambda
+expressions.
+
+ 8.1. Preliminary debugging of the compiled code interface.
+
+The first item of business, after the microcode interface
+(cmpaux-port.m4 and cmpint-port.h) has been written, is to guarantee
+that properly constructed compiled code addresses do not confuse the
+garbage collector. This can be done before writing any of the
+remaining files, but you must have rebuilt the microcode and the
+runtime.com band.
+
+A simple test to run is the following:
+
+ (define foo
+ ((make-primitive-procedure 'COERCE-TO-COMPILED-PROCEDURE)
+ (lambda (x y)
+ (+ x y))
+ 2))
+
+ (gc-flip)
+ (gc-flip)
+ (gc-flip)
+
+If the system does not crash or complain, in all likelihood the
+garbage collector can now properly relocate compiled code objects.
+\f
+This object can also be used to test parts of the compiled code
+interface. FOO is bound to a trampoline that will immediately revert
+back to the interpreter when invoked.
+
+The next test is to determine that FOO works properly. You can follow
+the execution of FOO by using a debugger and placing breakpoints at
+
+ cmpint.c:apply_compiled_procedure,
+ cmpaux-port.s:C_to_interface,
+ cmpaux-port.s:scheme_to_interface (or trampoline_to_interface if it
+is written),
+ cmpint.c:comutil_operator_apply_trap,
+ cmpint.c:comutil_apply, and
+ cmpaux-port.s:interface_to_C
+
+and then evaluating (FOO 3 4).
+
+When setting the breakpoints, remember that C_to_interface,
+scheme_to_interface, and interface_to_scheme are not proper C
+procedures, so you should use the instruction-level breakpoint
+instructions or formats, not the C procedure breakpoint instructions
+or formats. If you are using adb, this is moot, since adb is purely
+an assembly-language debugger. If you are using gdb, you should use
+``break *&C_to_interface'' instead of ``break C_to_interface''. If
+you are using dbx, you will want to use the ``stopi'' command, instead
+of the ``stop'' command to set breakpoints in the assembly language
+routines.
+
+Make sure that the arguments to comutil_operator_apply_trap look
+plausible and that the registers have the appropriate contents when
+going into scheme code and back into C. In particular, you probably
+should examine the contents of the registers right before jumping into
+the trampoline code, and single step the trampoline code until you get
+back to scheme_to_interface.
+
+In order to parse the Scheme objects, you may want to keep a copy of
+microcode/type.names handy. This file contains the names of all the
+scheme type tags and their values as they appear in the most
+significant byte of the word when type tags are 8 bits long and 6
+bits long. Remember that you may have to insert segment bits into
+addresses in order to examine memory locations.
+
+You should also make sure that an error is signalled when FOO is
+invoked with the wrong number of arguments, and that the system
+correctly recovers from the error (i.e., it gives a meaningful error
+message and an error prompt, and resets itself when you type ^G).
+
+This test exercises most of the required assembly language code. The
+only entry point not exercised is interface_to_scheme.
+
+ 8.2. Debugging the assembler.
+
+Assuming that the compiler generates correctly formatted compiled code
+objects, fasdump should be able to dump them out without a problem.
+If you have problems when dumping the first objects, and assuming that
+you ran the tests in section 8.1., then in all likelihood the block
+offsets are not computed correctly. You should probably re-examine
+the rule for the EXTERNAL-LABEL pseudo operation, and the block-offset
+definitions in machines/port/assmd.scm.
+
+Once you can dump compiled code objects, you should test the
+assembler. A simple, but somewhat inconvenient way of doing this is
+to use adb as a disassembler as follows:
+\f
+Scheme binary (.bin and .com) files have a 50 longword header that
+contain relocation information. The longword that follows immediately
+is the root of the dumped object.
+
+If COMPILER:COMPILE-BY-PROCEDURES? is false, the compiler dumps a
+compiled entry point directly, so the format word for the first entry
+is at longword location 53 (* 4 = 0xd4), and the instructions follow
+immediately.
+
+If COMPILER:COMPILE-BY-PROCEDURES? is true, the compiler dumps an
+Scode object that contains the first entry point as the ``comment
+expression''. The format word for the first entry point is then at
+longword location 55 (* 4 = 0xdc), and the instructions for the
+top-level block follow immediately.
+
+Thus, assuming that there are four bytes per Scheme object (unsigned
+long in C), and that foo.com was dumped by the compiler with
+COMPILER:COMPILE-BY-PROCEDURES? set to false, the following would
+disassemble the first 10 instructions of the generated code.
+
+ adb foo.com
+ 0xd8?10i
+
+If COMPILER:COMPILE-BY-PROCEDURES? was set to true, the following
+would accomplish the same task:
+
+ adb foo.com
+ 0xe0?10i
+
+You can use adb in this way to compare the input assembly language to
+its binary output. Remember that you can obtain the input assembly
+language by using the COMPILER:GENERATE-LAP-FILES? switch.
+
+ 8.3. Setting breakpoints in Scheme compiled code.
+
+Compiled code is not likely to work correctly at first even after the
+compiler stops signalling errors. In general, when you find that
+compiled code executes incorrectly, you should try to narrow it down
+as much as possible by trying the individual procedures, etc., in the
+code, but ultimately you may need the ability to set instruction-level
+breakpoints and single-step instructions in compiled code.
+
+A problem peculiar to systems in which code is relocated on the fly is
+that you cannot, in general, obtain a permanent address for a
+procedure or entry point. The code may move at every garbage
+collection, and if you set a machine-level breakpoint with a Unix
+debugger, and then the code moves, you will probably get spurious
+traps when re-running the code. Unix debuggers typically replace some
+instructions at the breakpoint location with instructions that will
+cause a specific trap, and then look up the trapping location in some
+table when the debugged process signals the trap.
+
+One way around this problem is to ``purify'' all compiled scheme code
+that you will be setting breakpoints in. If you purify the code, it
+will move into ``constant space'' and remain at a constant location
+across garbage collections. The PURIFY procedure expects an object
+to purify as the first argument, a boolean flag specifying whether the
+object should be moved into constant space (if false) or pure space
+(if true) as a second argument, and a boolean flag specifying whether
+purification should occur immediately (if false) or be delayed until
+the next convenient time (if true) as a third argument. You should
+use
+
+ (purify <object> false false)
+\f
+when moving compiled code objects to constant space for debugging
+purposes. Alternatively, you can specify that you want the code to be
+purified when you load it by passing appropriate arguments to LOAD.
+Since load delays the actual purification, you will need to invoke
+GC-FLIP twice to flush the purification queue.
+
+At any rate, setting the actual breakpoints is not completely trivial,
+since you must find the virtual address of the instructions, and then
+use them with your assembly-language debugger.
+
+The simplest way to do this is to get Scheme to print the datum of
+the entry points for you, and then type one of Scheme's interrupt
+characters to gain the debugger's attention and set the breakpoint.
+Continuing from the debugger will allow you to type further
+expressions to Scheme.
+
+Imagine, for example, that we have compiled the runtime file list.scm
+and some of the procedures in it, namely MEMBER and MEMQ, do not work
+properly. After purifying the code, you can type ``memq'' at the
+read-eval-print loop and it will respond with something like
+ ;Value 37: #[compiled-procedure 37 ("list" #x5A) #x10 #x10FE880]
+
+This specifies that MEMQ is bound to an ordinary compiled procedure
+(not a closure), that it was originally compiled as part of file
+``list'' and it was part of compiled code block number 90 (= #x5a) in
+that file. The current datum of the object is #x10FE880 (this is the
+address without the segment bits if any), and the offset to the
+beginning of the containing compiled code block is #x10. Thus you
+could then gain the attention of the debugger and set a breakpoint at
+address 0x10fe880 (remember to add the segment bits if necessary) and
+after continuing back into Scheme, use MEMQ normally to trigger the
+breakpoint.
+
+The case with MEMBER is similar. Typing ``member'' at the
+read-eval-print loop will cause something like
+ ;Value 36: #[compiled-closure 36 ("list" #x56) #x5C #x10FE484 #x1180DF8]
+to be printed.
+
+This specifies that MEMBER is bound to a compiled closure, originally
+in compiled code block number 86 (= #x56) in file ``list'', that the
+entry point to the closure is at datum #x1180DF8, that the entry point
+shared by all closures of the same lambda expression (the ``real''
+entry point) is at datum #x10FE484, and that this entry point is at
+offset #x5C of its containing compiled code block.
+
+Thus if you want to single step the closure code (a good idea when you
+try them at first), you would want to set a breakpoint at address
+#x1180DF8 (plus appropriate segment bits), and if you want to single
+step or examine the real code, then you should use address #x10FE484.
+If you purified the code when you loaded it, the real code would be
+pure, but the closure itself would not be, since it was not a part of
+the file being loaded (closures are created dynamically). Thus,
+before setting any breakpoints in a closure, you should probably
+purify it as specified above, and obtain its address again, since it
+would have moved in the meantime.
+\f
+For example, if you are using adb on an HP-PA (where the top two bits
+of a data segment address are always 01, and thus the top nibble of a
+Scheme's object address is always 4), assuming that the interpreter
+printed the above addresses,
+
+0x41180df8:b would set a breakpoint in the MEMBER closure,
+0x410fe484:b would set a breakpoint at the start of the code shared
+ by MEMBER and all closures of the same lambda expression,
+0x410fe880:b would set a breakpoint at the start of MEMQ.
+
+If you are using gdb on a Motorola MC68020 machine, with no segment bits
+for the data segment, the equivalent commands would be
+
+break *0x1180df8 for a breakpoint in the MEMBER closure,
+break *0x10fe484 for a breakpoint in MEMBER's shared code
+break *0x10fe880 for a breakpoint in MEMQ.
+
+If you are using dbx, you will need to use a command like
+ ``stopi at 0x10fe484'' to achieve the same effect.
+
+ 8.4. Examining arguments to Scheme procedures.
+
+Commonly, after setting a breakpoint at some interesting procedure,
+you will want to examine the arguments. Currently, Liar passes all
+arguments on the stack. The Scheme stack always grows towards
+decreasing addresses, and arguments are pushed from left to right.
+
+On entry to a procedure, the stack frame must have been reformatted so
+that optional arguments have been defaulted, and tail (lexpr)
+arguments have been collected into a list (possibly empty). Thus on
+entry to an ordinary procedure's code the stack pointer points to the
+rightmost parameter in the lambda list, and the rest of the parameters
+follow at increasing longword addresses.
+
+This is also the case on entry to a closure object's instructions, but
+by the time the shared code starts executing the body of the lambda
+expression, the closure object itself (or its canonical
+representative) will be on top of the stack with the arguments
+following in the standard format. On entry to a closure's shared
+code, the stack will contain the arguments in the standard format, but
+the closure object will typically be in the process of being
+constructed and pushed, depending on the distribution of the task
+between the instructions in the closure object, the optional helper
+instructions in assembly language, and the instructions in the closure
+header.
+
+If you are using adb, you can use the following commands:
+ $r displays the names and contents of the processor
+ registers,
+ <r23=X displays the contents of the r23 register in hex,
+ 0x4040c/4X displays four longwords in memory at increasing
+ addresses starting with 0x4040c.
+
+When using gdb, you can achieve the same effect by using the following
+commands:
+ info reg displays the names and contents of the processor
+ registers,
+ p/x $gr23 displays the contents of processor register gr23,
+ x/4wx 0x4040c displays four longwords in memory starting at
+ address 0x4040c.
+
+If you are using dbx, you can use the following commands:
+ print $l4 to display the contents of processor register l4,
+ 0x4040c/4X to display four longwords in memory starting at
+ address 0x4040c.
+\f
+ 8.5. Tracing the call stack.
+
+Procedures compiled by Liar receive additional implicit arguments used
+to communicate the lexical environment and the implicit continuation.
+Most procedures receive a return address argument that points to the
+procedure that is waiting for the one at hand to finish. Some
+procedures receive a static link argument. Static links are pointers
+into other frames in the stack, used to thread together environments
+when the distance between lexically nested frames cannot be statically
+computed. Some procedures receive a dynamic link argument. Dynamic
+links are pointers to return addresses on the stack, and are used when
+the compiler cannot determine statically the distance on the stack
+between the last argument and the return address for a procedure.
+Dynamic links are rarely needed, and static links are needed only
+somewhat more frequently. Only one of the programs in the test suite
+uses dynamic and static links, and it was carefully constructed to
+make the compiler generate code that uses them. All externally
+callable procedures, including all those ``closed'' at top level,
+expect return addresses and no other links.
+
+In general, it is impossible to find out what procedure called another
+in Scheme, due to tail recursion. Procedures whose last action is a
+call to another procedure, and whose local frames are not part of the
+environment of their callees, will have their frames popped off the
+stack before the callee is entered, and there will be no record left
+of their execution. The interpreter uses a cache of previous frames
+(called the history) to provide additional debugging information.
+
+On the other hand, most calls are not in tail position, and a return
+address will be ``passed'' to the callee indicating who the caller
+was. Occasionally the static and dynamic links will have to be traced
+to find the return address, but this is rare.
+
+The following assumes that we are dealing with an ordinary procedure
+that expects a return address.
+
+The return address is passed on the stack, immediately below the
+leftmost argument. It is a compiled-code entry object whose datum is
+the encoded address of the instruction at which the ``caller'' should
+be reentered. If the procedure was called from the interpreter, the
+``caller'' will be a special trampoline (return_to_interpreter) that
+will give control back to the interpreter by using the compiled code
+interface. There is a little hanky panky in the interpreter to
+guarantee that interpreted code ``tail recursing'' into compiled code
+and viceversa will not push spurious return_to_interpreter return
+addresses and return_to_compiled_code interpreter return codes that
+would break proper tail recursion, but you need not concern yourself
+with this.
+
+If your debugger allows you to dynamically call C procedures, and it
+is not hopelessly confused by the Scheme register set, you can use the
+C procedure ``compiled_entry_filename'' to determine the filename that
+a return address (or other compiled entry) belongs to. Its only
+argument should be the compiled entry object whose origin you want to
+find. Unfortunately, debuggers are often confused by the register
+assignments within Scheme compiled code, precisely when you need them
+most. You can bypass this problem the following way:
+
+On entry to a procedure whose return address you wish to examine,
+write down the return address object, change the compiled code's
+version of MemTop so that the comparison with free will fail and the
+code will take an interrupt, set a breakpoint in the runtime library
+routine ``compiler_interrupt_common'', and continue the code. When
+the new breakpoint is hit, you can use ``compiled_entry_filename'' to
+examine the return address you had written down.
+\f
+Here is how to do this the hard way, which you will have to resort to
+often:
+
+Compiled code blocks generated by the compiler always encompass two
+special locations. The last location in the compiled code block
+contains the environment where the compiled code block was loaded
+(after the code has been evaluated). The immediately preceding
+location contains a debugging information descriptor. This descriptor
+is either a string, namely the name of the file where the compiler
+dumped the debugging information for the block, or a pair whose car is
+the name of the debugging information filename and whose cdr is the
+block number (a fixnum) of the compiled code block in the file, or the
+debugging information itself if the compiled code block was generated
+in core and not dumped.
+
+Given that the word immediately preceding an external entry point in
+the compiler always contains a gc-offset from the entry point to the
+first word of the compiled code block (i.e. the vector header), or to
+another external entry point if the distance is too large, it is a
+simple, but involved, matter to find this debugging information. Note
+that all return addresses, full closures, and top-level procedures are
+external entry points.
+
+For example, imagine that the return address for a procedure is
+0xa08fe2ee. Furthermore, assume that we are running on a Motorola
+MC68020 with four bytes per longword, no segment bits, and for which
+cmpint-mc68k.h defines PC_ZERO_BITS to be 1.
+
+Extracting the word at address 0x8fe2ea (four bytes before the entry
+point), will yield the format longword, that consists of the format
+field in its high halfword, and the encoded gc offset field in the
+lower halfword.
+
+The gdb command ``x/2hx 0x4fc564'' will print these two halfwords,
+and let's assume that the output is
+ 0x8fe2ea <fpa_loc+10445546>: 0x8280 0x003e
+
+The adb (and dbx) command ``0x4fc564/2x'' should yield similar output.
+
+This confirms that the object in question is a return address because
+the most significant bit of the format word is a 1, and it would be a
+0 for a procedure. The encoded gc offset is 0x3e. GC offsets and
+format words are described in detail in cmpint.txt.
+
+Since the least significant bit of the GC offset is 0, it points
+directly to the vector header of a compiled code block. The real
+offset is
+ ((0x3e >> 1) << PC_ZERO_BITS) = 0x3e
+
+Thus the compiled code block starts at location
+ 0x8fe2ee-0x3e = 0x008fe2b0
+
+Examining the top two words at this address (using the gdb command
+ ``x/2wx 0x008fe2b0'' or the adb and dbx command ``0x8fe2b0/2X'') we
+see
+ 0x8fe2b0 <fpa_loc+10445488>: 0x00000028 0x9c00001d
+
+The first word is an ordinary vector header, and the second a
+non-marked vector header used to inform the GC that 0x1d longwords of
+binary data, the actual instructions, follow.
+
+The last location in the vector, containing the environment, is at
+address
+ 0x8fe2b0+4*0x28 = 0x8fe350
+\f
+Examining the preceding adjacent location and this one (using gdb's
+ ``x/2wx 0x8fe34c'' or a similar command for a different debugger)
+will yield
+ 0x8fe34c <fpa_loc+10445644>: 0x0498ef9c 0x4898e864
+
+The second object is the loading environment, and the first object is
+the debugging information, in this case a pair. This pair can be
+examined (using gdb's ``x/2wx 0x98ef9c'' or an analogous command for a
+different debugger) to yield
+ 0x98ef9c <fpa_loc+11038620>: 0x789ac5ec 0x6800001c
+
+The first object is a string, and the second a fixnum, indicating that
+the return address at hand belongs to the compiled code block numbered
+0x1c in the file whose name is that string.
+
+Scheme strings have two longwords of header, followed by an ordinary C
+string that includes a null terminating character, thus the C string
+starts at address 0x9ac5ec+4*2=0x9ac5f4, and the gdb command
+ ``x/s 0x9ac5f4'' or the adb and dbx command ``0x9ac5f4/s''
+will display something like:
+ 0x9ac5f4 <fpa_loc+11159028>:
+ (char *) 0x9ac5f4 "/usr/local/lib/mit-scheme/SRC/runtime/parse.binf"
+
+Thus the return address we are examining is at offset 0x3e in compiled
+code block number 0x1c of the runtime system file ``parse.com''.
+
+If the disassembler is available, you can then use
+ (compiler:write-lap-file "parse")
+to find out what this return address is, or if you compiled (or
+re-compile) parse.scm generating lap files, you can probably guess
+what return address is at offset 0x3e (the input lap files do not
+contain computed offsets, since these are computed on final assembly).
+
+This interaction would remain very similar for other machines and
+compiled entry points, given the same or similar debuggers. The
+variations would be the following:
+
+- Segment bits might have to be added to the object datum components
+to produce addresses. For example, on the HP-PA with segment bits 01
+at the most significant end of a word, the C string for Scheme string
+object 0x789ac5ec would start at address 0x409ac5ec+8=0x409ac5f4,
+instead of at address 0x9ac5ec+8=0x9ac5f4.
+
+- The gc offset might be computed differently, depending on the value
+of PC_ZERO_BITS. For example, on a Vax, where PC_ZERO_BITS has the
+value 0, an encoded offset of value 0x3e would imply a real offset of
+value (0x3e >> 1)=0x1f. On a MIPS R3000, where PC_ZERO_BITS is 2, the
+same encoded offset would encode the real offset value
+ ((0x3e >> 1) << 2)=0x7c. In addition, if the low order bit of the
+encoded gc offset field were 1, a new gc offset would have to be
+extracted, and the process repeated until the beginning of the block
+was reached.
+
+- The constant offsets added to various addresses (e.g. that added to
+a string object's address to obtain the C string's address) would vary
+if the number of bytes per Scheme object (sizeof (unsigned long)) in C
+were not 4.
+\f
+- Not all compiled entry points have debugging information descriptors
+accessible the same way. Trampolines don't have them at all, and
+closures have them in the shared code, not in the closure objects. To
+check whether something is a trampoline, you can check the format
+field (most trampolines have 0xfffd) or verify that the instructions
+immediately call the compiled code interface. Closure objects have
+type code MANIFEST-COMPILED-CLOSURE instead of MANIFEST-VECTOR in the
+length word of the compiled code block. Once you obtain the real
+entry point for a closure, you can use the same method to find out the
+information about it.
+
+ 8.6. Things to watch out for.
+
+The worst bugs to track are interrupt and garbage-collection related.
+They will often make the compiled code crash at seemingly random
+points, and are very hard to reproduce. A common source of this kind
+of bug is a problem in the rules for procedure headers. Make sure
+that the rules for the various kinds of procedure headers generate the
+desired code, and that the desired code operates correctly. You can
+test this explicitly by using an assembly-language debugger to set
+breakpoints at the entry points of various kinds of procedures. When
+the breakpoints are reached, you can bump the Free pointer to a value
+larger than MemTop, so that the interrupt branch will be taken. If
+the code continues to execute correctly, you are probably safe. You
+should especially check procedures that expect dynamic links for these
+must be saved and restored correctly. Closures should also be tested
+carefully, since they need to be reentered correctly, and the closure
+object on the stack may have to be de-canonicalized. Currently
+C_to_interface and interface_to_scheme must copy the interpreter's
+value register into the compiler's value register, and must extract
+the address of this value and store it in the dynamic link register.
+
+Register allocation bugs also manifest themselves in unexpected ways.
+If you forget to use NEED-REGISTER! on a register used by a LAPGEN
+rule, or if you allocate registers for the sources and target of a
+rule in the wrong order (remember the cardinal rule!), you may not
+notice for a long time, but some poor program will. If this happens,
+you will be lucky if you can find and disassemble a relatively small
+procedure that does not operate properly, but typically the only
+notice you will get is when Scheme crashes in an unrelated place.
+Fortunately, this type of bug is reproducible. In order to find the
+incorrectly compiled code, you can use binary search on the sources by
+mixing interpreted and compiled binaries. When loading the compiler,
+.bin files will be used for those files for which the corresponding
+.com file does not exist. Thus you can move .com files in and out of
+the appropriate directories, reload, and test again. Once you
+determine the procedure in which the bug occurs, re-compiling the
+module and examining the resulting RTL and LAP programs should lead to
+identification of the bug.
+\f
+ 9. Bibliography
+
+1. "Efficient Stack Allocation for Tail-Recursive Languages" by Chris
+Hanson, in Proceedings of the 1990 ACM Conference on Lisp and
+Functional Programming.
+
+2. "Free Variables and First-Class Environments" by James S. Miller
+and Guillermo J. Rozas, in Lisp and Symbolic Computation, 4, 107-141,
+1991, Kluwer Academic Publishers.
+
+3. "MIT Scheme User's Manual for Scheme Release 7.1" by Chris Hanson,
+distributed with MIT CScheme version 7.1.
+
+4. "MIT Scheme Reference Manual for Scheme Release 7.1" by Chris
+Hanson, distributed with MIT CScheme version 7.1.
+
+5. "Taming the Y Operator" by Guillermo J. Rozas, in Proceedings of
+the 1992 ACM Conference on Lisp and Functional Programming.
+\f
+ A.1. MIT Scheme package system
+
+The MIT Scheme package system is used to divide large programs into
+separate name spaces which are then ``wired together''. A large
+program, like the runtime system, Edwin, or the compiler, has many
+files and variable names all of which must exist at the same time
+without conflict. The package system is a prototype system to
+accomplish this separation, but will probably be replaced once a
+better module system is developed.
+
+Currently, each package corresponds, at runtime, to a Scheme
+environment. Environments have their usual, tree-shaped structure,
+and packages are also structured in a tree, but the trees need not be
+isomorphic, although they often are.
+
+Each package is given a name, e.g.:
+
+ (compiler reference-contexts)
+
+whose corresponding environment can be found using the procedure
+->ENVIRONMENT:
+
+ (->environment '(compiler reference-contexts)) ;; Call this CR
+
+By convention, this package corresponds to an environment below the
+
+ (->environment '(compiler)) ;; Call this C
+
+environment, and therefore CR contains all variables defined in C, as
+well as those specifically defined in CR. The package name ``()''
+corresponds to the system global environment.
+
+The package structure for the compiler is defined in the file
+
+ <compiler-directory>/machines/<machine-name>/comp.pkg
+
+In that file, each package has a description of the form:
+
+ (define-package <NAME>
+ (files <FILES>)
+ (parent <PACKAGE-NAME>)
+ (export <PACKAGE-NAME> <VARIABLES>)
+ (import <PACKAGE-NAME> <VARIABLES>))
+
+where <FILES> are the names of the files that should be loaded in to
+package <NAME>.
+
+ (parent <PACKAGE-NAME>)
+
+declares the package whose name is <PACKAGE-NAME> to be the parent
+package of <NAME>. Lexical scoping will make all variables visible in
+<PACKAGE-NAME> also visible in <NAME>.
+
+The EXPORT and IMPORT declarations are used to describe cross-package
+links. A package may export any of its variables to any other
+package using EXPORT; these variables will appear in both packages
+(environments), and any side effect to one of these variables in
+either package will be immediately visible in the other package.
+
+Similarly, a package may import any of another package's variables
+using IMPORT. Any number (including zero) of IMPORT and EXPORT
+declarations may appear in any package declaration.
+\f
+Here is an example package declaration, drawn from the compiler:
+
+ (define-package (compiler top-level)
+ (files "base/toplev"
+ "base/crstop")
+ (parent (compiler))
+ (export ()
+ cf
+ compile-bin-file
+ compile-procedure
+ compile-scode
+ compiler:reset!
+ cross-compile-bin-file
+ cross-compile-bin-file-end)
+ (export (compiler fg-generator)
+ compile-recursively)
+ (export (compiler rtl-generator)
+ *ic-procedure-headers*
+ *rtl-continuations*
+ *rtl-expression*
+ *rtl-graphs*
+ *rtl-procedures*)
+ (export (compiler lap-syntaxer)
+ *block-label*
+ *external-labels*
+ label->object)
+ (export (compiler debug)
+ *root-expression*
+ *rtl-procedures*
+ *rtl-graphs*)
+ (import (runtime compiler-info)
+ make-dbg-info-vector)
+ (import (runtime unparser)
+ *unparse-uninterned-symbols-by-name?*))
+
+The read-eval-print loop of Scheme evaluates all expressions in the
+same environment. It is possible to change this environment using the
+procedure GE, e.g.:
+
+ (ge (->environment '(compiler top-level)))
+
+To find the package name of the current read-eval-print loop
+environment, if there is one, evaluate:
+
+ (pe)
+
+The package system is currently completely static; it is difficult to
+create packages and wire them together on the fly. If you find that
+you need to temporarily wire a variable to two different environments
+(as you would do with an IMPORT or EXPORT declaration), use the
+procedure ENVIRONMENT-LINK-NAME:
+
+ (environment-link-name <TO-ENVIRONMENT>
+ <FROM-ENVIRONMENT>
+ <VARIABLE-SYMBOL-NAME>)
+
+For example, to make WRITE-RESTARTS, originally defined in the
+(runtime debugger) package, also visible in the (edwin debugger)
+package, evaluate:
+
+ (environment-link-name (->environment '(edwin debugger))
+ (->environment '(runtime debugger))
+ 'write-restarts)