From f7faacf80668c7e617ccb30a7ed83165084f289a Mon Sep 17 00:00:00 2001 From: Stephen Adams <edu/mit/csail/zurich/adams> Date: Sat, 5 Aug 1995 16:25:13 +0000 Subject: [PATCH] Initial revision --- v8/src/compiler/documentation/cmpaux.txt | 490 +++ v8/src/compiler/documentation/cmpint.txt | 1093 ++++++ v8/src/compiler/documentation/facts.txt | 28 + v8/src/compiler/documentation/files.txt | 197 + v8/src/compiler/documentation/notes.txt | 147 + v8/src/compiler/documentation/porting.guide | 3751 +++++++++++++++++++ v8/src/compiler/documentation/safety.txt | 226 ++ v8/src/compiler/documentation/todo.txt | 158 + 8 files changed, 6090 insertions(+) create mode 100644 v8/src/compiler/documentation/cmpaux.txt create mode 100644 v8/src/compiler/documentation/cmpint.txt create mode 100644 v8/src/compiler/documentation/facts.txt create mode 100644 v8/src/compiler/documentation/files.txt create mode 100644 v8/src/compiler/documentation/notes.txt create mode 100644 v8/src/compiler/documentation/porting.guide create mode 100644 v8/src/compiler/documentation/safety.txt create mode 100644 v8/src/compiler/documentation/todo.txt diff --git a/v8/src/compiler/documentation/cmpaux.txt b/v8/src/compiler/documentation/cmpaux.txt new file mode 100644 index 000000000..d56587604 --- /dev/null +++ b/v8/src/compiler/documentation/cmpaux.txt @@ -0,0 +1,490 @@ +-*- Text -*- + +$Header: /Users/cph/tmp/foo/mit-scheme/mit-scheme/v8/src/compiler/documentation/cmpaux.txt,v 1.1 1995/08/05 16:25:13 adams Exp $ + + +Copyright (c) 1991 Massachusetts Institute of Technology + + + Documentation of the assembly language + interface to MIT Scheme compiled code + *DRAFT* + + + +In the following, whenever Scheme is used, unless otherwise specified, +we refer to the MIT Scheme dialect and its CScheme implementation. + +This file describes the entry points that must be provided by +cmpaux-md.h, and the linkage conventions assumed by scheme code. + +cmpint.txt provides some background information required to understand +what follows. + + Calling conventions + +Most C implementations use traditional stack allocation of frames +coupled with a callee-saves register linkage convention. Scheme would +have a hard time adopting a compatible calling convention: + +The Scheme language requires properly tail recursive implementations. +This means that at any given point in a program's execution, if an +object is not accessible from the global (static) state of the +implementation or the current continuation, the object's storage has +been reclaimed or is in the process of being reclaimed. In +particular, recursively written programs need only use progressively +more storage if there is accumulation in the arguments or deeper +levels of the recursion are invoked with more deeply nested +continuations. + +This seemingly abstract requirement of Scheme implementations has some +very mundane consequences. The traditional technique of allocating a +new stack frame for each nested procedure call and deallocating it on +return does not work for Scheme, since allocation should only take +place if the continuation grows, not if the nesting level grows. + +A callee-saves register convention is hard to use for Scheme. The +callee-saves convention assumes that all procedures entered eventually +return, but in the presence of tail recursion, procedures replace each +other in the execution call tree and return only infrequently. The +caller-saves convention is much better suited to Scheme, since +registers can be saved when the continuation grows and restored when +it shrinks. + +Given these difficulties, it is easier to have each language use its +natural convention rather than impose an inappropriate (and therefore +expensive) model on Scheme. + +The unfortunate consequence of this decision is that Scheme and C are +not trivially inter-callable, and thus interface routines must be +provided to go back and forth. + +One additional complication is that the Scheme control stack (that +represents the current continuation) must be examined by the garbage +collector to determined what storage is currently in use. This means +that it must contain only objects or that regions not containing +objects must be clearly marked. Again, it is easier to have separate +stacks for Scheme and C than to merge them. + +The interface routines switch between stacks as well as between +conventions. They must be written in assembly language since they do +not follow C (or Scheme, for that matter) calling conventions. + + Routines required by C: + +The C support for compiled code resides in cmpint.c and is customized +to the implementation by cmpint2.h which must be a copy of the +appropriate cmpint-md.h file. + +The code in cmpint.c provides the communication between compiled code +and the interpreter, primitive procedures written in C, and many +utility procedures that are not in-line coded. + +cmpint.c requires three entry points to be made available from +assembly language: + +* C_to_interface: + This is a C-callable routine (it expects its arguments and return + address following C's passing conventions) used to transfer from the C + universe to the compiled Scheme universe. + + It expects a single argument, namely the address of the instruction to + execute once the conventions have been switched. + + It saves all C callee-saves registers, switches stacks (if there is + an architecture-distinguished stack pointer register), initializes + the Scheme register set (Free register, register array pointer, + utility handles register, MemTop register, pointer mask, value + register, dynamic link register, etc.), and jumps to the address + provided as its argument. + + C_to_interface does not return directly, it tail-recurses into + Scheme. Scheme code will eventually invoke C utilities that will + request a transfer back to the C world. This is accomplished by + using one of the assembly-language provided entry points listed below. + +C utilities are invoked as subroutines from an assembly language +routine (scheme_to_interface) described below. The expectation is +that they will accomplish their task, and execution will continue in +the compiled Scheme code, but this is not always the case, since the +utilities may request a transfer to the interpreter, the error system, +etc. + +This control is accomplished by having C utilities return a C +structure with two fields. The first field must hold as its contents +the address of one of the following two entry points in assembly +language. The second field holds a value that depends on which entry +point is being used. + +* interface_to_C: + This entry point is used by C utilities to abandon the Scheme + compiled code universe, and return from the last call to + C_to_interface. Its argument is the (C long) value that + C_to_interface must return to its caller. It is typically an exit + code that specifies further action by the interpreter. + + interface_to_C undoes the work of C_to_interface, ie. it saves the + Scheme stack and Free memory pointers in the appropriate C variables + (Ext_Stack_Pointer and Free), restores the C linkage registers and + callee-saves registers previously saved, and uses the C return + sequence to return to the caller of C_to_interface. + +* interface_to_scheme: + This entry point is used by C utilities to continue executing in the + Scheme compiled code universe. Its argument is the address of the + (compiled Scheme) instruction to execute once the transfer is + finished. + + Typically C_to_interface and interface_to_scheme share code. + + Routines required by Scheme: + +Conceptually, only one interface routine is required by Scheme code, +namely scheme_to_interface. For convenience, other assembly language +routines are may be provided with slightly different linkage +conventions. The Scheme compiler back end will choose among them +depending on the context of the call. Longer code sequences may have +to be issued if only one of the entry points is provided. The other +entry points are typically a fixed distance from scheme_to_interface, +so that compiled code can invoke them by adding the fixed offset. + +* scheme_to_interface: + This entry point is used by Scheme code to invoke a C utility. + It expects up to five arguments in fixed registers. The first + argument (required), is the number identifying the C utility routine + to invoke. This number is the index of the location + containing the address of the C procedure in the array + utility_result, declared in cmpint.c . + + The other four registers contain the arguments to be passed to the + utility procedure. Note that all C utilities declare 4 arguments + even if fewer are necessary or relevant. The reason for this is + that the assembly language routines must know how many arguments to + pass along, and it is easier to pass all of them. Of course, + compiled code need not initialize the registers for those arguments + that will never be examined. + + In order to make this linkage as fast as possible, it is + advantageous to choose the argument registers from the C argument + registers if the C calling convention passes arguments in registers. + In this way there is no need to move them around. + + scheme_to_interface switches stacks, moves the arguments to the + correct locations (for C), updates the C variables Free and + Ext_Stack_Pointer, and invokes (in the C fashion) the C utility + procedure indexed by the required argument to scheme_to_interface. + + On return from the call to scheme_to_interface, a C structure + described above is expected, and the first component is invoked + (jumped into) leaving the structure or the second component in a + pre-established place so that interface_to_C and interface_to_scheme + can find it. + +* scheme_to_interface_ble/jsr: + Many utility procedures expect a return address as one of their + arguments. The return address can be easily obtained by using the + machine's subroutine call instruction, rather than a jump + instruction, but the return address may not be left in the desired + argument register. scheme_to_interface_ble/jsr can be provided to + take care of this case. In order to facilitate its use, all utility + procedures that expect a return address receive it as the first + argument. Thus scheme_to_interface_ble/jsr is invoked with the + subroutine-call instruction, transfers (and bumps past the format + word) the return address from the place where the instruction leaves + it to the first argument register, and falls through to + scheme_to_interface. + +* trampoline_to_interface: + + Many of the calls to utilities occur in code issued by the linker, + rather than the compiler. For example, compiled code assumes that + all free operator references will resolve to compiled procedures, + and the linker constructs dummy compiled procedures (trampolines) to + allow the code to work when they resolve to unknown or interpreted + procedures. Corrective action must be taken on invocation, and this + is accomplished by invoking the appropriate utility. Trampolines + contain instructions to invoke the utility and some + utility-dependent data. The instruction sequence in trampolines may + be shortened by having a special-purpose entry point, also invoked + with a subroutine-call instruction. All utilities expecting + trampoline data expect as their first argument the address of the + first location containing the data. Thus, again, the return address + left behind by the subroutine-call instruction must be passed in the + first argument register. + +scheme_to_interface_ble/jsr and trampoline_to_interface are virtually +identical. The difference is that the return address is interpreted +differently. trampoline_to_interface interprets it as the address of +some storage, scheme_to_interface_ble/jsr interprets it as a machine +return address that must be bumped to a Scheme return address (all +Scheme entry points are preceded by format words for the garbage +collector). + +More entry points can be provided for individual ports. Some ports +have entry points for common operations that take many instructions +like integer multiplication, allocation and initialization of a +closure object, or calls to unknown Scheme procedures with fixed +numbers of arguments. None of these are necessary, but may make the +task of porting the compiler easier. + +Typically these additional entry points are also a fixed distance away +from scheme_to_interface in order to reduce the number of reserved +registers required. + + Examples: + +1 (PDP-11-like CISC): + +Machine M1 is a general-addressing-mode architecture and has 7 +general-purpose registers (R0 - R6), and a hardware-distinguished +stack pointer (SP). The stack is pushed by predecrementing the stack +pointer. The JSR (jump to subroutine) instruction transfers control +to the target and pushes a return address on the stack. The RTS +(return from subroutine) instruction pops a return address from the +top of the stack and jumps to it. + +The C calling convention is as follows: + +- arguments are passed on the stack and are popped on return by the +caller. +- the return address is on top of the stack on entry. +- register r6 is used as a frame pointer, saved by callees. +- registers r0 - r2 are caller saves, r3 - r5 are callee saves. +- scalar values are returned in r0. +- structures are returned by returning the address of a static area on r0. + +The Scheme register convention is as follows: + +- register r6 is used to hold the register block. +- register r5 is used to hold the free pointer. +- register r4 is used to hold the dynamic link, when necessary. +- registers r1 - r3 are the caller saves registers for the compiler. +- register r0 is used as the value register by compiled code. +- all other implementation registers reside in the register array. + In addition, scheme_to_interface, trampoline_to_interface, etc., are + reached from the register array as well (there is an absolute jump + instruction in the register array for each of them). + +The utility calling convention is as follows: + +- the utility index is in r0. +- the utility arguments appear in r1 - r4. + +The various entry points would then be (they can be bummed): + +_C_to_interface: + push r6 ; save old frame pointer + mov sp,r6 ; set up new frame pointer + mov 8(r6),r1 ; argument to C_to_interface + push r3 ; save callee-saves registers + push r4 + push r5 + push r6 ; and new frame pointer + +_interface_to_scheme: + mov sp,_saved_C_sp ; save the C stack pointer. + mov _Ext_Stack_Pointer,sp ; set up the Scheme stack pointer + mova _Registers,r6 ; set up the register array pointer + mov _Free,r5 ; set up the free register + mov regblock_val(r6),r0 ; set up the value register + and &<address-mask>,r0,r4 ; set up the dynamic link register + jmp 0(r1) ; go to compiled Scheme code + +scheme_to_interface_jsr: + pop r1 ; return address is first arg. + add &4,r1,r1 ; bump past format word + jmp scheme_to_interface + +trampoline_to_interface: + pop r1 ; return address is first arg. + +scheme_to_interface: + mov sp,_Ext_Stack_Pointer ; update the C variables + mov r5,_Free + mov _saved_C_sp,sp + mov 0(sp),r6 ; restore C frame pointer + push r4 ; push arguments to utility + push r3 + push r2 + push r1 + mova _utility_table,r1 + mul &4,r0,r0 ; scale index to byte offset + add r0,r1,r1 + mov 0(r1),r1 + jsr 0(r1) ; invoke the utility + + add &16,sp,sp ; pop arguments to utility + mov 4(r0),r1 ; extract argument to return entry point + mov 0(r0),r0 ; extract return entry point + jmp 0(r0) ; invoke it + +_interface_to_C: + mov r1,r0 ; C return value + pop r6 ; restore frame pointer + pop r5 ; and callee-saves registers + pop r4 + pop r3 + pop r6 ; restore caller's frame pointer + rts ; return to caller of C_to_interface + +Note that somewhere in the register array there would be a section +with the following code: + +offsi jmp scheme_to_interface +offsj jmp scheme_to_interface_jsr +offti jmp trampoline_to_interface + < perhaps more > + +So that the compiler could issue the following code to invoke utilities: + + <set up arguments in r1 - r4> + mov &<utility index>,r0 + jmp offsi(r6) + +or + + <set up arguments in r2 - r4> + mov &<utility index>,r0 + jsr offsj(r6) + <format word for the return address> + +<label for the return address> + <more instructions> + +Trampolines (created by the linker) would contain the code + + mov &<trampoline utility>,r0 + jsr offti(r6) + +2 (RISC processor): + +Machine M2 is a load-store architecture and has 31 general-purpose +registers (R1 - R31), and R0 always holds 0. There is no +hardware-distinguished stack pointer. The JL (jump and link) +instruction transfers control to the target and leaves the address of +the following instruction in R1. The JLR (jump and link register) +instruction is like JL but takes the target address from a register +and an offset, rather than a field in the instruction. There are no +delay slots on branches, loads, etc. (to make matters simpler). + +R2 is used as a software stack pointer, post-incremented to push. + +The C calling convention is as follows: + +- the return address is in r1 on entry. +- there is no frame pointer. Procedures allocate all the stack space +they'll ever need (including space for callees' parameters) on entry. +- all arguments (and the return address) have slots allocated on the +stack, but the values of the first four arguments are passed on r3 - +r6. They need not be preserved accross nested calls. +- scalar values are returned in r3. +- structures of 4 words or less are returned in r3 - r6. +- the stack is popped by the caller. +- r7 - r10 are caller-saves registers. +- r11 - r31 are callee-saves registers. + +The Scheme register convention is as follows: + +- register r7 holds the dynamic link, if present. +- register r8 holds the register copy of MemTop. +- register r9 holds the Free pointer. +- register r10 holds the Scheme stack pointer. +- register r11 holds the address of scheme_to_interface. +- register r12 holds the address of the register block. +- register r13 holds a mask to clear type-code bits from an object. +- values are returned in r3. +- the other registers are available without restrictions to the compiler. + +Note that scheme_to_interface, the address of the register block, and +the type-code mask, which are all constants have been assigned to +callee-saves registers. This guarantees that they will be preserved +around utility calls and therefore interface_to_scheme need not set +them again. + +The utility calling convention is: + +- arguments are placed in r3 - r6. +- the index is placed in r14. + +The argument registers are exactly those where the utility expects +them. In the code below, C_to_interface pre-allocates the frame for +utilities called by scheme_to_interface, so scheme_to_interface has +very little work to do. + +The code would then be: + +OFFSET is (21 + 1 + 4) * 4, the number of bytes allocated by +_C_to_interface from the stack. 21 is the number of callee-saves +registers to preserve. 1 is the return address for utilities, and 4 +is the number of arguments passed along. + +_C_to_interface: + add &OFFSET,r2,r2 ; allocate stack space + + st r1,-4-OFFSET(r2) ; save the return address + st r11,0-OFFSET(r2) ; save the callee-saves registers + st r12,4-OFFSET(r2) + ... + st r31,-24(r2) ; -20 - -4 are for the utility. + + or &mask,r0,r13 ; set up the pointer mask + lda _Registers,r12 ; set up the register array pointer + jl continue ; get address of continue in r1 +continue + add &(scheme_to_interface-continue),r1,r11 + or r0,r3,r4 ; preserve the entry point + +_interface_to_scheme: + ld _Ext_Stack_Pointer,r10 ; set up the Scheme stack pointer + ld _Free,r9 ; set up the Free pointer + ld _MemTop,r8 ; set up the Memory Top pointer + ld regblock_val(r12),r3 ; set up the return value + and r13,r3,r7 ; and the dynamic link register + jlr 0(r4) ; go to compiled Scheme code. + + +scheme_to_interface_jl: + add &4,r1,r1 ; bump past format word + +trampoline_to_interface: + or r0,r1,r3 ; return address is first arg. + +scheme_to_interface: + st r10,_Ext_Stack_Pointer ; update the C variables + st r9,_Free + mul &4,r14,r14 ; scale utility index + lda _utility_table,r8 + add r8,r14,r8 + ld 0(r8),r8 ; extract utility's entry point + jlr 0(r8) + + jlr 0(r3) ; invoke the assembly language entry point + ; on return from the utility. + + +_interface_to_C: + or r0,r4,r3 ; return value for C + ld -24(r2),r31 ; restore the callee-saves registers + ... + ld 0-OFFSET(r2),r11 + ld -4-OFFSET(r2),r1 ; restore the return address + add &-OFFSET,r2,r2 + jlr 0(r1) ; return to caller of C_to_interface + + +Ordinary utilities would be invoked as follows: + + <set up arguments in r3 - r6> + or &<utility index>,r0,r14 + jlr 0(r11) + +Utilities expecting a return address could be invoked as follows: + + <set up arguments in r4 - r6> + or &<utility index>,r0,r14 + jlr -8(r11) + +Trampolines would contain the following code: + + or &<utility index>,r0,r14 + jlr -4(r11) diff --git a/v8/src/compiler/documentation/cmpint.txt b/v8/src/compiler/documentation/cmpint.txt new file mode 100644 index 000000000..458c5c296 --- /dev/null +++ b/v8/src/compiler/documentation/cmpint.txt @@ -0,0 +1,1093 @@ +-*- Text -*- + +$Header: /Users/cph/tmp/foo/mit-scheme/mit-scheme/v8/src/compiler/documentation/cmpint.txt,v 1.1 1995/08/05 16:25:13 adams Exp $ + + +Copyright (c) 1991-1992 Massachusetts Institute of Technology + + + Documentation of the C interface to + MIT Scheme compiled code + *DRAFT* + + + + Remarks: + +In the following, whenever Scheme is used, unless otherwise specified, +we refer to the MIT Scheme dialect and its CScheme implementation. + +This file describes the compiled-code data structures and macros +defined in cmpint-md.h and required by cmpint.c and cmpgc.h . + +cmpaux.txt describes the assembly language code that must be written +to get all of this to work. + +cmpint-md.h is the machine dependent header file that defines many of +these parameters. A new version must be written for each +architecture. + +The "settable" fields in cmpint-md.h are described in the paragraphs +marked with "=>". + +In the following, word and longword are the size of an item that fills +a processor register, typically 32 bits. Halfword is half this size, +and byte is typically 8 bits. + + Description of compiled-code objects and relevant types: + +The Scheme compiler compiles scode expressions (often procedure +definitions) into native code. As its output, it produces Scheme +objects that represent compiled expressions. To execute these +expressions, they are passed as arguments to scode-eval together with +an environment. + +Typically these expressions will construct some pointers to compiled +procedures and define them in the environment. These procedures can +then be invoked normally from the read-eval-print loop, from +interpreted code, or from other compiled code. + +In the course of their computation, these procedures will need to +call other procedures and then proceed the computation. In order to +accomplish this, they will push compiled return addresses on the +stack, and these will eventually be popped and "jumped through" to +return. + +Compiled code and objects referenced by it are collected into +"vector-like" objects called compiled-code blocks. + +The above four classes of objects (compiled expressions, compiled +procedures, compiled return addresses, and compiled-code blocks) are +implemented using two microcode types: + +TC_COMPILED_CODE_BLOCK is used to implement compiled-code blocks. It +is a vector type, that is, it is a pointer type, the word addressed by +the pointer contains the length of the vector (not including the +length header word), and the rest of the words (length) follow at +increasing addresses in memory. Typically the first word after the +header is a non-marked-vector header, and the instructions follow it. +The non-marked-vector header covers all the instructions, but the +vector may contain arbitrary objects after the instructions, covered +only by the normal vector header. The optional, additional space at +the end of the block is called the "constants" section, since it is +used, among other things, to keep copies of constant objects used by +the compiled code. See the picture below for a diagram of the typical +layout. + +TC_COMPILED_ENTRY is used to implement compiled expressions, +compiled return addresses, compiled procedures, and some other entry +points that the compiler and the compiled-code interface need. A +compiled entry is a non-standard pointer type described below. + + Description of compiled entries: + +The address portion of a compiled entry object points to an +instruction in the "middle" of a compiled-code block. + +In order for the garbage collector to be able to move the whole +block as a unit it must be able to determine the address of the +first word. Note that this word contains the length of the whole +block, so this need not be specified again. + +The address of the first word of the block can be found from the +address of the instruction, and a few bytes, currently a halfword +preceding the instruction. These bytes are called the offset field of +a compiled entry object, and typically encode the distance (in bytes) +between the beginning of the block and the compiled entry. + +A few bytes preceding the offset field are called the format field and +encode the type of compiled entry (procedure vs. expression, etc.) and +some type-specific information (number of arguments, offset to next +return address on the stack, etc.). + +The gc-offset field and the format field must be the same size, and +their size is determined by the C typedef of format_word at the +beginning of cmpint-md.h. Note that, to date, the compiler has only +been ported to systems where this size is 2 bytes (for each), but it +should be possible to port it to systems where these fields are +larger. + + Encoding of the offset field: + +The offset field is decoded as follows: + +If the low order bit is 0 the offset is a simple offset, ie. +subtracting the offset from the address of the compiled entry +results in the address of the compiled-code block that contains the +entry. + +If the low order bit is 1, it is a continued offset, ie. subtracting +the offset from the address of the compiled entry results in the +address of another compiled entry whose offset may or may not have a +low bit of 0. + +The rest of the bits (typically 15) are some encoding of the offset: + +- If instructions can appear at arbitrary byte addresses (including +odd addresses), this field is the offset itself. + +- If instructions have alignment constraints (ie. halfword alignment +on MC68K, or longword alignment on many RISCs), this field is the +offset shifted appropriately. In this way, no bits are wasted, and +the range of offsets is increased. + +For example, + +The DEC VAX can have instructions at any byte location. The 15 bits +are the offset. + +The MC68020 can have instructions only at halfword boundaries. The 15 +bit field shifted left by 1 is the real offset. Note that in this +case, if the low bit of the offset field is 0, the offset field is the +real offset. + +The HP Precision Architecture, and many other RISCs, can have +instructions only at longword boundaries. The 15 bit field shifted +left by 2 is the real offset. + + Encoding of the format field: + +The preceding bytes encode the kind of compiled entry in the following +way: + +The format field is further subdivided into two equal sized halves, +used, roughly, for the minimum (high order half) and maximum (low +order half) number of arguments that a compiled procedure will accept. +Inappropriate values for these numbers of arguments imply that the +entry is not a procedure, and then the two halves may be combined to +generate information appropriate to the entry type. The examples +below assume that the format field is 2 bytes long. + +- For compiled expressions it is always -1 (0xffff) + +- For compiled entries it is always -3 or -2 (0xfff[d-e]). It is -2 +for compiler generated entries, -3 for compiler-interface generated +entries. + +- For compiled return addresses with saved dynamic links it is always +-4 (0xfffc). The next item on the stack is then a dynamic link. + +- For the special return address `return_to_interpreter' it is +always -5 (0xfffb). + +- For all other compiled return addresses, both halves (bytes) must +have their sign bit set, that is, they must appear negative when +sign-extended. The remaining bits of the high-order half of the field +(all but the sign bit) and all but the two most significant bits of +the low-order half of the field (sign bit and adjacent bit), when +concatenated, form the offset in the stack to the previous (earlier) +return address. This information is used by the debugger to "parse" +the stack into frames. The sub-fields are actually concatenated +backwards, with the bits from the high order half being the low order +bits in the result. If the format field is two bytes long, each half +is a single byte, and the valid range for the high-order half is +0x80-0xff, while the valid range for the low-order half is 0x80-0xdf + +- For compiled procedures, the format field describes the arity +(number of parameters) and the format of the frame on the stack: + +The high order half of the field is (1+ REQ) where REQ is the number +of required arguments. Note that REQ must such that the resulting +half of the format field does not appear negative! If the format +field is two bytes long, REQ must be less than 127. + +The low order half of the field is given by the expression +(* (EXPT -1 REST?) FRAME-SIZE) +where FRAME-SIZE is (+ 1 REQ OPT REST?), REQ is as above, OPT +is the number of named optional arguments, and REST? is 1 if +the procedure has a rest parameter (ie. it is a "lexpr"), or 0 +otherwise. FRAME-SIZE must not appear negative, thus if the format +field is two bytes long, FRAME-SIZE must be less than 127. + + Picture of typical compiled-code block and entry: + + ---------------------------------------- + start_address | MANIFEST-VECTOR | tl | + ----------------------------------------<---------\ + | NM-HEADER | il | \ + ----------------------------------------<---\ | + | | \ | + | | | | + | | | | + | | | | + | some instructions | | | + | | | | + | | | | + | | | | + | | | | + ---------------------------------------- | | + | format_field_1 | offset_field_1 | | | + ---------------------------------------- | | + entry_address_1 | movel arg0,reg0 | | | + ---------------------------------------- | | + | | | | + | | | | + | | > il | + | | | | + | more instructions | | | + | | | | + | | | | + | | | | + ---------------------------------------- | > tl + | format_field_2 | offset_field_2 | | | + ---------------------------------------- | | + entry_address_2 | andl pointer_mask,arg0,reg0 | | | + ---------------------------------------- | | + | | | | + | | | | + | | | | + | | | | + | more instructions | | | + | | | | + | | | | + | | / | + /--->----------------------------------------<---/ | + / | Scheme object | | + | ---------------------------------------- | + "cons- | | | | + tants" | | | | + | | | | + < | | | + | | more Scheme objects | | + section | | | | + | | | | + | | | | + \ | | / + \--->----------------------------------------<----------/ + +Note: The picture above assumes that each machine instruction takes the same +space as a scheme object, and that this is also the combined length of +the gc-offset and format fields. The type tags are always at the most +significant end of the word, which depending on endianness may be at +the lowest or highest addressed part of the word in memory. The +picture above depicts them on the left. + + Description of picture: + +[TC_COMPILED_CODE_BLOCK | start_address] would be the object +representing the compiled-code block. + +[TC_COMPILED_ENTRY | entry_address_1] would represent entry1. + +[TC_COMPILED_ENTRY | entry_address_2] would represent entry2. + +1) Assuming that instructions are longword aligned and that +entry_address_1 is close enough to start_address not to need an +extension, but entry_address_2 is not, then + +offset_field_1 = ((entry_address_1 - start_address) >> 1) +offset_field_2 = (((entry_address_2 - entry_address_1) >> 1) | 1) + +note that entry_address_1 - start_address is a multiple of 4 because +of the alignment assumption. + +2) Assuming that instructions are halfword aligned and that +entry_address_1 is close enough to start_address not to need an +extension, but entry_address_2 is not, then + +offset_field_1 = (entry_address_1 - start_address) +offset_field_2 = ((entry_address_2 - entry_address_1) | 1) + +note that entry_address_1 - start_address is a multiple of 2 because +of the alignment assumption. + +3) Assuming that instructions are byte aligned and that +entry_address_1 is close enough to start_address not to need an +extension, but entry_address_2 is not, then + +offset_field_1 = ((entry_address_1 - start_address) << 1) +offset_field_2 = (((entry_address_2 - entry_address_1) << 1) | 1) + +The length of the "constants" section is (tl - il). +There are (tl + 1) total words in the object. + +=> Macro PC_ZERO_BITS should be defined to be the number of bits in +instruction addresses that are always 0 (0 if no alignment +constraints, 1 if halfword, etc.). + +=> format_word should be 'typedefd' to be the size of the descriptor +fields. It is assumed that the offset field and the format field are +the same size. This definition is unlikely to need modification. + + Compiled closures: + +Most compiled procedures are represented as a simple compiled entry +pointing to the compiled-code block generated by the compiler. + +Some procedures, called closures, have free variables whose locations +cannot be determined statically by the compiler or the linker. The +compiler will generate code to construct a tiny compiled-code block on +the fly and make the compiled procedure be an entry point pointing to +this dynamically allocated compiled-code block. + +For example, consider the following code, appearing at top level, + + (define foo + (lambda (x) + (lambda (y) (+ x y)))) ;lambda-1 + +The outer LAMBDA will be represented as a compiled entry pointing to +the appropriate block. The inner LAMBDA cannot be since there may be +more than one instance, each with its independent value for X: + + (define foo1 (foo 1)) + (define foo2 (foo 2)) + +Compiled closures are implemented in the following way: The entry +corresponding to the procedure points to a jump-to-subroutine (or +branch-and-link) instruction. The target of this jump is the code +corresponding to the body of the procedure. This code resides in the +compiled-code block that the compiler generated. The free variables +follow the jump-to-subroutine instruction (after aligning to longword). + +Using this representation, the caller need not know whether it is +invoking a "normal" compiled procedure or a compiled closure. When +the closure is invoked normally, it jumps to the real code for the +procedure, after leaving a "return address" into the closure object in +a standard place (stack or link register). This "return address" is +the address of the free variables of the procedure, so the code can +reference them by using indirect loads through the "return address". + +Here is a stylized picture of the situation, where the procedure +object (closure entry point) is a pointer to <1>. + +closure object: + +-------------------------------+ + | | + | <header> | + | | + +-------------------------------+ +<1> | jsr instruction to <2> | + +-------------------------------+ + | <value of X> | + +-------------------------------+ + +compiled code blok produced by the compiler: + + +-------------------------------+ + | | + | ... | + | | + +-------------------------------+ +<2> | <code for inner lambda> | + | | | + | V | + | | + +-------------------------------+ + +The code above could be compiled as (in pseudo-assembly language, in +which & denotes an immediate value): + + const format-word:0x0202;gc-offset:?? +foo: + mov rfree,reg0 + mov &[TC_MANIFEST_CLOSURE | 4],reg1 ; gc header + mov reg1,0(reg0) + mov &[format_field | offset_field],reg1 ; entry descriptor + mov reg1,NEXT_WORD(reg0) + mov &[JSR absolute opcode],reg1 ; jsr absolute opcode/prefix + mov reg1,2*NEXT_WORD(reg0) + mova lambda-1,reg1 ; entry point + mov reg1,3*NEXT_WORD(reg0) + mov arg1,4*NEXT_WORD(reg0) ; x + mov &5*NEXT_WORD,reg1 + add reg0,reg1,rfree + mov &[TC_COMPILED_ENTRY | 2*NEXT_WORD],reg1 + add reg0,reg1,retval + ret + + const format-word:0xfffe;gc-offset:?? +lambda-1: + mov arg1,reg0 ; y + mov x_offset(retlnk),reg1 ; x x_offset = 0 + add reg1,reg0,reg0 + mov reg0,retval + ret + +A more detailed picture of the closure object would be: + + ---------------------------------------- + | MANIFEST-CLOSURE | 4 | + ---------------------------------------- + | format_field | gc_offset_field | ;format = 0x0202 + ---------------------------------------- ;offset = encode(8) +entry | JSR absolute opcode | + ---------------------------------------- + | address of lambda-1 | + ---------------------------------------- ;address of retadd +retadd | value of x | ; -> retlnk before + ---------------------------------------- ; entering lambda-1 + +The following macros are used to manipulate closure objects: + +=> COMPILED_CLOSURE_ENTRY_SIZE specifies the size of a compiled +closure entry (there may be many in a single compiled closure block) +in bytes. In the example above this would be 12 bytes (4 total for +the format and gc offset fields, 4 for JSR opcode, and 4 for the +address of the real entry point). + +=> EXTRACT_CLOSURE_ENTRY_ADDRESS is used to extract the real address +of the entry point from a closure object when given the address of the +closure entry. Note that the real entry point may be smeared out over +multiple instructions. In the example above, given the address the +word labeled ENTRY, it would extract the address of LAMBDA-1. + +=> STORE_CLOSURE_ENTRY_ADDRESS is the inverse of +EXTRACT_CLOSURE_ENTRY_ADDRESS. That is, given the address of a +closure entry point, and a real entry point, it stores the real entry +point in the closure object. In the example above, given the address +of the word labeled ENTRY, and a different entry point, say for +LAMBDA-2, it would make the closure jump to LAMBDA-2 instead. This is +used to relocate closures after garbage collection and similar +processes. + + Some caveats: + +- The code for lambda-1 described above does not match what the compiler +would currently generate. + +The current parameter-passing convention specifies that all the state +needed to continue the computation at a procedure's entry point must +be on the stack and all the information on the stack must be valid +objects (for GC correctness in case of interrupt, more on this below). +Thus the contents of retlnk must be pushed on the stack as a valid +object, and this is done by reconstructing the closure object whose +datum field encodes the address of entry and whose type tag is +TC_COMPILED_ENTRY, and then pushing it onto the stack. Note that on +some machines, the return address for the subroutine-call instruction +is pushed on the stack by the hardware, and thus this value might have +to be popped, adjusted, and re-pushed if it cannot be adjusted in +place. + +The code for lambda-1 would then be closer to: + +lambda-1: + sub retlnk,&retadd-entry,retlnk + or &[TC_COMPILED_ENTRY | 0],retlnk,retlnk ; set type code + push retlnk + <interrupt check> ; more on this below + mov arg1,reg0 + mov top_of_stack,reg1 + and &[0 | -1],reg1,reg1 ; remove type code + mov x_offset+retadd-entry(reg1),reg1 + add reg1,reg0,retval + pop ; the closure object + ret + +Note that (retadd-entry) is a constant known at compile time, and is the +same for the first entry point of all closures. On many machines, the +combination sub/or can be obtained with a single add instruction: + add &([TC_COMPILED_ENTRY | 0]-(retadd-entry)),retlnk,retlnk + +This value is called the "magic constant", encoded in the first few +instructions of a closure's code. + +- Multiple closures sharing free variables can share storage by having +multiple entry points (multiple JSR instructions) in the closure +object. The compiler occasionally merges multiple related closures +into single objects. + +A complication arises when closure entry points are not necessarily +long-word aligned, since the compiler expects all variable offsets +(like x_offset above) to be long-word offsets. + +This problem only occurs on machines where instructions are not all +long-word aligned and for closures with multiple entry points, since +the first entry point is guaranteed to be aligned on a long-word +boundary on all machines. + +The current solution to this problem, on those machines on which it is +a problem, is to choose a canonical entry point (the first one) +guaranteed to be aligned properly, and push that on the stack on entry +to a closure's code. The compiler keeps track of what actual entry +point the code belongs to even though the value on the stack may +correspond to a different entry point. + +The "magic constant" becomes an entry-point dependent value, since each +return address may have to be bumped back to the first entry point in +the closure object rather than to the immediately preceding entry point. + + Interrupts: + +Scheme polls for interrupts. That is, interrupt processing is divided +into two stages: + +- When an asynchronous interrupt arrives, the handler (written in C) +invoked by the operating system sets a bit in a pending-interrupts +mask, stores the relevant information (if any) in a queue, and +proceeds the computation where it was interrupted. + +- The interpreter and compiled code periodically check whether an +interrupt is pending and if so, invoke an interrupt handler written in +Scheme to process the interrupt. The interpreter checks for +interrupts at the apply point. Compiled code currently checks at +every procedure entry (including loops) and at every continuation +invocation. This may change in the future, although it will always be +the case that interrupts will be checked at least once in each +iteration of a loop or recursion. + +Compiled code does not actually check the bits in the mask to +determine whether an interrupt is pending. It assumes that the +first-level interrupt handler (the handler written in C) not only sets +the bits, but also changes the copy of the MemTop (top of consing +area) pointer used by the compiler so that it will appear that we have +run out of consing room. Thus compiled code merely checks whether the +Free pointer (pointer into the heap) is numerically larger than the +MemTop pointer, and if so it invokes an assembly-language or C utility +that decides whether a garbage collection is needed or an interrupt +must be processed. Sometimes this utility will decide that the +interrupt need not be processed (it is disabled, for example), and +will need to return to the compiled code skipping the interrupt check +since otherwise we will get into an infinite loop. + +The interrupt check code is fixed (so that the handler can determine +how much code to skip) and comes in two varieties: closure interrupt +code, and normal-entry (other) interrupt code. Normal-entry interrupt +code is always the first code in an entry point (procedure or +continuation, but not closure code) and merely compares the Free and +MemTop pointers and branches. Closure code does this comparison after +setting up the closure object. Closure code assumes that the closure +object is in the first parameter location (the closure itself is +argument 0) so that free variables can be fetched. Thus a closure +label must first set this up correctly, and then check for interrupts. + +In pseudo-assembly language, a "normal" entry might look like + +gc_or_int LOADI #interrupt-handler-index,rindex + LOADA entry,rentry + JMP scheme-to-interface + format word and gc word for the entry +entry CMP Free,MemTop + BGE gc_or_int +after_entry <actual code for the entry> + +a "closure" entry might look like (this is not in the closure object, +but in the code block at which the closure object points) + +gc_or_int LOADI #interrupt-handler-index,rindex + LOADA entry,rentry + JMP scheme-to-interface + format word and gc word for the entry +entry ADDI offset,retadd,ret_add ; bump ret. add. to entry point + ORI #[TC_CLOSURE | 0],ret_add + PUSH ret_add ; arguments on the stack + CMP Free,MemTop + BGE gc_or_int +after_entry <actual code for the entry> + +The following macros are used by the C utility and handler to +determine how much code to skip: + +=> ENTRY_SKIPPED_CHECK_OFFSET is the number of bytes between +entry and after_entry in a normal entry. + +=> CLOSURE_SKIPPED_CHECK_OFFSET is the number of bytes +between entry and after_entry in a closure entry. + +=> ENTRY_PREFIX_LENGTH is the number of bytes between gc_or_int and +entry in a normal entry, not counting those taken up by the format +word and the gc word. + + Important considerations: + +The Scheme compiled code register set includes the current copy of the +Free pointer, but does not include the copy of MemTop, although it is +mostly constant. The reason is that the C-level interrupt handler +does not have convenient access to the register set at the point of +the interrupt, and thus would have a hard time changing the version of +MemTop used by compiled code at the point of the interrupt. Thus the +copy of MemTop used by compiled code is kept in memory. + +On machines where register-to-memory comparisons can be done directly +this is no problem, but on load/store architectures (most RISCs for +example), this is not feasible. Furthermore, most RISCs have a few +cycles of delay for memory loads, and adjacent instructions may not be +interlocked by the hardware. Thus a sequence like + + LOAD Memory_MemTop,Rtemp + CMP Rfree,Rtemp + BGE gc_or_int + +may be very slow and NOPs may have to be inserted explicitly between +the LOAD and CMP instructions to make the code work. + +Since Scheme's interrupt response is not immediate, and polling is +frequent, the following sequence can be used instead: + + CMP Rfree,Rmemtop + BGE gc_or_int + LOAD Memory_MemTop,Rmemtop + +Where Rmemtop is a register that holds a recent value of MemTop and is +reloaded at every interrupt check. Thus interrupt processing will +start at the second interrupt check after the actual interrupt comes +in. In other words, if the sequence of entry points executed +dynamically is ep1, ep2, ep3, and an asynchronous interrupt occurs +between ep1 and ep2, the interrupt handler will not be invoked until +ep3, rather than ep2. + +This instruction sequence eliminates the need to wait for the LOAD to +complete, and the LOAD will have completed (or will be handled by the +hardware's interlock mechanism) by the next check since at least one +instruction (a branch instruction), and often many more, will +intervene. + +Note that this delayed checking does not affect garbage collection +interruptions since MemTop is constant between garbage collections, +and thus the value being loaded is always the same, in the absence of +asynchronous interrupts. + +Various operating systems allow the signal handler convenient access +to the interrupted code's register set. In such a situation, the LOAD +instruction can be eliminated and the C-level interrupt handler can +modify Rmemtop directly. Rmemtop should be chosen from the +caller-saves convention (super-temporary) registers if possible, since +these registers must be explicitly saved by the signal handler, rather +than implicitly by the calling convention. + + Interrupts and closures that share storage: + +If an interrupt arrives on entry to the closure, the correct closure +object must be reconstructed so that the computation will continue +correctly on return from the interrupt. The code to reconstruct the +correct closure is also issued by the compiler, which at compile time +maintains the identity of each closure and the distance to the +canonical closure used for environment purposes. + +If the interrupt is dismissed, instead of processed, we need to +continue the computation bypassing the interrupt checking code in +order to avoid an infinite loop. This is what the macro +CLOSURE_SKIPPED_CHECK_OFFSET is used for. We must skip the preamble +of the closure code and emulate part of it, that is, adjust the object +on top of the stack to be the closure object that the code expects to +have there. This can be done by extracting the magic constant from +the entry point, and bumping the corresponding return address by this +value. The macro ADJUST_CLOSURE_AT_CALL accomplishes this feat on +those machines where it is needed. + +=> ADJUST_CLOSURE_AT_CALL, when given an entry point and a location, +adjusts the closure object stored at location so that it is the +closure object that the entry point expects on top of the stack. On +machines where all instructions are long-word aligned, it is a NOP, on +other machines (eg. 68k, VAX), it extracts the magic constant from the +closure's code, and uses it to construct the appropriate closure +object. + + External calls from compiled code: + +Many calls in scheme code (and particularly in large programs) are +calls to independently compiled procedures or procedures appearing at +the top level of a file. All these calls are calls to potentially +unknown procedures since the names to which they are bound can be +unbound or redefined dynamically at run time. + +The code issued by the compiler for such an external call must take +into account the possibility of the lack of a valid value, run-time +definition, and run-time assignment. This is done as follows: + +For each external procedure called with a fixed number of arguments +(more on this below), a small contiguous space is allocated in the +"constants" section of the compiled-code block. + +This space initially contains the name of the external variable +whose value is being invoked, and the number of arguments (+ 1 for +technical reasons) being passed to the procedure. + +These locations will be replaced at load time by an absolute jump to +the correct entry point of the called procedure if the number of +arguments matches and the callee (target procedure) is compiled, or by +an absolute jump to some utility code generated on the fly to +interface the caller and the callee (called a trampoline procedure). +Note that both procedures need not be in the same compiled-code block. + +The fixed code in the code section of the compiled-code block contains +a pc-relative branch instruction to this space allocated in the +"constants" section. + +When the compiled-code block is loaded, a linker that resolves these +references and replaces the name and arguments with machine-specific +code to do the absolute jump is invoked. The linker also records the +locations of all such jump instructions so that a subsequent +redefinition or assigment of the same name will cause the jump +instruction to be replaced by a new one to the correct value. Note +that the number of arguments needs to be checked only by the linker, +so no instructions are issued to check it at run time. It is for this +reason that the number of arguments is part of the information left by +the compiler in the "constants" section. + +These entries in the "constants" section are called execute caches, +operator links, or "UUO" links for historical reasons. They must be +large enough to contain the instructions required for an absolute jump +(and possibly some delay slot instructions in a RISC-style machine), +and the number of arguments passed in the call. This number of +arguments is not used in the call sequence, but is used by the linker +when initially linking and when relinking. + +Execute caches are contiguous in the "constants" section, and the +whole lot is preceded by a GC header of type TC_LINKAGE_SECTION which +contains two fields. The least-significant halfword of the header +contains the size in longwords of the execute-cache section (note that +each cache entry may take up more than one longword). The remaining +bits (ignoring the type code) MUST be 0. If a file makes enough +external calls that this halfword field cannot hold the size, the +links caches must be separated into multiple blocks each with its own +header. + +Occasionally a procedure is called with more than one number of +arguments within the same file. For example, the LIST procedure may +be called with three and seven arguments in the same file. In this +case there would be two execute caches for LIST. One would correspond +to the argument count of three, and the other to seven. + +As an example, consider the code generated for + +(sort <some list> <some predicate>) + +where sort is the "global" procedure sort. + +The code section would contain + + <compute some predicate> + push <some predicate> + <compute some list> + push <some list> + branch sort-uuo-link + +In the constants section there would be a label that would contain the +following after linking + +sort-uuo-link: + jump sort ; Absolute address for sort + 3 ; Number of arguments + 1 + +Before linking it would contain + +sort-uuo-link: + SORT ; The symbol SORT + 3 ; Number of arguments + 1 + +This assumes that the absolute jump instruction takes one word. If it +takes more, the appropriate padding would have to be inserted between +the symbol SORT and the number 3. On machines where instructions are +not necessarily longword aligned (MC68020 and VAX, for example), the +padding bits for the instruction can be used to contain the argument +count. Note that the order of the instructions and the count are +machine dependent, although typically the instructions precede the +count. + +The following macros are used to manipulate execute caches: + +=> EXECUTE_CACHE_ENTRY_SIZE specifies the length (in longwords) of an +execute-cache entry. This includes the size of the instructions and +the argument count. For the example above it would be 3, assuming +that the jump instruction and the absolute address take two words +together (the third is for the argument count). Note that on RISC +machines, this size may have to include the size of the branch delay +slot instruction. + +=> EXTRACT_EXECUTE_CACHE_ARITY specifies how to read the argument +count from an execute-cache entry when given the address of the entry. +In the above example, it would extract 3 from the address labeled +sort-uuo-link. + +=> EXTRACT_EXECUTE_CACHE_SYMBOL specifies how to read the symbol from +an execute-cache entry (before it is actually linked) when given the +address of an entry. In the above example, it would extract the +symbol SORT from sort-uuo-link. + +=> EXTRACT_EXECUTE_CACHE_ADDRESS fetches the real entry point stored +in an execute-cache entry when given the address of the entry. In the +above example, it would extract the entry point of the sort procedure +when given the address of the jump instruction (labeled as +sort-uuo-link). + +=> STORE_EXECUTE_CACHE_ADDRESS is the inverse of this, ie. when given a +target entry point and the address of an execute cache entry, it +stores the entry point there. In the above example, given a new entry +point for sort, and sort-uuo-link, it would modify the jump +instruction to jump to the new location. + +=> STORE_EXECUTE_CACHE_CODE stores the fixed instructions (opcodes), +if any, in an execute cache cell. If the opcodes depend on the actual +target address, this macro should be a NOP, and all the work should be +done by STORE_EXECUTE_CACHE_ADDRESS. These two macros are separated +to avoid extra work at garbage collection time on architectures where +some or all of the code need not change. In the example above, this +macro would store the jump opcode. + + + Trampolines: + +Trampolines are the linker-generated procedures that interface the +caller and the callee when they are not directly compatible. They may +not be directly compatible because the callee may not exist, may not +be a compiled procedure, or may expect arguments in different +locations. Trampolines typically call a C or assembly-language +procedure to reformat the argument list or invoke the error handler. +C procedures are invoked using the scheme_to_interface (and +trampoline_to_interface) code described below. + +A trampoline is similar to a compiled closure in that it is a small +compiled-code block with some additional storage needed by the +trampoline handler (like the actual procedure being invoked, the +variable that is unbound, or the number of arguments being passed). +The code typically invokes an out-of-line handler passing it the +address of the storage section, and an index into a table of C or +assembly language procedures that handle the actual transfer. + +A typical trampoline looks like + + ---------------------------------------- +block | MANIFEST-VECTOR | 6 | (4 + words of storage) + ---------------------------------------- + | NM-HEADER | 3 | (fixed) + ---------------------------------------- + | format_field | offset_field | (fixed) + ---------------------------------------- +entry | LOADI #index,rindex | (index varies) + ---------------------------------------- + | JSR trampoline_to_interface | (fixed) + ---------------------------------------- +retadd | first word of storage | (variable) + ---------------------------------------- + | second word of storage | (variable) + ---------------------------------------- + +=> TRAMPOLINE_ENTRY_SIZE is the size in longwords of the compiled-code +portion of a trampoline. It is similar to COMPILED_CLOSURE_ENTRY_SIZE +but in longwords, and will typically represent less storage since an +absolute address is not needed (or desirable). It must include the +format word and the GC offset for the entry. In the example above it +would be 3. Note that it must be an integer number of longwords, so the +code should be padded if necessary. + +=> TRAMPOLINE_ENTRY_POINT returns the address of the entry point of a +trampoline when given the address of a trampoline's block, i.e. the +address of the word that will contain the manifest vector header. In +the picture above, when given the address of the byte labeled `block', +it will return the address of the byte labeled `entry'. + +=> TRAMPOLINE_STORAGE returns the address of the first storage word in +a trampoline when given the addres of the first instruction (the entry +point of the trampoline). In the picture above it would return the +address of the word labeled `retadd' when given the address of the +word labeled `entry'. + +Most versions of cmpint-md.h define the last two macros in the same +way, in terms of TRAMPOLINE_BLOCK_TO_ENTRY, which is used only for +this purpose. The definitions that use TRAMPOLINE_BLOCK_TO_ENTRY +assume that the first instruction of a trampoline is aligned on a +longword boundary. If this is the case, you can define +TRAMPOLINE_BLOCK_TO_ENTRY appropriately and use the definitions of +TRAMPOLINE_ENTRY_POINT and TRAMPOLINE_STORAGE from another machine. +TRAMPOLINE_BLOCK_TO_ENTRY is the number of longwords from the start +of a trampoline's block (the manifest vector header in the picture +above), to the first instruction, which must be longword aligned. +This will typically be 3 since there are two scheme header words, and +the gc and format word typically take one longword together. + +=> STORE_TRAMPOLINE_ENTRY stores the "compiled" code into an "empty" +trampoline. It is given the address of the entry point, and the index +of the C procedure to invoke (they are all in a table), and stores the +machine code necessary to invoke scheme_to_interface (or +trampoline_to_interface), passing the index and the address of the +trampoline storage area as parameters. In the example above this +macro would store the LOADI (load immediate) and JSR instructions. + + Compiled code and processor caches: + +Many modern computers have processor caches that speed up the average +memory reference if the code exhibits sufficient locality in its +reference patterns. In order to obtain increased performance at a +lower cost, many processors have split caches for instructions and +data that are not guaranteed to be consistent, ie. they are not +necessarily invisible to the programmer. + +This presents problems for self-modifying code and for dynamic loaders +and linkers, since instructions are stored using data references (and +therefore the data cache), but the instruction cache may not reflect +the updates. Modern hardware with split caches often provides some +way to synchronize both caches so that the operating system can +guarantee correct operation of newly-loaded programs. + +The Scheme compiled code support performs some of the same tasks that +operating systems do, and therefore runs into these problems. + +The ways in which the consistency problem arises in the Scheme system +are: + +- Newly allocated instructions. The compiler can be invoked +dynamically, compiled code can be loaded dynamically into freshly +allocated storage, and compiled closures are created dynamically. The +instruction cache must reflect the changes made to memory through the +data cache. The operating system's program loader must solve +precisely this problem. + +- Execute caches may change their contents. Execute caches contain +jump instructions to the appropriate code, but these instructions may +change when the corresponding variables are assigned. If the +instruction cache is not updated, the wrong code may be entered on +subsequent calls. Operating systems with dynamic linking must solve +this problem as well. + +- Code is moved by the garbage collector, since code space is not +separate from data space and static. If the caches are not +synchronized after a garbage collection, subsequent instruction +fetches may result in the execution of incorrect instructions. +The operating system must solve this problem when it re-allocates +virtual memory pages. + +The problem can be solved by synchronizing the caches in the +appropriate places. The relevant places in the Scheme system have +been identified, and use two machine-dependent macros to +synchronize both caches or flush the instruction cache. + +=> FLUSH_I_CACHE is used to flush the portion of the I-cache that +Scheme code addresses may be in, or alternatively, to guarantee that +the I-cache contains only valid data. It may flush/synchronize the +entire I-cache instead, if it is easier. It is used after garbage +collections and image loads. + +=> FLUSH_I_CACHE_REGION is used to flush or synchronize a region of +the address space from the I-cache. It is given the base address and +the number of long-words of the region of memory that has just been +modified and whose new contents must be copied into the I-cache for +correct execution. + +It is used after updating an execute cache while running between +garbage collections. It is not used during garbage collection since +FLUSH_I_CACHE will be used afterwards. + +These macros need not be defined if it is not needed to flush the +cache. A NOP version is provided by the code when they are not +defined in cmpint-md.h + +Note that on some machine/OS combinations, all system calls cause a +cache flush, thus an innocuous system call (eg., a time reading call) +may be used to achieve this purpose. + +Many modern machines only make their cache flushing instructions +available to the operating system (they are priviledged instructions), +and some operating systems provide no system calls to perform this +task. In the absence of information on the structure and +characteristics of the cache (the information could be used to write +flushing routines), the Scheme compiler and system may have to be +changed in order to run on such machines. Here is a set of changes +that will bypass the problem, at the expense of some functionality and +perhaps performance: + +- Change the entry code for closures and execute caches. + +The code in execute caches can be changed from + + jump target + +to + + jsr fixed-utility-routine + target address + +where fixed-utility-routine extracts target address from the return +address and invokes it. The same change can be made to the closure +entry code. + +This solves the problem of assignment to variables with execute +caches. + +This change can be done quite easily since the format of closures and +execute caches is already machine dependent, and all the accessors, +constructors, and mutators have been abstracted into macros or can +be easily rewritten in the compiler. + +- Change the storage management scheme to accomodate a code area that +is never garbage collected so that code, once placed there, never +moves. + +This would constitue a major change to the system. The main problem +that this change would present is the following: + +Closures are data structures created and dropped on the fly, thus they +cannot be allocated from a region of memory that is never reclaimed. +Thus closures would have to be allocated from data space, and could no +longer contain instructions. This implies that the format of entry +points would have to change, since the relevant information would no +longer consist of a single address, but two, ie. the address of the +code in code space and the address of the data in data space. This +would imply many changes to the compiler for there are implicit +assumptions throughout that compiled entry points take no space +besides the space taken by the code. In particular, simple closures +would disappear, and multi-closures would have to be redesigned. + + Implementation registers and utilities + +The C runtime support maintains some state variables that Scheme code +may need to access. In order to make these variables easily +accessible to both languages, all these variables are collected into a +contiguous vector (the register block) accesible by both. The C +variable "Registers" holds the address of this vector, and while +compiled Scheme code is being executed, a processor register also +holds the address of the vector. Among other data, the register block +contains the memory version of MemTop, the interpreter's expression +and environment registers, the interrupt mask and pending interrupt +words. + +In addition, the compiler occasionally needs static memory locations +into which it can spill the values contained in processor registers. +Rather than using another register to hold the address of the spill +locations, these are allocated on the same vector as the register +block, and the register that holds the address of the register block +can be used to access the spill locations as well. + +Compiled code also needs to invoke assembly language and C utilities +to perform certain tasks that would take too much space to code +in-line. Rather than choosing fixed addresses for these routines, or +having to update them every time a piece of code is loaded or dumped, +a register is reserved to hold the address of one of them, and the +distance between them is pre-determined, so that compiled code can +invoke any of them by adding an offset to the value in the register +and jumping there. + +On processors with few registers (eg. 68k), it would be wasteful to +reserve two registers in this fashion. Both registers are therefore +merged. Yet another section of the register block array is reserved +for utility procedures, and appropriate jump instructions are placed +there so that compiled code can invoke the utilities by jumping into +the register block array. + +The following macros define the sizes of the various areas of the +array. None of them need to be defined except to override the +default. The default assumes that there are enough processor +registers that another one can be reserved to point to the utility +handles. + +=> COMPILER_REGBLOCK_N_FIXED is the size of the register block +section of the array. It must accomodate at least as many locations +as the interpreter expects to have. + +=> COMPILER_REGBLOCK_N_TEMPS is the number of spill locations. + +=> COMPILER_TEMP_SIZE is the size (in long words) of the contents of a +floating point register if different from a double. For example, an +MC68881 saves registers in 96 bit (3 longword) blocks. The default is +fine for most machines. + +=> COMPILER_REGBLOCK_EXTRA_SIZE is the additional size (in longwords) +to be reserved for utility handles. It is typically defined the +following way as (COMPILER_REGBLOCK_N_HOOKS * COMPILER_HOOK_SIZE). + +=> COMPILER_REGBLOCK_N_HOOKS is the maximum number of utility handles. + +=> COMPILER_HOOK_SIZE is the size in longwords of a utility handle (an +absolute jump instruction). + +=> Macro ASM_RESET_HOOK can be used to initialize the register block +array. It is invoked at boot time. + + Miscellany: + +Macro IN_CMPINT_C, defined in cmpint.c, can be used to conditionally +include code (extern procedures) needed by the port. It is only +defined when cmpint-md.h is included by cmpint.c . + +=> Macro COMPILER_PROCESSOR_TYPE identifies the processor type. It +should be unique for each kind of processor. diff --git a/v8/src/compiler/documentation/facts.txt b/v8/src/compiler/documentation/facts.txt new file mode 100644 index 000000000..1747e4731 --- /dev/null +++ b/v8/src/compiler/documentation/facts.txt @@ -0,0 +1,28 @@ +Some useful facts: + +* A canonical subproblem has a `continuation' component that satisfies +the predicate `continuation?'. The `prefix' component is never null, +and always terminates in a return whose operator is `continuation', or +a combination with that continuation. The `rvalue' component is +always a reference to the parameter of `continuation'. + +* A non-canonical subproblem has a `continuation' component that is a +virtual continuation; this continuation is never reified. The +`rvalue' component is arbitrary. The `prefix' component may or may +not be null. + +* Every non-canonical subproblem is eventually translated into a +virtual return node. The exception to this rule is that subproblems +whose values are unused or known are usually translated into null +sequences. In either case the prefix is output. + +* A continuation which is the operator of the +`application-continuation-push' of some application satisfies +`continuation/always-known-operator?'. Furthermore, it has precisely +one application: the one of which it is the associated +continuation-push. + +* A return node can only have a `continuation-push' if it was created +by `combination/constant!' (i.e. was the result of constant-folding). +In this case the `continuation-push' is guaranteed to be of type +`effect', so that the continuation is not pushed at all. diff --git a/v8/src/compiler/documentation/files.txt b/v8/src/compiler/documentation/files.txt new file mode 100644 index 000000000..1f0a75a13 --- /dev/null +++ b/v8/src/compiler/documentation/files.txt @@ -0,0 +1,197 @@ +================================================================ + compiler/back: +================================================================ +This directory contains the machine-independent portion of the back +end. It contains bit-string utilities, symbol table utilities, label +management procedures, the hardware register allocator, and the +top-level assembler calls. + +* asmmac.scm +;;;; Assembler Syntax Macros + +* asutl.scm +;;;; Assembler Utilities +;;; package: (compiler) + +* bittop.scm +;;;; Assembler Top Level +;;; package: (compiler assembler) + +* bitutl.scm +;;;; Assembler utilities +;;; package: (compiler assembler) + +* insseq.scm +;;;; Lap instruction sequences + +* lapgn1.scm +;;;; LAP Generator: top level +;;; package: (compiler lap-syntaxer) + +* lapgn2.scm +;;;; LAP Generator: High-Level Register Assignment + +* lapgn3.scm +;;;; LAP Generator +;;; package: (compiler lap-syntaxer) + +* linear.scm +;;;; LAP linearizer +;;; package: (compiler lap-syntaxer linearizer) + +* mermap.scm +;;;; LAP Generator: Merge Register Maps + +* regmap.scm +;;;; Register Allocator +;;; package: (compiler lap-syntaxer) + +* syerly.scm +;;;; Syntax time instruction expansion + +* symtab.scm +;;;; Symbol Tables +;;; package: (compiler assembler) + +* syntax.scm +;;;; LAP Syntaxer + +================================================================ + compiler/rtlbase: +================================================================ + +This directory contains utilities used by the RTL generator and +optimizer. + +* regset.scm +;;;; RTL Register Sets + +* rgraph.scm +;;;; Program Graph Abstraction + +* rtlcfg.scm +;;;; RTL CFG Nodes + +* rtlcon.scm +;;;; Register Transfer Language: Complex Constructors +;;; package: (compiler) + +* rtlexp.scm +;;;; Register Transfer Language: Expression Operations +;;; package: (compiler) + +* rtline.scm +;;;; RTL linearizer + +* rtlobj.scm +;;;; Register Transfer Language: Object Datatypes + +* rtlreg.scm +;;;; RTL Registers + +* rtlty1.scm +* rtlty2.scm +;;;; Register Transfer Language Type Definitions +;;; package: (compiler) + +* valclass.scm +;;;; RTL Value Classes (? a hierarchy, right?) + +================================================================ + compiler/rtlgen: +================================================================ + +This directory contains the code that translates the flow-graph into +register transfer language (RTL). + +* fndblk.scm +;;;; RTL Generation: Environment Locatives +;;; package: (compiler rtl-generator find-block) + +fndvar.scm +;;;; RTL Generation: Variable Locatives +;;; package: (compiler rtl-generator) + +opncod.scm +;;;; RTL Generation: Inline Combinations +;;; package: (compiler rtl-generator combination/inline) + +rgcomb.scm +;;;; RTL Generation: Combinations +;;; package: (compiler rtl-generator generate/combination) + +rgproc.scm +;;;; RTL Generation: Procedure Headers +;;; package: (compiler rtl-generator generate/procedure-header) + +rgretn.scm +;;;; RTL Generation: Return Statements + +rgrval.scm +;;;; RTL Generation: RValues +;;; package: (compiler rtl-generator generate/rvalue) + +rgstmt.scm +;;;; RTL Generation: Statements +;;; package: (compiler rtl-generator) + +rtlgen.scm +;;;; RTL Generation +;;; package: (compiler rtl-generator) + +================================================================ + compiler/rtlopt: +================================================================ + +This directory contains the RTL-level optimizer. It contains code to +perform lifetime analysis, redundant subexpression elimination, +elimination of dead code, etc. + +* ralloc.scm +;;;; Register Allocation + +* rcompr.scm +;;;; RTL Compression + +* rcse1.scm +;;;; RTL Common Subexpression Elimination: Codewalker +;;; package: (compiler rtl-cse) + +* rcse2.scm +;;;; RTL Common Subexpression Elimination + +* rcseep.scm +;;;; RTL Common Subexpression Elimination: Expression Predicates + +* rcseht.scm +;;;; RTL Common Subexpression Elimination: Hash Table Abstraction +;;; package: (compiler rtl-cse) + +* rcserq.scm +;;;; RTL Common Subexpression Elimination: Register/Quantity +Abstractions + +* rcsesr.scm +;;;; RTL Common Subexpression Elimination: Stack References + +* rdebug.scm +;;;; RTL Optimizer Debugging Output + +* rdflow.scm +;;;; RTL Dataflow Analysis +;;; package: (compiler rtl-optimizer rtl-dataflow-analysis) + +* rerite.scm +;;;; RTL Rewriting +;;; package: (compiler rtl-optimizer rtl-rewriting) + +* rinvex.scm +;;;; RTL Invertible Expression Elimination +;;; package: (compiler rtl-optimizer invertible-expression-elimination) + +* rlife.scm +;;;; RTL Register Lifetime Analysis +;;; Based on the GNU C Compiler + +* rtlcsm.scm +;;;; RTL Common Suffix Merging \ No newline at end of file diff --git a/v8/src/compiler/documentation/notes.txt b/v8/src/compiler/documentation/notes.txt new file mode 100644 index 000000000..9087d33ed --- /dev/null +++ b/v8/src/compiler/documentation/notes.txt @@ -0,0 +1,147 @@ + + + Notes on potential compiler improvements + + +* The analysis which generates `block-stack-link' could be improved. +Currently, it fails in the case where the procedure is always invoked +as a subproblem, but there are multiple possible continuations. The +reason is subtle: we need to know that the `continuation/offset' of +all of the continuations is the same (as well as the frame-size and +the closing-block). Unfortunately, the computation of the offset +depends on the subproblem ordering, which depends on the stack-link +(to decide whether or not to use static links). Catch-22. Probably +this can be solved. + +* Pathological case of "takr.scm" can perhaps be solved by integrating +simapp and outer into one pass. By handling "passed-in" nodes before +other nodes, and not making links to such nodes, the explosion of +useless dataflow information would be avoided. However, this affects +the static-link analysis, which looks at BOTH the "passed-in" bit as +well as the set of values. Think of some way to make this degrade +properly. + +* Make the static-link analysis more sophisticated so that it uses +static links whenever the current strategy would require going through +at least two links. This sometimes happens when the parent must be +located through the closing block of the continuation. In this case +it is probably better to add a redundant static link for speed in +lookup. + +* When tail-recursing into an internal procedure, if the procedure has +no free variables, we can erase the calling frame. In the simplest +case, this means that such a procedure is actually an external +procedure. However, we could get more sophisticated and notice that +it was OK to delete some of the ancestor stack frames but not others. + +* The code generated by the rewrite rule for disjunctions demonstrates +that the decision about whether or not to use registers for LET +parameters does not depend on the entire body of the LET. In this +case, the predicate parameter can ALWAYS be register allocated, +independent of the complexity of the alternative, because it is unused +once the decision has been made in favor of the alternative. This can +be generalized to handle more complex cases. + +* Change CFG implementation so that `hook' objects are just partially +connected edges. I think that noop nodes can then be eliminated in +favor of disconnected edges. This will also solve a potential problem +where deletion of the noop nodes at the entry points of continuations +leaves random hooks scattered around in various subproblems. + +* Many closures are never invoked through their external entry points. +For such closures, the external entry point and associated code need +never be generated. Also, the closure object need not contain a code +pointer. This is one step closer to just using the closure frame +pointer in place of the closure. + +* Perform dead-code-elimination at the same time as constant folding. +Be thorough, deleting all nodes associated with all code that is +eliminated. This is tricky but pays off handsomely later on. Also, +doing it after the dataflow but before the rest of the analysis +greatly reduces the amount of details that have to be kept track of +during deletion. + +ALSO: note that removal of code to hack known predicates in "rgretn" +may make something like this necessary for simple cases. + +Subsequent note: performing dead code elimination prior to subproblem +ordering has a problem in that there are cfg fragments in the +subproblems with invisible pointers into the node structure. We can't +delete nodes unless we know about these pointers, so we must do dead +code elimination after subproblem ordering. + +* Now that RTL generator does not generate temporaries for quantities +that are immediately pushed, tested, etc., we will need to modify the +CSE to generate temporaries for the cases where those quantities are +found in multiple places. Hopefully this won't break the world. + +* The interning of SCode variable objects (for explicit lookup) is +done on a per-block basis. It should be changed so that stack blocks +are skipped and the interning is done on the nearest IC block. + +* Fixnum operations + +** Is signed bit-field extraction faster than current strategy if the +operand is in memory? + +** In the case of addition of two boxed fixnums to a boxed result, no +unboxing is needed on the operands provided the result is boxed in the +usual way. + + + Items that have been processed + + +* Introduction of inline-coded continuations (i.e. continuations of +type UNIQUE or SIMPLE) has invalidated the current method of +maintaining the frame pointer offset. The reason is that the body of +such a continuation thinks that the frame pointer knows where its +frame is, while the offset in fact refers to some ancestor of that +frame. I think that ignoring the frame of such a continuation in +`find-block' will produce the desired effect. + +* JOIN type blocks aren't needed for offset, but they ARE needed to +prevent continuations from being classified as UNIFORM when they +aren't. + +* To do `block-parent' operation on a "real" block, must skip any +intervening JOIN blocks to find the next "real" block. + +* `generator/subproblem' has code to mark frame-size of a join block +if the continuation is closed in one. That needs to be moved +elsewhere? + +* Theory: JOIN blocks are always invisible _except_ when needed to +compute a frame pointer offset. This means: + +** `find-block' and friends in "emodel" need to know about them. Also +the associated `stack-block-parent-locative' and similar +constructions. + +** `procedure-closure-block' now refers to the previous +`block-parent'. The closing code must refer to `block-%parent' to get +the lower-level closing block. + +** `block->join/distance' in "rgretn" needs to learn about them. + +* (implemented 8/88 -- cph) The code in "rgretn" should be modified as +follows. At a return point, if the continuation is known, then we can +just jump to the continuation, as long as we set things up correctly +based on the operator class of the continuation. This might mean, for +example, that we throw away the return address on the stack because we +know that it has a certain value. In practice, this can occur when we +supply a continuation to a combination that goes to one of two +procedures. The procedure in which the return appears is ONLY invoked +with this continuation, while the other procedure is sometimes invoked +with another continuation. Thus we must push the return address, +because we don't know which procedure we're invoking, but at return +time it isn't needed. + +* Some procedures that are being considered closures can easily be +open external. Each of the free variables must satisfy one of the +following criteria: (1) it has a known value, or (2) it is bound in +the IC block being used for cached references. This optimization will +make an enormous performance improvement on programs that consist of +many procedures closed in a compiled block, with a few external +closure entry points, because it will allow most of the internal +procedures to be open. Currently they will all become closures. diff --git a/v8/src/compiler/documentation/porting.guide b/v8/src/compiler/documentation/porting.guide new file mode 100644 index 000000000..0e596681b --- /dev/null +++ b/v8/src/compiler/documentation/porting.guide @@ -0,0 +1,3751 @@ + Emacs: Please use -*- Text -*- mode. Thank you. + +$Id: porting.guide,v 1.1 1995/08/05 16:25:13 adams Exp $ + + +Copyright (c) 1991-1993 Massachusetts Institute of Technology + + + LIAR PORTING GUIDE + *DRAFT* + + + +Notes: + +This porting guide applies to Liar version 4.91, but most of the +relevant information has not changed for a while, nor is it likely to +change in major ways any time soon. + +This is an early version of this document, and the order of +presentation leaves a lot to be desired. In particular, the document +does not follow a monotonic progression, but is instead organized in a +dictionary-like manner. We recommend that you read through the whole +document twice since some important details, apparently omitted, may +have their explanation later in the document. When reading the +document for the second time, you will have an idea of where this +other information is to be found, if it is present at all. We have +attempted to insert sufficient forward pointers to make the first +reading bearable, but we may have missed some. + +This document implicitly assumes that you are trying to build the +compiler under Unix. The only compiler sources that depend on Unix +are the sources that contain the pathnames of other files. +The syntax could easily be changed for other file systems. +This document uses Unix pathname syntax and assumes a hierarchical +file system, but it should easy to map these directories to a +different file system. The DOS runtime library accepts forward +slashes (#\/) as a substitute for backward slashes (#\\), so the +scripts are shared between Unix and DOS. + +This document also assumes that you are familiar with MIT Scheme, C, +and the C preprocessor. It does not describe Liar in detail, and it +does not cover many machine-independent portions at all. It is +intended to guide a programmer familiar with (MIT) Scheme and C in the +task of porting the compiler to a new architecture, not in modifying +the compiler in any other way. + +For questions on Liar not covered by this document, or questions about +this document, contact ``liar-implementors@zurich.ai.mit.edu''. + +Text tagged by ==> is intended primarily for the compiler developers. + +Good luck! + + Acknowledgments + +Liar is the work of many people. The current version is mostly the +effort of Chris Hanson and Bill Rozas, with significant contributions +from Mark Friedman and Jim Miller. Arthur Gleckler, Brian LaMacchia, +and Henry Wu have also contributed to the current version of Liar. +Many other people have offered suggestions and criticisms. + +The current Liar might never have existed had it not been for the +efforts and help of the now-extinct BBN Butterfly Lisp group. That +group included Don Allen, Seth Steinberg, Larry Stabile, and Anthony +Courtemanche. Don Allen, in particular, babysat computers to +painstakingly bootstrap the first version of the then new Liar. + +Many of the ideas and algorithms used in Liar, particularly at the RTL +level, are taken from the GNU C compiler, written by Richard Stallman +and many others. + +This document was written by Bill Rozas, with modifications and hints +from the people listed above. The section on the MIT Scheme package +system was written by Arthur Gleckler. + + 0. Introduction and a brief walk through Liar. + +Liar translates Scode as produced by the procedure SYNTAX, or by the +file syntaxer (SF, for syntax file) into compiled code objects. The +Scode is translated into a sequences of languages, the last of which +is the binary representation of the compiled code. + +The sequence of external languages manipulated is + + Characters --READ--> + S-Expressions --SYNTAX--> + Scode --COMPILE-SCODE--> + compiled code objects. + +Liar is a multi-pass compiler, where each major pass has multiple +subpasses. Many of the subpasses do not manipulate the whole code +graph, but instead follow threads that link the relevant parts of the +graph. + +COMPILE-SCODE is the main entry point to Liar, although CBF (for +compile bin file) is the usual entry point. CBF uses COMPILE-SCODE, +and assumes that the code has been syntaxed by SF producing a .bin +file, and dumps the resulting compiled code into a .com file. CF (for +compile file) invokes SF and then CBF on a file name argument. + +The internal sub-languages used by Liar are: + + Scode --FGGEN--> + Flow-graph --RTLGEN--> + RTL (Register Transfer Language) --LAPGEN--> + LAP (Lisp assembly program) --ASSEMBLER--> + bits --LINK--> + compiled code object. + +where FGGEN, etc., are some of the major passes of the compiler. + +The remaining major passes are FGOPT (the flow-graph optimizer), and +RTLOPT (the RTL-level optimizer). RTL-level register allocation is +performed by RTLOPT, and hardware-level register allocation is +performed by LAPGEN. ASSEMBLER branch-tensions the output code. +Branch-tensioning is described in a later section. LINK constructs a +Scheme compiled code object from the bits representing the code and +the fixed data that the compiled code uses at runtime. + +compiler/toplev.scm contains the top-level calls of the compiler and +its pass structure. + +The ``.com'' files contain compiled code objects, which are linked +further at load time. + + 0.1. Liar's package structure + +This section assumes that you are familiar with the MIT Scheme package +system. If you are not, there is a small description in an appendix +to this document. + +The package structure of the compiler reflects the pass structure and +is specified in compiler/machines/port/comp.pkg, where port is the +name of a machine (bobcat, vax, spectrum, mips, i386, alpha, etc.). +The major packages are: + +(COMPILER): + Utilities and data structures shared by most of the compiler. + +(COMPILER MACROS): + Syntax extensions used by the compiler to define language +translation rules. + +(COMPILER TOP-LEVEL): + Top level pass structure of the compiler. + +(COMPILER FG-GENERATOR): + This package contains the flow-graph generator, FGGEN. + +(COMPILER FG-OPTIMIZER): + This package contains the flow-graph analyzer and optimizer, +FGOPT. It has many sub-packages to contain the individual sub-passes. + +(COMPILER RTL-GENERATOR): + This package contains the flow-graph to RTL translator, +RTLGEN. It contains a few sub-packages for the major kinds of +flow-graph operations. + +(COMPILER RTL-OPTIMIZER): + This package contains most of the RTL-level optimizer, RTLOPT. +It has various sub-packages corresponding to some of its sub-passes. + +(COMPILER RTL-CSE): + This package contains the RTL-level common (redundant) +subexpression eliminator pass of the RTL-level optimizer. + +(COMPILER LAP-SYNTAXER): + This package contains most of the machine-dependent parts of +the compiler and the back end utilities. In particular, it contains +the RTL -> LAP translation rules, and the LAP -> bits translation +rules, i.e. the LAPGEN and ASSEMBLER passes respectively. It has some +sub-packages for various major utilities (linearizer, map-merger, +etc.). + +(COMPILER ASSEMBLER): + This package contains most of the machine-independent portion of +the assembler. In particular, it contains the bit-assembler, i.e. +the portion of the assembler that accumulates the bit strings produced +by ASSEMBLER and performs branch-tensioning on the result. + +(COMPILER DISASSEMBLER): + This package contains the disassembler. It is not needed for +ordinary compiler operation, but is useful for low-level debugging, +and debugging of the compiler and assembler. + + 0.2. Liar's sources' directory structure + +The directory structure loosely reflects the pass structure of the +compiler. compiler/machines/port/comp.pkg declares the packages and +the files that constitute them. + +compiler/back: + This directory contains the machine-independent portion of the +back end. It contains bit-string utilities, symbol table utilities, +label management procedures, the hardware register allocator, and the +top-level assembler calls. + +compiler/base: + This directory contains common utilities used by the whole +compiler, and the top level procedures provided by the compiler. + +compiler/etc: + This directory contains utilities used for cross-compiling, +and checking re-compilations. + +compiler/fggen: + This directory contains the front end of the compiler. The +code in this directory translates Scode into a flow-graph used by the +analyzer and optimizer. + +compiler/fgopt: + This directory contains the flow-graph analyzer and optimizer. + +compiler/rtlbase: + This directory contains utilities used by the RTL generator and +optimizer. + +compiler/rtlgen: + This directory contains the code that translates the +flow-graph into register transfer language (RTL). + +compiler/rtlopt: + This directory contains the RTL-level optimizer. It contains +code to perform lifetime analysis, redundant subexpression +elimination, elimination of dead code, etc. + +compiler/machines: + This directory contains a subdirectory for each port of the +compiler. Each of these subdirectories contains the port (machine) +dependent files of the compiler. + +compiler/machines/port: + This directory contains the definition of machine parameters, +the assembler rules, the disassembler, and RTL to assembly-language +rules for the port. + +All machine-dependent files are in compiler/machines/port and this is +the only directory that needs to be written to port the compiler to a +new architecture. + + 1. Liar's runtime model. + +Liar does not open code (inline) all operations that the code would +need to execute. In particular, it leaves error handling and +recovery, interrupt processing, initialization, and invocation of +unknown procedures, to a runtime library written in assembly language. + +Although this runtime library need not run in the context of the +CScheme interpreter, currently the only implementation of this library +runs from the interpreter and uses it for many of its operations. + +In other words, code generated by Liar does not depend on the +interpreter directly, but indirectly through the runtime library. It +does depend on the ability to invoke CScheme primitives at runtime, +some of which (eval, etc.) require the interpreter to be present. It +should be possible, however, to provide an alternate runtime library +and primitive set that would allow code produced by Liar to run +without the interpreter being present. (Foot) We often toy with this +idea. + +On the other hand, since the only instance of the runtime library is +that supplied by the interpreter, Liar currently assumes that the +Scheme object representation is the same as that used by the +interpreter, but this is relatively well abstracted and should not be +hard to change. (Foot) Famous last words. + +The runtime library is currently implemented by +microcode/cmpaux-port.m4 and microcode/cmpint.c . The files +cmpaux.txt and cmpint.txt document these files. The documentation +files may be found in the microcode or the documentation directories. + +microcode/cmpaux-port.m4 is an assembly language machine-dependent file +that allows compiled Scheme to call the C-written library routines and +vice versa. It is described in cmpaux.txt. + +microcode/cmpint.c defines the library in a machine-independent way, +but requires some information about the port and this is provided in +microcode/cmpint2.h, a copy (or link) of the appropriate +microcode/cmpint-port.h file. The microcode/cmpint-port.h files are +described in cmpint.txt . + +cmpint.txt also describes many of the data structures that the +compiled code and runtime library manipulate, and defines some of the +concepts needed to understand the compiler. + +The rest of this document assumes that you are using the runtime +library provided by the CScheme interpreter. If you wish to use Liar +as a compiler for stand-alone programs, a lot of work needs to be +done, and this work is not described here. Perhaps we will do it in +the future. + +If you have not yet read cmpaux.txt and cmpint.txt, please do so +before reading the rest of this document. + +You should probably also read [1] and [2] for a discussion of some of +the implementation issues. + + 2. Preliminary Observations + + 2.1. Constraints on architectures to which Liar can be ported: + +- Liar assumes that the target machine is a general-register machine. +That is, operations are based on processor registers, and there is a +set of general-purpose registers that can be used interchangeably. It +would be hard to port Liar to a pure stack machine, a graph-reduction +engine, a Turing machine, or a 4-counter machine. However, the +register set required is not huge. Liar has been ported to the +386/486 architecture which only has eight registers, four of which are +reserved for implementation quantities (e.g. stack pointer and free +pointer) and four of which are left to the register allocator. + +- Liar currently assumes that floating-point registers and integer +registers are separate or the same size. In other words, currently +Liar cannot handle quantities that need multiple registers to hold +them. For example, on the DEC VAX and the Motorola 88100, there is a +single set of registers, and double floating point values (the only +kind used by Scheme) take two consecutive integer registers. The +register allocator in Liar does not currently handle this situation, +and thus, floating-point operations are not currently open-coded on +the VAX. + +- Liar assumes that the target machine has an address space that is +flat enough that all Scheme objects can be addressed uniformly. In +other words, segmented address spaces with segments necessarily +smaller than the Scheme runtime heap (i.e. Intel 286) will make Liar +difficult to port. + +- Liar assumes that instructions and data can coexist in the same +address space, and that new code objects that contain machine +instructions can be dynamically allocated from and written to the heap +(memory pool) used to allocate all other Scheme objects. This +assumption in Liar conflicts with some current hardware that has +programmer-visible separate (split) data and instruction caches -- +that is, there are two different caches, one used by the processor for +instruction references and the other for data references, and storing +data into memory only updates the data cache, but not the instruction +cache, and perhaps not even memory. Most of the problems this causes +can be resolved if the user is given enough control over the hardware +caches, i.e. some way to flush or synchronize them. Furthermore, a +true Harvard architecture, with separate code and data memories, would +be hard to accommodate without relatively major changes. At some +point in the future we may write a C back end for Liar that handles +this case, since C code space and data space are typically kept +separate by the operating system. Whatever technique the C back end +may use can probably be emulated by architectures with such a strong +division, although it is likely to be expensive. + + 2.2. Some implementation decisions that may make your job +harder or impair the quality of the output code: + +- Liar generates code that passes arguments to procedures on a stack. +This decision especially affects the performance on load-store +architectures, common these days. Liar may be changed in the future +to generate code that passes arguments in registers because most +modern machines have large register sets and memory-based operations +are slower than register-based operations even when the memory +locations have been cached. + +- Liar assumes that pushing and popping elements from a stack is +cheap. Currently Liar does not attempt to bump the stack pointer once +per block of operations, but instead bumps it once per item. This is +expensive on many modern machines where pre-and-post incrementing are +not supported by the hardware. This may also change in the +not-too-far future. + +- Liar assumes that it is cheap to compute overflow conditions on +integer arithmetic operations. Generic arithmetic primitives have the +frequent fixnum (small integer) case open coded, and the overflow and +non-fixnum cases coded out of line, but this depends on the ability of +the code to detect and branch on overflow conditions cheaply. This is +not true of some modern machines, notably MIPS processors. If your +processor does not make branching on such conditions reasonably cheap, +you may have to use code similar to that used in the MIPS port. The +MIPS processor has trapping and non-trapping arithmetic instructions. +The trapping arithmetic instructions trap on overflow, but the trap +recovery code is typically so expensive that the generated code +computes the overflow conditions explicitly. + +- Liar assumes that extracting, inserting, and comparing bit-fields is +relatively cheap. The current object representation for Liar +(compatible with the interpreter) consists of using a number of bits +(usually 6) in the most significant bit positions of a machine word as +a type tag, and the rest as the datum, usually an encoded address. +Not only must extracting, comparing, and inserting these tags be +cheap, but decoding the address must be cheap as well. These +operations are relatively cheap on architectures with bit-field +instructions, but more expensive if they must be emulated with bitwise +boolean operations and shifts, as on the MIPS R3000. Decoding a datum +into an address may involve inserting segment bits in some of the +positions where the tag is placed, further increasing the dependency +on cheap bit-field manipulation. + +- The CScheme interpreter uses a particularly poor representation for +fixnums, forcing Liar's hand. Fixnums are suitably small integers. +They are immediate objects with a particular tag. This tag was not +wisely chosen, making fixnum operations more expensive than they need +to be. This tag may be changed in the future. + +- The CScheme interpreter manipulates a stack that grows in a fixed +direction (from higher to lower addresses). On many modern machines, +there are no special instructions to deal with the stack, so the +decision is arbitrary. On some machines, however, there are special +instructions to pop and push elements on the stack. Liar may not be +able to use these instructions if the machine's preferred direction of +stack growth does not match the interpreter's. + + 2.3. Emulating an existing port. + +The simplest way to port Liar is to find an architecture to which Liar +has already been ported that is sufficiently similar to the target +architecture that most of the code can be written by copying or +trivially translating existing code. In particular, if the +architectures are really close, there may be no need for +architecture-specific additional tuning. + +The compiler is primarily developed on Motorola MC68020 processors, so +this is the best-tuned version, and the other ports are not very well +tuned or not tuned at all. If you improve an existing port, please +share the improvements by notifying liar-implementors. + +- If you have a Vax-like CISC machine, you can try starting from the +Vax, the Motorola MC68020, or the i386 ports. The Vax and i386 ports +were written by starting from the MC68020 port. This is probably the +best solution for some architectures like the NS32000, and perhaps +even the IBM 370. + +- If you have an ``enlarged'' RISC processor, with some complex +addressing modes, and bit-field instructions, you may want to start by +looking at the Spectrum (HP Precision Architecture) port. This is +probably a good starting point for the Motorola 88000 and for the IBM +RS/6000. + +- If you have a bare-bones RISC processor, similar to a MIPS R3000 +processor, you may want to start from this port. Since the MIPS R3000 +is a minimalist architecture, it almost subsumes all other RISCs, and +may well be a good starting point for all of them. This is probably a +good starting point for the Sparc. The MIPS port used the Spectrum +port as its model, and the Alpha port used the MIPS port as its model. + +- If you have a machine significantly different from those listed +above, you are out of luck and will have to write a port from scratch. +For example, the port to the Intel 386+387/486 uses some of the +concepts and code from ports to other CISCs, but due to the +floating-point stack architecture (instead of register-based), the +floating-point stack management is different (but not very good). + +Of course, no architecture is identical to any other, so you may want +to mix and match ideas from many of the ports already done, and it is +probably a good idea for you to compare how the various ports solve +the various problems. + + 3. Compiler operation, LAPGEN rules and ASSEMBLER rules. + +The front end of the compiler translates Scode into a flow-graph that +is then translated into RTL. The back end does machine-independent +optimization on the RTL, generates assembly language (in LAP format) +from the RTL, and assembles the resulting bits. + +Although RTL is a machine-independent language, the particular RTL +generated for a given program will vary from machine to machine. The +RTL can vary in the following ways: + +- RTL is a language for manipulating the contents of conceptual +registers. RTL registers are divided into ``pseudo registers'' and +``machine registers''. Machine registers represent physical hardware +registers, some of which have been reserved and given fixed meanings +by the port (stack pointer, value register, etc.) while +pseudo-registers represent conceptual locations that contain +quantities that will need physical registers or memory locations to +hold them in the final translation. An RTL pseudo register can be +mapped to any number of physical registers in the final translation, +and may ``move'' between physical registers. In order to make the RTL +more homogeneous, the RTL registers are not distinguished +syntactically in the RTL, but are instead distinguished by their value +range. Machine registers are represented as the N lowest numbered RTL +registers (where N is the number of hardware registers), and all +others are pseudo registers. Since some RTL instructions explicitly +mention machine registers and these (and their numbers) vary from +architecture to architecture, the register numbers in an RTL program +will vary depending on the back end in use. Machine registers may be +divided into separate classes (e.g. address, data, and floating-point +registers) that can contain different types of values. Pseudo +registers are not distinguished a-priori, but the values stored in +them must be consistent. For example, if a floating point value is +stored into a particular pseudo register, the register can only be +mapped to floating-point machine registers, and non-floating-point +values cannot be stored in it. + +- RTL assumes a load-store architecture, but can accommodate +architectures that allow memory operands and rich addressing modes. +RTL is constructed by generating statements that include relatively +complex expressions. These expressions may represent multiple memory +indirections or other operations. An RTL simplifier runs over this +initial RTL, assigning these intermediate quantities to new pseudo +registers and rewriting the original statements to manipulate the +original and new pseudo-registers. Typically this simplification +results in a sequence of assignments to pseudo-registers with single +operations per assignment and where the only memory operations are +load and store. However, this simplification pass is controlled by +the port. The port supplies a set of rewriting rules to the +simplifier that causes the simplifier to leave more complex +expressions untouched, or to be simplified in different ways, +depending on the availability of memory operands or richer addressing +modes. Since these rules vary from port to port, the final RTL +differs for the different ports. The simplification process is also +controlled by the availability of various rules in the port, and ports +for richer instruction sets may require less simplification because +hardware instructions and addressing modes that encode more +complicated RTL patterns are directly available. + +- The open coding (inlining) of Scheme primitives is +machine-dependent. On some machines, for example, there is no +instruction to multiply integers, and it may not be advantageous to +open code the multiplication primitive. The RTL for a particular +program may reflect the set of primitive operations that the back end +for the port can open code. + +The resulting RTL program is represented as a control flow-graph where +each of the nodes has an associated list of RTL statements. The edges +in the graph correspond to conditional and unconditional branches in +the code, and include a low-level predicate used to choose between the +alternatives. The graph is linearized after the instructions have +been translated to LAP. There is a debugging RTL linearizer used by +the RTL output routine. + +Besides assignments and tests, the RTL has some higher level +statements that correspond to procedure headers, continuation (return +address) headers, etc. Thus an RTL program is made mostly of register +to register operation statements, a few conditional tests, and a few +higher-level ``glue'' statements. + +Once a program has been translated to RTL, the RTL code is optimized +in a machine-independent way by minimizing the number of RTL +pseudo-registers used, removing redundant subexpressions, eliminating +dead code, and by using various other techniques. + +The RTL program is then translated into a Lisp-format +assembly-language program (LAP). Hardware register allocation occurs +during this translation. The register allocator is +machine-independent and can accommodate different register classes, +but does not currently accommodate register pairs (this is why floating +point operations are not currently open coded on the Vax). + +The register allocator works by considering unused machine registers +(those not reserved by the port) to be a cache for the +pseudo-registers. Thus a particular pseudo-register may map into +multiple machine registers of different types, and these aliases are +invalidated as the pseudo-registers are written or the corresponding +machine registers reused. Thus the most basic facility that the +register allocator provides is a utility to allocate an alias of a +particular type for a given pseudo-register. + +The port defines the types and numbers of machine registers and the +subset that is available for allocation, and the register allocator +manages the associations between the pseudo-registers and their +aliases and the set of free machine registers. The register allocator +also automatically spills the contents of machine registers to memory +when pressed for machine registers, and reloads the values when +necessary. + +Thus the resulting LAP program is the collection of the code issued by +the rules that translate RTL into LAP, the instructions issued behind +the scenes by the register allocator, and the instructions used to +linearize the control flow graph. + +The back end provides a set of rules for the translation of RTL to +LAP, and a set of procedures that the register allocator and the +linearizer use to generate such instructions. The rules are written +using a special syntax that creates entries in a data base used by a +pattern matcher to translate the RTL into LAP. + +The linear LAP is then translated into binary form by using the same +pattern matcher with a different set of rules. These rules define the +translation between assembly language and machine language for the +architecture. Most of these rules output bit strings to be collected +together, but some output a set of directives to the bit-level +assembler to define labels, or choose between alternative encoding of +the fields depending on the final value of a displacement. These +alternative encodings are typically used for PC-relative quantities. + +The machine-independent bit-assembler collects all the bits together +and keeps track of a virtual program counter used to determine the +distance between instruction fields. A relaxation process is used to +reduce the size of the resulting encoding (to tension branches, i.e. +to choose the smallest encoding that will do the job when there are +alternatives). + +Since most of the LAPGEN rules generate almost fixed assembly language, +where the only difference is the register numbers, most of the LAP to +bits translation can be done when the compiler is compiled. A +compiler switch, ``COMPILER:ENABLE-EXPANSION-DECLARATIONS?'' allows this +process to take place. This mechanism has not been used for a while, +however, because the resulting compiler was, although somewhat faster, +considerably bigger, so this switch may not currently work. + +Several other compiler parameters and switches control various aspects +of the operation of the back end. Most parameters and switches are +machine independent, and are defined in compiler/base/switch.scm . +The remaining parameters and switches are defined in +compiler/machines/port/machin.scm. All compiler parameters and +switches are exported to the Scheme global package for easy +manipulation. + +The following switches are of special importance to the back end +writer: + +* COMPILER:COMPILE-BY-PROCEDURES? This switch controls whether the +compiler should compile each top-level lambda expression independently +or compile the whole input program (or file) as a block. It is +usually set to true, but must be set to false for cross-compilation. +The cross-compiler does this automatically. (Foot) The reason for +this is that the gc offset words in compiled entry points may differ +for the source and target machines, and thus the source machine's +garbage collector may be confused by the target machine's compiled +entry points. This is circumvented by having the cross compiler +generate a single compiled code block object, manipulated and dumped +as a vector object (instead of as an entry point). The final entry +points are generated by cross-compile-bin-file-end running interpreted +on the target machine. + +* COMPILER:OPEN-CODE-PRIMITIVES? This switch controls whether Liar +will open code (inline) MIT Scheme primitives. It is usually set to +true and should probably be left that way. On the other hand, it is +possible to do a lot less work in porting the compiler by not +providing the open coding of primitives and turning this switch off. + Some of the primitives are open coded by the machine-independent +portion of the compiler, since they depend only on structural +information, and not on the details of the particular architecture. +In other words, CAR, CONS, and many others can be open-coded in a +machine-independent way since their open codings are performed +directly in the RTL. Turning this switch to false would prevent the +compiler from open coding these primitives as well. + +* COMPILER:GENERATE-RTL-FILES? and COMPILER:GENERATE-LAP-FILES? These +are mostly compiler debugging switches. They control whether the +compiler will issue .rtl and .lap files for every file compiled. The +.rtl file will contain the RTL for the program, and the .lap file will +contain the input to the assembler. Their usual value is false. + +* COMPILER:INTERSPERSE-RTL-IN-LAP? This is another debugging switch. +If turned on, and COMPILER:GENERATE-LAP-FILES? is also on, the lap +output file includes the RTL statements as comments preceding their +LAP translations. Its usual value is true. + ==> RTL predicates are not included, making the control-flow hard to +follow. This should be fixed. + +* COMPILER:OPEN-CODE-FLOATING-POINT-ARITHMETIC? This switch is +defined in compiler/machines/port/machin.scm and determines whether +floating point primitives can and should be open coded by the compiler +or not. If the port provides open codings for them, it should be set +to true, otherwise to false. + +* COMPILER:PRIMITIVES-WITH-NO-OPEN-CODING This parameter is defined in +compiler/machines/port/machin.scm. It contains a list of primitive +names that the port cannot open code. Currently there is no simple +list of all the primitives that Liar can open-code. The list is +implicit in the code contained in rtlgen/opncod.scm. + ==> The last two parameters should probably be combined and inverted. +COMPILER:PRIMITIVES-WITH-OPEN-CODINGS should replace both of the +above. This has the advantage that if the RTL level is taught how to +deal with additional primitives, but not all ports have open codings +for them, there is no need to change all the machin.scm files, only +those for which the open coding has been provided. + + 4. Description of the machine-specific files + +The following is the list of files that usually appears in the port +directory. The files can be organized differently for each port, but +it is probably easiest if the same pattern is kept. In particular, +the best way to write most is by editing the corresponding files from +an existing port. Keeping the structure identical will make writing +decls.scm, comp.pkg, and comp.sf straightforward, and will make future +updates easier to track. + +A useful thing to do when writing new port files is to keep +track of the original version from which you started, and +additionally, that on which your original is based. For example, if +you use machines/mips/assmd.scm as a model for your version, in it you +would find something like + $ Header: assmd.scm,v 1.1 90/05/07 04:10:19 GMT jinx Exp $ + $MC68020-Header: assmd.scm,v 1.36 89/08/28 18:33:33 GMT cph Exp $ +In order to allow an easier merge in the future, it would +be good if you transformed this header into + $ Header $ + $mips-Header: assmd.scm,v 1.1 90/05/07 04:10:19 GMT jinx Exp $ + $MC68020-Header: assmd.scm,v 1.36 89/08/28 18:33:33 GMT cph Exp $ +The new $ Header $ line would be used by RCS to keep track of the +versions of your port and the others could be used to find updates to +the originals that would make updating your port easier. + + 4.1. Compiler building files: + +* comp.pkg: + This file describes the Scheme package structure of the +compiler, the files loaded into each package, and what names are +exported and imported from each package. + To write this file, copy the similar file from an existing +port, change the name of the port (i.e. mips -> sparc), and add or +remove files as appropriate. You should only need to add or remove +assembler and LAPGEN files. + +* comp.cbf: + This file is a script that can be used to compile the compiler +from scratch. You can copy this file from another port, and change +the port name. There is more information in a later section about how +to build the compiler. + +* comp.sf: + This file is a script that is used to pre-process the compiler +sources before they are loaded to be interpreted or compiled. You +should be able to copy the file from an existing port and replace the +name of the port. You should also edit the names of the instruction +files in the assembler instruction database section, although this +section is no longer used by default. + +The previous three files should be copied or linked to the top-level +compiler directory. I.E., compiler/comp.pkg should be a link (symbolic +preferably) or copy of compiler/machines/port/comp.pkg . + +* comp.con, comp.ldr, comp.bcon, and comp.bldr: + These files are generated by the CREF subsystem from the +information in the cref.pkg file. The .bcon and .bldr files are +binary versions of the others, which are scheme sources. The .con +file contains the ``connectivity code'', that is, the code to create and +link the package objects specified in the .pkg file. The .ldr file +contains the ``loading code'', that is, the code to load the source +files into the appropriate packages and, in theory, to initialize the +packages. The CREF subsystem also generates a comp.cref file that +includes cross-reference information. It is useful to examine this +file to find unbound references (often typos). + +* make.scm: + This file is used to load the compiler on top of a runtime +system that has the file syntaxer (SF) loaded, and defines the version +of the compiler. The list of files does not appear here because the +comp.pkg already declares them, CREF will build the comp.con and +comp.ldr files that contain this information and will be loaded by +make.scm. + +* decls.scm: + This file defines the pre-processing dependencies between the +various source files. There are three kinds of pre-processing +dependencies: + - Syntactic: Different files need to be processed in different syntax +tables that define the macros used by the files. + - Integrations: Different files import integrable (inline) definitions +from other files, and must be processed in the right sequence in order +to obtain the maximum effect from the integrations (mostly because of +transitive steps). + - Expansions: Certain procedures can be expanded at compiler +pre-processing time into accumulations of simpler calls. This is how +the assembly language in the LAPGEN rules can be translated into bits +at compiler pre-processing time. The files that define the +pre-processing-time expansion functions must be loaded in order to +process those files that use the procedures that can be expanded. +decls.scm builds a database of the dependencies. This database is +topologically sorted by some of the code in decls.scm itself in order +to determine the processing order. Since there are circularities in +the integration dependencies, some of the files are processed multiple +times, but the mechanism in decls takes care of doing this correctly. +You should be able to edit the version from another port in the +appropriate way. Mostly you will need to rename the port (i.e. mips +-> sparc), and add/delete instruction and rule files as needed. + + ==> decls.scm should probably be split into two sections: The +machine-independent dependency management code, and the actual +declaration of the dependencies for each port. This would allow us to +share more of the code, and make the task of rewriting it less +daunting. + + 4.2. Miscellaneous files: + +* rgspcm.scm: + This file declares a set of primitives that can be coded by +invoking runtime library procedures. This file is no longer machine +dependent, since the portable library has made all the sets identical. +It lives in machines/port for historical reasons, and should probably +move elsewhere. Obviously, you can just copy it from another port. + ==> Let's move it or get rid of it! + +* rulrew.scm: + This file defines the simplifier rules that allow more +efficient use of the hardware's addressing modes and other +capabilities. The rules use the same syntax as the LAPGEN rules, but +belong in the (rule) rewriting database. Although these rules are +machine-dependent, it should be straightforward to emulate what other +ports have done in order to arrive at a working set. Moreover, it is +possible to start out with an empty set and only add them as +inefficiencies are discovered in the output assembly language. These +rules manipulate RTL expressions by using the procedures defined in +compiler/rtlbase/rtlty1.scm and compiler/rtlbase/rtlty2.scm. + +* lapopt.scm: + This file defines a LAP-level peephole optimizer. Currently +only used in the MIPS port to reduce the number of NOPs in the ``delay +slots'' of load instructions. The instructions in each LAP-level +basic block are passed to optimize-linear-lap, which outputs the new +sequence of instructions corresponding to the basic block. Currently +all ports (except the MIPS port) implement this procedure as the +identity procedure. + +* machin.scm: + This file defines architecture and port parameters needed by +various parts of the compiler. The following is the current list of +the primary parameters. The definitions of derived parameters not +mentioned here should be copied verbatim from existing ports. Some of +these parameters are not currently in use, but should all be provided +for completeness. + +- USE-PRE/POST-INCREMENT?: Should be true or false depending on +whether the architecture has addressing modes that update the base +address. It is true on the MC68020, Vax, i386, and HP-PA, and false +on the MIPS and Alpha. + +- ENDIANNESS: Should be the symbol LITTLE if an address, when used as +a byte address, refers to the least significant byte of the long-word +addressed by it. It should be BIG if it refers to the most +significant byte of the long-word. The compiler has not been ported +to any machines where the quantum of addressability is not an 8-bit +byte, so the notion may not apply to those. + +- ADDRESSING-GRANULARITY: How many bits are addressed by the +addressing quantum. I.e., increasing an address by 1 will bump the +address to point past this number of bits. Again, the compiler has +not been ported to any machine where this value is not 8. + +- SCHEME-OBJECT-WIDTH: How many bits are taken up by a Scheme object. +This should be the number of bits in a C ``unsigned long'', since Scheme +objects are declared as such by the portable runtime library. + +- SCHEME-TYPE-WIDTH: How many bits at the most-significant end of a +Scheme object are taken up by the type tag. The value of +TYPE_CODE_LENGTH in the microcode must match this value. The value is +currently 6 for systems with a compiler and 8 for systems without one. + +- ADDRESS-UNITS-PER-PACKED-CHAR: This parameter defines how much to +increment an address by in order to make it point to the next +character in a string. The compiler has not been ported to any +configuration where this is not 1, but may be if 16-bit characters are +used in the future. + +- FLONUM-SIZE: This is the ceiling of the ratio of the size of a C +``double'' to the size of a C ``unsigned long''. It reflects how many +Scheme units of memory (measured in Scheme objects) the data in a +Scheme floating point object will take. + +- FLOAT-ALIGNMENT: This value defines the bit-alignment constraints +for a C ``double''. It must be a multiple of scheme-object-width. If +floating point values can only be stored at even long-word addresses, +for example, this value should be twice scheme-object-width. + +- SIGNED-FIXNUM/UPPER-LIMIT: This parameter should be derived from +others, but is specified as a constant due to a shortcoming of the +compiler pre-processing system (EXPT is not constant-folded). Use the +commented-out expression to derive the value for your port. All +values that should be derived but are instead specified as constants +are tagged by a comment containing ``***''. + +- STACK->MEMORY-OFFSET: This procedure is provided to accommodate +stacks that grow in either direction, but we have not tested any port +in which the stack grows towards larger addresses, because the CScheme +interpreter imposes its own direction of growth. It should probably +be copied verbatim. + +- EXECUTE-CACHE-SIZE: This should match EXECUTE_CACHE_ENTRY_SIZE in +microcode/cmpint-port.h, and is explained in cmpint.txt. + ==> We should probably rename one or the other to be alike. + +The following parameters describe to the front-end the format of +closures containing multiple entry points. Closures are described in +some detail in cmpint.txt and in section 5.3.3. Very briefly, a +closure is a procedure object that contains a code pointer and a set +of free variable locations or values. + +- CLOSURE-OBJECT-FIRST-OFFSET: This procedure takes a single argument, +the number of entry points in a closure object, and computes the +distance in long-words between the first long-word in the closure +object, and the first long-word containing a free variable. This is +the number of long-words taken up by the closure object's header, and +the code to represent N closure entry points. + +- CLOSURE-FIRST-OFFSET: This procedure takes two arguments, the number +of entry points in a closure object, and the index of one of them, the +first being zero. It computes the distance between that entry's +environment pointer and the first free variable in the closure object. +The entry's environment pointer will be the address of the entry point +itself if closure entry points are always aligned on long-word +boundaries, or the address of the first entry point if they are not. + +- CLOSURE-ENTRY-DISTANCE: This procedure is given the number of entry +points in a closure object, and the indices for two of its entry +points, and computes the number of bytes that separate the two entry +points in the closure object. This distance should be a multiple of +the parameter COMPILED_CLOSURE_ENTRY_SIZE described in cmpint.txt and +defined in microcode/cmpint-port.h. + +- CLOSURE-ENVIRONMENT-ADJUSTMENT: This procedure takes two parameters, +the number of entry points in a closure object, and the index of one +of them. It computes the number of bytes that must be added to the +entry point's address to result in the entry point's environment +pointer. If entry points are always aligned on long-word boundaries, +this number should always be zero, otherwise it should be the distance +to the first (lowest addressed) entry point. + +The remaining code in machin.scm describes the register set of the +architecture and defines the register conventions imposed by the port. +These conventions must match the expectations of +microcode/cmpaux-port.m4 described in cmpaux.txt. + +Machine registers are assigned a contiguous range of non-negative +integers starting from zero. Typically symbolic names are given to +each of these integers for use in some of the rules, especially those +dealing with the assembly language interface. + +- NUMBER-OF-MACHINE-REGISTERS should be the number of machine registers, +i.e. one greater than the number assigned to the last machine register. + +- NUMBER-OF-TEMPORARY-REGISTERS is the number of reserved memory +locations used for storing the contents of spilled pseudo-registers. + +Liar requires certain fixed locations to hold various implementation +quantities such as the stack pointer, the heap (free memory) pointer, +the pointer to the runtime library and interpreter's ``register'' +array, and the dynamic link ``register''. Typically each of these +locations is a fixed machine register. In addition, a processor +register is typically reserved for returned values and another for +holding a bit-mask used to clear type tags from objects (the pointer +or datum mask). All of these registers should be given additional +symbolic names. + + ==> What is MACHINE-REGISTER-KNOWN-VALUE used for? It would seem that +the datum mask is a known value, but... Currently all the ports seem +to have the same definition. + +The contents of pseudo-registers are divided into various classes to +allow some consistency checking. Some machine registers always +contain values in a fixed class (e.g. floating point registers and the +register holding the datum mask). + +- MACHINE-REGISTER-VALUE-CLASS is a procedure that maps a register to +its inherent value class. The main value classes are +value-class=object, value-class=address, and value-class=float. +The registers allocated for the special implementation quantities have +fixed value classes. The remaining registers, managed by the +compiler's register allocator, may be generic (value-class=word) or +allow only certain values to be stored in them (value-class=float, +value-class=address, etc.). + +Most of the remainder of compiler/machines/port/machin.scm is a set of +procedures that return and compare the port's chosen locations for +various operations. Some of these operations are no longer used by +the compiler, and reflect a previous reliance on the interpreter to +accomplish certain environment operations. These operations are now +handled by invoking the appropriate primitives rather than using +special entry points in the runtime library for them. Under some +compiler switch settings the older methods for handling these +operations can be re-activated, but this never worked completely, and +may no longer work at all. + +- RTL:MACHINE-REGISTER? should return a machine register for those +special RTL registers that have been allocated to fixed registers, and +false otherwise. + +- RTL:INTERPRETER-REGISTER? should return the long-word offset in the +runtime library's memory ``register'' array for those special RTL +registers not allocated to fixed registers, and false otherwise. + +- RTL:INTERPRETER-REGISTER->OFFSET errors when the special RTL +register has not been allocated to a fixed register, and otherwise +returns the long-word offset into the register array. + +- RTL:CONSTANT-COST is a procedure that computes some metric of how +expensive is to generate a particular constant. If the constant is +cheaply reconstructed, the register allocator may decide to flush it +(rather than spill it to memory) and re-generate it the next time it +is needed. The best estimate is the number of cycles that +constructing the constant would take, but the number of bytes of +instructions can be used instead. + +- COMPILER:OPEN-CODE-FLOATING-POINT-ARITHMETIC? and +COMPILER:PRIMITIVES-WITH-NO-OPEN-CODING have been described in the +section on compiler switches and parameters. + + 4.3. LAPGEN files: + +The following files control the RTL -> LAP translation. They define +the rules used by the pattern matcher to perform the translation, and +procedures used by the register allocator and linearizer to connect +the code that results from each rule. The rules, and how to write +them, are described further in a later section. + +The rule set is partitioned into multiple subsets. This is not +necessary, but makes re-compiling the compiler faster and reduces the +memory requirements of the compiler. The partition can be done in a +different way, but is probably best left as uniform as possible +between the different ports to facilitate comparison and updating. + +The LAPGEN (RTL->LAP) rules are separated into two different data +bases. The larger is the statement data base, used to translate whole +RTL instructions. The smaller is the predicate data base, used to +translate decisions to branch between the RTL basic blocks. + +* lapgen.scm: + This file does not define any rules, but provides a set of +utilities for the back end. It provides utilities for the rules, +typically procedures for generating code that manipulates the object +representation, additional entry points to the register allocator that +are better suited to the port, and the interface procedures for the +register allocator and the linearizer. + +The following definitions constitute the register allocator interface +and must be provided by lapgen.scm: + +- AVAILABLE-MACHINE-REGISTERS is a list of the RTL register numbers +corresponding to those registers that the register allocator should +manage. This should include all machine registers except those +reserved by the port. + +- SORT-MACHINE-REGISTERS is a procedure that reorders a list of +registers into the preferred allocation order. + ==> Is this right? + +- REGISTER-TYPE is a procedure that maps RTL register numbers to their +inherent register types (typically GENERAL and FLOAT). + +- REGISTER-TYPES-COMPATIBLE? is a boolean procedure that decides +whether two registers can hold the same range of values. + +- REGISTER-REFERENCE maps RTL register numbers into register +references, i.e. pieces of assembly language used to refer to those +registers. + +- REGISTER->REGISTER-TRANSFER issues code to copy the contents of one +RTL register into another. + +- REFERENCE->REGISTER-TRANSFER issues code to copy the contents of a +machine register described by its reference into a given RTL register. + +- PSEUDO-REGISTER-HOME maps RTL registers to a fragment of assembly +language used to refer to the memory location into which they will be +spilled if necessary. This is typically a location (or set of +locations) in the Scheme ``register'' array. + +- HOME->REGISTER-TRANSFER generates code that copies the contents of +an RTL register's home (its spill location) into a machine register. + +- REGISTER->HOME-TRANSFER generates code that copies the contents of +an RTL register, currently held in a machine register, into its memory +home. + +The following definitions constitute the linearizer interface, and +must be provided by lapgen.scm: + +- LAP:MAKE-LABEL-STATEMENT generates an assembly language directive +that defines the specified label. + +- LAP:MAKE-UNCONDITIONAL-BRANCH generates a fragment of assembly +language used to unconditionally transfer control to the specified +label. + +- LAP:MAKE-ENTRY-POINT generates a fragment of assembly language used +to precede the root of the control flow graph. Its output should use +the assembler directive ENTRY-POINT and generate format and GC words +for the entry point. + +The rest of the code in lapgen.scm is a machine-specific set of utilities +for the LAPGEN rules. Some of the more common procedures are +described in the section that covers the rules. + +Of special interest are the utility procedures for manipulating +pc-relative addresses and loads on RISC machines. RISC machines +typically only have pc-relative branch instructions, but no +pc-relative loads or pc-relative load-effective-address instructions. +On the other hand, they usually have a branch-and-link instruction +that performs a pc-relative branch and stores the return address in +a processor register. This instruction can be used (by branching to +the next instruction) to obtain its own address, and pc-relative +addresses and loads can use them. The MIPS back end currently +implements a simple pc-relative address caching scheme that attempts +to reduce the number of such branches by re-using the values produced +by previous branches if they are still available. This code can be +suitably modified to work on most RISC architectures. + +* rules1.scm: + This file contains RTL statement rules for simple register assignments +and operations. In particular, it contains the rules for constructing +and destructuring Scheme objects, allocating storage, and memory <-> +register transfers. + +* rules2.scm: + This file contains RTL predicate rules for simple equality +predicates (EQ-TEST, TYPE-TEST). + +* rules3.scm: + This file contains RTL statement rules for control-flow +statements like continuation (return address) invocation, several +mechanisms for invoking procedures, stack reformatting prior to +invocation, procedure headers, closure object allocation, expression +headers and declaring the data segment of compiled code blocks for +assembly. See [1] for some background information on stack +reformatting, and [2] for a discussion of how calls to (the values of) +free variables are handled by Liar. + +* rules4.scm: + This file contains RTL statement rules for runtime library +routines that handle manipulation of variables in first class +environments. Many of these rules are no longer used by the compiler +unless some switch settings are changed. See [2] for a discussion of +how Liar handles references to free variables. + +* rulfix.scm: + This file contains statement and predicate rules for +manipulating fixnums (small integers represented in immediate +form). The rules handle tagging and de-tagging fixnum objects, +arithmetic on them, comparison predicates, and overflow tests. + +* rulflo.scm: + This file contains statement and predicate rules for +manipulating flonums (floating point data in boxed form). The rules +handle boxing and un-boxing of flonums, arithmetic on them, and +comparison predicates. + + 4.4. Assembler files: + +* assmd.scm: + This file defines the following machine-dependent parameters +and utilities for the bit-level assembler: + +- MAXIMUM-PADDING-LENGTH: If instructions are not always long-word +aligned, the maximum distance in bits between the end of an +instruction and the next (higher) long-word boundary. + +- PADDING-STRING: A bit-string used for padding the instruction block +to a long-word boundary. If possible, it should encode a HALT or +ILLEGAL instruction. The length of this bit-string should evenly +divide maximum-padding-length. + +- BLOCK-OFFSET-WIDTH: This should be the size in bits of format_word +described in cmpint.txt. It should be 16 for all byte-addressed +machines where registers hold 32 bits. + +- MAXIMUM-BLOCK-OFFSET: The maximum byte offset that can be encoded in +block-offset-width bits. This depends on the encoding described in +cmpint.txt. The least significant bit is always used to indicate +whether this block offset points to the start of the object or to +another block offset, so the range may be smaller than the obvious +value. Furthermore, if instruction alignment constraints are tighter +than byte boundaries, this range may be larger. For example, if +instructions always start on even long-word boundaries, the bottom two +bits (always zero) are encoded implicitly, and the range is +accordingly larger. + +- BLOCK-OFFSET->BIT-STRING: This procedure is given a byte offset and +a boolean flag indicating whether this is the offset to the start of a +compiled code block or to another block-offset, and returns the +encoded value of this offset. + +- MAKE-NMV-HEADER: This procedure is given the size in long-words of a +block of instructions, and constructs the non-marked-vector header +that must precede the instructions in memory in order to prevent the +garbage collector from examining the data as Scheme objects. This +header is just an ``object'' whose type tag is manifest-nm-vector +(TC_MANIFEST_NM_VECTOR in the microcode) and whose datum is the size +in long-words (excluding the header itself). + +The following three parameters define how instruction fields are to be +assembled in memory depending on the ``endianness'' (byte ordering) of +the architecture. You should be able to use the MC68020 (big endian) +or the Vax (little endian) version, or the MIPS version which is +conditionalized for both possibilities since MIPS processors can be +configured either way. + +- INSTRUCTION-INSERT! is a procedure, that given a bit-string encoding +instruction fields, a larger bit-string into which the smaller should +be inserted, a position within the larger one, and a continuation, +inserts the smaller bit-string into the larger at the specified +position, and returns the new bit position at which the immediately +following instruction field should be inserted. + +- INSTRUCTION-INITIAL-POSITION is a procedure, that given a bit-string +representing a segment of compiled code, returns the bit-string +position at which instruction-insert! should insert the first +instruction. + +- INSTRUCTION-APPEND is a procedure, that given the bit-string +encoding successive (fields of) instructions, produces the bit-string +that corresponds to their concatenation in the correct order. + +* coerce.scm: + This file defines a set of coercion procedures. These +procedures are used to fill fields in instructions. Each coercion +procedure checks the range of its argument and produces a bit string +of the appropriate length encoding the argument. Most coercions will +coerce their signed or unsigned argument into a bit string of the +required fixed length. On some machines (e.g. HP PA), some coercions +may permute the bits appropriately. + +* insmac.scm: + This file defines machine-specific syntax used in the assembler, +and the procedure PARSE-INSTRUCTION, invoked by the syntax expander +for DEFINE-INSTRUCTION to parse the body of each of the instruction +rules. This code is typically complex and you are encouraged to +emulate one of the existing ports in order to reuse its code. + +The following ports use the following syntax for describing +instructions in machine language: + +- Spectrum and MIPS: + (LONG (<width 1> <value 1> <coercion type 1>) + (<width 2> <value 2> <coercion type 2>) + ... + (<width n> <value n> <coercion type n>)) +where all the widths must add up to an even multiple of 32. + +- Vax: +Instruction descriptions are made of arbitrary sequences of the +following field descriptors: + (BYTE (<width 1> <value 1> <coercion type 1>) + (<width 2> <value 2> <coercion type 2>) + ... + (<width n> <value n> <coercion type n>)) + (OPERAND <size> <value>) + (DISPLACEMENT (<width> <value>)) +The total width of each of these field descriptors must add up to a +multiple of 8. +BYTE is used primarily for instruction opcodes. +OPERAND is used for general addressing modes. +DISPLACEMENT is used for PC-relative branch displacements. + +- MC68020: + (WORD (<width 1> <value 1> <coercion type 1> <size 1>) + (<width 2> <value 2> <coercion type 2> <size 2>) + ... + (<width n> <value n> <coercion type n> <size 3>)) +where all the widths must add up to an even multiple of 16. +Size refers to immediate operands to be encoded in the instruction, +and are omitted when irrelevant. + +A missing coercion type means that the ordinary unsigned coercion (for +the corresponding number of bits) should be used. + +Additionally, each of these ports provides a syntax for specifying +instructions whose final format must be determined by the +branch-tensioning algorithm in the bit assembler. The syntax of these +instructions is usually + (VARIABLE-WIDTH (<name> <expression>) + ((<low-1> <high-1>) + <instruction-specifier-1>) + ((<low-2> <high-2>) + <instruction-specifier-2>) + ... + ((() ()) + <instruction-specifier-n>)) + +Each instruction specifier is an ordinary (i.e. not VARIABLE-WIDTH) +instruction specifier. NAME is a variable to be bound to the +bit-assembly-time value of EXPRESSION. Each of the ranges +<low-1>-<high-1> <low-2>-<high-2>, etc. must be properly nested in the +next, and () specifies no bound. The final format chosen is that +corresponding to the lowest numbered range containing the value of +<expression>. Successive instruction specifiers must yield +instructions of non-decreasing lengths for the branch tensioner to +work correctly. The MC68020 port uses GROWING-WORD instead of +VARIABLE-WIDTH as the keyword for this syntax. + ==> This should probably be changed. + +* inerly.scm: + This file provides alternative expanders for the +machine-specific syntax. These alternative expanders are used when +the assembly language that appears in the LAPGEN rules is assembled +(early) at compiler pre-processing time. That is, the procedures +defined in this file are only used if +COMPILER:ENABLE-EXPANSION-DECLARATIONS? is set to true. If you reuse +the code in insmac.scm from another port, you should be able to reuse +the inerly.scm file from the same port. Alternatively, you can write +a dummy version of this code and require +COMPILER:ENABLE-EXPANSION-DECLARATIONS? to be always false. This +switch defaults to false, currently. The Spectrum and MIPS versions +currently have dummy versions of this code. + +* insutl.scm: + This file defines machine-specific rule qualifiers and +transformers. It is often used to define addressing-mode filters and +handling procedures for architectures with general addressing modes. +This file does not exist in the Spectrum port because all the relevant +code has been placed in instr1.scm, and the MIPS port has no +machine-specific qualifiers and transformers. Qualifiers and +transformers are described further in the chapter on the syntax of +translation rules. + +* instr<n>.scm: + These files define the instruction set of the architecture by +using the syntax defined in insmac.scm and inerly.scm. There can be +as many of these files or as few as desired by whoever writes the +assembler. They are usually split according to the size of the files +or along the divisions in the architecture manual. Not all +instructions in the architecture need to be listed here -- only those +actually used by the back end in the LAPGEN rules and utility procedures. +Privileged/supervisory instructions, BCD (binary coded decimal) +instructions, COBOL-style EDIT instructions, etc., can probably be +safely ignored. + + 4.5. Disassembler files: + + The disassembler is almost completely machine dependent. For +many machines, a reasonable disassembler could be derived from the +description of the instruction set used to assemble programs. The Vax +disassembler, is essentially constructed this way. Unfortunately this +has not been generalized, and currently each port has its own +disassembler, often duplicating information contained in the +assembler. + +The disassembler is not necessary for the operation of the compiler +proper. It is, however, a good debugging tool. You can bring the +compiler up without a disassembler by providing stubs for the +procedures referenced in dassm2. + +* dassm1.scm: + This file contains the top-level of the disassembler. It is +not machine-dependent, and should probably be moved to another directory. + ==> Is compiler/back the right place for this? + +* dassm2.scm: + This file contains various utilities for the disassembler. In +particular, it contains the definitions of + +- COMPILED-CODE-BLOCK/BYTES-PER-OBJECT +- COMPILED-CODE-BLOCK/OBJECTS-PER-PROCEDURE-CACHE +- COMPILED-CODE-BLOCK/OBJECTS-PER-VARIABLE-CACHE + These parameters specify various relative sizes. + ==> Shouldn't these be in machin.scm? The first two have +counterparts there, and the last is always 1. + +- DISASSEMBLER/READ-VARIABLE-CACHE +- DISASSEMBLER/READ-PROCEDURE-CACHE + These procedures are used to extract free variable information from +a linked compiled code block. Variable caches are maintained as +native addresses (i.e. no tag bits), and procedure (execute) caches +contain absolute jump instructions that must be decoded to extract the +address of the called procedure. Appropriate type bits must be added +to both values before they are returned. + +This file also contains a state machine that allows the disassembler +to display data appearing in the instruction stream in an appropriate +format (gc and format words, mainly), and heuristics for displaying +addressing modes and PC-relative offsets in a more legible form. + +The output of the disassembler need not be identical to the input of +the assembler. The disassembler is used almost exclusively for +debugging, and additional syntactic hints make it easier to read. + +* dassm3.scm: + This file contains the code to disassemble one instruction at +a time. It is completely machine dependent at the time, and any old +way of doing it is fine. + +* dinstr<n>.scm: + In the VAX port, these are copies (or links) to the +instr<n>.scm files. They are processed with a different syntax table +to construct the disassembler tables instead of the assembler tables. + +* dsyn.scm: + In the VAX port, this file provides the alternative expansion +of DEFINE-INSTRUCTION used to construct the disassembler tables +instead of the assembler rule data base. + + 5. All about rules + +There are three subsystems in Liar that use rule-based languages. +They are the RTL simplifier, LAPGEN (RTL->LAP translation), and the +assembler. The assembler need not be rule-based, but given the +availability of the rule language, using the rule mechanism may be the +easiest way to write it. + + 5.1. Rule syntax + +The assembler rules use a somewhat different syntax from the rest and +will be described later. + +The rest of the rules are defined in the following way: + + (DEFINE-RULE <rule-database> + <rule pattern> + <qualifier> ; optional + <rule body>) + +* <rule-database> is an expression evaluating to a rule database. +It should be one of STATEMENT, PREDICATE, or REWRITING. + +* <rule pattern> is a list that represents the pattern to match. +Variables in the pattern are written by using the ``?'' syntax. +For example, + +- (hello) matches the constant list (hello) + +- (? thing) matches anything, and THING is bound in <qualifier> and +<rule body> to whatever was matched. + +- (hello (? person)) matches a list of two elements whose first +element is the symbol HELLO, and whose second element can be anything. +The variable PERSON will be bound in <qualifier> and <rule body> and +will have as its value the second element of the list matched. +Thus it would match (hello bill) and PERSON would be the symbol BILL, +(hello (bill rozas)) would match and PERSON would be the list (BILL ROZAS). + +- (hello . (? person)) matches a list of one or more elements whose +first element is the symbol HELLO. PERSON is bound to the rest of the +list. +Thus (hello my dog likes frankfurters) would match and PERSON would be +(MY DOG LIKES FRANKFURTERS). (hello (my dog)) would match, and PERSON +would be ((MY DOG)). + +Variable syntax is further described below. + +* <qualifier> is (QUALIFIER <expression>) where <expression> evaluates +to a boolean and further filters matches. If the qualifier expression +evaluates to false, the rule is not fired. Otherwise it is. +For example, + (DEFINE-RULE <some database> + (multiple (? number) (? divisor)) + (QUALIFIER (and (number? number) + (number? divisor) + (zero? (remainder number divisor)))) + <rule body>) + +will match (MULTIPLE 14 7) and (MULTIPLE 36 4), but will not match +(MULTIPLE FOO 3), (MULTIPLE 37 4), (MULTIPLE 2), (MULTIPLE 14 2 3), +nor (HELLO 14 7). +Rule qualifiers are optional. + +* <rule body> is an arbitrary Lisp expression whose value is the +translation determined by the rule. It will typically use the +variables bound by ``?'' to perform the translation. The statement +and predicate rules use the LAP macro to generate sequences of +assembly language instructions. + +The assembler rules use the following syntax: + + (DEFINE-INSTRUCTION <opcode> + (<pattern1> <qualifier1> <body1>) + (<pattern2> <qualifier2> <body2>) + ... + ) + +Where <opcode> is the name of the instruction, and the patterns will +be matched against the cdr of lists whose car is <opcode>. +The <patterns>, <qualifiers>, and <bodies> are as in the RTL rules, +except that there are typically no qualifiers, and the bodies are +typically written in a special syntax defined in +compiler/machines/port/insmac.scm and described in section 4.4. + +For example, + (DEFINE-INSTRUCTION ADD + (((R (? target)) (R (? reg1)) (R (? reg2))) + (WORD (6 #x24) + (5 target) + (5 reg1) + (5 reg2) + (11 0))) + (((R (? target)) (R (? reg)) (& (? constant))) + (WORD (6 #x23) + (5 target) + (5 reg) + (16 constant SIGNED)))) +would match (ADD (R 1) (R 2) (R 3)) and (ADD (R 7) (R 22) (& 257)), +firing the corresponding body. + +The bodies are defined in terms of the WORD syntax defined in +insmac.scm, and the ``commas'' used with the pattern variables in the +rule bodies are a consequence of the WORD syntax. The meaning of the +commas is identical to the meaning of the commas in a ``backquote'' +Scheme expression, and is briefly described in section 5.3.1. + + 5.2. Rule variable syntax. + +Although qualifiers and the simple variable syntax shown are +sufficient, some additional variable syntax is available for common +patterns. Moreover, the early matcher (used when +COMPILER:ENABLE-EXPANSION-DECLARATIONS? is true) cannot currently +handle qualifiers but can handle the additional variable syntax that +can supplant most qualifiers. The early matcher is used only on the +assembler rules, so if you want to use it, you only need to use the +restricted language when writing those rules. + +The complete variable syntax is as follows: + +* (? <name>) This syntax matches anything in that position of the +potential instance, and binds <name> to the sub-structure matched. + +* (? <name> <transform>) This syntax matches anything in that position +of the potential instance as long as <transform> returns non-false on +the sub-structure matched. <name> is bound to the result returned by +<transform>. For example, + (? q (lambda (obj) (and (number? obj) (* 2 obj)))) +will match 2, and Q will be bound to 4, but will not match FOO. + +* (? <name1> <transform> <name2>) <name1> and <transform> have the same +meaning as in the previous syntax, and this syntax matches exactly the +same objects, but provides the additional convenience of binding +<name2> to the sub-structure matched, before the transformation. +For example, + (? q (lambda (obj) + (and (pair? obj) + (number? (car obj)) + (- (car obj) 23))) + z) +will match (2 . HELLO), Q will be bound to -21, and Z will be bound to +(2 . HELLO), and will not match 34 or (HELLO . 2). + + ==> The pattern parser seems to understand (?@ <name>) as well, but +this syntax is used nowhere. The early parser does not understand it. +Should it be flushed? + + 5.3. Writing statement rules. + +Statement rules provide the translation between RTL instructions and +fragments of assembly language. Most RTL instructions are +assignments, where an RTL register is written with the contents of a +virtual location or the result of some operation. + + 5.3.1. Output of the statement rules + +The output of the statement rules is a fragment of assembly language +written in the syntax expected by the LAP assembler. The fragments, +containing any number of machine instructions, are constructed by +using the LAP macro, built on top of Scheme's QUASIQUOTE (back-quote). +Within a LAP form, you can use UNQUOTE (comma) and UNQUOTE-SPLICING +(comma at-sign) to tag subexpressions that should be evaluated and +appended. For example, + (LAP (MOV L ,r1 ,r2) + (ADD L ,r3 ,r2)) +constructs a fragment with two instructions in it where the values of +r1, r2, and r3 are substituted in the instructions. +The code + (LAP (MOV L ,r1 ,r2) + ,@(generate-test r2)) +constructs a fragment whose first instruction is a MOV instruction, +and the rest is the fragment returned by generate-test. + +The INST macro is similar to LAP but constructs a single instruction. +It should not be used unless necessary (i.e. in +LAP:MAKE-LABEL-STATEMENT), since you may find yourself later wanting +to change a single instruction into a fragment in a utility procedure, +and having to find every use of the procedure. + + ==> We should change the linearizer to expect +LAP:MAKE-LABEL-STATEMENT to return a fragment, and do away with INST. + +An additional macro, INST-EA, is provided to construct a piece of +assembly language representing an addressing mode. For example, +INST-EA is used by the following procedure in the Vax back-end: + + (define (non-pointer->ea type datum) + (if (and (zero? type) + (<= 0 datum 63)) + (INST-EA (S ,datum)) + (INST-EA (&U ,(make-non-pointer-literal type datum))))) + +where non-pointer->ea may be used in + + (LAP (MOV L ,(non-pointer->ea <type> <datum>) + ,(any-register-reference target))) + +INST-EA is superfluous on machines without general addressing modes +(i.e. load-store architectures). + +Each port provides a procedure, named REGISTER-REFERENCE, that maps +between RTL machine registers and the assembly language syntax used to +refer to the registers. It uses INST-EA to build such references. + +The macros LAP, INST, and INST-EA, besides providing the functionality +of QUASIQUOTE, also provide a hook for the compiler pre-processing +time assembly of the code generated by the rules. + + 5.3.2. Hardware register allocation + +Hardware register allocation occurs during the RTL->LAP translation. +The rules, besides generating assembly language, invoke utilities +provided by the register allocator to reserve and free hardware +registers on which the operations can be performed. + +Hardware registers are often divided into different non-overlapping +types that are used in different operations. For example, modern +hardware typically has a set of integer registers and a set of +floating point registers. Address operations typically require +operands in integer registers, while floating point operations +typically require floating point registers. On some machines, notably +the Motorola 68K family, the integer register set is further +subdivided into types with specific operations (address and data). + +The register allocator manipulates RTL registers. RTL registers are +just small integers. The low end of the valid range of RTL registers +is used to represent the physical registers of the processor (called +machine registers), and the rest of the numbers represent virtual +(pseudo) registers. The core allocator operations are given an RTL +register number and a register type, and return a suitable machine +register to be used for the operation. + +A machine register that holds the value of a pseudo register is called +an ``alias'' for the pseudo register. A pseudo register may have many +valid aliases simultaneously, usually of different types. Any +assignment to the pseudo register will invalidate all aliases but one, +namely the machine register actually written, rather than copy the new +value into all the previous aliases. Thus source references and +destination references have different effects, and are handled by +different procedures in the register allocator. + +Pseudo registers have associated homes, memory locations that hold +their values when the machine registers are needed for other purposes. +Most pseudo registers are never written to their homes, since a pseudo +register's value is usually kept in machine register aliases until the +pseudo register is dead, i.e. until its value is no longer needed. A +pseudo register's aliases can be reused for other purposes if there +are other remaining aliases or this is the last reference to the +pseudo register. An alias that can be reused is a ``reusable'' alias. +Occasionally, the value of a pseudo register may be transferred to the +register's home and the last alias invalidated, if the register +allocator is running out of registers. This is called ``spilling'' a +register. + +The register allocator maintains a table of associations, called the +``register map'', that associates each pseudo register with its valid +aliases, and each machine register with the pseudo register whose +value it holds (if any). The register allocator routines modify the +register map after aliases are requested and invalidated, and they +generate assembly language instructions to perform the necessary data +motion for spilling and re-loading at run time. These instructions +are usually inserted before the code output of the RTL rule in +execution. + +If you have chosen your RTL register numbers for machine registers so +that they match the hardware numbers, and your assembly language does +not distinguish between references to a register and other fields, you +can ignore register references and use the RTL register numbers +directly. This is commonly the case when using integer registers in +load-store architectures. + +As a convenience, the register allocator also provides operations that +manipulate register references. A register reference is a fragment of +assembly language, typically a register addressing mode for general +register machines, that when inserted into a LAP instruction, denotes +the appropriate register. For example, on the Motorola MC68020, +physical register A3 is represented as RTL register number 11, and a +register reference for it would be ``(A 3)''. RTL pseudo register 44 +may at some point have RTL machine register 11 as its only +address-register alias. At that time, (REGISTER-ALIAS 44 'ADDRESS) +would return 11. + +The interface to the register allocator is defined in +compiler/back/lapgn2.scm. Not all ports use all of the procedures +defined there. Often a smaller subset is sufficient depending on +whether there are general addressing modes, etc. A list of the most +frequently used follows: + +* REGISTER-ALIAS expects an RTL register and a register type, and +returns a machine register of the specified type that is a valid alias +for that RTL register if there is one, or false if there is none. +This procedure should only be used for source operand RTL registers. +If the register type is false, then REGISTER-ALIAS will return any +valid alias. + +* LOAD-ALIAS-REGISTER! is like REGISTER-ALIAS but always returns a +machine register, allocating one of the specified type if necessary. +This procedure should only be used for source operand RTL registers. + +* REFERENCE-ALIAS-REGISTER! performs the same action as +LOAD-ALIAS-REGISTER! but returns a register reference instead of an +RTL register number. + +* ALLOCATE-ALIAS-REGISTER! expects an RTL register and a register +type, and returns a machine register of the specified type that is the +only alias for the RTL register and should be written with the new +contents of the RTL register. ALLOCATE-ALIAS-REGISTER! is used to +generate aliases for target RTL registers. + +* REFERENCE-TARGET-ALIAS! performs the same action as +ALLOCATE-ALIAS-REGISTER! but returns a register reference instead of +an RTL register number. See CLEAR-REGISTERS! below. + +* STANDARD-REGISTER-REFERENCE expects an RTL register, a register +type, and a boolean. It will return a reference for an alias of the +specified register containing the current value of the RTL register. +This reference will be of the specified type if the boolean is false, +or sometimes of other types if the boolean is true. In other words, +the boolean argument determines whether other types are acceptable, +although not desirable. The register type may be false, specifying +that there really is no preference for the type, and any reference is +valid. STANDARD-REGISTER-REFERENCE should be used only for source +operands (i.e. those that already contain data), and may return a +memory reference for those machines with general addressing modes if +there is no preferred type and alternates are acceptable. + +* MOVE-TO-ALIAS-REGISTER! expects a source RTL register, a register +type, and a target RTL register. It returns a new alias for the +target of the specified type containing a copy of the current contents +of the source. Often this is accomplished by choosing an alias of the +source that already contains the correct data and making it the only +alias for target. MOVE-TO-ALIAS-REGISTER! attempts to reuse an alias +for the source register. + +* MOVE-TO-TEMPORARY-REGISTER! expects a source RTL register and a +register type and returns an appropriate register containing a copy of +the source. The register is intended for temporary use, that is, use +only within the code generated by the expansion of the current RTL +instruction, and as such it should not be permanently recorded in the +register map. The register becomes automatically freed for subsequent +RTL instructions. MOVE-TO-TEMPORARY-REGISTER! attempts to reuse an +alias for the source register. + +* REUSE-PSEUDO-REGISTER-ALIAS! expects an RTL register, a register +type, and two procedures. It attempts to find a reusable alias for +the RTL register of the specified type, and invokes the first +procedure giving it the alias if it succeeds, or the second procedure +with no arguments if it fails. MOVE-TO-ALIAS-REGISTER! and +MOVE-TO-TEMPORARY-REGISTER! use REUSE-PSEUDO-REGISTER-ALIAS! but +occasionally neither meets the requirements. + +* NEED-REGISTER! expects an RTL machine register and informs the +register allocator that the rule in use requires that register so it +should not be available for subsequent requests while translating the +current RTL instruction. The register is available for later RTL +instructions unless the relevant rules invoke NEED-REGISTER! again. +The procedures described above that allocate and assign aliases and +temporary registers call NEED-REGISTER! behind the scenes, but you will +need to invoke it explicitly when calling out-of-line routines. + +* LOAD-MACHINE-REGISTER! expects an RTL register and an RTL machine +register and generates code that copies the current value of the RTL +register to the machine register. It is used to pass arguments in +fixed registers to out-of-line code, typically in the compiled code +runtime library. + +* ADD-PSEUDO-REGISTER-ALIAS! expects an RTL pseudo-register and an +available machine register (no longer an alias), and makes the +specified machine register an alias for the pseudo-register. + +* CLEAR-REGISTERS! expects any number of RTL registers and clears them +from the register map, preserving their current contents in memory if +needed. It returns the code that will perform the required motion at +runtime. It should be used before invoking LOAD-MACHINE-REGISTER! to +ensure that the potentially valid previous contents of the machine +register have been saved. + +* CLEAR-MAP! deletes all aliases from the register map, pushing the +data only held in aliases into the memory homes if needed. This +procedure returns an assembly language code fragment, and is typically +used before invoking out-of-line code. + +* DELETE-DEAD-REGISTERS! informs the register allocator that RTL +pseudo registers whose contents will not be needed after the current +RTL instruction can be eliminated from the register map and their +aliases subsequently used for other purposes. + +Most of the rules are actually written in terms of machine-specific +procedures that invoke the procedures listed above in fixed ways. +Rule bodies typically match the following code pattern: + + (let* ((rs1 (standard-source source1)) + (rs2 (standard-source source2)) + (rt (standard-target target))) + (LAP ...)) + +where STANDARD-SOURCE and STANDARD-TARGET are machine-specific +procedures. The reason for the use of LET* (instead of LET) is given +below. + +On a machine with general addressing modes and memory operands, we +might provide their definitions as follows: + + (define (standard-source rtl-reg) + (standard-register-reference rtl-reg 'GENERAL true)) + (define (standard-target rtl-reg) + (delete-dead-registers!) + (reference-target-alias! rtl-reg 'GENERAL)) + +while on a load-store architecture we might define them as follows: + + (define (standard-source rtl-reg) + (load-alias-register! rtl-reg 'GENERAL)) + (define (standard-target rtl-reg) + (delete-dead-registers!) + (allocate-alias-register! rtl-reg 'GENERAL)) + + - VERY IMPORTANT: - + +This example brings up the cardinal rule of RTL assignments: + + Any rule that writes into an RTL pseudo-register MUST invoke + DELETE-DEAD-REGISTERS! after allocating aliases for the necessary + sources but before allocating an alias for the target. + +If this is not done, the register allocator may decide to spill +no-longer valid data into memory, which will probably make the +compiler get confused in other ways or cause garbage collection +problems later. If it is done too early, the last valid alias for a +source operand may have been reused in the interim, and the compiler +will assume that the source quantity is contained in memory and will +often generate code that fetches and operates on garbage. + +The example above uses LET* instead of LET. LET would not work in the +above example because Scheme does not specify the order of argument +evaluation, and Liar chooses arbitrary orders, so the +DELETE-DEAD-REGISTERS! implicit in STANDARD-TARGET might be called too +early possibly causing STANDARD-SOURCE to fail. + +MOVE-TO-ALIAS-REGISTER! invokes DELETE-DEAD-REGISTERS! because it +simultaneously allocates an alias for a source and for a target. +Thus, if there are other source operands, their aliases must be +allocated before MOVE-TO-ALIAS-REGISTER! is invoked. + + 5.3.3. Invocation rules, etc. + +The meaning and intent of most statement rules in an existing port is +readily apparent. The more arcane rules have to do with procedures +and the representation of numbers. What follows is a description of +some of the more obscure rules related to procedures and some of the +implementation concepts required to understand them. + +In the invocation rules, FRAME-SIZE is the number of arguments passed +in the call (often plus one), and there is often more than one rule +with the same keyword, typically to handle the common cases (small +FRAME-SIZE) more efficiently. + +Various of the rules specify the number of arguments that the +resulting procedure will accept. The range is described in terms of +two parameters, MIN and MAX: + - MIN is always positive and it is one greater than the smallest +number of arguments allowed. + - MAX may be positive or negative. If positive, it is one greater +than the largest number of arguments allowed. If negative, it +indicates that the procedure will accept an unbounded number of +arguments, and the absolute value of MAX, minus (MIN + 1), is the +number of positional optional parameters. Either way, the absolute +value of MAX is the size of the procedure's call frame counting the +procedure itself. + These two values are encoded in the format word of the resulting +procedures so that dynamic APPLY can check the number of arguments +passed and reformat the stack frame appropriately. + Non-positive MINs are used to indicate that the compiled entry point +is not a procedure, but a return address, a compiled expression, or a +pointer to an internal label. + +The CONS-CLOSURE rules will dynamically create some instructions in +the runtime heap, and these instructions must be visible to the +processor's instruction fetch unit. If the instruction and data +caches are not automatically kept consistent by the hardware, +especially for newly addressed memory, the caches must be explicitly +synchronized by the Scheme system. On machines where the programmer +is given no control over the caches, this will be very hard to do. On +machines where the control is minimal or flushing is expensive, the +following solution can be used to amortize the cost: + +The CONS-CLOSURE rules can generate code to allocate a closure from a +pre-allocated pool and invoke an out-of-line routine to refill the +pool when it is empty. The routine allocates more space from the +heap, initializes the instructions, and synchronizes the caches. + +Since the real entry points are not known until the closure objects +are created, instead of using absolute jumps to the real entry points, +the pre-allocated closures can contain jumps to a fixed routine that +will extract the real entry point from the word pointed at by the +return address and invoke it. In other words, the code inserted in +closure objects will be + + jsr fixed-routine + <storage for real-entry-point> + +and fixed-routine, written in assembly language, will do something like + + load 0(return-address),rtemp + jmp 0(rtemp) + +The 68040 version of the Motorola 68000 family port uses this trick +because the 68040 cache is typically configured in copyback mode, and +synchronizing the caches involves an expensive supervisor (OS) call. +The Alpha back-end also uses this trick because the caches can be +synchronized only by using the CALL_PAL IMB instruction, which flushes +the complete instruction cache, therefore implying a large re-start +cost. The Alpha version of this code is currently better than the +68040 version, so you should probably emulate that version. + +* (INVOCATION:UUO-LINK (? frame-size) (? continuation) (? name)) + This rule is used to invoke a procedure named by a free variable. +It is the rule used to generate a branch to an execute cache as +described in cmpint.txt. The rule should allocate a new execute cache +in the compiled code block by using FREE-UUO-LINK-LABEL, and should +then branch to the instruction portion of the execute cache. +FRAME-SIZE is the number of arguments passed in the call, plus one. + +* (INVOCATION:GLOBAL-LINK (? frame-size) (? continuation) (? name)) + This rule is identical to the previous one, except that the free +variable must be looked up in the global environment. It is used to +improve the expansion of some macros that insert explicit references +to the global environment (e.g. The expansion for FLUID-LET inserts +uses (ACCESS DYNAMIC-WIND #f) as the operator of a call). + +* (INVOCATION-PREFIX:MOVE-FRAME-UP (? frame-size) (? address)) + This rule is used to shift call frames on the stack to maintain +proper tail recursion. ADDRESS specifies where to start pushing the +frame. It should be a pointer into the used portion of the stack, +i.e. point to a higher address. + +For example, assume that what follows depicts the stack before + (INVOCATION-PREFIX:MOVE-FRAME-UP 3 addr) + + | ... | + | | + +-------------------------------+ + | <value n> | +addr -> +-------------------------------+ + | | direction of + | | stack growth + | | + | ... | | + | | | + | | V + | | + +-------------------------------+ + | <value 3> | + +-------------------------------+ + | <value 2> | + +-------------------------------+ + | <value 1> | +spbf -> +-------------------------------+ + +Where spbf is the contents of the stack pointer register. +After the invocation prefix, it will look as follows: + + | ... | + | | + +-------------------------------+ + | <value n> | direction of +addr -> +-------------------------------+ stack growth + | <value 3> | + +-------------------------------+ | + | <value 2> | | + +-------------------------------+ V + | <value 1> | +spaf -> +-------------------------------+ + +The stack pointer register will now contain the value of spaf. + +* (INVOCATION-PREFIX:DYNAMIC-LINK (? frame-size) (? address-1) (? address-2)) + This rule is similar to the INVOCATION-PREFIX:MOVE-FRAME-UP rule, +but is used when the destination of the frame is not known at compile +time. The destination depends on the continuation in effect at the +time of the call, and the section of the stack that contains enclosing +environment frames for the called procedure. Two addresses are +specified and the one that is closest to the current stack pointer +should be used, that is, the target address is the numerically smaller +of the two addresses since the Liar stack grows towards smaller +addresses. + ==> This rule need not need not exist in the RTL. It could be +expanded into a comparison and a use of +INVOCATION-PREFIX:MOVE-FRAME-UP with a computed address. + +* (ASSIGN (REGISTER (? target)) + (CONS-CLOSURE (ENTRY:PROCEDURE (? procedure-label)) + (? min) (? max) (? size))) + This rule issues the code to create a closure object whose real +entry point is PROCEDURE-LABEL, that will accept a number of arguments +specified by MIN and MAX, and that will have storage for SIZE free +variables. The free variable storage need not be initialized since it +will be written immediately by subsequent RTL instructions. The entry +point of the resulting closure object should be written to RTL +register TARGET. The format of closure objects is described in +cmpint.txt. + +* (ASSIGN (REGISTER (? target)) + (CONS-MULTICLOSURE (? nentries) (? size) (? entries))) + This rule is similar to the previous rule, but issues code to +allocate a closure object with NENTRIES entry points. ENTRIES is a +vector of entry-point descriptors, each being a list containing a +label, a min, and a max as in the previous rule. TARGET receives the +compiled code object corresponding to the first entry. + +* (OPEN-PROCEDURE-HEADER (? label-name)) + This rule and its siblings are used to generate the entry code to +procedures and return addresses. On entry to procedures and +continuations, a gc/interrupt check is performed, and the appropriate +routine in the runtime library is invoked if necessary. This check is +performed by comparing the memory Free pointer to the compiled code's +version of the MemTop pointer. The low-level interrupt handlers +change the MemTop pointer to guarantee that such comparisons will fail +in the future. A standard header generates the following code: + + (LABEL gc-label) + <code to invoke the runtime library> + <format and gc words for the entry point> + (LABEL label-name) + <branch to gc-label if Free >= MemTop> + +Each kind of header invokes a different runtime library utility. In +addition, procedures that expect dynamic links must guarantee that the +dynamic link is preserved around the execution of the interrupt +handler. This is accomplished by passing the contents of the dynamic +link register to the appropriate runtime library utility. + +* (CLOSURE-HEADER (? label-name) (? nentries) (? num-entry)) + NENTRIES is the number of entry points in the closure object, and +NUM-ENTRY is the zero-based index for this entry point. Closure +headers are similar to other procedure headers but also have to +complete the Hand-Shake initiated by the instructions stored in the +closure objects so that the closure object appears on top of the +stack. On architectures where it is necessary, they also have to map +closure objects to their canonical representatives, and back when +backing out because of interrupts or garbage collection. + +The file compiler/machines/port/rules3.scm contains most of these +procedure-related rules. It also contains three procedures that +generate assembly language and are required by the compiler. These +procedures are used to generate initialization code for compiled code +blocks. + +Compiled code blocks have two sections, a code section that contains +the instructions, and a ``constants'' section that contains scheme +objects referenced by the code (e.g. quoted lists and symbols), the +free variable caches for the code, the debugging information +descriptor (more on this later), and the environment where the free +variables in the code must be referenced. This environment is not +known at compile time, so the compiler allocates a slot in the +constants section for it, but the code itself must store it on first +entry. In addition, the linker is invoked on first entry to look up +the free variables and fill the variable caches with their correct +contents. The compiler allocates enough space for each free variable +cache and initializes the space with the information required by the +linker to patch the reference. This information consists of the name +of the free variable in addition to the number of actual arguments +passed (plus one) for execute references. + +If COMPILER:COMPILE-BY-PROCEDURES? is true, the compiler will generate +multiple compiled code blocks, one corresponding to each top-level +lambda expression. Each of these must be initialized and linked, but +instead of initializing them on first entry, the root compiled code +block links all of them when it is entered. + +The linker (a runtime library utility) expects three arguments: + The address of the first word of the compiled code block, the word +containing the GC vector header for the compiled code block. + The address of the first linker section in the constants area of the +compiled code block. The linker sections contain the free variable +caches and are all contiguous. + The number of linker sections in the compiled code block. + +* (GENERATE/QUOTATION-HEADER env-label free-label n-sections) + This procedure generates the code that initializes the environment +slot at location labeled ENV-LABEL. The environment is fetched from +the interpreter's environment register. It also generates code to +invoke the linker on the executing compiled code block. The first +word of the compiled code block is labeled by the value of +*BLOCK-LABEL*, the first linker section is labeled by FREE-LABEL, and +the number of linker sections is N-SECTIONS. + +* (GENERATE/REMOTE-LINK label env-offset free-offset n-sections) + This procedure is similar to GENERATE/QUOTATION-HEADER but is used +to generate code that initializes and links a different compiled code +block. It is used to generate the code to insert into the root +compiled code block to link each of the other compiled code blocks +generated when COMPILER:COMPILE-BY-PROCEDURES? is true. + LABEL is a label in current block's constant section where the +pointer to the code block to be linked is stored, + ENV-OFFSET is the vector offset in the other code block where the +environment of evaluation should be stored, + FREE-OFFSET is the vector offset of the first linker section in the +other compiled code block, and + N-SECTIONS is the number of linker sections in the other block. + +* (GENERATE/CONSTANTS-BLOCK consts reads writes execs global-execs statics) + This procedure generates the assembler pseudo-ops used to generate +the constants and linker section for a compiled code block. This +section consists of: + - The constant objects (e.g. quoted lists) referenced by the code. + - The read variable caches used by the code. + - The write variable caches used by the code. + - The execute variable caches used by the code. + - The global execute variable caches used by the code. + - The locations for static variables. + - A slot for the debugging information descriptor generated by the +compiler. + - A slot for the environment where the code is linked. + +Each word of storage in the constants block is allocated by using a +SCHEME-OBJECT assembler pseudo-op, and the order in which they appear +is the same as the order in which they appear in the final object. +The linker sections (free variable cache sections) must be contiguous, +and each has a one-word header describing the kind of section and its +length. The environment slot must be the last word in the compiled +code block, immediately preceded by the debugging information +descriptor. Each SCHEME-OBJECT directive takes a label and the +initial contents of the slot. + +This procedure is almost machine-independent, and you should be able +to trivially modify an existing version. The only machine dependence +is the layout and size of the storage allocated for each execute cache +(uuo link). This machine-dependence consists entirely of the +definition of the TRANSMOGRIFLY procedure. + +TRANSMOGRIFLY takes a list of the following form: + + ((free-variable-1 (frame-size-1-1 . label-1-1) + (frame-size-1-2 . label-1-2) + ...) + (free-variable-2 (frame-size-2-1 . label-2-1) + (frame-size-2-2 . label-2-2) + ...) + ...) + +This list is interpreted as follows: an execute cache for calls to +FREE-VARIABLE-1 with frame size FRAME-SIZE-1-1 (number of arguments +plus one) must be created, and labeled LABEL-1-1, similarly for + <FREE-VARIABLE-1, FRAME-SIZE-1-2, LABEL-1-2>, + <FREE-VARIABLE-2, FRAME-SIZE-2-1, LABEL-2-1>, etc. + +Assuming that the initial layout of an execute cache is + + free variable name ; labeled word + false ; optional storage (e.g. for branch delay slot) + frame size of call ; arity + 1 + +TRANSMOGRIFLY will return a list of the following form: + + ((frame-variable-1 label-1-1) + (#f dummy-label-1-1) ; optional word(s) + (frame-size-1-1 dummy-label-1-1) + + (frame-variable-1 label-1-2) + (#f dummy-label-1-2) ; optional word(s) + (frame-size-1-2 dummy-label-1-2) + + ... + + (frame-variable-2 label-2-1) + (#f dummy-label-2-1) ; optional word(s) + (frame-size-2-1 dummy-label-2-1) + + ...) + +There may be any number of optional words, but the layout must match +that expected by the macros defined in microcode/cmpint-port.h. In +particular, the length in longwords must match the definition of +EXECUTE_CACHE_ENTRY_SIZE in microcode/cmpint-port.h, and the definition +of EXECUTE-CACHE-SIZE in compiler/machines/port/machin.scm. + +Furthermore, the instructions that the linker will insert should +appear at the word labeled by LABEL-N-M, and should not overwrite the +relevant part of FRAME-SIZE-N-M, since the frame size will be needed +when re-linking after an incremental definition or assignment. + +The output format of TRANSMOGRIFLY is the input format for the read +and write execute cache sections. The procedure DECLARE-CONSTANTS, +local to GENERATE/CONSTANTS-BLOCK, reformats such lists into the final +SCHEME-OBJECT directives and tacks on the appropriate linkage section +headers. + + 5.3.4. Fixnum rules. + +Scheme's generic arithmetic primitives cannot be open-coded fully for +space reasons. Most Scheme code that manipulates numbers manipulates +small integers used as counters, vector indices, etc., and using +out-of-line arithmetic procedures to operate on them would make the +code too slow. The compromise, therefore, is to open-code the common +small integer case, and to handle the rest out of line. This, of +course, does not perform particularly well for the other common case +of floating point data. + +Scheme integers are represented in two formats. The most common, +fixnum representation, uses the datum field of the objects to directly +encode the values. The other format, bignum representation, stores +the values in multiple words in memory, and the datum is a pointer to +this storage. Scheme generic arithmetic procedures will generate +fixnums whenever possible, resorting to bignums when the value exceeds +the range that can be represented in fixnum format. + +Since the open-codings provided for the compiler only handle fixnums, +these open-codings must also detect when the result will not fit in a +fixnum in order to invoke the out-of-line utility that will handle +them correctly. + +Most hardware provides facilities for detecting and branching if an +integer operation overflows. Fixnums cannot use these facilities +directly, because of the tag bits at the high-end of the word. To be +able to use these facilities (and get the sign bit in the right +place), Scheme fixnums are converted to an internal format before they +are operated on, and converted back to Scheme object format before +storing them in memory or returning them as values. + +In this internal format, the value has been shifted left so that the +fixnum sign-bit coincides with the integer sign bit, and a number of +bits in the least-significant end of the word hold zeros. The shift +amount is the length of the type-tag field. + +The rules + (ASSIGN (REGISTER (? target)) (OBJECT->FIXNUM (REGISTER (? source)))) +and + (ASSIGN (REGISTER (? target)) (FIXNUM->OBJECT (REGISTER (? source)))) +perform this translation. + +The open-coding of fixnum arithmetic assumes that the sources and the +result are in this format. This format is good for value comparisons, +addition, subtraction, and bitwise logical operations, but must be +transformed for multiplication, division, and shifting operations. + +In addition to open-coding fixnum operations within generic +arithmetic, fixnum primitives can be invoked directly, and the code +can be open coded as well. Under these circumstances, the result will +not be checked for overflow, and the code generated can be quite +different. The RTL instructions that perform fixnum arithmetic have a +boolean flag that specifies whether overflow conditions should be +generated or not. + +The compiler does not generally require fixnum arithmetic to be open +coded. If the names of all the fixnum primitives are listed in +COMPILER:PRIMITIVES-WITH-NO-OPEN-CODING, all of them will be handled +by issuing code to invoke them out of line. + +There is one exception to this, however. The following rules MUST be +provided: + + (ASSIGN (REGISTER (? target)) + (FIXNUM-2-ARGS MULTIPLY-FIXNUM + (OBJECT->FIXNUM (CONSTANT 4)) + (OBJECT->FIXNUM (REGISTER (? source))) + #F)) + + (ASSIGN (REGISTER (? target)) + (FIXNUM-2-ARGS MULTIPLY-FIXNUM + (OBJECT->FIXNUM (REGISTER (? source))) + (OBJECT->FIXNUM (CONSTANT 4)) + #F)) + +The reason is that VECTOR-REF and VECTOR-SET! translate into a +sequence that uses these patterns when the index is not a compile-time +constant. Of course, you can include VECTOR-REF and VECTOR-SET! in +compiler:PRIMITIVES-WITH-NO-OPEN-CODING to avoid the problem +altogether, but this is probably not advisable. + + 5.3.5. Rules used to invoke the runtime library + +Some of the rules issue code that invokes the runtime library. The +runtime library is invoked through a primary entry point, +SCHEME-TO-INTERFACE, typically directly accessible through a dedicated +processor register. SCHEME-TO-INTERFACE expects at least one and up +to five arguments. The first argument is the index of the runtime +library service to invoke, and the rest are the parameters to the +service routine. These arguments are passed in fixed locations, +typically registers. Runtime library utilities return their values +(if any) in the compiler's value register. The following is a typical +example of such an invocation where INVOKE-INTERFACE expects the index +of a utility, and generates the code that writes the index into the +appropriate location and jumps to SCHEME-TO-INTERFACE. + + (define-rule statement + (INVOCATION:APPLY (? frame-size) (? continuation)) + (LAP ,@(clear-map!) + ,@(load-rn frame-size 2) + (MOV L (@R+ 14) (R 1)) + ,@(invoke-interface code:compiler-apply))) + +The code names are typically defined in +compiler/machines/port/lapgen.scm. + +Many of the utilities expect return addresses as their first argument, +and it is convenient to define a procedure, INVOKE-INTERFACE-JSB +(sometimes called LINK-TO-INTERFACE) which receives an index but +leaves the appropriate return address in the first argument's +location. INVOKE-INTERFACE-JSB can be written by using +INVOKE-INTERFACE (and SCHEME-TO-INTERFACE), but given the frequency of +this type of call, it is often written in terms of an alternate entry +point to the runtime library (e.g. SCHEME-TO-INTERFACE-JSB). + +An example of a more complicated call to the runtime library is + (define-rule statement + (INTERPRETER-CALL:CACHE-ASSIGNMENT (? extension) (? value)) + (QUALIFIER (and (interpreter-call-argument? extension) + (interpreter-call-argument? value))) + (let* ((set-extension + (interpreter-call-argument->machine-register! extension r2)) + (set-value + (interpreter-call-argument->machine-register! value r3)) + (clear-map (clear-map!))) + (LAP ,@set-extension + ,@set-value + ,@clear-map + ,@(invoke-interface-jsb code:compiler-assignment-trap)))) +where INTERPRETER-CALL-ARGUMENT->MACHINE-REGISTER! invokes +CLEAR-REGISTERS! and NEED-REGISTER! besides performing the assignment. + +For very frequent calls, the assembly language part of the runtime +library can provide additional entry points. The calling convention +for these would be machine-dependent, but frequently they take +arguments in the same way that SCHEME-TO-INTERFACE and +SCHEME-TO-INTERFACE-JSB take them, but avoid passing the utility +index, and may do part or all of the work of the utility in assembly +language instead of invoking the portable C version. Many of the +ports have out-of-line handlers for generic arithmetic, with the +commond fixnum/flonum cases handled there. + +The following is a possible specialized version of apply +where the special entry point expects the procedure argument on the +stack rather than in a fixed register: + + (define-rule statement + (INVOCATION:APPLY (? frame-size) (? continuation)) + (LAP ,@(clear-map!) + ,@(load-rn frame-size 2) + (JMP ,entry:compiler-apply))) + +The procedure object will have been pushed on the stack by earlier +code. + + 5.4. Writing predicate rules. + +Predicate rules are used to generate code to discriminate between +alternatives at runtime. The code generated depends on the +conditional branch facilities of the hardware at hand. There are two +main ways in which architectures provide conditional branching +facilities: + +* condition codes. Arithmetic instructions compute condition codes +that are stored in hardware registers. These hardware registers may +be targeted explicitly by the programmer or implicitly by the +hardware. Conditional branch instructions determine whether to branch +or not depending on the contents of the condition registers at the +time the branch instruction is executed. These condition registers +may be named explicitly by the instructions, or assumed implicitly. + +* compare-and-branch instructions. The instruction set includes +instructions that compare two values (or a value against 0) and branch +depending on the comparison. The results of the comparison are not +stored in special or explicit registers, since they are used +immediately, by the instruction itself, to branch to the desired +target. + +Liar accommodates both models for branching instructions. +Predicate rules generate code that precede the actual branches, and +then invoke the procedure SET-CURRENT-BRANCHES! informing it of the +code to generate to branch to the target. +Depending on the model, the prefix code may be empty, and all the code +may appear in the arguments to SET-CURRENT-BRANCHES! + +SET-CURRENT-BRANCHES! expects two procedures as arguments. Each of +them receives a label as an argument, and is supposed to generate code +that branches to the label if the predicate condition is true (first +argument) or false (second argument). Both options are provided +because linearization of the control-flow graph occurs after LAP +generation, and it is therefore not known when the predicate rule is +fired which of the two possible linearizations will be chosen. + +Thus on an architecture with condition codes, the rule will return the +code that performs the comparison, targeting the appropriate +condition-code registers (if they are not implicit), and the arguments +to SET-CURRENT-BRANCHES! will just generate the conditional-branch +instructions that use the generated condition codes. + +On an architecture with compare-and-branch instructions, the code +returned by the rule body will perform any work needed before the +compare-and-branch instructions, and the arguments to +SET-CURRENT-BRANCHES! will generate the compare-and-branch +instructions. + +For example, on the DEC Vax, a machine with implicit condition codes, +where compare (and most) instructions set the hidden condition-code +register, a predicate rule could be as follows: + (define-rule predicate + (EQ-TEST (REGISTER (? register-1)) (REGISTER (? register-2))) + (set-current-branches! + (lambda (label) + (LAP (B EQL (@PCR ,label)))) + (lambda (label) + (LAP (B NEQ (@PCR ,label))))) + (LAP (CMP L ,(any-register-reference register-1) + ,(any-register-reference register-2)))) +The prefix code performs the comparison. The arguments to +SET-CURRENT-BRANCHES! branch depending on the result. + +On the HP Precision Architecture (Spectrum), a machine with +compare-and-branch instructions, the same rule would be written as +follows: + (define-rule predicate + ;; test for two registers EQ? + (EQ-TEST (REGISTER (? source1)) (REGISTER (? source2))) + (let* ((r1 (standard-source! source1)) + (r2 (standard-source! source2))) + (set-current-branches! + (lambda (label) + (LAP (COMB (EQ) ,r1 ,r2 (@PCR ,label)) + (NOP ()))) ; handle delay slot + (lambda (label) + (LAP (COMB (LTGT) ,r1 ,r2 (@PCR ,label)) + (NOP ())))) ; handle delay slot + (LAP))) +There is no prefix code, and the arguments to SET-CURRENT-BRANCHES! +perform the comparison and branch. + +The (OVERFLOW-TEST) predicate condition does not fit this model +neatly. The current compiler issues overflow tests when open-coding +generic arithmetic. Fixnum overflow implies that bignums should be +used for the result, and this predicate is used to conditionally +invoke out-of-line utilities. + +The problem is that the decomposition of the code assumes that the +result of the overflow test is stored implicitly by the code that +generates the arithmetic instructions, and that this condition can be +later used for branching by the code generated for (OVERFLOW-TEST). +The code for the test will be adjacent to the code for the +corresponding arithmetic operation, but the compiler assumes that the +condition can be passed implicitly between these adjacent +instructions. This decomposition only matches hardware with condition +codes. + +Hardware with compare-and-branch instructions can be accommodated by +explicitly computing conditions into a hardware register reserved for +this purpose, and the code generated by the predicate rule can then +branch according to the contents of this register. On these machines, +the arithmetic operator will not only generate the desired result, but +will set or clear a fixed register according to whether the +computation overflowed or not. The predicate code will then branch +when the fixed register contains a non-zero value for the first +linearization choice, or zero for the other possibility. + +This problem is particularly acute on MIPS processors. The MIPS +architecture does not detect overflow conditions, so the overflow +condition must be computed by examining the inputs and outputs of the +arithmetic instructions. There are conditional branches used just to +store the correct overflow condition in a register, and the code +generated for the overflow test will then branch again depending on +the value stored. This makes the code generated by the open-coding of +generic arithmetic contain multiple branches and quite large. + +The Spectrum port solves this problem a little differently. On the +Spectrum, arithmetic instructions can conditionally cause the +following instruction to be skipped. Since the code generated by +(OVERFLOW-TEST) is guaranteed to follow the code generated by the +arithmetic operation, the last instruction generated by the arithmetic +operations conditionally skips if there is no overflow. The +(OVERFLOW-TEST) code generates an unconditional branch for the first +linearization choice, and an unconditional skip and an unconditional +branch for the alternative linearization. + +A more efficient solution, currently employed in the MIPS port +(version 4.87 or later) depends on the fact that the RTL instruction +immediately preceding an RTL OVERFLOW-TEST encodes the arithmetic +operation whose overflow condition is being tested. Given this +assumption (that the arithmetic operation producing the overflow +conditions and the test of such condition are adjacent), the rule for +OVERFLOW-TEST need not generate any code, and the rule for the +arithmetic operation can generate both the prefix code and invoke +SET-CURRENT-BRANCHES! as appropriate. This is possible because the +RTL encoding of arithmetic operations includes a boolean flag that +specifies whether the overflow condition is desired or not. + + 6. Suggested ordering of tasks. + +The task of porting the compiler requires a lot of work. In the past, +it has taken approximately three full weeks for a single person +knowledgeable with MIT Scheme and the compiler, but without +documentation. This guide was written after the first three ports. + +One unfortunate aspect is that a lot of mechanism must be in place +before most of the compiler can be tried out. In other words, there +is a lot of code that needs to be written before small pieces can be +tested, and the compiler is not properly organized so that parts of it +can be run independently. + +Note also that cmpint-port.h, machin.scm, rules3.scm, and +cmpaux-port.m4 are very intertwined, and you may often have to iterate +while writing them until you converge on a final design. + +Keeping all this in mind, here is a suggested ordering of the tasks: + + 6.1. Learn the target instruction set well. + +In particular, pay close attention to the branch and jump instructions +and to the facilities available for controlling the processor caches +(if necessary). You may need to find out the facilities that the +operating system provides if the instructions to control the cache are +privileged instructions. + + 6.2. Write microcode/cmpint-port.h: + +cmpint.txt documents most of the definitions that this file must +provide. + + 6.2.1. Design the trampoline code format. Trampolines are used to +invoke C utilities indirectly. In other words, Scheme code treats +trampolines like compiled Scheme entry points, but they immediately +invoke a utility to accomplish their task. Since +return-to-interpreter is implemented as a trampoline, you will need to +get this working before you can run any compiled code at all. + + 6.2.1. Design the closure format and the execute cache format. This +is needed to get the Scheme part of the compiler up AND to get the +compiled code interface in the microcode working. Try to keep the +number of instructions low since closures and execute caches are very +common. + + 6.2.2. Design the interrupt check instructions that are executed on +entry to every procedure, continuation, and closure. Again, try to +keep the number of instructions low, and attempt to make the +non-interrupting case fast at the expense of the case when interrupts +must be processed. Note that when writing the Scheme code to generate +the interrupt sequences, you can use the ADD-END-OF-BLOCK-CODE! +procedure to make sure that the interrupt sequence does not confuse +your hardware's branch prediction strategy. + + 6.2.3. Given all this, write cmpint-port.h. Be especially careful +with the code used to extract and insert absolute addresses into +closures and execute caches. A bug in this code would typically +manifest itself much later, after a couple of garbage collections. + +During this process you will be making decisions about what registers +will be fixed by the port, namely the stack pointer, the free pointer, +the register block pointer, and at least one register holding the +address of a label used to get back to C, typically +scheme_to_interface. + + 6.3. Write machin.scm: + +Most of the definitions in this file have direct counterparts or are +direct consequences of the code in and microcode/cmpint-port.h, so it +will be mostly a matter of re-coding the definitions in Scheme rather +than C. + +In particular, you will have to decide how registers are going to be +used and split between your C compiler and Liar. If your architecture +has a large register set, you can let C keep those registers to which +it assigns a fixed meaning (stack pointer, frame pointer, global +pointer), and use the rest for Liar. If your machine has few +registers or you feel more ambitious, you can give all the registers +to Liar, but the code for transferring control between both languages +in cmpaux-port.m4 will become more complex. Either way, you will need +to choose appropriate registers for the Liar fixed registers (stack +pointer, free pointer, register block pointer, dynamic link register +and optionally, datum mask, return value register, memtop register, +and scheme_to_interface address pointer). + + 6.4. Write the assembler: + +You can write the assembler any old way you want, but it is easier to +use the branch tensioner and the rest of the facilities if you use the +same conventions that the existing assemblers use. In particular, +with any luck, you will be able to copy inerly.scm, insmac.scm, and +parts of assmd.scm verbatim from an existing port, and for most +machines, coerce.scm is straightforward to write. + +assmd.scm defines procedures that depend only on the endianness of the +architecture. You may want to start with the MIPS version since this +version accommodates both endianness possibilities as MIPS processors +can be configured either way. If your processor has fixed endianness, +you can prune the inappropriate code. The block-offset definitions +must agree with those in microcode/cmpint-port.h, and the padding +definitions are simple constants. + +Assuming that you decide to use the same structure as existing +assemblers, you may need to write parsers for addressing modes if your +machine has them. You can use the versions in the MC68020 (bobcat), +Vax, and i386 (Intel 386) ports for guidance. Addressing modes are +described by a set of conditions under which they are valid, and some +output code to issue. The higher-level code that parses instructions +in insmac.scm must decide where the bits for the addressing modes must +appear. The MC68020 version divides the code into two parts, the part +that is inserted into the opcode word of the instruction (further +subdivided into two parts), and the part that follows the opcode word +as an extension. The Vax version produces all the bits at once since +addressing modes are not split on that architecture. You should write +the addressing mode definitions in port/insutl.scm, plus any +additional transformers that the instruction set may require. + +Once you have the code for the necessary addressing modes and +transformers (if any), and the parsing code for their declarations in +port/insmac.scm, writing the instr<n>.scm files should not be hard. +Remember to include pseudo-opcodes for inserting constants in the +assembly language, and for declaring external labels so that the +gc-offset to the beginning of the compiled code block will be inserted +correctly. See for example, the definition of the EXTERNAL-LABEL +pseudo-opcode in machines/mips/instr1.scm, and its use in +machines/mips/rules3.scm. + + 6.5. Write the LAPGEN rules: + +You will need to write lapgen.scm, rules1.scm, rules2.scm, rules3.scm, +and parts of rules4.scm. Most of rules4.scm is not used by the +compiler with the ordinary switch settings and the code may no longer +work in any of the ports, and rulfix.scm and rulflo.scm are only +necessary to open code fixnum and flonum arithmetic. A good way to +reduce the amount of code needed at first is to turn primitive open +coding off, and ignore rulfix.scm and rulflo.scm. + +Lapgen.scm need not include the shared code used to deal with fixnums +and flonums, but will require the rest, especially the code used to +invoke utilities in the compiled code interface. + +rules1.scm and rules2.scm are relatively straightforward since the +RTL instructions whose translations are provided there typically map +easily into instructions. + +rules4.scm need only have the INTERPRETER-CALL:CACHE-??? rules, and +these are simple invocations of runtime library routines which you can +emulate from exisiting ports. + +rules3.scm is an entirely different matter. It is probably the +hardest file to write when porting the compiler. The most complicated +parts to understand, and write, are the closure code, the invocation +prefix code, and the block assembly code. + + The block assembly code can be taken from another port. You will +only have to change how the transmogrifly procedure works to take into +account the size and layout of un-linked execute caches. + + The invocation prefix code is used to adjust the stack pointer, and +move a frame in the stack prior to a call to guarantee proper tail +recursion. The frame moved is the one pointed at by the stack +pointer, and it may be moved a distance known at compile time +(invocation-prefix:move-frame-up rules) or a distance that cannot be +computed statically (invocation-prefix:dynamic-link rules). The +move-frame-up rules are simple, but you should remember that the +starting and ending locations for the frame may overlap, so you must +ensure that data is not overwritten prematurely. The dynamic link +rules are similar to the move-frame-up rules (and typically share the +actual moving code) but must first decide the location where the frame +should be placed. This is done by comparing two possible values for +the location, and choosing the value closest to the current stack +pointer (i.e. numerically lower since the stack grows towards smaller +addresses). Again, the source and target locations for the frame may +overlap, so the generated code must be careful to move the data in +such a way that no data will be lost. + + The closure code is the most painful to write. When writing +cmpint-port.h you decided what the actual code in closure entries +would be, and the code for closure headers is a direct consequence of +this. The combination of the instructions in a closure object, the +helper instructions in assembly language (if any), and the +instructions in the closure header must ultimately push the closure +object (or its canonical representative) on the stack as if it were +the last argument to the procedure, and pending interrupts (and gc) +must be checked on entry to the closure. The interrupt back-out code +is different from the ordinary procedure interrupt back-out code +because the procedure object (the closure or its representative) is on +top of the stack. + + The cons-closure rules are used to allocate closure objects from the +runtime heap. Some of this allocation/initialization may be done out +of line, especially if ``assembling'' the appropriate instructions on +the fly would require a lot of code. In addition, you may have to +call out-of-line routines to synchronize the processor caches or +block-allocate multiple closure entries. + + 6.6. Write stubs for remaining port files: + +rgspcm.scm and dassm1.scm can be copied verbatim from any other port. + +lapopt.scm only needs to define an identity procedure. + +rulfix.scm, rulflo.scm, and rulrew.scm need not define any rules, +since you can initially turn off open coding of primitive operators. + +dassm2.scm and dassm3.scm need not be written at first, but they are +useful to debug the assembler (since disassembling some code should +produce code equivalent to the input to the assembler) and compiler +output when you forgot to make it output the LAP. + + 6.7. Write the compiler-building files: + +make.scm, and comp.cbf should be minorly modified copies of the +corresponding files in another port. + +comp.sf and decls.scm can be essentially copied from another port, but +you will need to change the pathnames to refer to your port directory +instead of the one you copied them from, and in addition, you may have +to add or remove instr<n> and other files as appropriate. + + 6.8. Write microcode/cmpaux-port.m4: + +cmpaux.txt documents the entry points that this file must provide. +You need not use m4, but it is convenient to conditionalize the code +for debugging and different type code size. If you decide not to use +it, you should call your file cmpaux-port.s + + 6.8.1. Determine your C compiler's calling convention. Find out what +registers have fixed meanings, which are supposed to be saved by +callees if written, and which are supposed to be saved by callers if +they contain useful data. + + 6.8.2. Find out how C code returns scalars and small C structures. +If the documentation for the compiler does not describe this, you can +write a C program consisting of two procedures, one of which returns a +two-word (two int) struct to the other, and you can examine the +assembly language produced by the compiler. + + 6.8.3. Design how scheme compiled code will invoke the C utilities. +Decide where the parameters (maximum of four) to the utilities will be +passed (preferably wherever C procedures expect arguments), and where +the utility index will be passed (preferably in a C caller-saves +register). + + 6.8.4. Given all this, write a minimalist cmpaux-port.m4. In other +words, write those entry points that are absolutely required +(C_to_interface, interface_to_C, interface_to_scheme, and +scheme_to_interface). Be especially careful with the code that +switches between calling conventions and register sets. +C_to_interface and interface_to_scheme must switch between C and Liar +conventions, while scheme_to_interface must switch the other way. +interface_to_C must return from the original call to C_to_interface. +Make sure that C code always sees a valid C register set and that code +compiled by Liar always sees a valid Scheme register set. + + 6.9. After the preliminary code works: + +Once the compiler is up enough to successfully compile moderately +complex test programs, and the compiled code and the interface have +been tested by running the code, you probably will want to go back and +write the files that were skipped over. In particular, you definitely +should write rulfix.scm and rulrew.scm, and rulflo.scm and the +disassembler if at all possible. + + 7. Building and testing the compiler. + +Once the port files have been written, you are ready to build and test +the compiler. The first step is to build an interpreted compiler and +run simple programs. Most simple bugs will be caught by this. + + 7.1. Re-building scheme. + +You need to build a version of the microcode with the compiled code +interface (portable runtime library) in it. Besides writing +cmpint-port.h and cmpaux-port.m4, you will need to do the following: + +- Copy (or link) cmpint-port.h to cmpint2.h. + +- Modify m.h to use 6-bit-long type tags (rather than the default 8) +if you did not do this when you installed the microcode. If you do +this, you will not be able to load .bin files created with 8 bit type +tags. You can overcome this problem by using the original .psb files +again to regenerate the .bin files. Alternatively, you can use a +version of Bintopsb compiled with 8-bit tags to generate new .psb +files, and a version of Psbtobin compiled with 6-bit tags to generate +the new .bin files. Anotheroption is to bring the compiler up using 8 +bit tags, but you may run out of address space. The simplest way to +specify 6-bit type tags is to add a definition of C_SWITCH_MACHINE +that includes -DTYPE_CODE_LENGTH=6 . Be sure to add any m4 switches +that you may need so that the assembly language will agree on the +number of tag bits if it needs it at all. If your version of m4 does +not support command-line definitions, you can use the s/ultrix.m4 +script to overcome this problem. Look at the m/vax.h and s/ultrix.h +files for m4-related definitions. + ==> We should just switch the default to 6 bits and be done with it. + +- Modify ymakefile to include the processor dependent section that +lists the cmpint-port.h and cmpaux-port.m4 files. You can emulate the +version for any other compiler port. It is especially important that +the microcode sources be compiled with HAS_COMPILER_SUPPORT defined. + +- Remove (or save elsewhere) all the .o files, scheme.touch, and +scheme, the linked scheme microcode. + +- Do ``make xmakefile;make -f xmakefile scheme'' to generate a new +linked microcode. + +Once you have a new linked microcode, you need to regenerate the +runtime system image files even if you have not changed the length of +the type tags. This is done as follows: + +- Re-generate a runtime.com (actually runtime.bin) image file by +invoking scheme with the options ``-large -fasl make.bin'' while +connected to the runtime directory, and then typing + (disk-save "<lib directory pathname>/runtime.com") +at the Scheme prompt. + +- You should probably also generate a runtime+sf.com file by typing + (begin + (cd "<sf directory pathname>") + (load "make") + (disk-save "<lib directory pathname>/runtime+sf.com")) +at the Scheme prompt. + +You also need to have a working version of cref. This can be done by +invoking scheme with the options ``-band runtime+sf.com'', and then +typing + (begin + (cd "<cref directory pathname>") + (load "cref.sf")) +at the Scheme prompt. + +If this errors because of the lack of a ``runtim.glob'' file, try it +again after executing + (begin + (cd "<runtime directory pathname>") + (load "runtim.sf")) + + 7.2. Building an interpreted compiler. + +Once you have a new microcode, compatible runtime system, and ready +cref, you can pre-process the compiler as follows: + +- Copy (or link) comp.pkg, comp.sf, and comp.cbf from the +compiler/machines/port directory to the compiler directory. + +- For convenience, make a link from compiler/machines/port to +compiler/port. + +- Invoke scheme with the ``-band runtime+sf.com'' option, and then +execute + (begin + (cd "<compiler directory pathname>") + (load "comp.sf")) +This will take quite a while, and pre-process some of the files twice. +At the end of this process, you should have a .bin file for each of +the .scm files, a .ext file for some of them, and a bunch of +additional files in the compiler directory (comp.con, comp.ldr, +comp.bcon, comp.bldr, comp.glob, comp.free, comp.cref). + +It is a good idea to look at the comp.cref file. This is a +cross-reference of the compiler and may lead you to find typos or +other small mistakes. The first section of the cref file (labeled +``Free References:'') lists all variables that are not defined in the +compiler or the runtime system. The only variables that should be in +this list are SF, and SF/PATHNAME-DEFAULTING. The ``Undefined +Bindings:'' section lists those variables defined in the runtime +system and referenced freely by the compiler sources. The remainder +of the cref file lists the compiler packages and the cross reference +of the procedures defined by it. + +- Load up the compiler. Invoke scheme with the options +``-compiler -band runtime+sf.com'', and then type + (begin + (cd "<compiler directory pathname>") + (load "port/make") + (disk-save "<lib directory pathname>/compiler.com")) + +You should then be able to invoke the compiler by giving scheme the +``-compiler'' option, and use it by invoking CF. + + 7.3. Testing the compiler. + +There is no comprehensive test suite for the compiler. There is, +however, a small test suite that is likely to catch gross errors. The +files for the test suite are in compiler/etc/tests. Each file +contains a short description of how it can be used. + +Make sure, in particular, that you test the closure code thoroughly, +especially if closure allocation hand-shakes with out-of-line code to +accomodate the CPU's caches. + +A good order to try the test suite in is + + three.scm + expr.scm + pred.scm + close.scm + blast.scm + reverse.scm + arith.scm + bitwse.scm + fib.scm + vector.scm + reptd.scm + lexpr.scm + klexpr.scm + + close2.scm + prim.scm + free.scm + uvuuo.scm + link.scm + uuo.scm + unv.scm + + tail.scm + y.scm + sort/*.scm (see sort/README for a description) + + The programs in the first list test various aspects of code +generation. + The programs in the second list test the handling of various dynamic +conditions (e.g. error recovery). + The programs in the third list are somewhat larger, and register +allocation bugs, etc., are more likely to show up in them. + +A good idea at the beginning is to turn COMPILER:GENERATE-RTL-FILES? +and COMPILER:GENERATE-LAP-FILES? on and compare them for plausibility. +If you have ported the disassembler as well, you should try +disassembling some files and comparing them to the input LAP. They +won't be identical, but they should be similar. The disassembler can +be invoked as follows: + + (compiler:write-lap-file "<pathname of .com file>") ; writes a .lap file. + (compiler:disassemble <compiled entry point>) ; writes on the screen. + +The .lap filename extension is used by COMPILER:WRITE-LAP-FILE and by +the compiler when COMPILER:GENERATE-LAP-FILES? is true, so you may +want to rename the .lap file generated by the compiler to avoid +overwriting it when using COMPILER:WRITE-LAP-FILE. + +Various runtime system files also make good tests. In particular, you +may want to try list.scm, vector.scm, and arith.scm. You can try them +by loading them, and invoking procedures defined in them, but you must +execute + (initialize-microcode-dependencies!) +after loading arith.com and before invoking procedures defined there. + + 7.4. Compiling the compiler. + +The real test of the compiler comes when it is used to compile itself +and the runtime system. Re-compiling the system is a slow process, +that can take a few hours even with a compiled compiler on a fast +machine. Compiling the compiler with an interpreted compiler would +probably take days. + +There are two ways to speed up the process: + +* Cross-compiling: + +If you can access some machines on which the compiler already runs, +you can cross-compile the sources using a compiled compiler. This +method is somewhat involved because you will need binaries for both +machines, since neither can load or dump the other's .bin files. + +Imagine that you have a Vax, and you are porting to a Sparc. You will +need to pre-process and compile the Sparc's compiler on the Vax to use +it as a cross-compiler. This can be done by following the same +pattern that you used to generate the interpreted compiler on the +Sparc, but running everything on the Vax, and then compiling the +cross-compiler on the Vax by running scheme with the ``-compiler'' +option, and typing + + (begin + (cd "<sparc compiler directory>") + (load "comp.cbf") + (disk-restore "runtime+sf.com")) + + ;; After the disk-restore + + (begin + (load "make") + (in-package (->environment '(compiler)) + (set! compiler:cross-compiling? true)) + (disk-save "sparccom.com")) + +to produce a cross-compiler band called "sparccom.com". + +Once you have the cross-compiler, you can use CROSS-COMPILE-BIN-FILE +to generate .moc files. The .moc files can be translated to .psb +files on the Vax. These .psb files can in turn be translated to .moc +files on the Sparc, and you can generate the final .com files by using +CROSS-COMPILE-BIN-FILE-END defined in compiler/base/crsend. +compiler/base/crsend can be loaded on a plain runtime system (i.e. +without SF or a compiler). You will probably find the following +idioms useful: + (for-each cross-compile-bin-file (directory-read "<some dir>/*.bin")) + (for-each cross-compile-bin-file-end (directory-read "<some dir>/*.moc")). + +To translate the original .moc files to .psb files, you should use +microcode/Bintopsb on the Vax as follows: + + Bintopsb ci_processor=?? <foo.moc >foo.psb + +where the value of ci_processor should be the value of +COMPILER_PROCESSOR_TYPE defined in microcode/cmpint-port.h. + +You can then generate the target .moc files by using +microcode/Psbtobin on the Sparc as follows: + + Psbtobin allow_cc <foo.psb >foo.moc + +* Distributing the task over several machines: + +You can use more than one machine to compile the sources. If the +machines do not share a file system, you will have to pre-partition +the job and generate a script for each machine. If the machines share +a (network) file system, you can try to use compiler/etc/xcbfdir. +This file defines two procedures, COMPILE-DIRECTORY, and +CROSS-COMPILE-DIRECTORY, that use a simple-minded protocol based on +creating .tch files to reserve files to compile, and can therefore be +run on many machines simultaneously without uselessly repeating work +or getting in each other's way. + +These two methods are not exclusive. We typically bring up the +compiler on a new machine by distributing the cross-compilation job. + +The compiler and the cross-compiler use a lot of memory while running, +and virtual memory is really no substitute for physical memory. You +may want to increase your physical memory limit on those systems where +this can be controlled (e.g. under BSD use the ``limit'' command). If +your machines don't have much physical memory, or it is too painful to +increase your limit, i.e. you have to re-compile or re-link the +kernel, you may want to use microcode/bchscheme instead of +microcode/scheme. Bchscheme uses a disk file for the spare heap, +rather than a region of memory, putting the available memory to use at +all times. + + 7.5. Compiler convergence testing. + +Once you have a compiled compiler, you should run the same test suite +that you ran with the interpreted compiler. Once you have some degree +of confidence that the compiled compiler works, you should make sure +that it can correctly compile itself and the runtime system. This +re-compilation can manifest second-order compiler bugs, that is, bugs +in the compiler that cause it to compile parts of itself incorrectly +without crashing, so that programs compiled by this +incorrectly-compiled compiler fail even though these programs did not +fail when compiled by the original compiler. + +Of course, you can never really tell if the compiler has compiled +itself successfully. You can only tell that it is not obviously wrong +(i.e. it did not crash). Furthermore, there could be higher-order bugs +that would take many re-compilations to find. However, if the binaries +produced by two successive re-compilations are identical, further +re-compilations would keep producing identical binaries and no +additional bugs will be found this way. Moreover, if the compiler +and system survive a couple of re-compilations, the compiler is likely +to be able to compile correctly most programs. + +To run this compiler convergence test, you need to re-compile the +compiler. In order to do this, you need to move the .com files from +the source directories so that COMPILE-DIRECTORY and +RECOMPILE-DIRECTORY will not skip all the files (they avoid compiling +those already compiled). The simplest way to move all these files is +to type ``make stage1'' at your shell in the source directories +(runtime, sf, cref, and compiler). This command will create a STAGE1 +subdirectory for each of the source directories, and move all the .com +and .binf files there. You can then use compiler/comp.cbf, or +compiler/etc/xcbfdir and RECOMPILE-DIRECTORY to regenerate the +compiler. + +If you generated the stage1 compiled compiler by running the compiler +interpreted, the new .com files should match the stage1 .com files. +If you generated the stage1 compiler by cross-compilation, they will +not. The cross-compiler turns COMPILER:COMPILE-BY-PROCEDURES? off, +while the default setting is on. In the latter case, you want to +generate one more stage to check for convergence, i.e. execute ``make +stage2'' in each source directory, and re-compile once more, at each +stage using the compiler produced by the previous stage. + +Once you have two stages that you think should have identical +binaries, you can use COMPARE-COM-FILES, defined in +compiler/etc/comcmp, to compare the binaries. The simplest way to use +it is to also load compiler/etc/comfiles and then use the CHECK-STAGE +procedure. + + (check-stage "STAGE2" '("runtime" "sf" "compiler/base")) + +will compare the corresponding .com files from runtime and +runtime/STAGE2, sf and sf/STAGE2, and compiler/base and +compiler/base/STAGE2. + +If nothing is printed, the binaries are identical. Otherwise some +description of the differences is printed. COMPARE-COM-FILES does not +check for isomorphism of Scode objects, so any sources that reference +Scode constants (e.g. runtime/advice.scm) will show some differences +that can safely be ignored. Generally, differences in constants can +be ignored, but length and code differences should be understood. The +code in question can be disassembled to determine whether the +differences are real or not. + +While testing the compiler, in addition to checking for the correct +operation of the compiled code, you should also watch out for crashes +and other forms of unexpected failure. In particular, hardware traps +(e.g. segmentation violations, bus errors, illegal instructions) +occurring during the re-compilation process are a good clue that there +is a problem somewhere. + + 8. Debugging. + +The process of porting a compiler, due to its complexity, is unlikely +to proceed perfectly. Things are likely to break more than once while +running the compiler and testing the compiled code. + +Debugging a compiler is not trivial, because often the failures +(especially after a while) will not manifest themselves until days, +weeks, or months after the compiler was released, at which point the +context of debugging the compiler has been swapped out by the +programmer. Second-order compiler bugs do not make things any easier. + +Liar does not have many facilities to aid in debugging. This section +mentions some of the few, and some techniques to use with +assembly-language debuggers (gdb, dbx, or adb). + +The main assumption in this section is that the front end and other +machine-independent parts of the compiler work correctly. Of course, +this cannot be guaranteed, but in all likelihood virtually all of the +bugs that you will meet when porting the compiler will be in the new +machine-specific code. + +If you need to examine some of the front-end data structures, you may +want to use the utilities in base/debug.scm which is loaded in the +compiler by default. In particular, you will want to use PO (for +print-object) to examine compiler data structures, and +DEBUG/FIND-PROCEDURE to map procedure names to the data structures +that represent the procedures, or more correctly, the lambda +expressions. + + 8.1. Preliminary debugging of the compiled code interface. + +The first item of business, after the microcode interface +(cmpaux-port.m4 and cmpint-port.h) has been written, is to guarantee +that properly constructed compiled code addresses do not confuse the +garbage collector. This can be done before writing any of the +remaining files, but you must have rebuilt the microcode and the +runtime.com band. + +A simple test to run is the following: + + (define foo + ((make-primitive-procedure 'COERCE-TO-COMPILED-PROCEDURE) + (lambda (x y) + (+ x y)) + 2)) + + (gc-flip) + (gc-flip) + (gc-flip) + +If the system does not crash or complain, in all likelihood the +garbage collector can now properly relocate compiled code objects. + +This object can also be used to test parts of the compiled code +interface. FOO is bound to a trampoline that will immediately revert +back to the interpreter when invoked. + +The next test is to determine that FOO works properly. You can follow +the execution of FOO by using a debugger and placing breakpoints at + + cmpint.c:apply_compiled_procedure, + cmpaux-port.s:C_to_interface, + cmpaux-port.s:scheme_to_interface (or trampoline_to_interface if it +is written), + cmpint.c:comutil_operator_apply_trap, + cmpint.c:comutil_apply, and + cmpaux-port.s:interface_to_C + +and then evaluating (FOO 3 4). + +When setting the breakpoints, remember that C_to_interface, +scheme_to_interface, and interface_to_scheme are not proper C +procedures, so you should use the instruction-level breakpoint +instructions or formats, not the C procedure breakpoint instructions +or formats. If you are using adb, this is moot, since adb is purely +an assembly-language debugger. If you are using gdb, you should use +``break *&C_to_interface'' instead of ``break C_to_interface''. If +you are using dbx, you will want to use the ``stopi'' command, instead +of the ``stop'' command to set breakpoints in the assembly language +routines. + +Make sure that the arguments to comutil_operator_apply_trap look +plausible and that the registers have the appropriate contents when +going into scheme code and back into C. In particular, you probably +should examine the contents of the registers right before jumping into +the trampoline code, and single step the trampoline code until you get +back to scheme_to_interface. + +In order to parse the Scheme objects, you may want to keep a copy of +microcode/type.names handy. This file contains the names of all the +scheme type tags and their values as they appear in the most +significant byte of the word when type tags are 8 bits long and 6 +bits long. Remember that you may have to insert segment bits into +addresses in order to examine memory locations. + +You should also make sure that an error is signalled when FOO is +invoked with the wrong number of arguments, and that the system +correctly recovers from the error (i.e., it gives a meaningful error +message and an error prompt, and resets itself when you type ^G). + +This test exercises most of the required assembly language code. The +only entry point not exercised is interface_to_scheme. + + 8.2. Debugging the assembler. + +Assuming that the compiler generates correctly formatted compiled code +objects, fasdump should be able to dump them out without a problem. +If you have problems when dumping the first objects, and assuming that +you ran the tests in section 8.1., then in all likelihood the block +offsets are not computed correctly. You should probably re-examine +the rule for the EXTERNAL-LABEL pseudo operation, and the block-offset +definitions in machines/port/assmd.scm. + +Once you can dump compiled code objects, you should test the +assembler. A simple, but somewhat inconvenient way of doing this is +to use adb as a disassembler as follows: + +Scheme binary (.bin and .com) files have a 50 longword header that +contain relocation information. The longword that follows immediately +is the root of the dumped object. + +If COMPILER:COMPILE-BY-PROCEDURES? is false, the compiler dumps a +compiled entry point directly, so the format word for the first entry +is at longword location 53 (* 4 = 0xd4), and the instructions follow +immediately. + +If COMPILER:COMPILE-BY-PROCEDURES? is true, the compiler dumps an +Scode object that contains the first entry point as the ``comment +expression''. The format word for the first entry point is then at +longword location 55 (* 4 = 0xdc), and the instructions for the +top-level block follow immediately. + +Thus, assuming that there are four bytes per Scheme object (unsigned +long in C), and that foo.com was dumped by the compiler with +COMPILER:COMPILE-BY-PROCEDURES? set to false, the following would +disassemble the first 10 instructions of the generated code. + + adb foo.com + 0xd8?10i + +If COMPILER:COMPILE-BY-PROCEDURES? was set to true, the following +would accomplish the same task: + + adb foo.com + 0xe0?10i + +You can use adb in this way to compare the input assembly language to +its binary output. Remember that you can obtain the input assembly +language by using the COMPILER:GENERATE-LAP-FILES? switch. + + 8.3. Setting breakpoints in Scheme compiled code. + +Compiled code is not likely to work correctly at first even after the +compiler stops signalling errors. In general, when you find that +compiled code executes incorrectly, you should try to narrow it down +as much as possible by trying the individual procedures, etc., in the +code, but ultimately you may need the ability to set instruction-level +breakpoints and single-step instructions in compiled code. + +A problem peculiar to systems in which code is relocated on the fly is +that you cannot, in general, obtain a permanent address for a +procedure or entry point. The code may move at every garbage +collection, and if you set a machine-level breakpoint with a Unix +debugger, and then the code moves, you will probably get spurious +traps when re-running the code. Unix debuggers typically replace some +instructions at the breakpoint location with instructions that will +cause a specific trap, and then look up the trapping location in some +table when the debugged process signals the trap. + +One way around this problem is to ``purify'' all compiled scheme code +that you will be setting breakpoints in. If you purify the code, it +will move into ``constant space'' and remain at a constant location +across garbage collections. The PURIFY procedure expects an object +to purify as the first argument, a boolean flag specifying whether the +object should be moved into constant space (if false) or pure space +(if true) as a second argument, and a boolean flag specifying whether +purification should occur immediately (if false) or be delayed until +the next convenient time (if true) as a third argument. You should +use + + (purify <object> false false) + +when moving compiled code objects to constant space for debugging +purposes. Alternatively, you can specify that you want the code to be +purified when you load it by passing appropriate arguments to LOAD. +Since load delays the actual purification, you will need to invoke +GC-FLIP twice to flush the purification queue. + +At any rate, setting the actual breakpoints is not completely trivial, +since you must find the virtual address of the instructions, and then +use them with your assembly-language debugger. + +The simplest way to do this is to get Scheme to print the datum of +the entry points for you, and then type one of Scheme's interrupt +characters to gain the debugger's attention and set the breakpoint. +Continuing from the debugger will allow you to type further +expressions to Scheme. + +Imagine, for example, that we have compiled the runtime file list.scm +and some of the procedures in it, namely MEMBER and MEMQ, do not work +properly. After purifying the code, you can type ``memq'' at the +read-eval-print loop and it will respond with something like + ;Value 37: #[compiled-procedure 37 ("list" #x5A) #x10 #x10FE880] + +This specifies that MEMQ is bound to an ordinary compiled procedure +(not a closure), that it was originally compiled as part of file +``list'' and it was part of compiled code block number 90 (= #x5a) in +that file. The current datum of the object is #x10FE880 (this is the +address without the segment bits if any), and the offset to the +beginning of the containing compiled code block is #x10. Thus you +could then gain the attention of the debugger and set a breakpoint at +address 0x10fe880 (remember to add the segment bits if necessary) and +after continuing back into Scheme, use MEMQ normally to trigger the +breakpoint. + +The case with MEMBER is similar. Typing ``member'' at the +read-eval-print loop will cause something like + ;Value 36: #[compiled-closure 36 ("list" #x56) #x5C #x10FE484 #x1180DF8] +to be printed. + +This specifies that MEMBER is bound to a compiled closure, originally +in compiled code block number 86 (= #x56) in file ``list'', that the +entry point to the closure is at datum #x1180DF8, that the entry point +shared by all closures of the same lambda expression (the ``real'' +entry point) is at datum #x10FE484, and that this entry point is at +offset #x5C of its containing compiled code block. + +Thus if you want to single step the closure code (a good idea when you +try them at first), you would want to set a breakpoint at address +#x1180DF8 (plus appropriate segment bits), and if you want to single +step or examine the real code, then you should use address #x10FE484. +If you purified the code when you loaded it, the real code would be +pure, but the closure itself would not be, since it was not a part of +the file being loaded (closures are created dynamically). Thus, +before setting any breakpoints in a closure, you should probably +purify it as specified above, and obtain its address again, since it +would have moved in the meantime. + +For example, if you are using adb on an HP-PA (where the top two bits +of a data segment address are always 01, and thus the top nibble of a +Scheme's object address is always 4), assuming that the interpreter +printed the above addresses, + +0x41180df8:b would set a breakpoint in the MEMBER closure, +0x410fe484:b would set a breakpoint at the start of the code shared + by MEMBER and all closures of the same lambda expression, +0x410fe880:b would set a breakpoint at the start of MEMQ. + +If you are using gdb on a Motorola MC68020 machine, with no segment bits +for the data segment, the equivalent commands would be + +break *0x1180df8 for a breakpoint in the MEMBER closure, +break *0x10fe484 for a breakpoint in MEMBER's shared code +break *0x10fe880 for a breakpoint in MEMQ. + +If you are using dbx, you will need to use a command like + ``stopi at 0x10fe484'' to achieve the same effect. + + 8.4. Examining arguments to Scheme procedures. + +Commonly, after setting a breakpoint at some interesting procedure, +you will want to examine the arguments. Currently, Liar passes all +arguments on the stack. The Scheme stack always grows towards +decreasing addresses, and arguments are pushed from left to right. + +On entry to a procedure, the stack frame must have been reformatted so +that optional arguments have been defaulted, and tail (lexpr) +arguments have been collected into a list (possibly empty). Thus on +entry to an ordinary procedure's code the stack pointer points to the +rightmost parameter in the lambda list, and the rest of the parameters +follow at increasing longword addresses. + +This is also the case on entry to a closure object's instructions, but +by the time the shared code starts executing the body of the lambda +expression, the closure object itself (or its canonical +representative) will be on top of the stack with the arguments +following in the standard format. On entry to a closure's shared +code, the stack will contain the arguments in the standard format, but +the closure object will typically be in the process of being +constructed and pushed, depending on the distribution of the task +between the instructions in the closure object, the optional helper +instructions in assembly language, and the instructions in the closure +header. + +If you are using adb, you can use the following commands: + $r displays the names and contents of the processor + registers, + <r23=X displays the contents of the r23 register in hex, + 0x4040c/4X displays four longwords in memory at increasing + addresses starting with 0x4040c. + +When using gdb, you can achieve the same effect by using the following +commands: + info reg displays the names and contents of the processor + registers, + p/x $gr23 displays the contents of processor register gr23, + x/4wx 0x4040c displays four longwords in memory starting at + address 0x4040c. + +If you are using dbx, you can use the following commands: + print $l4 to display the contents of processor register l4, + 0x4040c/4X to display four longwords in memory starting at + address 0x4040c. + + 8.5. Tracing the call stack. + +Procedures compiled by Liar receive additional implicit arguments used +to communicate the lexical environment and the implicit continuation. +Most procedures receive a return address argument that points to the +procedure that is waiting for the one at hand to finish. Some +procedures receive a static link argument. Static links are pointers +into other frames in the stack, used to thread together environments +when the distance between lexically nested frames cannot be statically +computed. Some procedures receive a dynamic link argument. Dynamic +links are pointers to return addresses on the stack, and are used when +the compiler cannot determine statically the distance on the stack +between the last argument and the return address for a procedure. +Dynamic links are rarely needed, and static links are needed only +somewhat more frequently. Only one of the programs in the test suite +uses dynamic and static links, and it was carefully constructed to +make the compiler generate code that uses them. All externally +callable procedures, including all those ``closed'' at top level, +expect return addresses and no other links. + +In general, it is impossible to find out what procedure called another +in Scheme, due to tail recursion. Procedures whose last action is a +call to another procedure, and whose local frames are not part of the +environment of their callees, will have their frames popped off the +stack before the callee is entered, and there will be no record left +of their execution. The interpreter uses a cache of previous frames +(called the history) to provide additional debugging information. + +On the other hand, most calls are not in tail position, and a return +address will be ``passed'' to the callee indicating who the caller +was. Occasionally the static and dynamic links will have to be traced +to find the return address, but this is rare. + +The following assumes that we are dealing with an ordinary procedure +that expects a return address. + +The return address is passed on the stack, immediately below the +leftmost argument. It is a compiled-code entry object whose datum is +the encoded address of the instruction at which the ``caller'' should +be reentered. If the procedure was called from the interpreter, the +``caller'' will be a special trampoline (return_to_interpreter) that +will give control back to the interpreter by using the compiled code +interface. There is a little hanky panky in the interpreter to +guarantee that interpreted code ``tail recursing'' into compiled code +and viceversa will not push spurious return_to_interpreter return +addresses and return_to_compiled_code interpreter return codes that +would break proper tail recursion, but you need not concern yourself +with this. + +If your debugger allows you to dynamically call C procedures, and it +is not hopelessly confused by the Scheme register set, you can use the +C procedure ``compiled_entry_filename'' to determine the filename that +a return address (or other compiled entry) belongs to. Its only +argument should be the compiled entry object whose origin you want to +find. Unfortunately, debuggers are often confused by the register +assignments within Scheme compiled code, precisely when you need them +most. You can bypass this problem the following way: + +On entry to a procedure whose return address you wish to examine, +write down the return address object, change the compiled code's +version of MemTop so that the comparison with free will fail and the +code will take an interrupt, set a breakpoint in the runtime library +routine ``compiler_interrupt_common'', and continue the code. When +the new breakpoint is hit, you can use ``compiled_entry_filename'' to +examine the return address you had written down. + +Here is how to do this the hard way, which you will have to resort to +often: + +Compiled code blocks generated by the compiler always encompass two +special locations. The last location in the compiled code block +contains the environment where the compiled code block was loaded +(after the code has been evaluated). The immediately preceding +location contains a debugging information descriptor. This descriptor +is either a string, namely the name of the file where the compiler +dumped the debugging information for the block, or a pair whose car is +the name of the debugging information filename and whose cdr is the +block number (a fixnum) of the compiled code block in the file, or the +debugging information itself if the compiled code block was generated +in core and not dumped. + +Given that the word immediately preceding an external entry point in +the compiler always contains a gc-offset from the entry point to the +first word of the compiled code block (i.e. the vector header), or to +another external entry point if the distance is too large, it is a +simple, but involved, matter to find this debugging information. Note +that all return addresses, full closures, and top-level procedures are +external entry points. + +For example, imagine that the return address for a procedure is +0xa08fe2ee. Furthermore, assume that we are running on a Motorola +MC68020 with four bytes per longword, no segment bits, and for which +cmpint-mc68k.h defines PC_ZERO_BITS to be 1. + +Extracting the word at address 0x8fe2ea (four bytes before the entry +point), will yield the format longword, that consists of the format +field in its high halfword, and the encoded gc offset field in the +lower halfword. + +The gdb command ``x/2hx 0x4fc564'' will print these two halfwords, +and let's assume that the output is + 0x8fe2ea <fpa_loc+10445546>: 0x8280 0x003e + +The adb (and dbx) command ``0x4fc564/2x'' should yield similar output. + +This confirms that the object in question is a return address because +the most significant bit of the format word is a 1, and it would be a +0 for a procedure. The encoded gc offset is 0x3e. GC offsets and +format words are described in detail in cmpint.txt. + +Since the least significant bit of the GC offset is 0, it points +directly to the vector header of a compiled code block. The real +offset is + ((0x3e >> 1) << PC_ZERO_BITS) = 0x3e + +Thus the compiled code block starts at location + 0x8fe2ee-0x3e = 0x008fe2b0 + +Examining the top two words at this address (using the gdb command + ``x/2wx 0x008fe2b0'' or the adb and dbx command ``0x8fe2b0/2X'') we +see + 0x8fe2b0 <fpa_loc+10445488>: 0x00000028 0x9c00001d + +The first word is an ordinary vector header, and the second a +non-marked vector header used to inform the GC that 0x1d longwords of +binary data, the actual instructions, follow. + +The last location in the vector, containing the environment, is at +address + 0x8fe2b0+4*0x28 = 0x8fe350 + +Examining the preceding adjacent location and this one (using gdb's + ``x/2wx 0x8fe34c'' or a similar command for a different debugger) +will yield + 0x8fe34c <fpa_loc+10445644>: 0x0498ef9c 0x4898e864 + +The second object is the loading environment, and the first object is +the debugging information, in this case a pair. This pair can be +examined (using gdb's ``x/2wx 0x98ef9c'' or an analogous command for a +different debugger) to yield + 0x98ef9c <fpa_loc+11038620>: 0x789ac5ec 0x6800001c + +The first object is a string, and the second a fixnum, indicating that +the return address at hand belongs to the compiled code block numbered +0x1c in the file whose name is that string. + +Scheme strings have two longwords of header, followed by an ordinary C +string that includes a null terminating character, thus the C string +starts at address 0x9ac5ec+4*2=0x9ac5f4, and the gdb command + ``x/s 0x9ac5f4'' or the adb and dbx command ``0x9ac5f4/s'' +will display something like: + 0x9ac5f4 <fpa_loc+11159028>: + (char *) 0x9ac5f4 "/usr/local/lib/mit-scheme/SRC/runtime/parse.binf" + +Thus the return address we are examining is at offset 0x3e in compiled +code block number 0x1c of the runtime system file ``parse.com''. + +If the disassembler is available, you can then use + (compiler:write-lap-file "parse") +to find out what this return address is, or if you compiled (or +re-compile) parse.scm generating lap files, you can probably guess +what return address is at offset 0x3e (the input lap files do not +contain computed offsets, since these are computed on final assembly). + +This interaction would remain very similar for other machines and +compiled entry points, given the same or similar debuggers. The +variations would be the following: + +- Segment bits might have to be added to the object datum components +to produce addresses. For example, on the HP-PA with segment bits 01 +at the most significant end of a word, the C string for Scheme string +object 0x789ac5ec would start at address 0x409ac5ec+8=0x409ac5f4, +instead of at address 0x9ac5ec+8=0x9ac5f4. + +- The gc offset might be computed differently, depending on the value +of PC_ZERO_BITS. For example, on a Vax, where PC_ZERO_BITS has the +value 0, an encoded offset of value 0x3e would imply a real offset of +value (0x3e >> 1)=0x1f. On a MIPS R3000, where PC_ZERO_BITS is 2, the +same encoded offset would encode the real offset value + ((0x3e >> 1) << 2)=0x7c. In addition, if the low order bit of the +encoded gc offset field were 1, a new gc offset would have to be +extracted, and the process repeated until the beginning of the block +was reached. + +- The constant offsets added to various addresses (e.g. that added to +a string object's address to obtain the C string's address) would vary +if the number of bytes per Scheme object (sizeof (unsigned long)) in C +were not 4. + +- Not all compiled entry points have debugging information descriptors +accessible the same way. Trampolines don't have them at all, and +closures have them in the shared code, not in the closure objects. To +check whether something is a trampoline, you can check the format +field (most trampolines have 0xfffd) or verify that the instructions +immediately call the compiled code interface. Closure objects have +type code MANIFEST-COMPILED-CLOSURE instead of MANIFEST-VECTOR in the +length word of the compiled code block. Once you obtain the real +entry point for a closure, you can use the same method to find out the +information about it. + + 8.6. Things to watch out for. + +The worst bugs to track are interrupt and garbage-collection related. +They will often make the compiled code crash at seemingly random +points, and are very hard to reproduce. A common source of this kind +of bug is a problem in the rules for procedure headers. Make sure +that the rules for the various kinds of procedure headers generate the +desired code, and that the desired code operates correctly. You can +test this explicitly by using an assembly-language debugger to set +breakpoints at the entry points of various kinds of procedures. When +the breakpoints are reached, you can bump the Free pointer to a value +larger than MemTop, so that the interrupt branch will be taken. If +the code continues to execute correctly, you are probably safe. You +should especially check procedures that expect dynamic links for these +must be saved and restored correctly. Closures should also be tested +carefully, since they need to be reentered correctly, and the closure +object on the stack may have to be de-canonicalized. Currently +C_to_interface and interface_to_scheme must copy the interpreter's +value register into the compiler's value register, and must extract +the address of this value and store it in the dynamic link register. + +Register allocation bugs also manifest themselves in unexpected ways. +If you forget to use NEED-REGISTER! on a register used by a LAPGEN +rule, or if you allocate registers for the sources and target of a +rule in the wrong order (remember the cardinal rule!), you may not +notice for a long time, but some poor program will. If this happens, +you will be lucky if you can find and disassemble a relatively small +procedure that does not operate properly, but typically the only +notice you will get is when Scheme crashes in an unrelated place. +Fortunately, this type of bug is reproducible. In order to find the +incorrectly compiled code, you can use binary search on the sources by +mixing interpreted and compiled binaries. When loading the compiler, +.bin files will be used for those files for which the corresponding +.com file does not exist. Thus you can move .com files in and out of +the appropriate directories, reload, and test again. Once you +determine the procedure in which the bug occurs, re-compiling the +module and examining the resulting RTL and LAP programs should lead to +identification of the bug. + + 9. Bibliography + +1. "Efficient Stack Allocation for Tail-Recursive Languages" by Chris +Hanson, in Proceedings of the 1990 ACM Conference on Lisp and +Functional Programming. + +2. "Free Variables and First-Class Environments" by James S. Miller +and Guillermo J. Rozas, in Lisp and Symbolic Computation, 4, 107-141, +1991, Kluwer Academic Publishers. + +3. "MIT Scheme User's Manual for Scheme Release 7.1" by Chris Hanson, +distributed with MIT CScheme version 7.1. + +4. "MIT Scheme Reference Manual for Scheme Release 7.1" by Chris +Hanson, distributed with MIT CScheme version 7.1. + +5. "Taming the Y Operator" by Guillermo J. Rozas, in Proceedings of +the 1992 ACM Conference on Lisp and Functional Programming. + + A.1. MIT Scheme package system + +The MIT Scheme package system is used to divide large programs into +separate name spaces which are then ``wired together''. A large +program, like the runtime system, Edwin, or the compiler, has many +files and variable names all of which must exist at the same time +without conflict. The package system is a prototype system to +accomplish this separation, but will probably be replaced once a +better module system is developed. + +Currently, each package corresponds, at runtime, to a Scheme +environment. Environments have their usual, tree-shaped structure, +and packages are also structured in a tree, but the trees need not be +isomorphic, although they often are. + +Each package is given a name, e.g.: + + (compiler reference-contexts) + +whose corresponding environment can be found using the procedure +->ENVIRONMENT: + + (->environment '(compiler reference-contexts)) ;; Call this CR + +By convention, this package corresponds to an environment below the + + (->environment '(compiler)) ;; Call this C + +environment, and therefore CR contains all variables defined in C, as +well as those specifically defined in CR. The package name ``()'' +corresponds to the system global environment. + +The package structure for the compiler is defined in the file + + <compiler-directory>/machines/<machine-name>/comp.pkg + +In that file, each package has a description of the form: + + (define-package <NAME> + (files <FILES>) + (parent <PACKAGE-NAME>) + (export <PACKAGE-NAME> <VARIABLES>) + (import <PACKAGE-NAME> <VARIABLES>)) + +where <FILES> are the names of the files that should be loaded in to +package <NAME>. + + (parent <PACKAGE-NAME>) + +declares the package whose name is <PACKAGE-NAME> to be the parent +package of <NAME>. Lexical scoping will make all variables visible in +<PACKAGE-NAME> also visible in <NAME>. + +The EXPORT and IMPORT declarations are used to describe cross-package +links. A package may export any of its variables to any other +package using EXPORT; these variables will appear in both packages +(environments), and any side effect to one of these variables in +either package will be immediately visible in the other package. + +Similarly, a package may import any of another package's variables +using IMPORT. Any number (including zero) of IMPORT and EXPORT +declarations may appear in any package declaration. + +Here is an example package declaration, drawn from the compiler: + + (define-package (compiler top-level) + (files "base/toplev" + "base/crstop") + (parent (compiler)) + (export () + cf + compile-bin-file + compile-procedure + compile-scode + compiler:reset! + cross-compile-bin-file + cross-compile-bin-file-end) + (export (compiler fg-generator) + compile-recursively) + (export (compiler rtl-generator) + *ic-procedure-headers* + *rtl-continuations* + *rtl-expression* + *rtl-graphs* + *rtl-procedures*) + (export (compiler lap-syntaxer) + *block-label* + *external-labels* + label->object) + (export (compiler debug) + *root-expression* + *rtl-procedures* + *rtl-graphs*) + (import (runtime compiler-info) + make-dbg-info-vector) + (import (runtime unparser) + *unparse-uninterned-symbols-by-name?*)) + +The read-eval-print loop of Scheme evaluates all expressions in the +same environment. It is possible to change this environment using the +procedure GE, e.g.: + + (ge (->environment '(compiler top-level))) + +To find the package name of the current read-eval-print loop +environment, if there is one, evaluate: + + (pe) + +The package system is currently completely static; it is difficult to +create packages and wire them together on the fly. If you find that +you need to temporarily wire a variable to two different environments +(as you would do with an IMPORT or EXPORT declaration), use the +procedure ENVIRONMENT-LINK-NAME: + + (environment-link-name <TO-ENVIRONMENT> + <FROM-ENVIRONMENT> + <VARIABLE-SYMBOL-NAME>) + +For example, to make WRITE-RESTARTS, originally defined in the +(runtime debugger) package, also visible in the (edwin debugger) +package, evaluate: + + (environment-link-name (->environment '(edwin debugger)) + (->environment '(runtime debugger)) + 'write-restarts) diff --git a/v8/src/compiler/documentation/safety.txt b/v8/src/compiler/documentation/safety.txt new file mode 100644 index 000000000..c037825b5 --- /dev/null +++ b/v8/src/compiler/documentation/safety.txt @@ -0,0 +1,226 @@ +-*- Text -*- + +$Header: /Users/cph/tmp/foo/mit-scheme/mit-scheme/v8/src/compiler/documentation/safety.txt,v 1.1 1995/08/05 16:25:13 adams Exp $ + + COMPILER SAFETY INFORMATION + Liar versions 4.77 and later + +This article describes how to control the compilation process in order +to achieve the desired mix of safety, debuggability, and speed. + +The task of the native-code compiler is to translate a source (or +Scode) program into the native machine language in order to make the +program run faster than when interpreted. + +Although a straight-forward translation speeds the program +significantly, much of the achievable performance comes from +optimizations that the compiler can perform after statically analyzing +the program text. There is a limit, however, to the extent of the +information that can be collected statically, and, in order to achieve +higher performance (often desired, occasionally necessary), the +compiler can be directed to assume additional information that is not +apparent after analyzing the program text. + +Compilation switches are (global) variables whose value when the +compiler is run determines how the compilation proceeds. Some of the +switches provide information that cannot be deduced statically and +allow the relaxation of some runtime consistency checking and the +collection of information to be displayed when an error is detected +and signalled. Relaxing the runtime constraints often makes the +generated code smaller and faster, but may cause problems if the +program being compiled has not been fully debugged, or is invoked with +inappropriate arguments at run time. + +Safety (correctness) can primarily be compromised by eliminating +checks that the program should perform at runtime. These checks +are divided into a few categories: + +- Heap availability checks. Programs need to invoke the storage +manager (garbage collector) when they need more memory than is +available. Each time that storage is needed, its availability should +be checked. If this is not done, the system may be damaged. + +- Stack availability checks. Storage is divided into a heap used to +allocate objects with indefinite extent, and a stack used for +procedure call frames with dynamic extent +(call-with-current-continuation copies the stack when invoked). +Availability of storage must be checked in the appropriate region. A +very deep recursion may cause the stack to overflow, and this +condition must be checked in order to avoid overwriting other regions +of memory. + +- Type checks. Scheme is a strongly (albeit dynamically) typed +language. Operations are only defined on certain types of objects, +and a program is in error if it attempts to operate on the wrong type +of data. + +- Range checks. The type of some arguments to a procedure may be +correct, but there may be further restrictions on them which may not +be satisfied. For example, vector and string indices must be +non-negative integers smaller than the length of the vector or string, +filenames represented as strings must denote existing files with the +appropriate protection when the files are going to be opened for +reading, etc. + +These checks obviously require some code, when compared to the code +that could be generated assuming that no violations will occur at +runtime. This code requires space, and time to execute, but +furthermore, may cause other performance degradation with respect to +the version where no violations are guaranteed to occur. This +additional performance degradation arises because the compiler is +prevented from making better register assignments or reusing the +results of previous computations. + +For a translation to be safe, ie. completely correct, all these checks +must be performed at runtime except in those situations when the +compiler can prove that violations cannot occur at runtime. These +situations are very rare, so for most programs, most checks would be +included in the code generated by the compiler. + +The MIT Scheme compiler treats each of these consistency checks as +follows: + +- Heap availability checks. Heap availability is currently not checked +on every allocation, but instead is checked when allocating large +blocks of storage, and otherwise checked frequently, typically on +entry to procedures and continuations. The storage manager reserves a +block of storage past the end of the logical end of storage in order +to allow this scheme to work. This scheme is, however, unsafe. It is +possible, but unlikely, to write programs that, after being compiled, +will overflow the heap and cause the system to crash at runtime. The +current heuristic has not being observed to fail, but future versions +of the compiler will improve matters by allowing more careful code +generation, and/or limiting the amount of allocation between checks to +the size of the storage manager's overflow buffer. + +- Stack availability checks: Stack availability is currently not +checked at all by compiled code. A very deep or infinite recursion +will cause the system to crash. This WILL be fixed in the near +future. + +- Type checks and range checks: A Scheme program can be considered to +be a set of calls to primitive operations and some higher-level glue +that pieces them together. The higher-level glue does not directly +manipulate objects, but instead passes them around to the various +primitives in a controlled fashion. Thus type and range checks are +not needed in the higher-level glue, but only in the primitives +themselves. There are various switches that control how primitives +are treated by the compiler, and they provide the main form of user +control of the safety of compiled code. + + Control of the open coding (in-lining) of primitives + +Primitives may be open-coded or called out of line. The out-of-line +versions are safe, ie. they perform all pertinent consistency checks. +The compilation switches listed below control how the primitives are +open coded. + +Some important considerations: + +- Under all possible settings of the switches described below, any +generated code corresponding to a primitive call, whether open coded +or not, will operate correctly on correct inputs. + +- If the compiler does not know that the operator of a combination is +a primitive procedure, it will not open code it. In particular, if +the compiler does not know that a variable is bound to a particular +primitive procedure, no combinations with that variable as the +operator will be open coded. Usually the compiler is informed of such +constant bindings by making use of declarations like +USUAL-INTEGRATIONS. See the documentation for sf for additional +information on declarations. + +- The compiler will not make an unsafe program safe, ie. safe +translation does not compensate for unsafe programs. + +This article describes whether and when the translation of the program +into native code will reduce the safety of the program (as compared to +the interpreted version), but there is no realistic way to increase +its safety. A program may be inherently unsafe if it uses inherenty +unsafe primitives inappropriately. + +Some primitives of the MIT Scheme system are inherently unsafe. They +are used for system maintenance and low-level system operation, but, +like everything else in the system, they are available to users. +Their use should be avoided except in rare occasions. Using them +arbitrarily may cause the system to crash, or worse, damage it in +subtle ways that will produce spurious wrong results or later crashes. +There is nothing the compiler can effectively do to prevent this, +since any other action might change the meaning of the program on +correct inputs. + +- The switches listed below are not orthogonal. Their meaning +sometimes depends on the settings of the other switches. + +The following compilation switches affect the open-coding of +primitives: + + + COMPILER:OPEN-CODE-PRIMITIVES? + +This N-ary switch can take several values as described below. Two of +the values (true and false) are booleans, the rest symbols. + +Note that if a primitive call is open coded when a switch setting is +used, it will also be open coded with settings that appear below in +the list. + +The possible values for this switch are: + +-- false: No primitive calls are open-coded. All primitives are +called out-of-line and the code is fully safe. + +-- CORRECT: Open code only those primitive calls whose corresponding +code is always correct, and therefore safe. + +-- INNOCUOUS: Open code primitive calls whose corresponding code is +correct when given appropriate arguments, and will not crash +immediately when given inappropriate arguments. Primitive calls may +return values when they should have signalled an error, but the values +returned are relatively innocuous: they are guaranteed to be valid +Scheme objects. The overall program or the system may still fail, +since these incorrect values may cause the program to take the wrong +branches later and end up in unsafe or unexpected code that it would +never have executed had the errors been signalled. Damage to the +system is unlikely. + +-- ALLOW-READS: Open code even if arbitrary memory locations may be +read with inappropriate arguments. This may cause a memory trap if +the location is read-protected by the Operating System, or the +resulting address is not valid (eg. not aligned properly), and may +cause the garbage collector or other parts of the program and system +to crash if the data stored at the location read is not a valid object +but looks like one. If the extracted data is only used temporarily +and never stored in long living data structures or environments, +damage to the system is unlikely. + +-- ALLOW-WRITES: Open code even if arbitrary memory locations may be +written. This may cause an immediate failure if the location is not +writable, or other problems if the integrity of some data is destroyed +causing (often obscure) errors or crashes later. + +-- true: open code all primitive calls (that the compiler is capable of +open-coding) without regard for safety. + + COMPILER:GENERATE-TYPE-CHECKS? + COMPILER:GENERATE-RANGE-CHECKS? + +These boolean switches control whether type or range checks should be +issued. The code generated is longer and slower when they are. Note +that a primitive call that would not fall in the CORRECT setting of +COMPILER:OPEN-CODE-PRIMITIVES? if these checks where not issued, might +very well fall in it when they are. For most intents and purposes, +turning both of these switches on bumps COMPILER:OPEN-CODE-PRIMITIVES? +to ALLOW-WRITES unless it is false. + + + COMPILER:PRIMITIVE-ERRORS-RESTARTABLE? + +This boolean switch controls how errors will be signalled if they are +detected at runtime due to incorrect arguments found by checks in the +open coding of primitive calls. If set to true, the code will be +longer and slower, but will provide the maximum amount of debugging +information, and in addition, the primitive call may be bypassed and +the computation restarted as if it had completed successfully. If set +to false, the code may be noticeably smaller and faster, but there may +be less debugging information and some restarting ability may be lost. diff --git a/v8/src/compiler/documentation/todo.txt b/v8/src/compiler/documentation/todo.txt new file mode 100644 index 000000000..37850cc02 --- /dev/null +++ b/v8/src/compiler/documentation/todo.txt @@ -0,0 +1,158 @@ + + + Things left to do before releasing the compiler + +* Type checking for inline coded primitives. + +* Implement debugging features. Solve absolute pathname problems. + + + + Items that have been processed + + +* Bug in definitions in IC code: if there is a GC during the +definition, the ENV register in the compiler's register block is not +copied, so the next time it is referenced it is in oldspace. One +possible fix is to rewrite definitions as calls to the primitive. The +other non-caching environment operations may have the same problem, +this should be looked into. + +Ordinary primitives should NOT have this problem since they supposedly +push the environment into the continuation and then restore it on +invocation of the continuation. + +FIXED: The compiler no longer uses the "short" definition hook by +default. It actually calls the primitives. -- Jinx + + +* Should notify users that the framing of IC blocks will be changed by +the rewrite rules for such things as disjunctions and internal +definition for value. + +FIXED: I don't think this is true by default any more. The rewriting +of first class environment code in the front end preserves framing +while allowing better code generation. -- Jinx + + +* Update "rtlgen/rgproc". + +* Write method for `unassigned-test' in "rtlgen/rgrval". + +* Write `make-rtl-continuation', `make-rtl-expr', and +`make-rtl-procedure'. + +* `Temporary' objects are used in rgcomb, rgrval, and rgstmt. +Change this code to use pseudo-registers. + +* "rgretn" refers to `pop-all-stack-frames', which is not +written. + +* "rgraph" collects continuations in the current rgraph object. Is +this still what we want to do? If so, somebody must do the +accumulation. + +* Subproblem redesign: Attempt to change fggen so that there +are two kinds of subproblems -- those that explicitly invoke +continuations, and those that do not. These correspond to "canonical" +and "rvalue" subproblems, respectively. Delay the consing of the +continuation objects associated with the subproblem until it is known +that it must be canonical. This introduces a problem: the +"subproblem-register" and "subproblem-type" will now be undefined on +"rvalue" subproblems, and these concepts must be generalized to work +in this new context. Also, the operation "set-continuation/rtl!" is +used on subproblems, and must be changed in this case. All of these +problems have to do solely with the RTL generator. + +* Separate applications from their subproblems. Create a new +node type "parallel" which contains the subproblems. Doubly link the +parallel node to the application node so we get the same relationship +as at present. Then, during subproblem ordering, edit the CFG to +place the application node in the correct place, which normally will +be in one of the continuations of one of the subproblems. + +Note that this implies a somewhat complicated CFG edit. + +* Note that after a continuation's CFG has been edited (e.g. +using continuation/next-hooks), the value of continuation/scfg is no +longer correct. This is because it is not updated. It's not obvious +what should be done here. + +There is no good reason to keep the scfg of a continuation around. A +properly formed continuation (or procedure, either) has no +"next-hooks" in its body since all of the exit points are +applications. Also, the only kinds of continuations that we want to +glue anything to are those whose bodies are fg-noop nodes whose "next" +is not yet connected. If we try to glue to anything else, it is an +error. + +* Rewrite rule for LAMBDA body screws up mutual recursion if +the body contains any non-constant-valued definitions. The LET which +is created should be rewritten so that it goes around the LETREC +bindings rather than inside them. + +* Change RTL generator to pass "offset" value around explicitly. + +* Flush JOIN blocks as these will no longer be used. + +* Be more careful about the code generation for applications whose +operators are "simple". If a program is known to be a loop, then both +the call and return for that loop will generate links in the RTL +graph, causing a real loop to appear in the graph. Later passes of +the compiler are assuming that there are no loops! + +Right now only "simple" return statements are turned into links, but +it is desirable to convert "simple" call statements as well, provided +that they aren't loops. A simple heuristic that wins is to only +convert calls who are both "simple" and whose operator is not called +from elsewhere. This will optimize LET, the most important case, +without introducing loops. + +Unfortunately this is not easy to do in RTL because of the invocation +prefixes: prefixes other than NULL require some extra work at the call +point. Unfortunately the prefixes are needed to make particular +invocations work right, e.g. `rtl:make-invocation:lookup'. Probably +should eliminate the "prefix" concept for all except those invocations +that need it, replacing prefixes by some explicit code to perform the +action required. + +For now: implement fall-through in the LAP generator, by noticing it +at linearization time. + +* Try to rewrite `invocation-prefix/erase-to' in "rtlgen/rgcomb" to +use the `block-stack-link'. + +* I'm not convinced that the operator class analysis is useful any +more. This should be checked out and flushed if desirable. + +* Update the references to `make-vector-tag' in $zfront/rcseht and +$zfront/rcserq to have the extra argument. + +* Write `combination/inline?' and the primitive inlining code. + +* The environment register is not being saved in the continuation of +a subproblem call from an IC procedure. + +* Some memoization is desirable for the entry nodes on SIMPLE +continuations, because they are being generated explicitly. + +* Probably the computations involving `lvalue/source-set' want to be +made more efficient. It's also possible that they will be computed +more than once. + +* CSE will have to be changed back to do modelling of the stack again. + +* Change handling of dynamic links so that the link register is saved +when calling an unknown place, and is assumed to contain nothing at +external entries. The simplest implementation of this is to assume +that nothing is in the link register at external entries, and to save +it on calls to external procedures. Later we can optimize it better. +This strategy allows us to coexist with the current compiled code. + +* Implement the data-structure discarding code. + +* The call and return code is not taking into account the cases where +continuations are closed in IC blocks. This may complicate things +somewhat. I'd prefer to leave this until I can see some output. + +* Implement open-coding for `vector'. -- 2.25.1