From f3cef11eff02bdb54789e7fe82631c3e1e7830de Mon Sep 17 00:00:00 2001 From: "Guillermo J. Rozas" Date: Tue, 5 Mar 1991 20:54:36 +0000 Subject: [PATCH] Merge in some of Arthur's and Markf's comments. Tag the rest. --- v7/src/compiler/documentation/porting.guide | 123 +++++++++++++------- 1 file changed, 78 insertions(+), 45 deletions(-) diff --git a/v7/src/compiler/documentation/porting.guide b/v7/src/compiler/documentation/porting.guide index 69e2d811d..d7ba7061f 100644 --- a/v7/src/compiler/documentation/porting.guide +++ b/v7/src/compiler/documentation/porting.guide @@ -1,6 +1,6 @@ Emacs: Please use -*- Text -*- mode. Thank you. -$Header: /Users/cph/tmp/foo/mit-scheme/mit-scheme/v7/src/compiler/documentation/porting.guide,v 1.16 1991/03/01 02:06:56 jinx Exp $ +$Header: /Users/cph/tmp/foo/mit-scheme/mit-scheme/v7/src/compiler/documentation/porting.guide,v 1.17 1991/03/05 20:54:36 jinx Exp $ Copyright (c) 1991 Massachusetts Institute of Technology @@ -43,6 +43,12 @@ Text tagged by ==> is intended primarily for the compiler developers. Good luck! +[*Markf: A section outlining a procedure to use for actually doing +the port (what should be done and when, how to debug ...) would be +useful] + +[*Markf: a discussion (or at least a mention) of the stuff in +base/debug.scm would be useful] Acknowledgments @@ -68,11 +74,11 @@ from the people listed above. 0. Introduction and a brief walk through Liar. Liar translates Scode as produced by the procedure SYNTAX, or by the -file syntaxer (SF) into compiled code objects. The Scode is -translated into a sequences of languages, the last of which is the -binary representation of the compiled code. +file syntaxer (SF, for syntax file) into compiled code objects. The +Scode is translated into a sequences of languages, the last of which +is the binary representation of the compiled code. -The sequence of languages manipulated is +The sequence of external languages manipulated is Characters --READ--> S-Expressions --SYNTAX--> @@ -84,10 +90,11 @@ subpasses. Many of the subpasses do not manipulate the whole code graph, but instead follow threads that link the relevant parts of the graph. -COMPILE-SCODE is the main entry point to Liar, although CF is the -usual entry point. CF uses COMPILE-SCODE, and assumes that the code -has been syntaxed by SF producing a .bin file, and dumps the resulting -compiled code into a .com file. +COMPILE-SCODE is the main entry point to Liar, although CBF (for +compile bin file) is the usual entry point. CBF uses COMPILE-SCODE, +and assumes that the code has been syntaxed by SF producing a .bin +file, and dumps the resulting compiled code into a .com file. CF (for +compile file) invokes SF and then CBF on a file name argument. The internal sub-languages used by Liar are: @@ -103,19 +110,24 @@ where FGGEN, etc., are some of the major passes of the compiler. The remaining major passes are FGOPT (the flow-graph optimizer), and RTLOPT (the RTL-level optimizer). RTL-level register allocation is performed by RTLOPT, and hardware-level register allocation is -performed by LAPGEN. Branch-tensioning of the output code is -performed by ASSEMBLER. LINK constructs a Scheme compiled code object -from the bits representing the code and the fixed data that the -compiled code uses at runtime. +performed by LAPGEN. ASSEMBLER branch-tensions the output code. +Branch-tensioning is described in a later section. LINK constructs a +Scheme compiled code object from the bits representing the code and +the fixed data that the compiled code uses at runtime. compiler/toplev.scm contains the top-level calls of the compiler and its pass structure. 0.1. Liar's package structure -The package structure of the compiler reflects the pass structure. -The package structure is specified in compiler/machines/port/comp.pkg. -The major packages are: +[*Artur: What is a package and what are the basic commands for moving +between packages? Give a brief introduction to the structure of .pkg +files (forward pointer). At least tell where to find this information.] + +The package structure of the compiler reflects the pass structure and +is specified in compiler/machines/port/comp.pkg, where port is the +name of a machine (vax, mips, spectrum, bobcat, sparc, etc.). The +major packages are: (COMPILER): Utilities and data structures shared by most of the compiler. @@ -217,16 +229,16 @@ compiler/machines/port: the assembler rules, the disassembler, and RTL to assembly-language rules for the port. -All machine-dependent files are in compiler/machines/port and is the -only directory that needs to be written to port the compiler to a new -architecture. +All machine-dependent files are in compiler/machines/port and this is +the only directory that needs to be written to port the compiler to a +new architecture. 1. Liar's runtime model. Liar does not open-code all operations that the code would need to execute. In particular, it leaves error handling and recovery, -interrupt processing, and initialization, to a runtime library written -in assembly language. +interrupt processing, initialization, and invocation of unknown +procedures, to a runtime library written in assembly language. Although this runtime library need not run in the context of the CScheme interpreter, currently the only implementation of this library @@ -287,6 +299,7 @@ other words, segmented address spaces with segments necessarily smaller than the Scheme runtime heap will make Liar very hard or inefficient to port. +[*Markf: Insert short description of the assumptions in what follows:] - Liar assumes that code and data can coexist in the same address space. In other words, a true Harvard architecture, with separate code and data spaces, would be hard to support without relatively @@ -437,6 +450,9 @@ expressions untouched, or to be simplified in different ways, depending on the availability of memory operands or richer addressing modes. Since these rules vary from port to port, the final RTL differs for the different ports. +[*Markf: note also that the simplification is constrained by +the kinds of RTL expressions that the LAP rules for a particular port +will accept.] - The open coding of Scheme primitives is port-dependent. On some machines, for example, there is no instruction to multiply integers, @@ -460,7 +476,7 @@ higher-level ``glue'' statements. Once a program has been translated to RTL, the RTL code is optimized in a machine-independent way by minimizing the number of RTL - ) ( ) @@ -1094,8 +1115,8 @@ A missing coercion type means that the ordinary unsigned coercion (for the corresponding number of bits) should be used. Additionally, each of these ports provides a syntax for specifying -instructions whose final format must be determined by the branch -tensioning algorithm in the bit assembler. The syntax of these +instructions whose final format must be determined by the +branch-tensioning algorithm in the bit assembler. The syntax of these instructions is usually (VARIABLE-WIDTH ( ) (( ) @@ -1200,7 +1221,7 @@ addressing modes and PC-relative offsets in a more legible form. Note that the output of the disassembler need not be identical to the input of the assembler. The disassembler is used almost exclusively -fore debugging, and additional syntactic hints make it easier to read. +for debugging, and additional syntactic hints make it easier to read. * dassm3.scm: This file contains the code to disassemble one instruction at @@ -1246,7 +1267,7 @@ For example, - (hello) matches the constant list (hello) -- (? thing) matches anything, and THING is bound in and to whatever was matched. - (hello (? person)) matches a list of two elements whose first @@ -1319,6 +1340,8 @@ firing the corresponding body. The bodies are defined in terms of the WORD syntax defined in insmac.scm, and the ``commas'' used with the pattern variables in the rule bodies are a consequence of the WORD syntax. +[*Arthur: Refer to backquote syntax for more information? Forward +pointer to 5.3.1.] 5.2 Rule variable syntax. @@ -1451,13 +1474,14 @@ This procedure should only be used for source RTL registers. REFERENCE-ALIAS-REGISTER! performs the same action but returns a register reference instead of an RTL register number. -* ALLOCATE-ALIAS-REGISTER! expects and RTL register and a register +* ALLOCATE-ALIAS-REGISTER! expects an RTL register and a register type, and returns a machine register of the specified type that is the only alias for the RTL register and should be written with the new contents of the RTL register. ALLOCATE-ALIAS-REGISTER! is used to generate aliases for target RTL registers. REFERENCE-TARGET-ALIAS! performs the same action but returns a register reference instead of an RTL register number. +[*Arthur: Include forward reference to CLEAR-REGISTERS!] * STANDARD-REGISTER-REFERENCE expects an RTL register, a register type, and a boolean. It will return a reference for an alias of the @@ -1484,6 +1508,7 @@ register type and returns an appropriate register containing a copy of the source. The register is intended for temporary use, and MOVE-TO-TEMPORARY-REGISTER! attempts to reuse an existing alias for the source RTL register. +[*Markf: What does temporary mean?] * REUSE-PSEUDO-REGISTER-ALIAS! expects an RTL register, a register type, and two continuations. It attempts to find a reusable alias for @@ -1493,19 +1518,23 @@ continuation with no arguments if it fails. MOVE-TO-ALIAS-REGISTER! and MOVE-TO-TEMPORARY-REGISTER! are written in terms of REUSE-PSEUDO-REGISTER-ALIAS! but occasionally neither meets the requirements. +[*Markf: continuations? really?] * NEED-REGISTER! expects and RTL machine register and informs the -register allocator that the rule being expanded requires the use of -that register so it should not be available for subsequent requests. -The procedures described above that allocate and assign aliases call -NEED-REGISTER! behind the scenes, but you may occasionally need to -invoke it explicitly. +register allocator that the rule in use requires that register so it +should not be available for subsequent requests while translating the +current RTL statement or expression. The register is available for +later RTL statements or expressions (unless the appropriate rules +invoke NEED-REGISTER! all over). The procedures described above that +allocate and assign aliases call NEED-REGISTER! behind the scenes, +but you may occasionally need to invoke it explicitly. * LOAD-MACHINE-REGISTER! expects an RTL register and an RTL machine register and generates code that copies the current value of the RTL register to the machine register. It is used to pass arguments on registers to out-of-line code, typically in the compiled code runtime library. +[*Markf: Explain the register map.] * CLEAR-REGISTERS! expects any number of RTL registers and clears them from the register map, pushing their current contents to memory if @@ -1635,6 +1664,7 @@ free variables. The free variable storage need not be initialized since it will be by subsequent RTL instructions. The entry point of the resulting closure object should be written to RTL register TARGET. The format of closure objects is described in microcode/cmpint.txt. +[*Arthur: From where did the "-1"s come?] Note that CONS-CLOSURE will dynamically create some new instructions on the runtime heap, and that these instructions must be visible to @@ -2059,7 +2089,7 @@ the s/ultrix.m4 script to overcome this problem. Look at the m/vax.h and s/ultrix.h files for m4-related definitions. ==> We should just switch the default to 6 bits and be done with it. -- Modify ymakefile to include the a processor dependent section that +- Modify ymakefile to include the processor dependent section that lists the cmpint-port.h and cmpaux-port.m4 files. You can emulate the version for any other compiler port. It is especially important that the microcode sources be compiled with HAS_COMPILER_SUPPORT defined. @@ -2100,6 +2130,7 @@ again after executing (begin (cd "") (load "runtim.sf")) +[*Arthur: Is this still necessary?] 6.2 Building an interpreted compiler @@ -2179,7 +2210,7 @@ A good order to try them is sort/*.scm The programs in the first list test various aspects of code generation. -The programs in the first list test the handling of various dynamic +The programs in the second list test the handling of various dynamic conditions (i.e. error recovery). The programs in the third list are somewhat larger, and register allocation bugs, etc., are more likely to show up in them. @@ -2187,7 +2218,7 @@ allocation bugs, etc., are more likely to show up in them. A good idea at the beginning is to turn COMPILER:GENERATE-RTL-FILES? and COMPILER:GENERATE-LAP-FILES? on and compare them for plausibility. If you have ported the disassembler as well, you should try -disassembling some files and comparing them to then input LAP. They +disassembling some files and comparing them to the input LAP. They won't be identical, but they should be similar. Various runtime system files also make good tests. In particular, you @@ -2213,7 +2244,7 @@ you can cross-compile the sources using a compiled compiler. This method is somewhat involved because you will need binaries for both machines, since neither can load or dump the other's .bin files. -Say that you have a Vax, and you are porting to a Sparc. You will +Imagine that you have a Vax, and you are porting to a Sparc. You will need to pre-process and compile the Sparc's compiler on the Vax to use it as a cross-compiler. This can be done by following the same pattern that you used to generate the interpreted compiler on the @@ -2317,7 +2348,8 @@ If you generated the stage1 compiler by cross-compilation, they will not. The cross-compiler turns COMPILER:COMPILE-BY-PROCEDURES? off, while the default setting is on. In the latter case, you want to generate one more stage to check for convergence, i.e. execute ``make -stage2'' in each source directory, and re-compile once more. +stage2'' in each source directory, and re-compile once more, at each +stage using the compiler produced by the previous stage. Once you have two stages that you think should have identical binaries, you can use COMPARE-COM-FILES, defined in @@ -2358,10 +2390,11 @@ adb) to set breakpoints at the entry points of various kinds of procedures. When the breakpoints are reached, you can bump the Free pointer to a value larger than MemTop, so that the interrupt branch will be taken. If the code continues to execute correctly, you are -probably safe. You should especially procedures that expect dynamic -links since they must be saved and restored correctly. Closures -should also be tested carefully, since they need to be reentered -correctly, and the closure object on the stack may have to be bumped. +probably safe. You should especially check procedures that expect +dynamic links for these must be saved and restored correctly. +Closures should also be tested carefully, since they need to be +reentered correctly, and the closure object on the stack may have to +be bumped. Register allocation bugs also manifest themselves in unexpected ways. If you forget to use NEED-REGISTER! on a register used by a LAPGEN -- 2.25.1