Initial revision

author Nick Papadakis <edu/mit/csail/zurich/nick>

Fri, 29 Oct 1993 23:02:49 +0000 (23:02 +0000)

committer Nick Papadakis <edu/mit/csail/zurich/nick>

Fri, 29 Oct 1993 23:02:49 +0000 (23:02 +0000)
author Nick Papadakis <edu/mit/csail/zurich/nick>
Fri, 29 Oct 1993 23:02:49 +0000 (23:02 +0000)
committer Nick Papadakis <edu/mit/csail/zurich/nick>
Fri, 29 Oct 1993 23:02:49 +0000 (23:02 +0000)
diff --git a/v7/src/compiler/documentation/INSTALL b/v7/src/compiler/documentation/INSTALL

new file mode 100644 (file)

index 0000000..84e2f21
--- /dev/null
+++ b/v7/src/compiler/documentation/INSTALL
@@ -0,0 +1,366 @@
+-*-Text-*-
+
+              Installation Notes for Liar version 4.9
+
+
+Liar, the CScheme compiler, is available for the following computers:
+
+       Sun 3
+       HP 9000 series 300 (except model 310)
+
+These are 68020 based machines.  Ports for 68000/68010 machines and
+the Vax will be available in the future.
+
+For bug reports send computer mail to
+
+    BUG-LIAR@ZURICH.AI.MIT.EDU (on the Arpanet/Internet)
+
+or US Snail to
+
+    Scheme Team
+    c/o Prof. Hal Abelson
+    545 Technology Sq. rm 410
+    Cambridge MA 02139
+
+* The compiler is distributed as four compressed tar files, as
+follows:
+
+** "dist6.2.1-tar.Z" is release 6.2.1 of CScheme.  This is required
+for using the compiler.  It is installed in the usual way except for
+one small change to the microcode needed to support compiled code.
+This tar file contains about 5.1 Mbyte of data when unloaded.
+
+** "liar4.9b-tar.Z" contains the binary files for the compiler.  This
+includes a ".bin" file (SCode binary, for the interpreter) and a
+".com" file (native code compiler output) for each source file in the
+compiler.  It also contains a few other files used to construct the
+compiler from the binary files.  This tar file contains about 3 Mbyte
+of data when unloaded.
+
+** "liar4.9s-tar.Z" contains the source files for the compiler.  It
+also includes a TAGS table.  This tar file contains about 1.2 Mbyte of
+data when unloaded.
+
+** "liar4.9d-tar.Z" contains some debugging files.  There is one
+".binf" file corresponding to each ".com" file in the compiler.  Given
+both of these files, the compiler can generate a symbolic assembly
+language listing of the compiled code.  In future releases, these
+debugging files will also support debugging tools for parsing the
+stack and examining compiled code environment structures.  This tar
+file contains about 4.5 Mbyte of data when unloaded.
+\f
+* Installation of the compiler.  Installation requires about 17-20
+Mbyte of disk space.  This is conservative and could be reduced with
+some knowledge of what is needed and what is not.
+
+** The first step in installation is building CScheme.  Follow the
+instructions included in the release, except that the file
+"makefiles/sun" or "makefiles/hp200" (as appropriate) must be edited
+as follows.  Look for the following lines in that file:
+
+    # Compiled code interface files.
+    # These defaults are just stubs.
+
+    CSRC  = compiler.c
+    CFILE = compiler.oo
+    D_CFILE = compiler.do
+    F_CFILE = compiler.fo
+    CFLAG =
+    GC_HEAD_FILES= gccode.h
+
+edit these lines to read as follows:
+
+    # Compiled code interface files.
+
+    CSRC  = cmp68020.s
+    CFILE = cmp68020.o
+    D_CFILE = cmp68020.o
+    F_CFILE = cmp68020.o
+    CFLAG = -DCMPGCFILE=\"cmp68kgc.h\"
+    GC_HEAD_FILES= gccode.h cmp68kgc.h
+
+    .s.o: ; as -o $*.o $*.s
+
+After this is done, connect to the microcode subdirectory and execute
+the following
+
+    cp cmp68020.s-<sys> cmp68020.s
+
+where <sys> is "sun" if you are running on a Sun 3, or "hp" if you are
+running on an HP 9000 series 300.  NOTE: the file "cmp68020.s-src" is
+the source file from which the other two were built.  It was processed
+by m4 on an HP machine to create "cmp68020.s-hp", then that file was
+processed by a custom conversion program (courtesy of the
+butterfly-lisp hackers at BBN) to produce "cmp68020.s-sun".
+
+Once these changes have been made, finish the installation process in
+the normal way.
+
+**** Note that on Sun workstations, assembling "cmp68020.s" will
+produce the following harmless warning messages:
+
+as: error (cmp68020.s:1432): Unqualified forward reference
+as: error (cmp68020.s:1435): Unqualified forward reference
+as: error (cmp68020.s:1444): Unqualified forward reference
+
+Also, on older versions of Sun software (before release 3.4) you may
+not be able to assemble this file at all.  For that case, we have
+included the file "cmp68020.o-sun" which is the output of the
+assembler on a 3.4 system.  Copy that file to "cmp68020.o" and touch
+it to make sure it is newer than the source file.
+\f
+** The next step in installation is unloading the Liar tar files.  The
+tar files may be unloaded wherever you like.  When unloaded, they will
+create a directory "liar4.9" under the directory to which you are
+connected.
+
+Note that only "liar4.9b-tar.Z" need be unloaded in order to perform
+the rest of the installation.
+
+In what follows, let $LIAR stand for the name of the directory in
+which the compiler is loaded, and let $SCHEME stand for the name of
+the directory in which the interpreter is loaded.
+
+** After having unloaded the files, and after CScheme has been built
+and installed, do the following:
+
+    cd $SCHEME
+    mv $LIAR/runtime/* runtime
+    mv $LIAR/sf/* sf
+    cd runtime
+    scheme -fasl cmp-runmd.bin < $LIAR/etc/mkrun.scm
+
+This transfers a number of compiled files to the Scheme runtime system
+directory, and constructs a new version of the runtime system, named
+"scheme.com", which is partially compiled.  After this has been done,
+you may discard all of the ".com" files in the runtime system
+directory.  If you want the new runtime system to be the default,
+rename it to "scheme.bin".
+
+**** Note: because this is a beta release, the compiled runtime system
+"scheme.com" is likely to have bugs.  If you intend to use it by
+default, we suggest you retain the original (interpreted) runtime
+system "scheme.bin" by renaming it to something else.
+
+** Next, do the following:
+
+    scheme -constant 510 -heap 500 -band $SCHEME/runtime/scheme.com
+
+This starts up the scheme interpreter with a large constant space and
+heap, using the partially compiled runtime system.  After the
+interpreter has started, type the following expression at it:
+
+    (begin (%cd "$LIAR")
+          (load "machines/bobcat/make" system-global-environment)
+          (disk-save "$SCHEME/runtime/compiler.com"))
+
+it will load two files, then ask the question "Load compiled?".  Type
+Y, which means to build the compiler using compiled code.  If you type
+N, the compiler will be run interpretively, which is about a factor of
+10 slower than the compiled version.
+
+After you answer the question, it will load and evaluate approximately
+100 files.  This will take several minutes.  When it is done, you are
+returned to the interpreter.  At this point, a new band will have been
+created, called "$SCHEME/runtime/compiler.com", which contains the
+compiler.  All the other files in the $LIAR directory may be
+discarded, if you wish, since only "compiler.com" is needed to run the
+compiler.
+\f
+* Using the compiler.
+
+** Loading.  The compiler band, "compiler.com", is used by starting
+Scheme and specifying that file using the "-band" option.  You must
+also use the "-constant" option to specify that the constant space is
+at least 510, and it is recommended that the "-heap" be specified at
+least 500.  For medium to large compilations, a heap size of 700 or
+more may be needed; at MIT we typically use 1000 to be safe.
+
+Alternatively, the switch "-compiler" specifies constant 510, heap
+500, and the compiler band.
+
+** Memory usage.  Note that the total memory used by Scheme in this
+configuration is substantial!  With a heap of 1000 and a constant
+space of 510, the memory used is (* 4 (+ 510 (* 2 1000))), or about 10
+Mbyte.  For many computers this is a ridiculous figure, and Scheme
+will die a slow death due to paging.  Using a heap of 500 reduces this
+to about 6 Mbytes, but that is still quite alot.
+
+For machines with small memories, using the `bchscheme' version of the
+microcode will be helpful.  This program, which is made by connecting
+to "$SCHEME/microcode" and typing "make bchscheme", does its garbage
+collection to a disk file, thus requiring only one heap in the virtual
+address space.  This reduces the overall memory requirements for the
+above examples to 6 Mbyte and 4 Mbyte, respectively.  The savings of 4
+and 2 Mbytes (respective) will be allocated in the file system rather
+than in virtual memory.
+
+This may seem like a complicated way of doing virtual memory
+management, but in fact it performs significantly better than paging
+on machines with small amounts of RAM.  This is because the GC
+algorithm uses the disk much more efficiently than the paging system
+will be able to.
+
+** Compilation.  The following global definitions are available for
+calling the compiler:
+
+
+(COMPILE-BIN-FILE FILENAME #!OPTIONAL OUTPUT-FILENAME)
+
+Compiles a binary SCode file, producing a native code file.  FILENAME
+should refer to a file which is the output of the SF program (see
+"$SCHEME/documentation/user.txt" for a description of SF).  The type
+of the input file defaults to ".bin".
+
+OUTPUT-FILENAME, if given, is where to put the output file.  If no
+output filename is given, the output filename defaults to the input
+filename, except with type ".com".  If it is a directory specification
+(on unix, this means if it has a trailing "/"), then the output
+filename defaults as usual, except that it goes in that directory.
+
+This is similar to the operation of SF.  Also, like SF, the input
+filename may be a list of filenames, in which case they are all
+compiled in order.
+\f
+
+(COMPILE-PROCEDURE PROCEDURE)
+
+Compiles a compound procedure, given as its argument, and returns a
+new procedure which is the compiled form.  This does not perform side
+effects on the environment, so if one wished to compile MAP, for
+example, and install the compiled form, it would be necessary to say
+
+    (set! map (compile-procedure map))
+
+
+(COMPILER:WRITE-LAP-FILE FILENAME)
+
+This procedure generates a "LAP" disassembly file (LAP stands for Lisp
+Assembly Program, a traditional name for assembly language written in
+a list notation) from the output of COMPILE-BIN-FILE.  If filename is
+"foo", then it looks for "foo.com" and disassembles that, producing a
+file "foo.lap".  If, in addition, the file "foo.binf" exists, it will
+use that information to produce a disassembly which contains symbolic
+names for all of the labels.  This second form is extremely useful for
+debugging.
+
+
+(COMPILE-DIRECTORY DIRECTORY #!OPTIONAL OUTPUT-DIRECTORY FORCE?)
+
+Finds all of the ".bin" files in DIRECTORY whose corresponding ".com"
+files either do not exist or are older, and recompiles them.
+OUTPUT-DIRECTORY, if given, specifies a different directory to look in
+for the ".com" files.  FORCE?, if given and not #F, means recompile
+even if the output files appear up to date.
+\f
+* Debugging compiled code.  At present the debugging tools are
+practically nonexistent.  What follows is a description of the lowest
+level support, which is clumsy to use but which is adequate if you
+have a moderate understanding of the compiled code.  This is one of
+the prices of beta test!  Before release we will have user-level
+debugging tools.
+
+There are two basic kinds of errors: fatal and non-fatal.  Fatal
+errors are things like segmentation violations and bus errors, and
+when these occur the only method of debugging is to use an assembly
+language debugger such as `adb' or `gdb'.  Debugging these errors is
+complicated and will not be described here.
+
+** Non-fatal errors can be debugged from Scheme.  Here is the method:
+the file "$LIAR/etc/stackp.bin" contains a simple stack parser that
+will allow you to display the Scheme stack, and refer to any of the
+items in the stack by offset number.  Loading this file (into the
+global environment, for example), defines two useful procedures:
+
+(RCD FILENAME) writes a file containing a description of the current
+stack.  When an error has occurred, the current stack contains the
+continuation of the error, which is the information you want to see.
+Each line of the file contains an offset number and the printed
+representation of an object (the latter is truncated to fit on one
+line).
+
+(RCR OFFSET) returns the object corresponding to OFFSET from the
+current stack.  Thus, after using RCD to see the stack, RCR will get
+you pointers to any of the objects.
+
+Given these procedures, you can look at the compiled code stack
+frames, and possibly (with some skill) figure out what is happening.
+\f
+** Compiled code objects manipulators.  Another set of useful
+procedures, built into the runtime system and defined in the file
+"$SCHEME/runtime/ustruc.scm", will allow you to manipulate various
+compiled code objects:
+
+(COMPILED-PROCEDURE-ENTRY PROCEDURE) returns the entry point of the
+compiled procedure PROCEDURE.  This entry point is an object whose
+type is COMPILED-EXPRESSION.
+
+(COMPILED-CODE-ADDRESS? OBJECT) is true of both COMPILED-EXPRESSION
+objects as well as COMPILER-RETURN-ADDRESS objects.
+
+(COMPILED-CODE-ADDRESS->BLOCK COMPILED-CODE-ADDRESS) returns the
+compiled code block to which that address refers.  The procedure
+COMPILED-CODE-BLOCK/DEBUGGING-INFO will tell you the name of the
+".binf" file corresponding to that compiled code block, if the
+compiled code was generated by COMPILE-BIN-FILE.
+
+(COMPILED-CODE-ADDRESS->OFFSET COMPILED-CODE-ADDRESS) returns the
+offset, in bytes, of that address from the beginning of the compiled
+code block.  NOTE: this offset is the SAME offset as that shown in the
+disassembly listing!  Thus, given any compiled code address, you can
+figure out both what file it corresponds to, plus what label in the
+disassembly file it points at.  This is the basic information you need
+to understand the stack.
+
+There are several other procedures defined for manipulating these
+objects -- see the source code for details.  What follows is a brief
+description of the object formats to aid debugging.
+\f
+** Compiled Code Blocks.  Compiled code blocks are "partially marked"
+vectors.  The first part of a compiled code block is "non-marked",
+which means that the GC copies it but does not look through it for
+pointers.  This part is used to hold the compiled code.  The second
+part is "marked", and contains constants that are referred to by the
+compiled code.  These constants are ordinary Scheme objects and must
+be traced by the GC in the usual way.
+
+The disassembly listing shows the compiled code block in the same
+format that it is laid out in memory, with offsets in bytes from the
+beginning of the block.  The header of the block is 8 bytes, so the
+disassembly listing starts at offset 8.  The code and constants
+sections are displayed separately, in slightly different formats.
+
+** Procedure Entry Points.  The entry point of a procedure can be
+found in the LAP file by looking for a label with the same name as the
+procedure, concatenated with some positive integer.  Unnamed lambda
+expressions will be lambda-<n> for some <n>.  Closed procedures (i.e.
+those procedures which have an external representation) have two entry
+points, whose labels differ only in the concatenated integer.  The
+first entry point is responsible for checking the number of arguments,
+and transfers control to the second entry point if that is correct.
+
+** Stack Frames.  The normal stack frame for a closed procedure is
+constructed by pushing the return address, then all the arguments
+right to left, then the procedure.  If the procedure has internal
+definitions, then these are pushed on the stack on top of that in some
+unspecified order.  Internal procedures, when invoked, may either
+extend the closure's frame or create new frames.  The rules for this
+are complicated and far beyond the scope of this document.  However,
+two special types of stack pointers may be used when the closure's
+frame is extended.
+
+The first of these is a "static link".  This is a pointer into the
+stack which is used when a sub-frame needs to refer to bindings in
+some parent frame, but the compiler was unable to determine where that
+parent frame was at compile time.  The other type is a "dynamic link",
+which points to where the return address for the current procedure is
+located in the stack.  Because of tail recursion considerations, the
+compiler cannot always determine this at compile time, and in those
+cases dynamic links are used.  The dynamic link is normally kept in
+register A4, and pushed and popped off the stack at appropriate times.
+
+Note that internal procedures evaluate and push their arguments in a
+completely unspecified order.  Thus if your program depends on the
+fact that the interpreter evaluates arguments from right to left, you
+might be screwed, since the compiler chooses whatever order seems most
+efficient or convenient.
diff --git a/v7/src/compiler/documentation/TASKS b/v7/src/compiler/documentation/TASKS

new file mode 100644 (file)

index 0000000..caadf6b
--- /dev/null
+++ b/v7/src/compiler/documentation/TASKS
@@ -0,0 +1,234 @@
+-*-Text-*-
+
+Task list for compiler.  The list is seperated into classes of
+decreasing priority.  Add new entries at the end of the appropriate
+class.
+
+Each entry should start with an open/close bracket.  "Claim" a
+particular task by putting your uname in the brackets.  When the task
+is done, put DONE in the brackets.
+
+
+---- Class 1 (required for release) ----
+
+[DONE] Fix keyword bug in pattern matcher.
+
+[DONE] Open code computed vector operations.
+
+[DONE] Open code computed string, and bit-string operations.  (1-2
+days)
+
+[DONE] Open code generic arithmetic.  (1 week)
+
+[DONE] Open code flonum arithmetic.  (6 weeks)
+
+[Partly done] Fix dataflow analyzer explosion on takr.
+Handled by compiling by procedures.  Not really taken care of, but
+solves the problem in practice.
+
+[] Stack overflow checks.  
+This can be done accurately or heuristically:
+To do it accurately we must compute the maximum number of pushes for a
+procedure (not transitively) and (if not zero) check at entry whether
+that much space is available.
+To do it heuristically we only need to find those procedures that can
+call themselves (indirectly) in subproblem position and check whether
+we've exceeded the buffer at entry.  The other procedures will push an
+arbitrarily large, but finite amount.  Given a sufficiently large
+overpush buffer, the heuristic test should be sufficient.
+
+[] New closure/trampoline implementation to alleviate cacheing
+problems:
+  Closures can be pre-allocated in chunks at HeapTop and growing
+towards Free.  The closures have fixed instructions to jump to a
+simple assembly language routine that grabs the real entry point from
+the closure and invokes it through a register.  In this way the
+instructions are never modified and the cache need only be flushed
+rarely.  For example, the pre-allocated closures at the top of memory
+would look like
+
+<header>
+jsr n(a6)
+<entry point of code>
+<pointer to closure's variable area>
+
+and n(a6) would be
+
+mov.l  (sp),a0
+subq.l &4,(sp)         ; bump back to closure
+ori.b  &tc_compiled_entry,(sp)
+mov.l  (a0),a0         ; get real entry point
+jmp    (a0)            ; go to closure
+
+
+---- Class 2 (highly desirable or good cost/payoff ratio) ----
+
+[DONE] Reorder arguments in tail recursive calls and push the minimum
+number of temporaries.  (3 weeks)
+
+[] Reduce the number of interrupt checks and move them to continuation
+invocation rather than continuation entry.  The call graph is already
+computed.  There is no need for an entry gc check if the procedure
+does not call itself.  There is no need for a continuation check if
+the continuation cannot ultimately return to itself.  We may want to
+add gc checks anyway if we are consing more than a small fixed amount.
+A different problem is determining when to gc check in the middle of a
+basic block.  AAB's code probably generates humongous basic blocks
+which may require interruptability. (3 weeks)
+
+[DONE] Self consistent closing.  This includes dropping parent frame
+for open externals (and maybe static links) when the procedure does
+not need them.  Effective closures.  (3 weeks)
+
+[Partly done] Open code compiler apply.  (3 days)
+The 68K version has quick handlers for common arities.
+
+[] Open code or provide special handlers for common "unsafe"
+primitives such as apply, force, eval, with-interrupt-mask, etc.  (3
+days)
+
+[DONE] Teach the uuo linker about entities so it can do a direct jump. (3
+days).
+
+[] Speed up some bit string operations.  (3 days?)
+
+[] Cache compatible compiled versions of procedures in loops, and
+invoke them cheaply, using a computed jump.  (Use declarations, 1
+week.)
+
+[] Optimize I/O procedures (e.g. read, write) by supplying correct
+default port argument.  Perhaps call lower-level operation which does
+no defaulting.  (3 days)
+
+
+---- Class 3 (less desirable but cheap) ----
+
+[] Better linearization in loops.  (3-4 days)
+
+[OBSOLETE] Make top level (constant) definitions be handled by the
+linker to eliminate code space.  (2 weeks?)  Compilation by procedures
+obsoletes this.  The top level code, which performs the definitions,
+is not purified, so it is GCd.
+
+[] Assignments should do better.  No need to cellify if the variable
+is never closed over either by a procedure or by a continuation.
+Currently we can easily tell the procedure story, but can't tell the
+continuation story since the closing analysis is asymmetric.  Maybe do
+the analysis on continuations as well only for this job.  Lvalues
+already have a field to determine whether they are "closed over".  We
+may need a notion of a continuation "ultimately exported".  (10 days)
+
+[] Add an fg optimization phase that reduces the strength of the
+continuation types: After simapp and outer previously unknown-type
+continuations may have known types and some of the work can be avoided
+if the type is effect or predicate.  (2 weeks)
+
+[] Better code generation for many cases of computed jump: Many of
+them turn into
+       <test>
+       pea     entry
+       move.b  &tc_entry,(sp)
+       bra     merge
+       ...  merge
+       clr.b   (sp)
+       rts
+
+which can obviously be improved to
+       
+       <test>
+       bra     entry
+
+and merge may not be necessary at all.  (1 week)
+
+[DONE] Teach the UUO linker about primitives.  In this way, users who
+don't know about declarations may get a little better performance when
+their code references CAR (etc) freely.  This requires making "unsafe"
+primitives back out correctly when invoked from compiled code.  1 week.
+
+
+---- Class 4 (expensive or long term) ----
+
+[] Register variables in tight loops.  (summer)
+
+[] Loop unrolling in tight loops.  (2 weeks)
+
+[] Remove type codes from continuation via microcode stack parser.  A
+different possibility is to have a hybrid high/low tag approach where
+fixnums and compiled entries differ only in the low tags.  Through
+alignment constraints the code could always be tagged automatically.
+(summer)
+
+[] Redo the variable cache stuff to avoid or simplify trap tests.
+Right now the main obstacle to this is "unassigned".  Assignments
+could become expensive (because unassigning a variable would be
+expensive), or we could make assignment never be able to unassign a
+variable.  (3 weeks)
+
+[DONE] Improve optional arguments by bypassing apply in some cases where
+the frame needs to be reformatted by inserting optionals.  (3 days)
+
+[DONE] Rewrite the back end.  Currently it behaves quadratically on the
+length of the input.  It should be linear!  (4 weeks)
+It was a bug in the symbol table stuff by which many labels hashed to
+the same bucket and searches became linear rather than constant time.
+
+[DONE] Multi closing: Divide closures into non-overlapping sets which can
+share the structure of the closure.  In many cases, procedures are
+closed by contagion, and their free variables could be added to the
+closure who caused them to be closed.  Many of these don't even need
+code pointers.  (6 weeks)
+
+[] Write a recognizer for downward funargs and for cps-like code to
+avoid closing non passed-out procedures passed as arguments.  This
+would make cps-style multiple values generate pretty good code.  (6
+weeks)
+
+[] Improve the closure analyzer to close less often for compatibility:
+if all the possible operators have their closing limits in a linear
+chain, we can always leave all the stuff around that the innermost
+possible operator needs, and dynamically pop the appropriate amount of
+stuff if the operator is not the innermost one.  (4 weeks)
+
+
+---- Class 5 (very long term) ----
+
+[DONE] Side effect analysis.  Remove extraneous calls.
+
+[DONE] Value analysis.  Remove busy noops (cdr-loops, etc).
+
+[] Make a triviality analyzer that tells the code generator to inline
+code simple procedures even if used in more than one place.  As a
+special case of this, add a piece of code that decides when eta
+conversion is beneficial and propagates the results through the value
+graph.  This is important for some versions of the Y operator.  { A
+very simple version of this already done in the value analysis. }
+
+[] Reverse the order of arguments on the stack.  In this way
+listifying and defaulting optionals becomes much simpler since there
+is never a need to open a gap.
+
+[OBSOLETE] Write a static linker which when given multiple code
+objects to be loaded in the same environment, produces a new code
+object to be loaded in that environment but in which all
+cross-references have been linked.  This needs definitions to be
+written in a "parseable" format.  If an option that would produce code
+for creation of the environment and initialization of the program was
+provided, and the runtime system was restructured into a library from
+which the linker could selectively link, the compiler would become
+stand-alone.  (summer) 
+       There is a better way to get a stand-alone compiler, with no
+changes to the compiler!  A modified fasload can be written that
+`nulls' the environment object from compiled code blocks and the cache
+lists kept around for incremental definition.  The dumped code will
+only have those procedures needed (or modules needed if not compiled
+by procedures), and will not share environment structure with the
+load-time environment.  The primitives referenced by the code will be
+the only ones needed for the stand-alone version.
+
+
+---- Class 6 (idle thoughts) ----
+
+[] When handling disjunctions in the front end, if the predicate
+expression is something known to return a boolean value (such as
+`eq?'), then there is no need to generate a variable to hold the
+value.
diff --git a/v7/src/compiler/documentation/facts.txt b/v7/src/compiler/documentation/facts.txt

new file mode 100644 (file)

index 0000000..1747e47
--- /dev/null
+++ b/v7/src/compiler/documentation/facts.txt
@@ -0,0 +1,28 @@
+Some useful facts:
+
+* A canonical subproblem has a `continuation' component that satisfies
+the predicate `continuation?'.  The `prefix' component is never null,
+and always terminates in a return whose operator is `continuation', or
+a combination with that continuation.  The `rvalue' component is
+always a reference to the parameter of `continuation'.
+
+* A non-canonical subproblem has a `continuation' component that is a
+virtual continuation; this continuation is never reified.  The
+`rvalue' component is arbitrary.  The `prefix' component may or may
+not be null.
+
+* Every non-canonical subproblem is eventually translated into a
+virtual return node.  The exception to this rule is that subproblems
+whose values are unused or known are usually translated into null
+sequences.  In either case the prefix is output.
+
+* A continuation which is the operator of the
+`application-continuation-push' of some application satisfies
+`continuation/always-known-operator?'.  Furthermore, it has precisely
+one application: the one of which it is the associated
+continuation-push.
+
+* A return node can only have a `continuation-push' if it was created
+by `combination/constant!' (i.e. was the result of constant-folding).
+In this case the `continuation-push' is guaranteed to be of type
+`effect', so that the continuation is not pushed at all.
diff --git a/v7/src/compiler/documentation/files.txt b/v7/src/compiler/documentation/files.txt

new file mode 100644 (file)

index 0000000..1f0a75a
--- /dev/null
+++ b/v7/src/compiler/documentation/files.txt
@@ -0,0 +1,197 @@
+================================================================
+       compiler/back:
+================================================================
+This directory contains the machine-independent portion of the back
+end.  It contains bit-string utilities, symbol table utilities, label
+management procedures, the hardware register allocator, and the
+top-level assembler calls.
+
+* asmmac.scm
+;;;; Assembler Syntax Macros
+
+* asutl.scm
+;;;; Assembler Utilities
+;;; package: (compiler)
+
+* bittop.scm
+;;;; Assembler Top Level
+;;; package: (compiler assembler)
+
+* bitutl.scm
+;;;; Assembler utilities
+;;; package: (compiler assembler)
+
+* insseq.scm
+;;;; Lap instruction sequences
+
+* lapgn1.scm
+;;;; LAP Generator: top level
+;;; package: (compiler lap-syntaxer)
+
+* lapgn2.scm
+;;;; LAP Generator: High-Level Register Assignment
+
+* lapgn3.scm
+;;;; LAP Generator
+;;; package: (compiler lap-syntaxer)
+
+* linear.scm
+;;;; LAP linearizer
+;;; package: (compiler lap-syntaxer linearizer)
+
+* mermap.scm
+;;;; LAP Generator: Merge Register Maps
+
+* regmap.scm
+;;;; Register Allocator
+;;; package: (compiler lap-syntaxer)
+
+* syerly.scm
+;;;; Syntax time instruction expansion
+
+* symtab.scm
+;;;; Symbol Tables
+;;; package: (compiler assembler)
+
+* syntax.scm
+;;;; LAP Syntaxer
+\f
+================================================================
+       compiler/rtlbase:
+================================================================
+       
+This directory contains utilities used by the RTL generator and
+optimizer.
+
+* regset.scm
+;;;; RTL Register Sets
+
+* rgraph.scm
+;;;; Program Graph Abstraction
+
+* rtlcfg.scm
+;;;; RTL CFG Nodes
+
+* rtlcon.scm
+;;;; Register Transfer Language: Complex Constructors
+;;; package: (compiler)
+
+* rtlexp.scm
+;;;; Register Transfer Language: Expression Operations
+;;; package: (compiler)
+
+* rtline.scm
+;;;; RTL linearizer
+
+* rtlobj.scm
+;;;; Register Transfer Language: Object Datatypes
+
+* rtlreg.scm
+;;;; RTL Registers
+
+* rtlty1.scm
+* rtlty2.scm
+;;;; Register Transfer Language Type Definitions
+;;; package: (compiler)
+
+* valclass.scm
+;;;; RTL Value Classes (? a hierarchy, right?)
+\f
+================================================================
+       compiler/rtlgen:
+================================================================
+
+This directory contains the code that translates the flow-graph into
+register transfer language (RTL).
+
+* fndblk.scm
+;;;; RTL Generation: Environment Locatives
+;;; package: (compiler rtl-generator find-block)
+
+fndvar.scm
+;;;; RTL Generation: Variable Locatives
+;;; package: (compiler rtl-generator)
+
+opncod.scm
+;;;; RTL Generation: Inline Combinations
+;;; package: (compiler rtl-generator combination/inline)
+
+rgcomb.scm
+;;;; RTL Generation: Combinations
+;;; package: (compiler rtl-generator generate/combination)
+
+rgproc.scm
+;;;; RTL Generation: Procedure Headers
+;;; package: (compiler rtl-generator generate/procedure-header)
+
+rgretn.scm
+;;;; RTL Generation: Return Statements
+
+rgrval.scm
+;;;; RTL Generation: RValues
+;;; package: (compiler rtl-generator generate/rvalue)
+
+rgstmt.scm
+;;;; RTL Generation: Statements
+;;; package: (compiler rtl-generator)
+
+rtlgen.scm
+;;;; RTL Generation
+;;; package: (compiler rtl-generator)
+\f
+================================================================
+       compiler/rtlopt:
+================================================================
+
+This directory contains the RTL-level optimizer.  It contains code to
+perform lifetime analysis, redundant subexpression elimination,
+elimination of dead code, etc.
+
+* ralloc.scm
+;;;; Register Allocation
+
+* rcompr.scm
+;;;; RTL Compression
+
+* rcse1.scm
+;;;; RTL Common Subexpression Elimination: Codewalker
+;;; package: (compiler rtl-cse)
+
+* rcse2.scm
+;;;; RTL Common Subexpression Elimination
+
+* rcseep.scm
+;;;; RTL Common Subexpression Elimination: Expression Predicates
+
+* rcseht.scm
+;;;; RTL Common Subexpression Elimination: Hash Table Abstraction
+;;; package: (compiler rtl-cse)
+
+* rcserq.scm
+;;;; RTL Common Subexpression Elimination: Register/Quantity
+Abstractions
+
+* rcsesr.scm
+;;;; RTL Common Subexpression Elimination: Stack References
+
+* rdebug.scm
+;;;; RTL Optimizer Debugging Output
+
+* rdflow.scm
+;;;; RTL Dataflow Analysis
+;;; package: (compiler rtl-optimizer rtl-dataflow-analysis)
+
+* rerite.scm
+;;;; RTL Rewriting
+;;; package: (compiler rtl-optimizer rtl-rewriting)
+
+* rinvex.scm
+;;;; RTL Invertible Expression Elimination
+;;; package: (compiler rtl-optimizer invertible-expression-elimination)
+
+* rlife.scm
+;;;; RTL Register Lifetime Analysis
+;;;  Based on the GNU C Compiler
+
+* rtlcsm.scm
+;;;; RTL Common Suffix Merging
+\ No newline at end of file
diff --git a/v7/src/compiler/documentation/notes.txt b/v7/src/compiler/documentation/notes.txt

new file mode 100644 (file)

index 0000000..9087d33
--- /dev/null
+++ b/v7/src/compiler/documentation/notes.txt
@@ -0,0 +1,147 @@
+
+
+              Notes on potential compiler improvements
+
+
+* The analysis which generates `block-stack-link' could be improved.
+Currently, it fails in the case where the procedure is always invoked
+as a subproblem, but there are multiple possible continuations.  The
+reason is subtle: we need to know that the `continuation/offset' of
+all of the continuations is the same (as well as the frame-size and
+the closing-block).  Unfortunately, the computation of the offset
+depends on the subproblem ordering, which depends on the stack-link
+(to decide whether or not to use static links).  Catch-22.  Probably
+this can be solved.
+
+* Pathological case of "takr.scm" can perhaps be solved by integrating
+simapp and outer into one pass.  By handling "passed-in" nodes before
+other nodes, and not making links to such nodes, the explosion of
+useless dataflow information would be avoided.  However, this affects
+the static-link analysis, which looks at BOTH the "passed-in" bit as
+well as the set of values.  Think of some way to make this degrade
+properly.
+
+* Make the static-link analysis more sophisticated so that it uses
+static links whenever the current strategy would require going through
+at least two links.  This sometimes happens when the parent must be
+located through the closing block of the continuation.  In this case
+it is probably better to add a redundant static link for speed in
+lookup.
+
+* When tail-recursing into an internal procedure, if the procedure has
+no free variables, we can erase the calling frame.  In the simplest
+case, this means that such a procedure is actually an external
+procedure.  However, we could get more sophisticated and notice that
+it was OK to delete some of the ancestor stack frames but not others.
+
+* The code generated by the rewrite rule for disjunctions demonstrates
+that the decision about whether or not to use registers for LET
+parameters does not depend on the entire body of the LET.  In this
+case, the predicate parameter can ALWAYS be register allocated,
+independent of the complexity of the alternative, because it is unused
+once the decision has been made in favor of the alternative.  This can
+be generalized to handle more complex cases.
+
+* Change CFG implementation so that `hook' objects are just partially
+connected edges.  I think that noop nodes can then be eliminated in
+favor of disconnected edges.  This will also solve a potential problem
+where deletion of the noop nodes at the entry points of continuations
+leaves random hooks scattered around in various subproblems.
+
+* Many closures are never invoked through their external entry points.
+For such closures, the external entry point and associated code need
+never be generated.  Also, the closure object need not contain a code
+pointer.  This is one step closer to just using the closure frame
+pointer in place of the closure.
+
+* Perform dead-code-elimination at the same time as constant folding.
+Be thorough, deleting all nodes associated with all code that is
+eliminated.  This is tricky but pays off handsomely later on.  Also,
+doing it after the dataflow but before the rest of the analysis
+greatly reduces the amount of details that have to be kept track of
+during deletion.
+
+ALSO: note that removal of code to hack known predicates in "rgretn"
+may make something like this necessary for simple cases.
+
+Subsequent note: performing dead code elimination prior to subproblem
+ordering has a problem in that there are cfg fragments in the
+subproblems with invisible pointers into the node structure.  We can't
+delete nodes unless we know about these pointers, so we must do dead
+code elimination after subproblem ordering.
+
+* Now that RTL generator does not generate temporaries for quantities
+that are immediately pushed, tested, etc., we will need to modify the
+CSE to generate temporaries for the cases where those quantities are
+found in multiple places.  Hopefully this won't break the world.
+
+* The interning of SCode variable objects (for explicit lookup) is
+done on a per-block basis.  It should be changed so that stack blocks
+are skipped and the interning is done on the nearest IC block.
+
+* Fixnum operations
+
+** Is signed bit-field extraction faster than current strategy if the
+operand is in memory?
+
+** In the case of addition of two boxed fixnums to a boxed result,  no
+unboxing is needed on the operands provided the result is boxed in the
+usual way.
+\f
+
+                   Items that have been processed
+
+
+* Introduction of inline-coded continuations (i.e. continuations of
+type UNIQUE or SIMPLE) has invalidated the current method of
+maintaining the frame pointer offset.  The reason is that the body of
+such a continuation thinks that the frame pointer knows where its
+frame is, while the offset in fact refers to some ancestor of that
+frame.  I think that ignoring the frame of such a continuation in
+`find-block' will produce the desired effect.
+
+* JOIN type blocks aren't needed for offset, but they ARE needed to
+prevent continuations from being classified as UNIFORM when they
+aren't.
+
+* To do `block-parent' operation on a "real" block, must skip any
+intervening JOIN blocks to find the next "real" block.
+
+* `generator/subproblem' has code to mark frame-size of a join block
+if the continuation is closed in one.  That needs to be moved
+elsewhere?
+
+* Theory: JOIN blocks are always invisible _except_ when needed to
+compute a frame pointer offset.  This means:
+
+** `find-block' and friends in "emodel" need to know about them.  Also
+the associated `stack-block-parent-locative' and similar
+constructions.
+
+** `procedure-closure-block' now refers to the previous
+`block-parent'.  The closing code must refer to `block-%parent' to get
+the lower-level closing block.
+
+** `block->join/distance' in "rgretn" needs to learn about them.
+
+* (implemented 8/88 -- cph) The code in "rgretn" should be modified as
+follows.  At a return point, if the continuation is known, then we can
+just jump to the continuation, as long as we set things up correctly
+based on the operator class of the continuation.  This might mean, for
+example, that we throw away the return address on the stack because we
+know that it has a certain value.  In practice, this can occur when we
+supply a continuation to a combination that goes to one of two
+procedures.  The procedure in which the return appears is ONLY invoked
+with this continuation, while the other procedure is sometimes invoked
+with another continuation.  Thus we must push the return address,
+because we don't know which procedure we're invoking, but at return
+time it isn't needed.
+
+* Some procedures that are being considered closures can easily be
+open external.  Each of the free variables must satisfy one of the
+following criteria: (1) it has a known value, or (2) it is bound in
+the IC block being used for cached references.  This optimization will
+make an enormous performance improvement on programs that consist of
+many procedures closed in a compiled block, with a few external
+closure entry points, because it will allow most of the internal
+procedures to be open.  Currently they will all become closures.
diff --git a/v7/src/compiler/documentation/todo.txt b/v7/src/compiler/documentation/todo.txt

new file mode 100644 (file)

index 0000000..37850cc
--- /dev/null
+++ b/v7/src/compiler/documentation/todo.txt
@@ -0,0 +1,158 @@
+
+
+          Things left to do before releasing the compiler
+
+* Type checking for inline coded primitives.
+
+* Implement debugging features.  Solve absolute pathname problems.
+
+\f
+
+                   Items that have been processed
+
+
+* Bug in definitions in IC code: if there is a GC during the
+definition, the ENV register in the compiler's register block is not
+copied, so the next time it is referenced it is in oldspace.  One
+possible fix is to rewrite definitions as calls to the primitive.  The
+other non-caching environment operations may have the same problem,
+this should be looked into.
+
+Ordinary primitives should NOT have this problem since they supposedly
+push the environment into the continuation and then restore it on
+invocation of the continuation.
+
+FIXED: The compiler no longer uses the "short" definition hook by
+default.  It actually calls the primitives.  -- Jinx
+
+
+* Should notify users that the framing of IC blocks will be changed by
+the rewrite rules for such things as disjunctions and internal
+definition for value.
+
+FIXED: I don't think this is true by default any more.  The rewriting
+of first class environment code in the front end preserves framing
+while allowing better code generation. -- Jinx
+
+
+* Update "rtlgen/rgproc".
+
+* Write method for `unassigned-test' in "rtlgen/rgrval".
+
+* Write `make-rtl-continuation', `make-rtl-expr', and
+`make-rtl-procedure'.
+
+* `Temporary' objects are used in rgcomb, rgrval, and rgstmt.
+Change this code to use pseudo-registers.
+
+* "rgretn" refers to `pop-all-stack-frames', which is not
+written.
+
+* "rgraph" collects continuations in the current rgraph object.  Is
+this still what we want to do?  If so, somebody must do the
+accumulation.
+
+* Subproblem redesign: Attempt to change fggen so that there
+are two kinds of subproblems -- those that explicitly invoke
+continuations, and those that do not.  These correspond to "canonical"
+and "rvalue" subproblems, respectively.  Delay the consing of the
+continuation objects associated with the subproblem until it is known
+that it must be canonical.  This introduces a problem: the
+"subproblem-register" and "subproblem-type" will now be undefined on
+"rvalue" subproblems, and these concepts must be generalized to work
+in this new context.  Also, the operation "set-continuation/rtl!" is
+used on subproblems, and must be changed in this case.  All of these
+problems have to do solely with the RTL generator.
+
+* Separate applications from their subproblems.  Create a new
+node type "parallel" which contains the subproblems.  Doubly link the
+parallel node to the application node so we get the same relationship
+as at present.  Then, during subproblem ordering, edit the CFG to
+place the application node in the correct place, which normally will
+be in one of the continuations of one of the subproblems.
+
+Note that this implies a somewhat complicated CFG edit.
+
+* Note that after a continuation's CFG has been edited (e.g.
+using continuation/next-hooks), the value of continuation/scfg is no
+longer correct.  This is because it is not updated.  It's not obvious
+what should be done here.
+
+There is no good reason to keep the scfg of a continuation around.  A
+properly formed continuation (or procedure, either) has no
+"next-hooks" in its body since all of the exit points are
+applications.  Also, the only kinds of continuations that we want to
+glue anything to are those whose bodies are fg-noop nodes whose "next"
+is not yet connected.  If we try to glue to anything else, it is an
+error.
+
+* Rewrite rule for LAMBDA body screws up mutual recursion if
+the body contains any non-constant-valued definitions.  The LET which
+is created should be rewritten so that it goes around the LETREC
+bindings rather than inside them.
+
+* Change RTL generator to pass "offset" value around explicitly.
+
+* Flush JOIN blocks as these will no longer be used.
+
+* Be more careful about the code generation for applications whose
+operators are "simple".  If a program is known to be a loop, then both
+the call and return for that loop will generate links in the RTL
+graph, causing a real loop to appear in the graph.  Later passes of
+the compiler are assuming that there are no loops!
+
+Right now only "simple" return statements are turned into links, but
+it is desirable to convert "simple" call statements as well, provided
+that they aren't loops.  A simple heuristic that wins is to only
+convert calls who are both "simple" and whose operator is not called
+from elsewhere.  This will optimize LET, the most important case,
+without introducing loops.
+
+Unfortunately this is not easy to do in RTL because of the invocation
+prefixes: prefixes other than NULL require some extra work at the call
+point.  Unfortunately the prefixes are needed to make particular
+invocations work right, e.g. `rtl:make-invocation:lookup'.  Probably
+should eliminate the "prefix" concept for all except those invocations
+that need it, replacing prefixes by some explicit code to perform the
+action required.
+
+For now: implement fall-through in the LAP generator, by noticing it
+at linearization time.
+
+* Try to rewrite `invocation-prefix/erase-to' in "rtlgen/rgcomb" to
+use the `block-stack-link'.
+
+* I'm not convinced that the operator class analysis is useful any
+more.  This should be checked out and flushed if desirable.
+
+* Update the references to `make-vector-tag' in $zfront/rcseht and
+$zfront/rcserq to have the extra argument.
+
+* Write `combination/inline?' and the primitive inlining code.
+
+* The environment register is not being saved in the continuation of
+a subproblem call from an IC procedure.
+
+* Some memoization is desirable for the entry nodes on SIMPLE
+continuations, because they are being generated explicitly.
+
+* Probably the computations involving `lvalue/source-set' want to be
+made more efficient.  It's also possible that they will be computed
+more than once.
+
+* CSE will have to be changed back to do modelling of the stack again.
+
+* Change handling of dynamic links so that the link register is saved
+when calling an unknown place, and is assumed to contain nothing at
+external entries.  The simplest implementation of this is to assume
+that nothing is in the link register at external entries, and to save
+it on calls to external procedures.  Later we can optimize it better.
+This strategy allows us to coexist with the current compiled code.
+
+* Implement the data-structure discarding code.
+
+* The call and return code is not taking into account the cases where
+continuations are closed in IC blocks.  This may complicate things
+somewhat.  I'd prefer to leave this until I can see some output.
+
+* Implement open-coding for `vector'.
diff --git a/v7/src/compiler/machines/alpha/TODO b/v7/src/compiler/machines/alpha/TODO

new file mode 100644 (file)

index 0000000..fadf1c4
--- /dev/null
+++ b/v7/src/compiler/machines/alpha/TODO
@@ -0,0 +1,10 @@
+- Debug disassembler.
+
+- Update disassembler to match structure of others (not so many
+  assignments in dassm2.).
+
+- Eliminate warning from lapgen about 64-bit constants.
+
+- Teach lapgen how to generate #x1A00000000010000 and similar things.
+
+- Add stack check option.
diff --git a/v7/src/compiler/machines/spectrum/TODO b/v7/src/compiler/machines/spectrum/TODO

new file mode 100644 (file)

index 0000000..27bfad3
--- /dev/null
+++ b/v7/src/compiler/machines/spectrum/TODO
@@ -0,0 +1,105 @@
+Optimizations:
+
+A:     - Done
+
+5510       (ldi () #x32 8)
+5514       (ldi () 0 9)
+5518       (dep () 8 5 6 9)
+
+Could be done with a single ldil instruction.
+
+It comes from the sequence
+
+(assign (register #x30) (machine-constant #x32))
+(assign (register #x31) (machine-constant 0))
+(assign (register #x32) (cons-pointer (register #x30) (register #x31)))
+
+B:     - Done
+
+       (ldi () #x28 7)
+       (bl () 1 (@pco 0))
+       (dep () 0 #x1F 2 1)
+       (ldo () (offset (- continuation-695 *pc*) 0 1) 6)
+       (dep () 7 5 6 6)
+
+No need for ldi/dep, can be done with depi.
+
+It comes from sequence
+
+(assign (register #x33) (machine-constant #x28))
+(assign (register #x34) (entry:continuation #[uninterned-symbol 482 continuation-695]))
+(assign (register #x35) (cons-pointer (register #x33) (register #x34)))
+
+C:
+
+       (bl () 1 (@pco 0))
+       (dep () 0 #x1F 2 1)
+       (ldo () (offset (- continuation-695 *pc*) 0 1) 6)
+
+can become
+
+       (bl () 1 (@pco 0))
+       (ldo () (offset (- (- continuation-695 (+ *pc* 4)) privilege-bits) 0 1) 6)
+
+assuming that privilege bits are constant.
+
+D:     - Done
+
+       (ldi () #x28 #xA)
+       (stw () #x1F (offset #xC8 0 4))
+       (ldil () #x68000 8)
+       (ldo () (offset #x18 0 8) 8)
+       (stwm () 8 (offset 4 0 #x15))
+       (ldil () #x2020 8)
+       (ldo () (offset 4 0 8) 8)
+       (stwm () 8 (offset 4 0 #x15))
+       (bl () 1 (@pco 0))
+       (dep () 0 #x1F 2 1)
+       (ldo () (offset (- lambda-1814 *pc*) 0 1) 1)
+       (ble () (offset #x64 4 3))
+
+[Closure consing code]
+
+This can be shortened, and the ldi/dep can become a depi.
+
+E:
+
+4DC4       (ldi () #x36 8)
+4DC8       (copy () #x15 9)
+4DCC       (dep () 8 5 6 9)
+
+No need for ldi/dep.  Can be done with depi. (as long as free).
+
+F:
+
+(flo:- 0.0 x)
+uses a scheme object for 0.0
+fpr0 (the status register) reads as 0.0 (except for stores), a rule
+should use this.
+
+G:     - Done
+
+Introduce new macro instructions
+COMIBTN
+COMIBFN
+COMBN
+
+which work like the versions without N except that they always nullify
+the following instruction.  The branch tensioner knows the sign of the
+displacement and can therfore insert the NOP when necessary.
+
+H:
+
+Hooks are invoked by the following sequence:
+
+       BLE n(4,scheme_to_interface_ble_reg)
+       NOP
+
+Why?  The NOP should go away, and the hooks should use -4(4,31)
+
+No.  The sequence must be uniform, and the NOP allows for further
+optimization.  If the sequence were BLE,n, there would be no way
+to improve it.
+
+Note that hooks that don't return (e.g. +) can use BE,n .
+
diff --git a/v7/src/edwin/README b/v7/src/edwin/README

new file mode 100644 (file)

index 0000000..348674b
--- /dev/null
+++ b/v7/src/edwin/README
@@ -0,0 +1,22 @@
+Notes on the Edwin sources:
+
+If you want to add a new file to the source tree, you need to 
+modify the following three files:
+
+       decls.scm
+       ed-ffi.scm
+       edwin.pkg
+
+If the file should be loaded into the default edwin band, you must also
+edit the file:
+
+       edwin.ldr
+
+If the file is to be autoloaded, you must edit the file:
+
+       loadef.scm
+
+So, in either case, you have to change four files in /scheme/src/edwin.
+Of course, you also have to put a copy of your file in
+/scheme/src/edwin, too, and you have to put links to new-file.scm in
+/scheme/300/edwin and /scheme/800/edwin.
+\ No newline at end of file
diff --git a/v7/src/microcode/TODO b/v7/src/microcode/TODO

new file mode 100644 (file)

index 0000000..034b527
--- /dev/null
+++ b/v7/src/microcode/TODO
@@ -0,0 +1,148 @@
+-*-Text-*-
+$Header: /Users/cph/tmp/foo/mit-scheme/mit-scheme/v7/src/microcode/TODO,v 1.1 1993/10/29 23:00:51 nick Exp $
+
+       Things to do to the C microcode:
+
+MINOR (although not necessarily painless):
+
+* Adopt a naming convention for primitives to mitigate the name conflict
+problem with user-definable names.
+
+* Make the microcode intern the empty string and the empty vector, so
+EQV? does not have to do extra work.  Note that fasdump/fasload have
+to be careful about this.  This may not be desirable for strings,
+given the existence of SET-STRING-LENGTH!
+
+* Check that the microcode backs out of assignment correctly when it
+needs to gc.
+
+* Implement multiple values.
+
+* Change fasdump in scheme and bchscheme to fix the heap image before
+writing the file.  In this way, if there is a trap during the writing
+operation, the system can recover because there will be no broken hearts
+in memory.
+
+* Make Microcode_Termination close all files (including the photo
+file) if there is no termination handler.
+
+* Clean up the OS dependent stuff.  Add a new error,
+ERR_UNIMPLEMENTED_PRIMITIVE, which is signalled by the missing
+procedures.  Divide the procedures into the ones which can signal such
+an error, and the ones that must return a fake value (System_Clock,
+for example).  NOTE: The error has been added, we must now examine all
+the primitives.
+
+* Improve vms.c.  Implement many of the missing procedures.
+
+* Make the communication between OS_file_open and the appropriate
+primitive better: if OS_file_open fails it can be because the file
+does not exist, or because of some OS limitation on the number of
+files open at any given time.
+
+* Add the GNU emacs directory stuff to unix.c, and maybe to vms.c.
+\f
+MAJOR (or very painful):
+
+* Look at all instances of Microcode_Termination to determine whether
+they are synchronous or not.  If not synchronous, the stack may be in
+a bad state and the termination handler may be screwed up.  In
+particular, we may be in the middle of compiled code.  This might want
+to be merged with the trap mechanism in unix.c.
+
+* Redesign fasdump so pure load can be implemented.
+
+* Fix purify so it does an indirect transport of the procedure stored
+in an environment.  Thus purifying an environment would have the
+effect of purifying the code that created it.  Maybe the values should
+be purified too.
+
+* Write complete garbage collect.  This should be easy if the
+mechanism used in bchscheme is used.
+
+* Fix purify in bchscheme.  Currently, if the object is too large, it
+crashes.
+
+* Rewrite purify to avoid the double garbage collection.  It can use
+the same hack that fasdump uses, namely build a table of fixups in the
+spare heap while it conses in constant space.  If it reaches the end
+of constant space, it backs out by using the table of fixups.
+
+* Fix the way weak pairs are treated by fasdump, and by fasdump and
+purify in bchscheme.  They should not be treated like normal pairs.
+
+* Design and implement a better microcode error facility.  Adding new
+errors and parsers is a pain in the neck, and often the only
+interesting thing is the message the microcode wants to provide.
+
+* Eliminate all fprintf(stderr, <mumble>).  This can be achieved by
+having a message facility available for the microcode.
+
+* Split fixnum types: +fixnum is 0, -fixnum is -1, null is ?.
+Check for implicit 0 type.  Make manifest header be +fixnum.
+
+* Change the representation of compiled procedures for faster external
+calls.
+
+* Hack GC and related programs so that variable caches and uuo links
+need no type code for faster free variable reference and calls.
+\f
+       Things done to the C microcode:
+
+* Clean up variable reference code.  Many changes here:
+- Single trap mechanism so the microcode does not have to check more
+than one thing.  Implement unbound, unassigned, and dangerous in terms of this.
+- Clean up aux variable compilation: variables should not go into pure
+space, and then all the kludges about compilation can go away.
+- Eliminate the procedure name slot from the variable reference code.
+It should still be there for debugging, but not visible.  This also
+removes the extra test for assignment.
+       Jinx 4/2/87
+Variables can always go into pure space now because of the way aux
+compilation is done (depth, offset).
+
+* Fix `access' code so that it continues correctly when the variable
+is unbound or unassigned.  This is because the value of the access'
+environment field is not being pushed on the stack at the time of the
+error, so there is no way to continue.  There are probably some other
+similar bugs -- this one is likely to be caused by the fact that it
+requires a non-standard stack frame, making it slightly painful to
+implement.
+       Jinx 4/2/87
+
+* Setup the cached variable stuff so that assignments and references
+can be separated. 
+       Jinx 10/5/87
+
+* Remove danger bit. 
+       Jinx 10/9/87
+
+* Change various places that are signalling interrupts to use the
+macro `Request_Interrupt'.
+* Examine usage of `New_Compiler_Memtop' to determine if it is being
+used similarly.
+Eliminated.  There are new macros in interrupt.h .
+       Jinx 11/17/87
+
+* Make fasdump dump only those primitives referenced in the file.
+Maybe dump some arity information?  Once this is done, both kinds of
+primitives can be merged.
+       Jinx 11/17/87
+
+* Change primitives to use uniform mechanism like external primitive
+mechanism.
+       Jinx 11/17/87
+
+* Change the internal representation of primitives.  Instead of being
+just the primitive number, the high 12 bits of the datum can be the
+primitive number and the low 12 bits can be the primitive number if
+implemented, MAX_PRIMITIVE + 1 otherwise.  Then the primitive
+procedure table can be grown by 1 (with an error procedure) so that
+when invokig primitives the masking will automaticall cause an error
+procedure to be invoked if the primitive is not implemented, without
+comparing against MAX_PRIMITIVE.
+       Jinx 12/4/87
+
+* Improve compiled code interface to primitives.  Make them be
+expensive on backout, not on normal call.
+       Jinx 12/4/87
+\ No newline at end of file
diff --git a/v7/src/microcode/getpgsz.h b/v7/src/microcode/getpgsz.h

new file mode 100644 (file)

index 0000000..32adae6
--- /dev/null
+++ b/v7/src/microcode/getpgsz.h
@@ -0,0 +1,25 @@
+#ifdef BSD
+#ifndef BSD4_1
+#define HAVE_GETPAGESIZE
+#endif
+#endif
+
+#ifndef HAVE_GETPAGESIZE
+
+#include <sys/param.h>
+
+#ifdef EXEC_PAGESIZE
+#define getpagesize() EXEC_PAGESIZE
+#else
+#ifdef NBPG
+#define getpagesize() NBPG * CLSIZE
+#ifndef CLSIZE
+#define CLSIZE 1
+#endif /* no CLSIZE */
+#else /* no NBPG */
+#define getpagesize() NBPC
+#endif /* no NBPG */
+#endif /* no EXEC_PAGESIZE */
+
+#endif /* not HAVE_GETPAGESIZE */
+
diff --git a/v7/src/microcode/ntutl/mincer.ico b/v7/src/microcode/ntutl/mincer.ico

new file mode 100644 (file)

index 0000000..db85d5e

Binary files /dev/null and b/v7/src/microcode/ntutl/mincer.ico differ
diff --git a/v8/src/microcode/TODO b/v8/src/microcode/TODO

new file mode 100644 (file)

index 0000000..8336a0c
--- /dev/null
+++ b/v8/src/microcode/TODO
@@ -0,0 +1,148 @@
+-*-Text-*-
+$Header: /Users/cph/tmp/foo/mit-scheme/mit-scheme/v8/src/microcode/TODO,v 1.1 1993/10/29 23:00:51 nick Exp $
+
+       Things to do to the C microcode:
+
+MINOR (although not necessarily painless):
+
+* Adopt a naming convention for primitives to mitigate the name conflict
+problem with user-definable names.
+
+* Make the microcode intern the empty string and the empty vector, so
+EQV? does not have to do extra work.  Note that fasdump/fasload have
+to be careful about this.  This may not be desirable for strings,
+given the existence of SET-STRING-LENGTH!
+
+* Check that the microcode backs out of assignment correctly when it
+needs to gc.
+
+* Implement multiple values.
+
+* Change fasdump in scheme and bchscheme to fix the heap image before
+writing the file.  In this way, if there is a trap during the writing
+operation, the system can recover because there will be no broken hearts
+in memory.
+
+* Make Microcode_Termination close all files (including the photo
+file) if there is no termination handler.
+
+* Clean up the OS dependent stuff.  Add a new error,
+ERR_UNIMPLEMENTED_PRIMITIVE, which is signalled by the missing
+procedures.  Divide the procedures into the ones which can signal such
+an error, and the ones that must return a fake value (System_Clock,
+for example).  NOTE: The error has been added, we must now examine all
+the primitives.
+
+* Improve vms.c.  Implement many of the missing procedures.
+
+* Make the communication between OS_file_open and the appropriate
+primitive better: if OS_file_open fails it can be because the file
+does not exist, or because of some OS limitation on the number of
+files open at any given time.
+
+* Add the GNU emacs directory stuff to unix.c, and maybe to vms.c.
+\f
+MAJOR (or very painful):
+
+* Look at all instances of Microcode_Termination to determine whether
+they are synchronous or not.  If not synchronous, the stack may be in
+a bad state and the termination handler may be screwed up.  In
+particular, we may be in the middle of compiled code.  This might want
+to be merged with the trap mechanism in unix.c.
+
+* Redesign fasdump so pure load can be implemented.
+
+* Fix purify so it does an indirect transport of the procedure stored
+in an environment.  Thus purifying an environment would have the
+effect of purifying the code that created it.  Maybe the values should
+be purified too.
+
+* Write complete garbage collect.  This should be easy if the
+mechanism used in bchscheme is used.
+
+* Fix purify in bchscheme.  Currently, if the object is too large, it
+crashes.
+
+* Rewrite purify to avoid the double garbage collection.  It can use
+the same hack that fasdump uses, namely build a table of fixups in the
+spare heap while it conses in constant space.  If it reaches the end
+of constant space, it backs out by using the table of fixups.
+
+* Fix the way weak pairs are treated by fasdump, and by fasdump and
+purify in bchscheme.  They should not be treated like normal pairs.
+
+* Design and implement a better microcode error facility.  Adding new
+errors and parsers is a pain in the neck, and often the only
+interesting thing is the message the microcode wants to provide.
+
+* Eliminate all fprintf(stderr, <mumble>).  This can be achieved by
+having a message facility available for the microcode.
+
+* Split fixnum types: +fixnum is 0, -fixnum is -1, null is ?.
+Check for implicit 0 type.  Make manifest header be +fixnum.
+
+* Change the representation of compiled procedures for faster external
+calls.
+
+* Hack GC and related programs so that variable caches and uuo links
+need no type code for faster free variable reference and calls.
+\f
+       Things done to the C microcode:
+
+* Clean up variable reference code.  Many changes here:
+- Single trap mechanism so the microcode does not have to check more
+than one thing.  Implement unbound, unassigned, and dangerous in terms of this.
+- Clean up aux variable compilation: variables should not go into pure
+space, and then all the kludges about compilation can go away.
+- Eliminate the procedure name slot from the variable reference code.
+It should still be there for debugging, but not visible.  This also
+removes the extra test for assignment.
+       Jinx 4/2/87
+Variables can always go into pure space now because of the way aux
+compilation is done (depth, offset).
+
+* Fix `access' code so that it continues correctly when the variable
+is unbound or unassigned.  This is because the value of the access'
+environment field is not being pushed on the stack at the time of the
+error, so there is no way to continue.  There are probably some other
+similar bugs -- this one is likely to be caused by the fact that it
+requires a non-standard stack frame, making it slightly painful to
+implement.
+       Jinx 4/2/87
+
+* Setup the cached variable stuff so that assignments and references
+can be separated. 
+       Jinx 10/5/87
+
+* Remove danger bit. 
+       Jinx 10/9/87
+
+* Change various places that are signalling interrupts to use the
+macro `Request_Interrupt'.
+* Examine usage of `New_Compiler_Memtop' to determine if it is being
+used similarly.
+Eliminated.  There are new macros in interrupt.h .
+       Jinx 11/17/87
+
+* Make fasdump dump only those primitives referenced in the file.
+Maybe dump some arity information?  Once this is done, both kinds of
+primitives can be merged.
+       Jinx 11/17/87
+
+* Change primitives to use uniform mechanism like external primitive
+mechanism.
+       Jinx 11/17/87
+
+* Change the internal representation of primitives.  Instead of being
+just the primitive number, the high 12 bits of the datum can be the
+primitive number and the low 12 bits can be the primitive number if
+implemented, MAX_PRIMITIVE + 1 otherwise.  Then the primitive
+procedure table can be grown by 1 (with an error procedure) so that
+when invokig primitives the masking will automaticall cause an error
+procedure to be invoked if the primitive is not implemented, without
+comparing against MAX_PRIMITIVE.
+       Jinx 12/4/87
+
+* Improve compiled code interface to primitives.  Make them be
+expensive on backout, not on normal call.
+       Jinx 12/4/87
+\ No newline at end of file
author	Nick Papadakis <edu/mit/csail/zurich/nick>
	Fri, 29 Oct 1993 23:02:49 +0000 (23:02 +0000)
committer	Nick Papadakis <edu/mit/csail/zurich/nick>
	Fri, 29 Oct 1993 23:02:49 +0000 (23:02 +0000)
v7/src/compiler/documentation/INSTALL	[new file with mode: 0644]	patch \| blob
v7/src/compiler/documentation/TASKS	[new file with mode: 0644]	patch \| blob
v7/src/compiler/documentation/facts.txt	[new file with mode: 0644]	patch \| blob
v7/src/compiler/documentation/files.txt	[new file with mode: 0644]	patch \| blob
v7/src/compiler/documentation/notes.txt	[new file with mode: 0644]	patch \| blob
v7/src/compiler/documentation/todo.txt	[new file with mode: 0644]	patch \| blob
v7/src/compiler/machines/alpha/TODO	[new file with mode: 0644]	patch \| blob
v7/src/compiler/machines/spectrum/TODO	[new file with mode: 0644]	patch \| blob
v7/src/edwin/README	[new file with mode: 0644]	patch \| blob
v7/src/microcode/TODO	[new file with mode: 0644]	patch \| blob
v7/src/microcode/getpgsz.h	[new file with mode: 0644]	patch \| blob
v7/src/microcode/ntutl/mincer.ico	[new file with mode: 0644]	patch \| blob
v8/src/microcode/TODO	[new file with mode: 0644]	patch \| blob