mit-scheme.git
5 years agoFix indexing of remote links.
Taylor R Campbell [Tue, 22 Jan 2019 09:01:33 +0000 (09:01 +0000)]
Fix indexing of remote links.

5 years agoFix byte ordering in GENERATE/NSECTS.
Taylor R Campbell [Tue, 22 Jan 2019 03:47:47 +0000 (03:47 +0000)]
Fix byte ordering in GENERATE/NSECTS.

5 years agoFix register choices in GENERATE/REMOTE-LINKS.
Taylor R Campbell [Tue, 22 Jan 2019 03:47:35 +0000 (03:47 +0000)]
Fix register choices in GENERATE/REMOTE-LINKS.

5 years agoFix sense of INVOCATION-PREFIX:DYNAMIC-LINK choice.
Taylor R Campbell [Tue, 22 Jan 2019 03:47:03 +0000 (03:47 +0000)]
Fix sense of INVOCATION-PREFIX:DYNAMIC-LINK choice.

5 years agoFix reference to constant section in GENERATE/REMOTE-LINKS.
Taylor R Campbell [Tue, 22 Jan 2019 01:55:18 +0000 (01:55 +0000)]
Fix reference to constant section in GENERATE/REMOTE-LINKS.

5 years agoSign-extend PC-relative branch target.
Taylor R Campbell [Mon, 21 Jan 2019 23:37:32 +0000 (23:37 +0000)]
Sign-extend PC-relative branch target.

5 years agoFix indexing in MOVE-FRAME-UP code: objects, not bytes, here.
Taylor R Campbell [Mon, 21 Jan 2019 22:39:29 +0000 (22:39 +0000)]
Fix indexing in MOVE-FRAME-UP code: objects, not bytes, here.

And with this, the cold load completes on aarch64!

5 years agoFix large application setup.
Taylor R Campbell [Mon, 21 Jan 2019 22:39:11 +0000 (22:39 +0000)]
Fix large application setup.

5 years agoTeach cmpintmd to flush the instruction cache on aarch64.
Taylor R Campbell [Mon, 21 Jan 2019 20:59:02 +0000 (20:59 +0000)]
Teach cmpintmd to flush the instruction cache on aarch64.

5 years agoFix argument to PUSH_D_CACHE_REGION.
Taylor R Campbell [Mon, 21 Jan 2019 20:53:14 +0000 (20:53 +0000)]
Fix argument to PUSH_D_CACHE_REGION.

Takes startptr/count, not startptr/endptr.

This was not an issue before because until aarch64, the only extant
port that even used this, i386, ignored the argument as a macro and
flushed the entire cache.

5 years agoFix branch instruction in uuo link stub.
Taylor R Campbell [Mon, 21 Jan 2019 19:06:38 +0000 (19:06 +0000)]
Fix branch instruction in uuo link stub.

5 years agoTweak read/write_compiled_closure_target for clarity and assertions.
Taylor R Campbell [Mon, 21 Jan 2019 19:06:20 +0000 (19:06 +0000)]
Tweak read/write_compiled_closure_target for clarity and assertions.

5 years agoFix cache-assignment code generation.
Taylor R Campbell [Mon, 21 Jan 2019 19:06:02 +0000 (19:06 +0000)]
Fix cache-assignment code generation.

5 years agoFix case.
Taylor R Campbell [Mon, 21 Jan 2019 19:05:51 +0000 (19:05 +0000)]
Fix case.

5 years agoFix LSR instruction encoding.
Taylor R Campbell [Mon, 21 Jan 2019 01:20:14 +0000 (01:20 +0000)]
Fix LSR instruction encoding.

5 years agoFix scale->shift.
Taylor R Campbell [Mon, 21 Jan 2019 00:37:29 +0000 (00:37 +0000)]
Fix scale->shift.

5 years agoFix read/write_compiled_closure_target.
Taylor R Campbell [Sun, 20 Jan 2019 21:36:42 +0000 (21:36 +0000)]
Fix read/write_compiled_closure_target.

Byte offsets, not object or instruction word offsets.

5 years agoFix comment.
Taylor R Campbell [Sun, 20 Jan 2019 20:10:39 +0000 (20:10 +0000)]
Fix comment.

5 years agoFix PC-relative calculations to work entirely in newspace.
Taylor R Campbell [Sun, 20 Jan 2019 00:19:13 +0000 (00:19 +0000)]
Fix PC-relative calculations to work entirely in newspace.

5 years agoFix read/write_compiled_closure_target offsets.
Taylor R Campbell [Sun, 20 Jan 2019 00:18:55 +0000 (00:18 +0000)]
Fix read/write_compiled_closure_target offsets.

5 years agoAllow non-branch in cc_return_address_to_entry_address.
Taylor R Campbell [Sat, 19 Jan 2019 23:57:34 +0000 (23:57 +0000)]
Allow non-branch in cc_return_address_to_entry_address.

This happens for trampolines.  Maybe this should be a special case.

5 years agoFix scaling of PC offsets: they're byte offsets, not word offsets.
Taylor R Campbell [Sat, 19 Jan 2019 23:57:08 +0000 (23:57 +0000)]
Fix scaling of PC offsets: they're byte offsets, not word offsets.

5 years agoFix some symbol sizing.
Taylor R Campbell [Sat, 19 Jan 2019 23:56:55 +0000 (23:56 +0000)]
Fix some symbol sizing.

5 years agoTidy up interface_to_C.
Taylor R Campbell [Sat, 19 Jan 2019 23:56:45 +0000 (23:56 +0000)]
Tidy up interface_to_C.

5 years agoNote there is a way to do negative offsets.
Taylor R Campbell [Sat, 19 Jan 2019 23:56:31 +0000 (23:56 +0000)]
Note there is a way to do negative offsets.

5 years agoMake C_to_interface go through interface_to_scheme.
Taylor R Campbell [Sat, 19 Jan 2019 22:43:03 +0000 (22:43 +0000)]
Make C_to_interface go through interface_to_scheme.

This way C_to_interface sets up VAL, which is necessary in case it is
invoking a continuation.

5 years agoFix encoding of ROR and EXTR instructions.
Taylor R Campbell [Sat, 19 Jan 2019 21:20:47 +0000 (21:20 +0000)]
Fix encoding of ROR and EXTR instructions.

5 years agoLoad UARG2, don't clobber UARG1, in apply hooks.
Taylor R Campbell [Sat, 19 Jan 2019 20:51:56 +0000 (20:51 +0000)]
Load UARG2, don't clobber UARG1, in apply hooks.

5 years agoFix calculation of hook instruction address.
Taylor R Campbell [Sat, 19 Jan 2019 20:51:44 +0000 (20:51 +0000)]
Fix calculation of hook instruction address.

5 years agoFix order of arguments to load-tagged-immediate.
Taylor R Campbell [Sat, 19 Jan 2019 18:33:01 +0000 (18:33 +0000)]
Fix order of arguments to load-tagged-immediate.

5 years agoFix reversed byte order branches in read_uuo_frame_size.
Taylor R Campbell [Sat, 19 Jan 2019 08:03:54 +0000 (08:03 +0000)]
Fix reversed byte order branches in read_uuo_frame_size.

5 years agoFix extraction of PC offset from branch instruction.
Taylor R Campbell [Sat, 19 Jan 2019 08:03:41 +0000 (08:03 +0000)]
Fix extraction of PC offset from branch instruction.

5 years agoFix format word padding and tweak block offsets.
Taylor R Campbell [Sat, 19 Jan 2019 08:02:50 +0000 (08:02 +0000)]
Fix format word padding and tweak block offsets.

We already arranged for all entries to be 64-bit aligned, so we might
as well take advantage of that in block offsets.

5 years agoFix uuo link and trampoline instructions.
Taylor R Campbell [Fri, 18 Jan 2019 08:15:28 +0000 (08:15 +0000)]
Fix uuo link and trampoline instructions.

5 years agoMake interface_to_scheme match reality, not sensibility.
Taylor R Campbell [Fri, 18 Jan 2019 07:13:32 +0000 (07:13 +0000)]
Make interface_to_scheme match reality, not sensibility.

Should change cmpint.c so we pass a separate dispatch routine in for
entries and continuations, but that requires changing all the
cmpauxen at once.

5 years agoCompiler oughta agree cmpauxmd about what register is stack pointer.
Taylor R Campbell [Fri, 18 Jan 2019 07:13:15 +0000 (07:13 +0000)]
Compiler oughta agree cmpauxmd about what register is stack pointer.

5 years agoSimplify format words: make them always be instruction words.
Taylor R Campbell [Fri, 18 Jan 2019 07:03:11 +0000 (07:03 +0000)]
Simplify format words: make them always be instruction words.

No need for endianness conditionalization.

5 years agoFix passage of dynamic-link. Only machine register, not regblock.
Taylor R Campbell [Fri, 18 Jan 2019 06:23:00 +0000 (06:23 +0000)]
Fix passage of dynamic-link.  Only machine register, not regblock.

5 years agoAssert block offset is zero.
Taylor R Campbell [Fri, 18 Jan 2019 06:22:18 +0000 (06:22 +0000)]
Assert block offset is zero.

5 years agoAdd a TODO.
Taylor R Campbell [Wed, 16 Jan 2019 04:48:27 +0000 (04:48 +0000)]
Add a TODO.

5 years agoTeach ucode identify about aarch64.
Taylor R Campbell [Wed, 16 Jan 2019 04:47:27 +0000 (04:47 +0000)]
Teach ucode identify about aarch64.

Also make this always return a string here, so it doesn't crash on
boot if it hasn't been taught about new compiled code types.

5 years agoSave an instruction in multiplication with CSETM.
Taylor R Campbell [Wed, 16 Jan 2019 04:47:13 +0000 (04:47 +0000)]
Save an instruction in multiplication with CSETM.

5 years agoTweak some register numbering to reduce a bit of code.
Taylor R Campbell [Wed, 16 Jan 2019 04:47:00 +0000 (04:47 +0000)]
Tweak some register numbering to reduce a bit of code.

5 years agoFix register block indexing: no hooks in the register block here.
Taylor R Campbell [Wed, 16 Jan 2019 04:46:17 +0000 (04:46 +0000)]
Fix register block indexing: no hooks in the register block here.

5 years agoFix add/sub immediate syntax and criterion.
Taylor R Campbell [Tue, 15 Jan 2019 17:27:45 +0000 (17:27 +0000)]
Fix add/sub immediate syntax and criterion.

5 years agoUse a temporary if necessary in AFFIX-TYPE.
Taylor R Campbell [Tue, 15 Jan 2019 16:37:11 +0000 (16:37 +0000)]
Use a temporary if necessary in AFFIX-TYPE.

5 years agoDraft aarch64 cmpauxmd.
Taylor R Campbell [Tue, 15 Jan 2019 16:29:02 +0000 (16:29 +0000)]
Draft aarch64 cmpauxmd.

5 years agoFix push order in move-frame-up / dynamic-link.
Taylor R Campbell [Tue, 15 Jan 2019 03:48:25 +0000 (03:48 +0000)]
Fix push order in move-frame-up / dynamic-link.

5 years agoFix some instruction syntax bugs.
Taylor R Campbell [Tue, 15 Jan 2019 03:20:21 +0000 (03:20 +0000)]
Fix some instruction syntax bugs.

- Specify target _and_ source -- we're not x86 here.
- Specify operand size.
- Specify multipliers correctly.

5 years agoAvoid REGISTER-COPY-IF-AVAILABLE and TEMPORARY-COPY-IF-AVAILABLE.
Taylor R Campbell [Tue, 15 Jan 2019 03:19:18 +0000 (03:19 +0000)]
Avoid REGISTER-COPY-IF-AVAILABLE and TEMPORARY-COPY-IF-AVAILABLE.

These give out register references, which are a pain.  Just use
REUSE-PSEUDO-REGISTER-IF-AVAILABLE! to get the machine register
number.

5 years agoDisable floating-point vector primitives too.
Taylor R Campbell [Tue, 15 Jan 2019 03:18:32 +0000 (03:18 +0000)]
Disable floating-point vector primitives too.

Until we have open-coded floating-point arithmetic.

5 years agoMake RTL:CONSTANT-COST always return positive.
Taylor R Campbell [Tue, 15 Jan 2019 03:17:35 +0000 (03:17 +0000)]
Make RTL:CONSTANT-COST always return positive.

Otherwise CSE might substitute constants for registers where at best
it's not helpful and at worst we don't have rules for it.

5 years agoFix up some instruction decriptions.
Taylor R Campbell [Tue, 15 Jan 2019 03:15:35 +0000 (03:15 +0000)]
Fix up some instruction decriptions.

- Migrate some things with citations and updates to instr1.scm.
- No need for `(evaluation ,terms) in fixed-width instructions.
- Fix some missing or duplicated bits.
- Add some more instructions.

5 years agoUmptuple-check that instruction widths sum to multiples of 32 bits.
Taylor R Campbell [Tue, 15 Jan 2019 03:14:40 +0000 (03:14 +0000)]
Umptuple-check that instruction widths sum to multiples of 32 bits.

5 years agoPut something in these stub files so they compile as code.
Taylor R Campbell [Tue, 15 Jan 2019 03:12:46 +0000 (03:12 +0000)]
Put something in these stub files so they compile as code.

Otherwise the portable fasdumper barfs trying to fasdump a pathname.

5 years agoUpdate config.guess and config.sub so they recognize aarch64.
Taylor R Campbell [Tue, 15 Jan 2019 03:12:25 +0000 (03:12 +0000)]
Update config.guess and config.sub so they recognize aarch64.

5 years agoFix configure goo for aarch64 with byte order specified.
Taylor R Campbell [Tue, 15 Jan 2019 03:11:36 +0000 (03:11 +0000)]
Fix configure goo for aarch64 with byte order specified.

5 years agoBlock offset units are instructions, not bytes, so we get two more bits.
Taylor R Campbell [Tue, 15 Jan 2019 03:09:58 +0000 (03:09 +0000)]
Block offset units are instructions, not bytes, so we get two more bits.

5 years agoVarious work to get this going.
Taylor R Campbell [Mon, 14 Jan 2019 07:43:42 +0000 (07:43 +0000)]
Various work to get this going.

Enough to compile and assemble advice.scm, the first file in the
runtime.  Still a ways from doing anything.

5 years agoTeach assembler about MODULO.
Taylor R Campbell [Mon, 14 Jan 2019 07:44:17 +0000 (07:44 +0000)]
Teach assembler about MODULO.

XXX Should maybe do EUCLIDEAN-REMAINDER or the full gamut of division
operators, but this is all I need for now.

5 years agoReport bad expressions more clearly.
Taylor R Campbell [Mon, 14 Jan 2019 07:44:05 +0000 (07:44 +0000)]
Report bad expressions more clearly.

5 years agoFill in some more files, add some build goo, fix some bugs.
Taylor R Campbell [Sun, 13 Jan 2019 22:52:06 +0000 (22:52 +0000)]
Fill in some more files, add some build goo, fix some bugs.

Invent a way to do assembler macros so we can do legible branch
tensioning rules and reuse ADRP/ADD patterns.

5 years agoDraft aarch64 back end.
Taylor R Campbell [Sun, 13 Jan 2019 06:08:23 +0000 (06:08 +0000)]
Draft aarch64 back end.

Nowhere near completion yet, long TODO list, not compile-tested, &c.
Not sure if I'll find any more copious spare time to work on this for
a while.

5 years agoFix multiplication and division by purely imaginary numbers.
Taylor R Campbell [Tue, 20 Aug 2019 03:40:24 +0000 (03:40 +0000)]
Fix multiplication and division by purely imaginary numbers.

That is, complex numbers whose real part is exact zero.

5 years agoTest multiplication and division by +i and -i.
Taylor R Campbell [Tue, 20 Aug 2019 03:13:51 +0000 (03:13 +0000)]
Test multiplication and division by +i and -i.

We do not currently follow Kahan's recommenations that when the real
part is exactly zero, the arithmetic be done by negation rather than
multiplication.

5 years agoFix edge cases in ANGLE.
Taylor R Campbell [Tue, 20 Aug 2019 03:03:25 +0000 (03:03 +0000)]
Fix edge cases in ANGLE.

5 years agoExpand edge cases for ANGLE.
Taylor R Campbell [Tue, 20 Aug 2019 02:51:27 +0000 (02:51 +0000)]
Expand edge cases for ANGLE.

Based on Kahan's `Much Ado about Nothing's Sign Bit' paper.  We screw
up some zero edge cases.

5 years agoFix references incorrectly marked with EVR().
Chris Hanson [Mon, 19 Aug 2019 22:33:00 +0000 (15:33 -0700)]
Fix references incorrectly marked with EVR().

5 years ago`x ... ...' is busted in syntax-rules.
Taylor R Campbell [Sat, 17 Aug 2019 13:54:34 +0000 (13:54 +0000)]
`x ... ...' is busted in syntax-rules.

5 years agoMerge branch 'riastradh-20181220-closentry-v12'
Taylor R Campbell [Fri, 16 Aug 2019 05:02:00 +0000 (05:02 +0000)]
Merge branch 'riastradh-20181220-closentry-v12'

5 years agoTweak logit1/2+ condition number plot for clarity.
Taylor R Campbell [Fri, 16 Aug 2019 04:59:52 +0000 (04:59 +0000)]
Tweak logit1/2+ condition number plot for clarity.

5 years agoFactor out common PostScript code for plotting.
Taylor R Campbell [Fri, 16 Aug 2019 03:54:49 +0000 (03:54 +0000)]
Factor out common PostScript code for plotting.

Should make this a little more maintainable.

5 years agoUniform code and style for plots.
Taylor R Campbell [Fri, 16 Aug 2019 02:54:44 +0000 (02:54 +0000)]
Uniform code and style for plots.

Tweak line widths a little bit to roughly match cmmi10 (Computer
Modern Math Italic 10pt) rule widths for axes, and a little thicker
for the plots themselves, for the printed manual.

5 years agoProduce 300dpi, not 72dpi, PNGs for HTML output.
Taylor R Campbell [Fri, 16 Aug 2019 02:51:41 +0000 (02:51 +0000)]
Produce 300dpi, not 72dpi, PNGs for HTML output.

5 years agoUse TLS/SSL for links to <srfi.schemers.org>.
Arthur A. Gleckler [Thu, 15 Aug 2019 20:17:00 +0000 (13:17 -0700)]
Use TLS/SSL for links to <srfi.schemers.org>.

5 years agoAdd release note.
Taylor R Campbell [Thu, 15 Aug 2019 14:24:35 +0000 (14:24 +0000)]
Add release note.

5 years agoBump COMPILER_INTERFACE_VERSION.
Taylor R Campbell [Thu, 15 Aug 2019 05:19:18 +0000 (05:19 +0000)]
Bump COMPILER_INTERFACE_VERSION.

Make attempts to use old .com files fail a little more obviously.

5 years agoSet default target to all for cross-builds too.
Taylor R Campbell [Thu, 15 Aug 2019 04:57:56 +0000 (04:57 +0000)]
Set default target to all for cross-builds too.

No need to make it default to cross-host.  If you want to separate
the cross-host/cross-target stages, you'll know to do cross-host
anyway.

5 years agoAvoid spurious fallthrough (fortunately harmless here).
Taylor R Campbell [Thu, 15 Aug 2019 04:45:27 +0000 (04:45 +0000)]
Avoid spurious fallthrough (fortunately harmless here).

5 years agoTest fma exceptions too.
Taylor R Campbell [Wed, 14 Aug 2019 01:31:56 +0000 (01:31 +0000)]
Test fma exceptions too.

5 years agoAdd fma, fused-multiply/add.
Taylor R Campbell [Tue, 13 Aug 2019 23:25:14 +0000 (23:25 +0000)]
Add fma, fused-multiply/add.

Not yet open-coded anywhere.  Will be a huge pain on x86.  No aarch64
flonum open-coding at all yet.

(Maybe flo:fast-fma? should return false if it's not open-coded...)

5 years agoUse a different reflect code number for compiled invocations. origin/riastradh-20181220-closentry-v12
Taylor R Campbell [Sun, 6 Jan 2019 03:59:31 +0000 (03:59 +0000)]
Use a different reflect code number for compiled invocations.

Teach the continuation parser about it.

Turns out this doesn't actually coincide with the format the v8
microcode used for APPLY-COMPILED, which also has a frame size,
presumably so arity dispatch could be done in the callee.

(Not that the v8 stuff matters these days; maybe we should just flush
those parts of conpar.scm.)

5 years agoOpen-code WITH-STACK-MARKER too.
Taylor R Campbell [Sat, 5 Jan 2019 15:53:23 +0000 (15:53 +0000)]
Open-code WITH-STACK-MARKER too.

Saves a trip through reflect-to-interface, which would break the
return address branch target prediction stack.

5 years agoShare closure interrupt labels.
Taylor R Campbell [Sat, 5 Jan 2019 06:31:35 +0000 (06:31 +0000)]
Share closure interrupt labels.

The interrupt-handling subroutine just uses the tagged entry on the
stack, so no need for a separate call for each closure.  If nothing
else this should save some code size.

Also, in open-coding of with-interrupt-mask, reuse pop-return with
interrupt checks.

5 years agoTidy up compiler utility return addresses.
Taylor R Campbell [Sat, 5 Jan 2019 03:36:51 +0000 (03:36 +0000)]
Tidy up compiler utility return addresses.

Use compiled returns for the ones that are likely to return to Scheme
like lookups and assignments, and compiled entries for the ones that
are likely to return to microcode like interrupts.

Architectures on which compiled entries and compiled returns have the
same format will see no difference: compiled code passes in an
untagged return address either way.

On amd64, where compiled entries and compiled returns are different:

- For hooks that act like leaf subroutines and never return to
  microcode, use plain CALL/RET in pairs.

- For hooks that are subroutines likely to return to Scheme
  immediately but might return to microcode in screw cases, use

        (CALL ,hook)                    ; Invoke hook with untagged ret addr...
        (JMP (@PCR ,continuation))      ; ...which jumps to formatted entry.
        (WORD ...)
        (BLOCK-OFFSET ,continuation)
        (QUAD U 0)
       (LABEL ,continuation)
        ...                             ; continuation instructions

  For the non-screw cases this keeps CALL/RET paired.

- For hooks that always defer to microcode, namely to handle
  interrupts, use

        (LEA Q (R ,rbx) (@PCR ,continuation))
        (JMP ,hook)

  Here it doesn't really whether the CALL/RET is paired because we're
  going to wreck the return address branch prediction stack no matter
  what, but it is convenient to have the entry address rather than
  the return address in the compiled utility.

5 years agoUse ret for returns from interface and from generic arithmetic hooks.
Taylor R Campbell [Fri, 4 Jan 2019 04:58:51 +0000 (04:58 +0000)]
Use ret for returns from interface and from generic arithmetic hooks.

Let's take advantage of the return address stack branch target
predictor rather than unceremoniously trash it, shall we?

5 years agoOpen-code with-interrupt-mask, with-interrupts-reduced.
Taylor R Campbell [Thu, 3 Jan 2019 19:10:45 +0000 (19:10 +0000)]
Open-code with-interrupt-mask, with-interrupts-reduced.

Not open-coded at the RTL level, but at the LAP level.

This way we avoid going through a return trampoline, which wrecks the
return address stack branch target predictor as long as we transition
between Scheme and C to handle trampolines.

Most of the work, of munging MEMTOP and STACK_GUARD, is relegated to
an assembly hook subroutine so the code doesn't expand too much.  The
format of the stack still uses reflect-to-interface so that this
should require no changes to the continuation parser to get the
interrupt masks right, but with an intermediate empty-frame
continuation that actually calls the assembly hook and then pops
reflect-to-interface off.

5 years agoAllow return_to_compiled_code to return to compiled entries.
Taylor R Campbell [Thu, 3 Jan 2019 03:19:54 +0000 (03:19 +0000)]
Allow return_to_compiled_code to return to compiled entries.

The earlier compiled entry/return split left various utility calls
pushing compiled entries, rather than compiled return addresses, for
continuations on the stack -- notably interrupt routines, the linker
utility, and interpreter calls.

I arranged for these to all to use RETURN_TO_SCHEME_ENTRY (or
JUMP_TO_CC_ENTRY), but missed one spot: the continuations constructed
by STACK-FRAME->CONTINUATION, which use return_to_compiled_code,
which in turn expected a compiled return rather than a compiled entry
and choked.

The interrupt routines, linker utility, and interpreter calls should
all be adapted to take returns rather than entries (which is another
ABI-breaking flag day), but this will do for now.

5 years agoSave interpreter result too before anything in continuation.
Taylor R Campbell [Wed, 2 Jan 2019 23:44:09 +0000 (23:44 +0000)]
Save interpreter result too before anything in continuation.

On x86, the interpreter call result register is eax/rax, register 0,
which is also the first register we hand out for register allocation.
The continuation for an interpreter call result uses register 0, but
if the caller uses a dynamic link, the continuation first pops its
frame via the dynamic link...using a temporary register that is
guaranteed to be register 0 since it's the first one the register
allocator hands out.  The code sequence looks something like this:

;; (interpreter-call:cache-reference label-10 (register #x24) #f)
(mov q (r 2) (r 1))
(call (@ro 6 #xd0))
;; (continuation-entry label-10)
(word u #xfffc)
(block-offset label-10)
label-10:
;; (assign (register #x25) (post-increment (register 4) 1))
(pop q (r 0))
;; (assign (register #x26) (object->address (register #x25)))
(and q (r 0) (r 5))
;; (assign (offset (register 6) (machine-constant 4)) (register #x26))
(mov q (@ro 6 #x20) (r 0))
;; (assign (register #x23) (register 0))
(jmp (@pcr label-13))

On entry to the continuation, register 0 holds the value we want,
chosen as a machine alias for pseudo-register #x23 in the procedure
body, but the first thing the continuation does is pop the dynamic
link into register 0, ruining the party.

This is rather tricky to trigger because it turns out in _non-error_
cases, compiled code never asks the interpreter to evaluate a cache
reference that will return a value.  But you can trigger this by
referencing an unassigned variable and invoking a restart, which does
cause the cache reference to return a value:

;; Unassigned, so compiled code will ask interpreter for help.
(define null)

;; Recursive procedure for which the compiler uses a dynamic link.
(define (map f l)
  (let loop ((l l))
    (if (pair? l)
        (cons (f (car l)) (loop (cdr l)))
        null)))

;; Invoke the restart that will return from the cache reference with
;; a value.
(bind-condition-handler (list condition-type:unassigned-variable)
    (lambda (condition)
      condition
      (use-value '()))
  (lambda ()
    (map + '(1 2 3))))
;Value: (1 2 3 . #[false 15 #xea9c18])

Here #[false 15 #xea9c18] is the (detagged) dynamic link, a pointer
into the stack, not the result we wanted at all.

5 years agoMake entries point to _after_ the PC offset.
Taylor R Campbell [Mon, 31 Dec 2018 21:08:22 +0000 (21:08 +0000)]
Make entries point to _after_ the PC offset.

This saves a jump in closure headers, and makes non-closure entries
have a nice PC offset of 0 rather than an awkward PC offset of 8.
However, this causes all indirect calls to have an additional offset
of -8 in the addressing mode -- not clear yet how much this hurts.

WARNING: This changes the amd64 compiled code interface so that new
compiled code requires a new microcode and vice versa.  Further, you
must set compiler:cross-compiling? to #t to compile the system,
because compiled code block offsets are now in a different place
relative to compiled entries, so the native fasdumper of an old
microcode can't handle compiled entries produced by a new compiler.

5 years agoLoad the fallback into rax so caller needs no conditional branch.
Taylor R Campbell [Wed, 2 Jan 2019 06:10:52 +0000 (06:10 +0000)]
Load the fallback into rax so caller needs no conditional branch.

WARNING: This changes the amd64 compiled code interface so that new
compiled code requires a new microcode.  (However, a new microcode
should handle old compiled code without trouble, since old compiled
code treats rax as garbage at this point, and LEA does not affect
flags.)

5 years agoUse BTS to affix single-bit type tags.
Taylor R Campbell [Mon, 31 Dec 2018 20:32:37 +0000 (20:32 +0000)]
Use BTS to affix single-bit type tags.

5 years agoRelax register constraints for tagging rule.
Taylor R Campbell [Sat, 29 Dec 2018 21:57:47 +0000 (21:57 +0000)]
Relax register constraints for tagging rule.

No need to keep the source alive here -- use move-to-target and allow
any temporary register instead.

5 years agoSimplify hook calls.
Taylor R Campbell [Sat, 29 Dec 2018 16:40:02 +0000 (16:40 +0000)]
Simplify hook calls.

No need for CALL or the stack to be involved: just load PC-relative
address into RBX directly.  Should shave off a few bytes of code.

WARNING: This changes the amd64 compiled code interface so that new
compiled code requires a new microcode and vice versa.

5 years agoConvert x86-64 to use rax as value register.
Taylor R Campbell [Sun, 30 Dec 2018 21:28:19 +0000 (21:28 +0000)]
Convert x86-64 to use rax as value register.

WARNING: This changes the amd64 compiled code interface so that new
compiled code requires a new microcode and vice versa.

5 years agoAllow careful use of available machine registers in RTL.
Taylor R Campbell [Sun, 30 Dec 2018 21:01:00 +0000 (21:01 +0000)]
Allow careful use of available machine registers in RTL.

This will enable us to put fixed machine registers such as the value
register carefully into the RTL even if they are ordinarily available
as pseudo-register aliases for machine register allocation.

- CGEN-RINST calls TARGET-MACHINE-REGISTER! if the target of an RTL
  instruction is a machine register that is ordinarily available for
  register allocation.

- REGISTER-ALIAS declines to return any aliases reserved by
  TARGET-MACHINE-REGISTER!, until...

- DELETE-DEAD-REGISTERS! makes the target machine registers available
  again for REGISTER-ALIAS so that they can be chosen as targets.
  (However, they still won't be chosen as temporaries.)

- MOVE-TO-ALIAS-REGISTER! -- which may be used only after all other
  source registers have been chosen -- also allows the machine target
  to be used as a source alias in order to avoid unnecessary register
  motion.

- Don't propagate RTL references to available machine registers in
  common subexpression elimination or in code compression.

  Since the machine register might be allocated as an alias for another
  register, it can't be moved around.  The RTL generator ensures these
  references appear only at the beginning or end of a block where the
  machine register cannot be an alias for any live pseudo-register.

5 years agoEnsure register:value appears first or last in block.
Taylor R Campbell [Sun, 30 Dec 2018 21:01:58 +0000 (21:01 +0000)]
Ensure register:value appears first or last in block.

Either it is the first register referenced, or the last register
assigned.  This will enable us to use a machine register that is
normally available for register allocation, without having to worry
that it may be an alias for a live pseudo-register.

- In continuations that receive a value through register:value,
  create a temporary register and make the first instruction be an
  assignment of register:value to the temporary register, before we
  the pop-extra.

  The RTL optimizer avoids propagating this alias so the assignment
  will stay in place, but later on, the LAP generator will take
  advantage of the alias to avoid generating additional unnecessary
  code.

- In returns that store a value in register:value, create a temporary
  register and assign it where we used to assign to register:value,
  and then store the temporary in register:value as the very last
  instruction before pop-return after any frame-popping which might
  involve temporaries.

5 years agoOptimize execute caches: avoid indirect jumps if possible.
Taylor R Campbell [Sun, 30 Dec 2018 21:26:16 +0000 (21:26 +0000)]
Optimize execute caches: avoid indirect jumps if possible.

No change to the compiled code interface: this just generates faster
code in execute caches if it can.

5 years agoGenerate per-invocation jmp instructions.
Taylor R Campbell [Sat, 29 Dec 2018 04:12:28 +0000 (04:12 +0000)]
Generate per-invocation jmp instructions.

I hypothesize that this will help the CPU's branch target predictor
be more precise than having a single jmp instruction inside an
assembly hook that actually jumps to an unknown procedure.

Empirically, this gives about 5x speed improvement for a
microbenchmark involving unknown procedure calls:

(define x (make-vector 1))

(define (test-01 x)
  (define (g)
    (vector-set! x 0 0))
  (g)
  ((identity-procedure g)))

(define (test-10 x)
  (define (g)
    (vector-set! x 0 0))
  ((identity-procedure g))
  (g))

(define (repeat n f x)
  (show-time
   (lambda ()
     (do ((i 0 (fix:+ i 1)))
         ((fix:>= i n))
       (f x)))))

; Before:
(repeat 10000000 test-01 x)
;process time: 1420 (1370 RUN + 50 GC); real time: 1427

; After:
(repeat 10000000 test-01 x)
;process time: 290 (220 RUN + 70 GC); real time: 312

Caveat: This is on top of a bunch of other experiments.
XXX Redo this in isolation.

WARNING: This adds hooks to the amd64 compiled code interface, so new
compiled code requires a new microcode.  (However, a new microcode
should handle old compiled code just fine.)

5 years agoUse CALL/RET for pushing and returning to continuations on amd64.
Taylor R Campbell [Sat, 29 Dec 2018 01:11:00 +0000 (01:11 +0000)]
Use CALL/RET for pushing and returning to continuations on amd64.

Calls now look like:

  ;; (assign (register #x123) (cons-pointer tag (entry:continuation cont)))
  (CALL (@PCR pushed))
  (JMP (@PCR cont))
pushed:
  (OR Q (@R ,rsp) (&U ,tag))
  ...
  (JMP (@PCR uuo-link))

Returns now look like:

  ;; (pop-return)
  (AND Q (@R ,rsp) (R ,regnum:datum-mask))
  (RET)

These should happen in pairs, so that we can take advantage of the
CPU's return address branch target prediction stack rather than
abusing the indirect jump branch target predictor.

WARNING: This changes the amd64 compiled code interface, so new
compiled code requires a new microcode.  (A new microcode might be
able to handle existing compiled code just fine.)