Taylor R Campbell [Mon, 21 Jan 2019 23:37:32 +0000 (23:37 +0000)]
Sign-extend PC-relative branch target.
Taylor R Campbell [Mon, 21 Jan 2019 22:39:29 +0000 (22:39 +0000)]
Fix indexing in MOVE-FRAME-UP code: objects, not bytes, here.
And with this, the cold load completes on aarch64!
Taylor R Campbell [Mon, 21 Jan 2019 22:39:11 +0000 (22:39 +0000)]
Fix large application setup.
Taylor R Campbell [Mon, 21 Jan 2019 20:59:02 +0000 (20:59 +0000)]
Teach cmpintmd to flush the instruction cache on aarch64.
Taylor R Campbell [Mon, 21 Jan 2019 20:53:14 +0000 (20:53 +0000)]
Fix argument to PUSH_D_CACHE_REGION.
Takes startptr/count, not startptr/endptr.
This was not an issue before because until aarch64, the only extant
port that even used this, i386, ignored the argument as a macro and
flushed the entire cache.
Taylor R Campbell [Mon, 21 Jan 2019 19:06:38 +0000 (19:06 +0000)]
Fix branch instruction in uuo link stub.
Taylor R Campbell [Mon, 21 Jan 2019 19:06:20 +0000 (19:06 +0000)]
Tweak read/write_compiled_closure_target for clarity and assertions.
Taylor R Campbell [Mon, 21 Jan 2019 19:06:02 +0000 (19:06 +0000)]
Fix cache-assignment code generation.
Taylor R Campbell [Mon, 21 Jan 2019 19:05:51 +0000 (19:05 +0000)]
Fix case.
Taylor R Campbell [Mon, 21 Jan 2019 01:20:14 +0000 (01:20 +0000)]
Fix LSR instruction encoding.
Taylor R Campbell [Mon, 21 Jan 2019 00:37:29 +0000 (00:37 +0000)]
Fix scale->shift.
Taylor R Campbell [Sun, 20 Jan 2019 21:36:42 +0000 (21:36 +0000)]
Fix read/write_compiled_closure_target.
Byte offsets, not object or instruction word offsets.
Taylor R Campbell [Sun, 20 Jan 2019 20:10:39 +0000 (20:10 +0000)]
Fix comment.
Taylor R Campbell [Sun, 20 Jan 2019 00:19:13 +0000 (00:19 +0000)]
Fix PC-relative calculations to work entirely in newspace.
Taylor R Campbell [Sun, 20 Jan 2019 00:18:55 +0000 (00:18 +0000)]
Fix read/write_compiled_closure_target offsets.
Taylor R Campbell [Sat, 19 Jan 2019 23:57:34 +0000 (23:57 +0000)]
Allow non-branch in cc_return_address_to_entry_address.
This happens for trampolines. Maybe this should be a special case.
Taylor R Campbell [Sat, 19 Jan 2019 23:57:08 +0000 (23:57 +0000)]
Fix scaling of PC offsets: they're byte offsets, not word offsets.
Taylor R Campbell [Sat, 19 Jan 2019 23:56:55 +0000 (23:56 +0000)]
Fix some symbol sizing.
Taylor R Campbell [Sat, 19 Jan 2019 23:56:45 +0000 (23:56 +0000)]
Tidy up interface_to_C.
Taylor R Campbell [Sat, 19 Jan 2019 23:56:31 +0000 (23:56 +0000)]
Note there is a way to do negative offsets.
Taylor R Campbell [Sat, 19 Jan 2019 22:43:03 +0000 (22:43 +0000)]
Make C_to_interface go through interface_to_scheme.
This way C_to_interface sets up VAL, which is necessary in case it is
invoking a continuation.
Taylor R Campbell [Sat, 19 Jan 2019 21:20:47 +0000 (21:20 +0000)]
Fix encoding of ROR and EXTR instructions.
Taylor R Campbell [Sat, 19 Jan 2019 20:51:56 +0000 (20:51 +0000)]
Load UARG2, don't clobber UARG1, in apply hooks.
Taylor R Campbell [Sat, 19 Jan 2019 20:51:44 +0000 (20:51 +0000)]
Fix calculation of hook instruction address.
Taylor R Campbell [Sat, 19 Jan 2019 18:33:01 +0000 (18:33 +0000)]
Fix order of arguments to load-tagged-immediate.
Taylor R Campbell [Sat, 19 Jan 2019 08:03:54 +0000 (08:03 +0000)]
Fix reversed byte order branches in read_uuo_frame_size.
Taylor R Campbell [Sat, 19 Jan 2019 08:03:41 +0000 (08:03 +0000)]
Fix extraction of PC offset from branch instruction.
Taylor R Campbell [Sat, 19 Jan 2019 08:02:50 +0000 (08:02 +0000)]
Fix format word padding and tweak block offsets.
We already arranged for all entries to be 64-bit aligned, so we might
as well take advantage of that in block offsets.
Taylor R Campbell [Fri, 18 Jan 2019 08:15:28 +0000 (08:15 +0000)]
Fix uuo link and trampoline instructions.
Taylor R Campbell [Fri, 18 Jan 2019 07:13:32 +0000 (07:13 +0000)]
Make interface_to_scheme match reality, not sensibility.
Should change cmpint.c so we pass a separate dispatch routine in for
entries and continuations, but that requires changing all the
cmpauxen at once.
Taylor R Campbell [Fri, 18 Jan 2019 07:13:15 +0000 (07:13 +0000)]
Compiler oughta agree cmpauxmd about what register is stack pointer.
Taylor R Campbell [Fri, 18 Jan 2019 07:03:11 +0000 (07:03 +0000)]
Simplify format words: make them always be instruction words.
No need for endianness conditionalization.
Taylor R Campbell [Fri, 18 Jan 2019 06:23:00 +0000 (06:23 +0000)]
Fix passage of dynamic-link. Only machine register, not regblock.
Taylor R Campbell [Fri, 18 Jan 2019 06:22:18 +0000 (06:22 +0000)]
Assert block offset is zero.
Taylor R Campbell [Wed, 16 Jan 2019 04:48:27 +0000 (04:48 +0000)]
Add a TODO.
Taylor R Campbell [Wed, 16 Jan 2019 04:47:27 +0000 (04:47 +0000)]
Teach ucode identify about aarch64.
Also make this always return a string here, so it doesn't crash on
boot if it hasn't been taught about new compiled code types.
Taylor R Campbell [Wed, 16 Jan 2019 04:47:13 +0000 (04:47 +0000)]
Save an instruction in multiplication with CSETM.
Taylor R Campbell [Wed, 16 Jan 2019 04:47:00 +0000 (04:47 +0000)]
Tweak some register numbering to reduce a bit of code.
Taylor R Campbell [Wed, 16 Jan 2019 04:46:17 +0000 (04:46 +0000)]
Fix register block indexing: no hooks in the register block here.
Taylor R Campbell [Tue, 15 Jan 2019 17:27:45 +0000 (17:27 +0000)]
Fix add/sub immediate syntax and criterion.
Taylor R Campbell [Tue, 15 Jan 2019 16:37:11 +0000 (16:37 +0000)]
Use a temporary if necessary in AFFIX-TYPE.
Taylor R Campbell [Tue, 15 Jan 2019 16:29:02 +0000 (16:29 +0000)]
Draft aarch64 cmpauxmd.
Taylor R Campbell [Tue, 15 Jan 2019 03:48:25 +0000 (03:48 +0000)]
Fix push order in move-frame-up / dynamic-link.
Taylor R Campbell [Tue, 15 Jan 2019 03:20:21 +0000 (03:20 +0000)]
Fix some instruction syntax bugs.
- Specify target _and_ source -- we're not x86 here.
- Specify operand size.
- Specify multipliers correctly.
Taylor R Campbell [Tue, 15 Jan 2019 03:19:18 +0000 (03:19 +0000)]
Avoid REGISTER-COPY-IF-AVAILABLE and TEMPORARY-COPY-IF-AVAILABLE.
These give out register references, which are a pain. Just use
REUSE-PSEUDO-REGISTER-IF-AVAILABLE! to get the machine register
number.
Taylor R Campbell [Tue, 15 Jan 2019 03:18:32 +0000 (03:18 +0000)]
Disable floating-point vector primitives too.
Until we have open-coded floating-point arithmetic.
Taylor R Campbell [Tue, 15 Jan 2019 03:17:35 +0000 (03:17 +0000)]
Make RTL:CONSTANT-COST always return positive.
Otherwise CSE might substitute constants for registers where at best
it's not helpful and at worst we don't have rules for it.
Taylor R Campbell [Tue, 15 Jan 2019 03:15:35 +0000 (03:15 +0000)]
Fix up some instruction decriptions.
- Migrate some things with citations and updates to instr1.scm.
- No need for `(evaluation ,terms) in fixed-width instructions.
- Fix some missing or duplicated bits.
- Add some more instructions.
Taylor R Campbell [Tue, 15 Jan 2019 03:14:40 +0000 (03:14 +0000)]
Umptuple-check that instruction widths sum to multiples of 32 bits.
Taylor R Campbell [Tue, 15 Jan 2019 03:12:46 +0000 (03:12 +0000)]
Put something in these stub files so they compile as code.
Otherwise the portable fasdumper barfs trying to fasdump a pathname.
Taylor R Campbell [Tue, 15 Jan 2019 03:12:25 +0000 (03:12 +0000)]
Update config.guess and config.sub so they recognize aarch64.
Taylor R Campbell [Tue, 15 Jan 2019 03:11:36 +0000 (03:11 +0000)]
Fix configure goo for aarch64 with byte order specified.
Taylor R Campbell [Tue, 15 Jan 2019 03:09:58 +0000 (03:09 +0000)]
Block offset units are instructions, not bytes, so we get two more bits.
Taylor R Campbell [Mon, 14 Jan 2019 07:43:42 +0000 (07:43 +0000)]
Various work to get this going.
Enough to compile and assemble advice.scm, the first file in the
runtime. Still a ways from doing anything.
Taylor R Campbell [Mon, 14 Jan 2019 07:44:17 +0000 (07:44 +0000)]
Teach assembler about MODULO.
XXX Should maybe do EUCLIDEAN-REMAINDER or the full gamut of division
operators, but this is all I need for now.
Taylor R Campbell [Mon, 14 Jan 2019 07:44:05 +0000 (07:44 +0000)]
Report bad expressions more clearly.
Taylor R Campbell [Sun, 13 Jan 2019 22:52:06 +0000 (22:52 +0000)]
Fill in some more files, add some build goo, fix some bugs.
Invent a way to do assembler macros so we can do legible branch
tensioning rules and reuse ADRP/ADD patterns.
Taylor R Campbell [Sun, 13 Jan 2019 06:08:23 +0000 (06:08 +0000)]
Draft aarch64 back end.
Nowhere near completion yet, long TODO list, not compile-tested, &c.
Not sure if I'll find any more copious spare time to work on this for
a while.
Taylor R Campbell [Tue, 20 Aug 2019 03:40:24 +0000 (03:40 +0000)]
Fix multiplication and division by purely imaginary numbers.
That is, complex numbers whose real part is exact zero.
Taylor R Campbell [Tue, 20 Aug 2019 03:13:51 +0000 (03:13 +0000)]
Test multiplication and division by +i and -i.
We do not currently follow Kahan's recommenations that when the real
part is exactly zero, the arithmetic be done by negation rather than
multiplication.
Taylor R Campbell [Tue, 20 Aug 2019 03:03:25 +0000 (03:03 +0000)]
Fix edge cases in ANGLE.
Taylor R Campbell [Tue, 20 Aug 2019 02:51:27 +0000 (02:51 +0000)]
Expand edge cases for ANGLE.
Based on Kahan's `Much Ado about Nothing's Sign Bit' paper. We screw
up some zero edge cases.
Chris Hanson [Mon, 19 Aug 2019 22:33:00 +0000 (15:33 -0700)]
Fix references incorrectly marked with EVR().
Taylor R Campbell [Sat, 17 Aug 2019 13:54:34 +0000 (13:54 +0000)]
`x ... ...' is busted in syntax-rules.
Taylor R Campbell [Fri, 16 Aug 2019 05:02:00 +0000 (05:02 +0000)]
Merge branch 'riastradh-
20181220-closentry-v12'
Taylor R Campbell [Fri, 16 Aug 2019 04:59:52 +0000 (04:59 +0000)]
Tweak logit1/2+ condition number plot for clarity.
Taylor R Campbell [Fri, 16 Aug 2019 03:54:49 +0000 (03:54 +0000)]
Factor out common PostScript code for plotting.
Should make this a little more maintainable.
Taylor R Campbell [Fri, 16 Aug 2019 02:54:44 +0000 (02:54 +0000)]
Uniform code and style for plots.
Tweak line widths a little bit to roughly match cmmi10 (Computer
Modern Math Italic 10pt) rule widths for axes, and a little thicker
for the plots themselves, for the printed manual.
Taylor R Campbell [Fri, 16 Aug 2019 02:51:41 +0000 (02:51 +0000)]
Produce 300dpi, not 72dpi, PNGs for HTML output.
Arthur A. Gleckler [Thu, 15 Aug 2019 20:17:00 +0000 (13:17 -0700)]
Use TLS/SSL for links to <srfi.schemers.org>.
Taylor R Campbell [Thu, 15 Aug 2019 14:24:35 +0000 (14:24 +0000)]
Add release note.
Taylor R Campbell [Thu, 15 Aug 2019 05:19:18 +0000 (05:19 +0000)]
Bump COMPILER_INTERFACE_VERSION.
Make attempts to use old .com files fail a little more obviously.
Taylor R Campbell [Thu, 15 Aug 2019 04:57:56 +0000 (04:57 +0000)]
Set default target to all for cross-builds too.
No need to make it default to cross-host. If you want to separate
the cross-host/cross-target stages, you'll know to do cross-host
anyway.
Taylor R Campbell [Thu, 15 Aug 2019 04:45:27 +0000 (04:45 +0000)]
Avoid spurious fallthrough (fortunately harmless here).
Taylor R Campbell [Wed, 14 Aug 2019 01:31:56 +0000 (01:31 +0000)]
Test fma exceptions too.
Taylor R Campbell [Tue, 13 Aug 2019 23:25:14 +0000 (23:25 +0000)]
Add fma, fused-multiply/add.
Not yet open-coded anywhere. Will be a huge pain on x86. No aarch64
flonum open-coding at all yet.
(Maybe flo:fast-fma? should return false if it's not open-coded...)
Taylor R Campbell [Sun, 6 Jan 2019 03:59:31 +0000 (03:59 +0000)]
Use a different reflect code number for compiled invocations.
Teach the continuation parser about it.
Turns out this doesn't actually coincide with the format the v8
microcode used for APPLY-COMPILED, which also has a frame size,
presumably so arity dispatch could be done in the callee.
(Not that the v8 stuff matters these days; maybe we should just flush
those parts of conpar.scm.)
Taylor R Campbell [Sat, 5 Jan 2019 15:53:23 +0000 (15:53 +0000)]
Open-code WITH-STACK-MARKER too.
Saves a trip through reflect-to-interface, which would break the
return address branch target prediction stack.
Taylor R Campbell [Sat, 5 Jan 2019 06:31:35 +0000 (06:31 +0000)]
Share closure interrupt labels.
The interrupt-handling subroutine just uses the tagged entry on the
stack, so no need for a separate call for each closure. If nothing
else this should save some code size.
Also, in open-coding of with-interrupt-mask, reuse pop-return with
interrupt checks.
Taylor R Campbell [Sat, 5 Jan 2019 03:36:51 +0000 (03:36 +0000)]
Tidy up compiler utility return addresses.
Use compiled returns for the ones that are likely to return to Scheme
like lookups and assignments, and compiled entries for the ones that
are likely to return to microcode like interrupts.
Architectures on which compiled entries and compiled returns have the
same format will see no difference: compiled code passes in an
untagged return address either way.
On amd64, where compiled entries and compiled returns are different:
- For hooks that act like leaf subroutines and never return to
microcode, use plain CALL/RET in pairs.
- For hooks that are subroutines likely to return to Scheme
immediately but might return to microcode in screw cases, use
(CALL ,hook) ; Invoke hook with untagged ret addr...
(JMP (@PCR ,continuation)) ; ...which jumps to formatted entry.
(WORD ...)
(BLOCK-OFFSET ,continuation)
(QUAD U 0)
(LABEL ,continuation)
... ; continuation instructions
For the non-screw cases this keeps CALL/RET paired.
- For hooks that always defer to microcode, namely to handle
interrupts, use
(LEA Q (R ,rbx) (@PCR ,continuation))
(JMP ,hook)
Here it doesn't really whether the CALL/RET is paired because we're
going to wreck the return address branch prediction stack no matter
what, but it is convenient to have the entry address rather than
the return address in the compiled utility.
Taylor R Campbell [Fri, 4 Jan 2019 04:58:51 +0000 (04:58 +0000)]
Use ret for returns from interface and from generic arithmetic hooks.
Let's take advantage of the return address stack branch target
predictor rather than unceremoniously trash it, shall we?
Taylor R Campbell [Thu, 3 Jan 2019 19:10:45 +0000 (19:10 +0000)]
Open-code with-interrupt-mask, with-interrupts-reduced.
Not open-coded at the RTL level, but at the LAP level.
This way we avoid going through a return trampoline, which wrecks the
return address stack branch target predictor as long as we transition
between Scheme and C to handle trampolines.
Most of the work, of munging MEMTOP and STACK_GUARD, is relegated to
an assembly hook subroutine so the code doesn't expand too much. The
format of the stack still uses reflect-to-interface so that this
should require no changes to the continuation parser to get the
interrupt masks right, but with an intermediate empty-frame
continuation that actually calls the assembly hook and then pops
reflect-to-interface off.
Taylor R Campbell [Thu, 3 Jan 2019 03:19:54 +0000 (03:19 +0000)]
Allow return_to_compiled_code to return to compiled entries.
The earlier compiled entry/return split left various utility calls
pushing compiled entries, rather than compiled return addresses, for
continuations on the stack -- notably interrupt routines, the linker
utility, and interpreter calls.
I arranged for these to all to use RETURN_TO_SCHEME_ENTRY (or
JUMP_TO_CC_ENTRY), but missed one spot: the continuations constructed
by STACK-FRAME->CONTINUATION, which use return_to_compiled_code,
which in turn expected a compiled return rather than a compiled entry
and choked.
The interrupt routines, linker utility, and interpreter calls should
all be adapted to take returns rather than entries (which is another
ABI-breaking flag day), but this will do for now.
Taylor R Campbell [Wed, 2 Jan 2019 23:44:09 +0000 (23:44 +0000)]
Save interpreter result too before anything in continuation.
On x86, the interpreter call result register is eax/rax, register 0,
which is also the first register we hand out for register allocation.
The continuation for an interpreter call result uses register 0, but
if the caller uses a dynamic link, the continuation first pops its
frame via the dynamic link...using a temporary register that is
guaranteed to be register 0 since it's the first one the register
allocator hands out. The code sequence looks something like this:
;; (interpreter-call:cache-reference label-10 (register #x24) #f)
(mov q (r 2) (r 1))
(call (@ro 6 #xd0))
;; (continuation-entry label-10)
(word u #xfffc)
(block-offset label-10)
label-10:
;; (assign (register #x25) (post-increment (register 4) 1))
(pop q (r 0))
;; (assign (register #x26) (object->address (register #x25)))
(and q (r 0) (r 5))
;; (assign (offset (register 6) (machine-constant 4)) (register #x26))
(mov q (@ro 6 #x20) (r 0))
;; (assign (register #x23) (register 0))
(jmp (@pcr label-13))
On entry to the continuation, register 0 holds the value we want,
chosen as a machine alias for pseudo-register #x23 in the procedure
body, but the first thing the continuation does is pop the dynamic
link into register 0, ruining the party.
This is rather tricky to trigger because it turns out in _non-error_
cases, compiled code never asks the interpreter to evaluate a cache
reference that will return a value. But you can trigger this by
referencing an unassigned variable and invoking a restart, which does
cause the cache reference to return a value:
;; Unassigned, so compiled code will ask interpreter for help.
(define null)
;; Recursive procedure for which the compiler uses a dynamic link.
(define (map f l)
(let loop ((l l))
(if (pair? l)
(cons (f (car l)) (loop (cdr l)))
null)))
;; Invoke the restart that will return from the cache reference with
;; a value.
(bind-condition-handler (list condition-type:unassigned-variable)
(lambda (condition)
condition
(use-value '()))
(lambda ()
(map + '(1 2 3))))
;Value: (1 2 3 . #[false 15 #xea9c18])
Here #[false 15 #xea9c18] is the (detagged) dynamic link, a pointer
into the stack, not the result we wanted at all.
Taylor R Campbell [Mon, 31 Dec 2018 21:08:22 +0000 (21:08 +0000)]
Make entries point to _after_ the PC offset.
This saves a jump in closure headers, and makes non-closure entries
have a nice PC offset of 0 rather than an awkward PC offset of 8.
However, this causes all indirect calls to have an additional offset
of -8 in the addressing mode -- not clear yet how much this hurts.
WARNING: This changes the amd64 compiled code interface so that new
compiled code requires a new microcode and vice versa. Further, you
must set compiler:cross-compiling? to #t to compile the system,
because compiled code block offsets are now in a different place
relative to compiled entries, so the native fasdumper of an old
microcode can't handle compiled entries produced by a new compiler.
Taylor R Campbell [Wed, 2 Jan 2019 06:10:52 +0000 (06:10 +0000)]
Load the fallback into rax so caller needs no conditional branch.
WARNING: This changes the amd64 compiled code interface so that new
compiled code requires a new microcode. (However, a new microcode
should handle old compiled code without trouble, since old compiled
code treats rax as garbage at this point, and LEA does not affect
flags.)
Taylor R Campbell [Mon, 31 Dec 2018 20:32:37 +0000 (20:32 +0000)]
Use BTS to affix single-bit type tags.
Taylor R Campbell [Sat, 29 Dec 2018 21:57:47 +0000 (21:57 +0000)]
Relax register constraints for tagging rule.
No need to keep the source alive here -- use move-to-target and allow
any temporary register instead.
Taylor R Campbell [Sat, 29 Dec 2018 16:40:02 +0000 (16:40 +0000)]
Simplify hook calls.
No need for CALL or the stack to be involved: just load PC-relative
address into RBX directly. Should shave off a few bytes of code.
WARNING: This changes the amd64 compiled code interface so that new
compiled code requires a new microcode and vice versa.
Taylor R Campbell [Sun, 30 Dec 2018 21:28:19 +0000 (21:28 +0000)]
Convert x86-64 to use rax as value register.
WARNING: This changes the amd64 compiled code interface so that new
compiled code requires a new microcode and vice versa.
Taylor R Campbell [Sun, 30 Dec 2018 21:01:00 +0000 (21:01 +0000)]
Allow careful use of available machine registers in RTL.
This will enable us to put fixed machine registers such as the value
register carefully into the RTL even if they are ordinarily available
as pseudo-register aliases for machine register allocation.
- CGEN-RINST calls TARGET-MACHINE-REGISTER! if the target of an RTL
instruction is a machine register that is ordinarily available for
register allocation.
- REGISTER-ALIAS declines to return any aliases reserved by
TARGET-MACHINE-REGISTER!, until...
- DELETE-DEAD-REGISTERS! makes the target machine registers available
again for REGISTER-ALIAS so that they can be chosen as targets.
(However, they still won't be chosen as temporaries.)
- MOVE-TO-ALIAS-REGISTER! -- which may be used only after all other
source registers have been chosen -- also allows the machine target
to be used as a source alias in order to avoid unnecessary register
motion.
- Don't propagate RTL references to available machine registers in
common subexpression elimination or in code compression.
Since the machine register might be allocated as an alias for another
register, it can't be moved around. The RTL generator ensures these
references appear only at the beginning or end of a block where the
machine register cannot be an alias for any live pseudo-register.
Taylor R Campbell [Sun, 30 Dec 2018 21:01:58 +0000 (21:01 +0000)]
Ensure register:value appears first or last in block.
Either it is the first register referenced, or the last register
assigned. This will enable us to use a machine register that is
normally available for register allocation, without having to worry
that it may be an alias for a live pseudo-register.
- In continuations that receive a value through register:value,
create a temporary register and make the first instruction be an
assignment of register:value to the temporary register, before we
the pop-extra.
The RTL optimizer avoids propagating this alias so the assignment
will stay in place, but later on, the LAP generator will take
advantage of the alias to avoid generating additional unnecessary
code.
- In returns that store a value in register:value, create a temporary
register and assign it where we used to assign to register:value,
and then store the temporary in register:value as the very last
instruction before pop-return after any frame-popping which might
involve temporaries.
Taylor R Campbell [Sun, 30 Dec 2018 21:26:16 +0000 (21:26 +0000)]
Optimize execute caches: avoid indirect jumps if possible.
No change to the compiled code interface: this just generates faster
code in execute caches if it can.
Taylor R Campbell [Sat, 29 Dec 2018 04:12:28 +0000 (04:12 +0000)]
Generate per-invocation jmp instructions.
I hypothesize that this will help the CPU's branch target predictor
be more precise than having a single jmp instruction inside an
assembly hook that actually jumps to an unknown procedure.
Empirically, this gives about 5x speed improvement for a
microbenchmark involving unknown procedure calls:
(define x (make-vector 1))
(define (test-01 x)
(define (g)
(vector-set! x 0 0))
(g)
((identity-procedure g)))
(define (test-10 x)
(define (g)
(vector-set! x 0 0))
((identity-procedure g))
(g))
(define (repeat n f x)
(show-time
(lambda ()
(do ((i 0 (fix:+ i 1)))
((fix:>= i n))
(f x)))))
; Before:
(repeat
10000000 test-01 x)
;process time: 1420 (1370 RUN + 50 GC); real time: 1427
; After:
(repeat
10000000 test-01 x)
;process time: 290 (220 RUN + 70 GC); real time: 312
Caveat: This is on top of a bunch of other experiments.
XXX Redo this in isolation.
WARNING: This adds hooks to the amd64 compiled code interface, so new
compiled code requires a new microcode. (However, a new microcode
should handle old compiled code just fine.)
Taylor R Campbell [Sat, 29 Dec 2018 01:11:00 +0000 (01:11 +0000)]
Use CALL/RET for pushing and returning to continuations on amd64.
Calls now look like:
;; (assign (register #x123) (cons-pointer tag (entry:continuation cont)))
(CALL (@PCR pushed))
(JMP (@PCR cont))
pushed:
(OR Q (@R ,rsp) (&U ,tag))
...
(JMP (@PCR uuo-link))
Returns now look like:
;; (pop-return)
(AND Q (@R ,rsp) (R ,regnum:datum-mask))
(RET)
These should happen in pairs, so that we can take advantage of the
CPU's return address branch target prediction stack rather than
abusing the indirect jump branch target predictor.
WARNING: This changes the amd64 compiled code interface, so new
compiled code requires a new microcode. (A new microcode might be
able to handle existing compiled code just fine.)
Taylor R Campbell [Fri, 28 Dec 2018 20:51:02 +0000 (20:51 +0000)]
Split compiled entries and compiled return addresses.
Reallocate tag 4 for return addresses.
This way, a compiled entry can be a pointer to a PC offsets so that
we can construct closures without dynamically generating code and
wrecking the instruction cache, while a compiled return addresses can
be a pointer to a PC, since we never dynamically create indirections
for returns.
For now, the runtime can handle both tags for return addresses.
XXX Only done and tested on x86-64 for now. Other architectures need
to be tested. Might be worthwhile to do this on i386 too, if anyone
still cares about i386.
WARNING: This changes the compiled code interface on all
architectures, so you'll have to build a new compiler running on an
old microcode and use that to compile a new system afresh.
Taylor R Campbell [Thu, 27 Dec 2018 03:58:38 +0000 (03:58 +0000)]
Use indirection for entry points on amd64.
A compiled entry is now a tagged address A pointing to a 64-bit word
W such that A + W points to instruction to execute.
This adds a memory indirection overhead to unknown procedure calls,
but it has the effect that consing a closure only involves writing
data memory, not instruction memory that must be reloaded into the
CPU's instruction cache.
WARNING: This changes the amd64 compiled code interface, so you'll
have to build a new compiler running on an old microcode and use that
to compile a new system afresh.
Taylor R Campbell [Thu, 20 Dec 2018 04:58:07 +0000 (04:58 +0000)]
Avoid CALL without RET for closure entries, hooks, and trampolines.
This will wreck the CPU's return address branch target predictor.
This is an intermediate change en route to using paired CALL/RET for
continuation pushes and pop-returns in order to take advantage of the
CPU's branch target predictor.
WARNING: This changes the format of compiled closures, and as such,
new compiled code requires a new microcode and vice versa.
Taylor R Campbell [Thu, 27 Dec 2018 03:58:50 +0000 (03:58 +0000)]
Eliminate return/entry compiled invocation pun.
There is a small cost to this. My hope is that it will be offset by:
1. distinguishing compiled entries from compiled return addresses, in
order to enable...
2. using indirection for compiled entries so closures don't need
dynamically generated code, and finally...
3. using direct instruction addresses for compiled return addresses
so we can exploit the return stack branch predictor,
which all requires this change in order to function correctly.
No change to compiled code interface intended.
Taylor R Campbell [Tue, 13 Aug 2019 14:26:13 +0000 (14:26 +0000)]
Use ln -n to avoid following symlinks to directories.
Evidently -h is the BSD option for `don't follow symlinks', following
the convention of chmod and other utilities, while -n is the GNU
option; fortunately at least NetBSD ln has had -n too for GNU
compatibility for decades so I'm satisfied with -n. (Neither one is
POSIX.)