Chris Hanson [Tue, 24 Sep 2019 07:35:38 +0000 (00:35 -0700)]
Change record types to be immutable.
The primary reason for this is to make the predicates slightly faster,
eliminating the need to look up the type markers in the predicates.
Additionally, make-record-type now accepts these additional options using a
keyword list. The define-record-type macro has not been updated to support
these new options, but that will come soon. Consequently the files using these
options have been modified to use make-record-type directly.
Finally, a small tweak was needed so that multiple values are available earlier
in the cold load.
The primary advantage of this layout is to make the record predicate be constant
time, as opposed to the previous design in which it could be linear in the depth
of the parent chain.
In addition, a number of record operations have been bummed for slightly better
performance, and the layout of record types has been altered to keep track of
the type information in a way that's better organized for generating the record
operations.
There are some behavioral changes:
* This implementation is slightly incompatible with SRFI 131, since it prohibits
a child from having a field name that's the same as one of its ancestors.
I'll probably change this for compatibility.
* Only a root record type can have an applicability method, and that method is
called for all sub-types of that root type. Arguably this is reasonable
behavior.
* Non-root fasdumpable records must have proxy markers for all of their
component types. Previously, only the record type stored in slot 0 needed to
have a fasdumpable proxy. This isn't an immediate issue since fasdumpable
records are used very sparingly at the moment and probably won't be supported
outside of the runtime system.
This did not manifest in my testing on NetBSD because it happened
that on NetBSD, the tospace and newspace are always separated by more
than 4 GB, so the bogus jmprel32_offset was never used during GC,
e.g.:
Open-code flonum-fma (fused multiply-add) on aarch64.
The fused multiply-subtract doesn't kick in right now for reasons I
don't understand in rcompr.scm; maybe someone who understands that
code better can help.
Cache cleared exceptions to prevent SIGFPE loop on trap.
fesetenv, as used by fixup_float_environment at the top of Interpret,
will trap any trapped and raised exceptions in the floating-point
environment it is restoring, which is bad news during a trap.
Not really sure how this managed to work in the past...
Teach continuation parser about last return code offsets.
This fixes a thirty-year-old (!) bug with creating continuations that
return into compiled code with #f as the last return code offset for
reenter-compiled-code. Manifests only with debugging enabled.
Fix units for cc_entry_to_block_offset/cc_return_to_block_offset.
It would make more sense for the compiler to generate debug data
labels in instruction units, but this is a simpler change and is what
was done in the past on machines like mips with 32-bit aligned
instructions.
Caller is interested in exceptions afterward, so it is not sensible
to deregister interest in the floating-point environment afterward.
If you really want that, surround it in flo:preserving-environment.
We can't mark everywhere the cache needs to be invalidated --
i.e., every floating-point instruction -- and it's not clear
there's any performance benefit to the cache anyway. The main
performance cost, as I recall, was swapping environments on every
thread switch, which we avoid for all threads in the default
environment.
2. The default environment initialization left the machine in a wacky
state after reset-package!, which caused many spurious exception
traps once I undid the cache. There's no need to preserve the
machine environment here; we are setting up the default
environment, after all, so the environment we're in when done
should be the default one.
Comment out bogus ADRP-ADD pseudo-instruction definition.
This will cause the compiler to fail noisily if it tries to assemble
code with sufficiently distant PC-relative addresses, which is better
than silently assembling garbage.
Apparently ADRP really does do Rd <- (PC & ~0xfff) + (imm << 12), not
PC + (imm << 12), which means it's gonna cause some trouble for the
assembler in LIAR, since it means the code needs to know its own
offset within a page of memory and the target's offset within a page
of memory.
This was not an issue before because until aarch64, the only extant
port that even used this, i386, ignored the argument as a macro and
flushed the entire cache.