Generate per-invocation jmp instructions.
I hypothesize that this will help the CPU's branch target predictor
be more precise than having a single jmp instruction inside an
assembly hook that actually jumps to an unknown procedure.
Empirically, this gives about 5x speed improvement for a
microbenchmark involving unknown procedure calls:
(define x (make-vector 1))
(define (test-01 x)
(define (g)
(vector-set! x 0 0))
(g)
((identity-procedure g)))
(define (test-10 x)
(define (g)
(vector-set! x 0 0))
((identity-procedure g))
(g))
(define (repeat n f x)
(show-time
(lambda ()
(do ((i 0 (fix:+ i 1)))
((fix:>= i n))
(f x)))))
; Before:
(repeat
10000000 test-01 x)
;process time: 1420 (1370 RUN + 50 GC); real time: 1427
; After:
(repeat
10000000 test-01 x)
;process time: 290 (220 RUN + 70 GC); real time: 312
Caveat: This is on top of a bunch of other experiments.
XXX Redo this in isolation.
WARNING: This adds hooks to the amd64 compiled code interface, so new
compiled code requires a new microcode. (However, a new microcode
should handle old compiled code just fine.)