Changed the instruction sequence for procedure return (and computed
jump). The code for clearing the type code from a continuation now
loads the value into a register instead of modifying it in-place on
the stack.
I have left the code using an indirect jump. An alternative is to
push the value back on the stack and do a RET. The indirect jump
seems faster, especially when returning to the same address as the
previous jump, but the branch prediction mechanisms for RET and JMP
seem quite different.
Speeds up the modified Gabriel Benchmark Suite (/scheme/8.0/src/bench)
by 10% overall! I guess this is because the Pentium Pro really
doesn't like the old read-modify-write instruction.
Test Old New Ratio
ctak 11.59 11.54 0.996
conform 0.62 0.50 0.806
traverse 1.57 0.92 0.586
takl 0.23 0.20 0.870
peval 0.40 0.35 0.875
browse 0.59 0.56 0.949
tak 0.28 0.25 0.893
wttree 1.61 1.49 0.925
deriv 0.34 0.29 0.853
boyer 0.47 0.42 0.894
div 0.42 0.39 0.929
dderiv 0.44 0.38 0.864
cpstak 0.42 0.41 0.976
matmul1 0.27 0.27 1.000
fib 0.68 0.55 0.809
fcomp 0.61 0.54 0.885
triangle 2.89 2.36 0.817
puzzle 0.47 0.47 1.000
matmul2 0.66 0.69 1.045
destruct 0.28 0.28 1.000
~a.mean - - 0.899
~g.mean - - 0.892