Page 3 of 5
Does this seem eggregious to anyone besides me?
Posted: Sun Nov 21, 2021 12:14 pm
by BruceMcF
Putting the general 16.8 / #60 -> 16.8 algorithm into the format of the original ... but using the new API scratch register space, on the theory that a general set-up process may be using any of r0-r10 for holding values used in the process.
As with the original use of r0-r2, this shouldn't be placed in an interrupt routine.
I also noticed that since the remainder is not the actual remainder, but the residual remainder after calculating the /256th fractional part, moving the loop indexing and exit saves two bytes as the "rol rem" required for the first 24 iterations can be used in the final iteration to get the high bit of the residual remainder into the carry flag for rounding the final result.
A third byte may be saved by omitting the "clc" at the start the loop, which is a bit which flows through but is not part of the final result. If carry could be set or clear at the beginning of the process, having this garbage bit can confuse debugging, but it doesn't affect the result.
On the other hand, the "*32" pre-shift to speed up the process, because when dividing by 60, the first five iterations are known to fail the test subtraction, adds around 11 bytes. The speed optimization is omitting one byte from the shifting, replacing five left shifts with an equivalent three right shits, unrolling the loop, and being able to use the register for the shift.
.proc calculate_tick_rate: near
; X/Y = tick rate (Hz) - divide by 60 and store to zsm_steps
; use the ZP variable as tmp space
value := r11
frac := r12
rem := r13
stx value
sty value+1
stz frac
stz rem
; the first five trial subtracts will always fail ... for space optimization, just let them
ldx #25 ; 24 trial subtracts, plus 1 partial loop to shift the final result bit in.
; may be omitted
clc ; this bit will be shifted into bottom of residual remainder at end
; avoiding a possible garbage bit floating through makes it easier to debug
loop:
; Shift dividend one bit left, then into remainder byte
; In iterations 1-24, the remainder is prepared for next trial subtract
; In iterations 2-25, the 24 result bits are shifted in to replace the value.
rol frac
rol value
rol value+1
rol rem ; in last iteration, residual remainder high bit is in carry
dex
beq endloop
lda rem
sec
sbc #60
bcc loop
sta rem
bra loop
endloop:
; round up if residual remainder is >=$80
; high bit of residual remainder is already in carry
lda frac
adc #0
sta zsm_fracsteps
lda value
adc #0
sta zsm_steps
lda value+1
adc #0
sta zsm_steps+1
rts
~~~~~~~~~~~~~~~~~~~
For the slight speed optimization since the first five iterations are known to fail:
; the first five trial subtracts will always fail, so pre-shift by five
; start at destination and shifting right by three gives the same effect
stz value
stz frac
stx value+1
tya
lsr
ror value+1
ror value
lsr
ror value+1
ror value
lsr
ror value+1
ror value
sta rem
; Now do the remaining 19 (of 24) shifts, plus 1 to shift in the final result bit
ldx #20
Does this seem eggregious to anyone besides me?
Posted: Mon Nov 22, 2021 5:03 am
by ZeroByte
On 11/20/2021 at 1:20 PM, kliepatsch said:
I just tried a couple of things with Concerto and indeed, the jitter is not as bad as I remembered. When making music with Concerto a couple of weeks ago, the jitter was bothering me a lot. It was barely noticeable, but because I KNEW the jitter was real, it kept distracting me all the time, so I had to adapt the song tempo to some integer tick count. And thinking that at 60 Hz, jitter could be twice as bad, it simply wouldn't be fun to make non-60 Hz music. Just playing it back would probably not be as bad.
Well, since last night was "algorithm drag racing" I didn't actually implement my HZ conversion in the player, but tonight I did, and it works like a champ. I haven't made a "setspeed" function yet, but using Box16 to poke new speed values directly into RAM, it works like a champ. Once I make a function, I guess I should make a speed change feature in my Sonic demo where you can press up/dn arrow keys, and Sonic will speed up and so will the music. That'd be kind of cool.
Does this seem eggregious to anyone besides me?
Posted: Mon Nov 22, 2021 5:42 am
by Scott Robison
On 11/20/2021 at 9:35 PM, ZeroByte said:
I just did a comparison between my algorithm (bespoke),
@BruceMcF's General algorithm, and one that
@Scott Robison proposed on Discord (Galaxybrain):
Galaxybrain took my algorithm and enhanced it by using clever byte-order manipulation to cut down the number of shifts required to perform the computation.
It occurs to me I gave you generic 6502 code. Did you make it 65C02? Could save a few more bytes...
Does this seem eggregious to anyone besides me?
Posted: Mon Nov 22, 2021 9:20 am
by paulscottrobson
You could approximate. The tick range isn't going to be that large, I wouldn't have thought. 1/60 (0.016667) is pretty close to 1/16 (0.015625), better done here obviously as 4/256. The error is about 6.5%.
This is a bit high, high enough to be noticeable perhaps, so you could have a fudge factor or factors to add based around a variety of tick rates, given the limited range of tick in practice. You could knock up some numbers in a spreadsheet and get it near enough for practical purposes.
It's even debatable whether there's any point in representing it as a fixed point fraction, I don't think many people are going to be able to tell a difference of 1-2% in tempo.
Does this seem eggregious to anyone besides me?
Posted: Mon Nov 22, 2021 3:21 pm
by ZeroByte
On 11/22/2021 at 3:20 AM, paulscottrobson said:
I don't think many people are going to be able to tell a difference of 1-2% in tempo.
You're definitely correct here. From playing around with the playback speed functionality, even at lowly 60Hz resolution, you don't hear much difference in tempo when modifying the fractional step in amounts less than 0x08.
I think the accuracy might come into play a little more in programs that are trying to keep music and something else synchronized - over time, even very small error rates accumulate into noticeable quantity. Obviously I'm not trying to address this, as 8 fractional bits is not really "accurate" for such tasks either.
Does this seem eggregious to anyone besides me?
Posted: Mon Nov 22, 2021 4:12 pm
by Ed Minchau
On 11/22/2021 at 2:20 AM, paulscottrobson said:
You could approximate. The tick range isn't going to be that large, I wouldn't have thought. 1/60 (0.016667) is pretty close to 1/16 (0.015625), better done here obviously as 4/256. The error is about 6.5%.
This is a bit high, high enough to be noticeable perhaps, so you could have a fudge factor or factors to add based around a variety of tick rates, given the limited range of tick in practice. You could knock up some numbers in a spreadsheet and get it near enough for practical purposes.
It's even debatable whether there's any point in representing it as a fixed point fraction, I don't think many people are going to be able to tell a difference of 1-2% in tempo.
You mean 1/64 rather than 1/16, but yeah, this is basically what I was saying (my idea works out to x/64+ x/1024 ~ x/60). Your idea works out to a total of 12 lines of code, just 6 copies of
LSR highbyte
ROR lobyte
So that's 24 bytes, but it could be even more efficient:
LDA lobyte
LSR highbyte
ROR A
LSR highbyte
ROR A
LSR highbyte
ROR A
LSR highbyte
ROR A
LSR highbyte
ROR A
LSR highbyte
ROR A
STA lobyte
that's 22 bytes. Then, to add 1/1024 of the original to the total and get a little closer to 1/60,
LDX highbyte
STX highbyte2
LSR highbyte2
ROR A
LSR highbyte2
ROR A
LSR highbyte2
ROR A
LSR highbyte2
ROR A
CLC
ADC lobyte
STA lobyte
LDA highbyte2
ADC highbyte
STA highbyte
This gets within 0.4% of 1/60 and totals 49 bytes.
Does this seem eggregious to anyone besides me?
Posted: Mon Nov 22, 2021 8:48 pm
by Fabio
maybe you should use the fact that one full bye of movement is free
i'll try this
LDA highb-inp
STA midbyte2
STZ highbyte2 ; because 16:8 is a 24 bit number
lda lobyte-inp
ldy #2
:loop
ASL A
ROL midbyte2
ROL highbyte2
LSR highb-inp
ROR lobyte-inp
DEY
BNE :loop ;given priority on code compactness
CLC
ADC lobyte-inp
STA lobyte2
LDA highb-inp
ADC midbyte2
STA midbyte2
TYA ; same as LDA #0 because now Y is zero
ADC highbyte2
STA highbyte2
RTS
Does this seem eggregious to anyone besides me?
Posted: Tue Nov 23, 2021 12:53 am
by BruceMcF
On 11/22/2021 at 4:20 AM, paulscottrobson said:
You could approximate. The tick range isn't going to be that large, I wouldn't have thought. 1/60 (0.016667) is pretty close to 1/16 (0.015625), better done here obviously as 4/256. The error is about 6.5%.
This is a bit high, high enough to be noticeable perhaps, so you could have a fudge factor or factors to add based around a variety of tick rates, given the limited range of tick in practice. You could knock up some numbers in a spreadsheet and get it near enough for practical purposes.
It's even debatable whether there's any point in representing it as a fixed point fraction, I don't think many people are going to be able to tell a difference of 1-2% in tempo.
1/64 + 1/1024 = 0.0166015625, which is about 0.3% off. That is 4*(1/256) and (1/256)/4.
Does this seem eggregious to anyone besides me?
Posted: Tue Nov 23, 2021 9:35 am
by paulscottrobson
On 11/23/2021 at 12:53 AM, BruceMcF said:
1/64 + 1/1024 = 0.0166015625, which is about 0.3% off. That is 4*(1/256) and (1/256)/4.
As it's a 16 bit value you could probably just divide the upper byte by 1024, which would be >> 8x >> 2 which would be quicker and only marginally less accurate. It might be quicker to decompose the value into 256.a+b , apply your calculation and simplify it. But I've got to go for a flu jab in a minute
?
Does this seem eggregious to anyone besides me?
Posted: Tue Nov 23, 2021 11:43 am
by Ed Minchau
On 11/22/2021 at 5:53 PM, BruceMcF said:
1/64 + 1/1024 = 0.0166015625, which is about 0.3% off. That is 4*(1/256) and (1/256)/4.
That's what my little algorithm above does. The result isn't 16.8, but can be shifted there.