Page 3 of 5

Does this seem eggregious to anyone besides me?

Posted: Sun Nov 21, 2021 12:14 pm
by BruceMcF

Putting the general 16.8 / #60 -> 16.8 algorithm into the format of the original ... but using the new API scratch register space, on the theory that a general set-up process may be using any of r0-r10 for holding values used in the process.

As with the original use of r0-r2, this shouldn't be placed in an interrupt routine.

I also noticed that since the remainder is not the actual remainder, but the residual remainder after calculating the /256th fractional part, moving the loop indexing and exit saves two bytes as the "rol rem" required for the first 24 iterations can be used in the final iteration to get the high bit of the residual remainder into the carry flag for rounding the final result.

A third byte may be saved by omitting the "clc" at the start the loop, which is a bit which flows through but is not part of the final result. If carry could be set or clear at the beginning of the process, having this garbage bit can confuse debugging, but it doesn't affect the result.

On the other hand, the "*32" pre-shift to speed up the process, because when dividing by 60, the first five iterations are known to fail the test subtraction, adds around 11 bytes. The speed optimization is omitting one byte from the shifting, replacing five left shifts with an equivalent three right shits, unrolling the loop, and being able to use the register for the shift.

.proc calculate_tick_rate: near

            ; X/Y = tick rate (Hz) - divide by 60 and store to zsm_steps

            ; use the ZP variable as tmp space



            value := r11

            frac := r12

            rem := r13



            stx value

            sty value+1

            stz frac

            stz rem

            ; the first five trial subtracts will always fail ... for space optimization, just let them

            ldx #25        ; 24 trial subtracts, plus 1 partial loop to shift the final result bit in.



            ; may be omitted

            clc                  ; this bit will be shifted into bottom of residual remainder at end

                                   ; avoiding a possible garbage bit floating through makes it easier to debug



loop:

            ; Shift dividend one bit left, then into remainder byte

            ; In iterations 1-24, the remainder is prepared for next trial subtract

            ; In iterations 2-25, the 24 result bits are shifted in to replace the value.

            rol frac

            rol value

            rol value+1

            rol rem ; in last iteration, residual remainder high bit is in carry

            dex

            beq endloop

            lda rem

            sec

            sbc #60

            bcc loop

            sta rem

            bra loop



endloop:

            ; round up if residual remainder is >=$80

            ; high bit of residual remainder is already in carry

            lda frac

            adc #0

            sta zsm_fracsteps

            lda value

            adc #0

            sta zsm_steps

            lda value+1

            adc #0

            sta zsm_steps+1

            rts

~~~~~~~~~~~~~~~~~~~

For the slight speed optimization since the first five iterations are known to fail:

            ; the first five trial subtracts will always fail, so pre-shift by five

            ; start at destination and shifting right by three gives the same effect

            stz value

            stz frac

            stx value+1

            tya

            lsr

            ror value+1

            ror value

            lsr

            ror value+1

            ror value

            lsr

            ror value+1

            ror value

            sta rem



            ; Now do the remaining 19 (of 24) shifts, plus 1 to shift in the final result bit

            ldx #20


Does this seem eggregious to anyone besides me?

Posted: Mon Nov 22, 2021 5:03 am
by ZeroByte


On 11/20/2021 at 1:20 PM, kliepatsch said:




I just tried a couple of things with Concerto and indeed, the jitter is not as bad as I remembered. When making music with Concerto a couple of weeks ago, the jitter was bothering me a lot. It was barely noticeable, but because I KNEW the jitter was real, it kept distracting me all the time, so I had to adapt the song tempo to some integer tick count. And thinking that at 60 Hz, jitter could be twice as bad, it simply wouldn't be fun to make non-60 Hz music. Just playing it back would probably not be as bad.



Well, since last night was "algorithm drag racing" I didn't actually implement my HZ conversion in the player, but tonight I did, and it works like a champ. I haven't made a "setspeed" function yet, but using Box16 to poke new speed values directly into RAM, it works like a champ. Once I make a function, I guess I should make a speed change feature in my Sonic demo where you can press up/dn arrow keys, and Sonic will speed up and so will the music. That'd be kind of cool.


Does this seem eggregious to anyone besides me?

Posted: Mon Nov 22, 2021 5:42 am
by Scott Robison


On 11/20/2021 at 9:35 PM, ZeroByte said:




I just did a comparison between my algorithm (bespoke), @BruceMcF's General algorithm, and one that @Scott Robison proposed on Discord (Galaxybrain):



Galaxybrain took my algorithm and enhanced it by using clever byte-order manipulation to cut down the number of shifts required to perform the computation.



It occurs to me I gave you generic 6502 code. Did you make it 65C02? Could save a few more bytes...


Does this seem eggregious to anyone besides me?

Posted: Mon Nov 22, 2021 9:20 am
by paulscottrobson

You could approximate. The tick range isn't going to be that large, I wouldn't have thought. 1/60 (0.016667) is pretty close to 1/16 (0.015625), better done here obviously as 4/256. The error is about 6.5%.

This is a bit high, high enough to be noticeable perhaps, so you could have a fudge factor or factors to add based around a variety of tick rates, given the limited range of tick in practice. You could knock up some numbers in a spreadsheet and get it near enough for practical purposes.

It's even debatable whether there's any point in representing it as a fixed point fraction, I don't think many people are going to be able to tell a difference of 1-2% in tempo.


Does this seem eggregious to anyone besides me?

Posted: Mon Nov 22, 2021 3:21 pm
by ZeroByte


On 11/22/2021 at 3:20 AM, paulscottrobson said:




I don't think many people are going to be able to tell a difference of 1-2% in tempo.



You're definitely correct here. From playing around with the playback speed functionality, even at lowly 60Hz resolution, you don't hear much difference in tempo when modifying the fractional step in amounts less than 0x08.

I think the accuracy might come into play a little more in programs that are trying to keep music and something else synchronized - over time, even very small error rates accumulate into noticeable quantity. Obviously I'm not trying to address this, as 8 fractional bits is not really "accurate" for such tasks either.


Does this seem eggregious to anyone besides me?

Posted: Mon Nov 22, 2021 4:12 pm
by Ed Minchau


On 11/22/2021 at 2:20 AM, paulscottrobson said:




You could approximate. The tick range isn't going to be that large, I wouldn't have thought. 1/60 (0.016667) is pretty close to 1/16 (0.015625), better done here obviously as 4/256. The error is about 6.5%.



This is a bit high, high enough to be noticeable perhaps, so you could have a fudge factor or factors to add based around a variety of tick rates, given the limited range of tick in practice. You could knock up some numbers in a spreadsheet and get it near enough for practical purposes.



It's even debatable whether there's any point in representing it as a fixed point fraction, I don't think many people are going to be able to tell a difference of 1-2% in tempo.



You mean 1/64 rather than 1/16, but yeah, this is basically what I was saying (my idea works out to x/64+ x/1024 ~ x/60).  Your idea works out to a total of 12 lines of code, just 6 copies of

LSR highbyte

ROR lobyte

So that's 24 bytes, but it could be even more efficient:

LDA lobyte

LSR highbyte

ROR A

LSR highbyte

ROR A

LSR highbyte

ROR A

LSR highbyte

ROR A

LSR highbyte

ROR A

LSR highbyte

ROR A

STA lobyte

that's 22 bytes.  Then, to add 1/1024 of the original to the total and get a little closer to 1/60,

LDX highbyte

STX highbyte2

LSR highbyte2

ROR A

LSR highbyte2

ROR A

LSR highbyte2

ROR A

LSR highbyte2

ROR A

CLC

ADC lobyte

STA lobyte

LDA highbyte2

ADC highbyte

STA highbyte

This gets within 0.4% of 1/60 and totals 49 bytes.


Does this seem eggregious to anyone besides me?

Posted: Mon Nov 22, 2021 8:48 pm
by Fabio

maybe you should use the fact that one full bye of movement is free

i'll try this

 

LDA highb-inp

STA midbyte2

STZ highbyte2             ; because 16:8 is a 24 bit number

lda lobyte-inp

ldy   #2

:loop

ASL A

ROL midbyte2

ROL highbyte2

LSR highb-inp

ROR lobyte-inp

DEY

BNE :loop                   ;given priority on code compactness

CLC

ADC lobyte-inp

STA lobyte2

LDA highb-inp

ADC midbyte2

STA midbyte2   

TYA                ;     same as LDA #0 because now Y is zero

ADC highbyte2

STA highbyte2

RTS

 


Does this seem eggregious to anyone besides me?

Posted: Tue Nov 23, 2021 12:53 am
by BruceMcF


On 11/22/2021 at 4:20 AM, paulscottrobson said:




You could approximate. The tick range isn't going to be that large, I wouldn't have thought. 1/60 (0.016667) is pretty close to 1/16 (0.015625), better done here obviously as 4/256. The error is about 6.5%.



This is a bit high, high enough to be noticeable perhaps, so you could have a fudge factor or factors to add based around a variety of tick rates, given the limited range of tick in practice. You could knock up some numbers in a spreadsheet and get it near enough for practical purposes.



It's even debatable whether there's any point in representing it as a fixed point fraction, I don't think many people are going to be able to tell a difference of 1-2% in tempo.



1/64 + 1/1024 = 0.0166015625, which is about 0.3% off.  That is 4*(1/256) and (1/256)/4.


Does this seem eggregious to anyone besides me?

Posted: Tue Nov 23, 2021 9:35 am
by paulscottrobson


On 11/23/2021 at 12:53 AM, BruceMcF said:




1/64 + 1/1024 = 0.0166015625, which is about 0.3% off.  That is 4*(1/256) and (1/256)/4.



As it's a 16 bit value you could probably just divide the upper byte by 1024, which would be >> 8x >> 2 which would be quicker and only marginally less accurate. It might be quicker to decompose the value into 256.a+b , apply your calculation and simplify it. But I've got to go for a flu jab in a minute ?


Does this seem eggregious to anyone besides me?

Posted: Tue Nov 23, 2021 11:43 am
by Ed Minchau


On 11/22/2021 at 5:53 PM, BruceMcF said:




1/64 + 1/1024 = 0.0166015625, which is about 0.3% off.  That is 4*(1/256) and (1/256)/4.



That's what my little algorithm above does. The result isn't 16.8, but can be shifted there.