Does this seem eggregious to anyone besides me?

BruceMcF · Post by **BruceMcF** » Sun Nov 21, 2021 12:14 pm

Putting the general 16.8 / #60 -> 16.8 algorithm into the format of the original ... but using the new API scratch register space, on the theory that a general set-up process may be using any of r0-r10 for holding values used in the process.

As with the original use of r0-r2, this shouldn't be placed in an interrupt routine.

I also noticed that since the remainder is not the actual remainder, but the residual remainder after calculating the /256th fractional part, moving the loop indexing and exit saves two bytes as the "rol rem" required for the first 24 iterations can be used in the final iteration to get the high bit of the residual remainder into the carry flag for rounding the final result.

A third byte may be saved by omitting the "clc" at the start the loop, which is a bit which flows through but is not part of the final result. If carry could be set or clear at the beginning of the process, having this garbage bit can confuse debugging, but it doesn't affect the result.

On the other hand, the "*32" pre-shift to speed up the process, because when dividing by 60, the first five iterations are known to fail the test subtraction, adds around 11 bytes. The speed optimization is omitting one byte from the shifting, replacing five left shifts with an equivalent three right shits, unrolling the loop, and being able to use the register for the shift.

.proc calculate_tick_rate: near

           ; X/Y = tick rate (Hz) - divide by 60 and store to zsm_steps

           ; use the ZP variable as tmp space

           value := r11

           frac := r12

           rem := r13

           stx value

           sty value+1

           stz frac

           stz rem

           ; the first five trial subtracts will always fail ... for space optimization, just let them

           ldx #25        ; 24 trial subtracts, plus 1 partial loop to shift the final result bit in.

            ; may be omitted

            clc                  ; this bit will be shifted into bottom of residual remainder at end

                                   ; avoiding a possible garbage bit floating through makes it easier to debug

loop:

           ; Shift dividend one bit left, then into remainder byte

            ; In iterations 1-24, the remainder is prepared for next trial subtract

            ; In iterations 2-25, the 24 result bits are shifted in to replace the value.

            rol frac

            rol value

            rol value+1

            rol rem ; in last iteration, residual remainder high bit is in carry

            dex

            beq endloop

            lda rem

            sec

            sbc #60

            bcc loop

            sta rem

            bra loop

endloop:

            ; round up if residual remainder is >=$80

            ; high bit of residual remainder is already in carry

            lda frac

            adc #0

           sta zsm_fracsteps

           lda value

           adc #0

           sta zsm_steps

           lda value+1

           adc #0

           sta zsm_steps+1

           rts

~~~~~~~~~~~~~~~~~~~

For the slight speed optimization since the first five iterations are known to fail:

            ; the first five trial subtracts will always fail, so pre-shift by five

            ; start at destination and shifting right by three gives the same effect

           stz value

           stz frac

            stx value+1

            tya

            lsr

           ror value+1

           ror value

            lsr

           ror value+1

           ror value

            lsr

           ror value+1

           ror value

            sta rem

           ; Now do the remaining 19 (of 24) shifts, plus 1 to shift in the final result bit

           ldx #20

ZeroByte · Post by **ZeroByte** » Mon Nov 22, 2021 5:03 am

On 11/20/2021 at 1:20 PM, kliepatsch said:

I just tried a couple of things with Concerto and indeed, the jitter is not as bad as I remembered. When making music with Concerto a couple of weeks ago, the jitter was bothering me a lot. It was barely noticeable, but because I KNEW the jitter was real, it kept distracting me all the time, so I had to adapt the song tempo to some integer tick count. And thinking that at 60 Hz, jitter could be twice as bad, it simply wouldn't be fun to make non-60 Hz music. Just playing it back would probably not be as bad.

Well, since last night was "algorithm drag racing" I didn't actually implement my HZ conversion in the player, but tonight I did, and it works like a champ. I haven't made a "setspeed" function yet, but using Box16 to poke new speed values directly into RAM, it works like a champ. Once I make a function, I guess I should make a speed change feature in my Sonic demo where you can press up/dn arrow keys, and Sonic will speed up and so will the music. That'd be kind of cool.

Scott Robison · Post by **Scott Robison** » Mon Nov 22, 2021 5:42 am

On 11/20/2021 at 9:35 PM, ZeroByte said:

I just did a comparison between my algorithm (bespoke), @BruceMcF's General algorithm, and one that @Scott Robison proposed on Discord (Galaxybrain):

Galaxybrain took my algorithm and enhanced it by using clever byte-order manipulation to cut down the number of shifts required to perform the computation.

It occurs to me I gave you generic 6502 code. Did you make it 65C02? Could save a few more bytes...

paulscottrobson · Post by **paulscottrobson** » Mon Nov 22, 2021 9:20 am

You could approximate. The tick range isn't going to be that large, I wouldn't have thought. 1/60 (0.016667) is pretty close to 1/16 (0.015625), better done here obviously as 4/256. The error is about 6.5%.

This is a bit high, high enough to be noticeable perhaps, so you could have a fudge factor or factors to add based around a variety of tick rates, given the limited range of tick in practice. You could knock up some numbers in a spreadsheet and get it near enough for practical purposes.

It's even debatable whether there's any point in representing it as a fixed point fraction, I don't think many people are going to be able to tell a difference of 1-2% in tempo.

ZeroByte · Post by **ZeroByte** » Mon Nov 22, 2021 3:21 pm

On 11/22/2021 at 3:20 AM, paulscottrobson said:

I don't think many people are going to be able to tell a difference of 1-2% in tempo.

You're definitely correct here. From playing around with the playback speed functionality, even at lowly 60Hz resolution, you don't hear much difference in tempo when modifying the fractional step in amounts less than 0x08.

I think the accuracy might come into play a little more in programs that are trying to keep music and something else synchronized - over time, even very small error rates accumulate into noticeable quantity. Obviously I'm not trying to address this, as 8 fractional bits is not really "accurate" for such tasks either.

Ed Minchau · Post by **Ed Minchau** » Mon Nov 22, 2021 4:12 pm

On 11/22/2021 at 2:20 AM, paulscottrobson said:

You could approximate. The tick range isn't going to be that large, I wouldn't have thought. 1/60 (0.016667) is pretty close to 1/16 (0.015625), better done here obviously as 4/256. The error is about 6.5%.

This is a bit high, high enough to be noticeable perhaps, so you could have a fudge factor or factors to add based around a variety of tick rates, given the limited range of tick in practice. You could knock up some numbers in a spreadsheet and get it near enough for practical purposes.

It's even debatable whether there's any point in representing it as a fixed point fraction, I don't think many people are going to be able to tell a difference of 1-2% in tempo.

You mean 1/64 rather than 1/16, but yeah, this is basically what I was saying (my idea works out to x/64+ x/1024 ~ x/60). Your idea works out to a total of 12 lines of code, just 6 copies of

LSR highbyte

ROR lobyte

So that's 24 bytes, but it could be even more efficient:

LDA lobyte

LSR highbyte

ROR A

LSR highbyte

ROR A

LSR highbyte

ROR A

LSR highbyte

ROR A

LSR highbyte

ROR A

LSR highbyte

ROR A

STA lobyte

that's 22 bytes. Then, to add 1/1024 of the original to the total and get a little closer to 1/60,

LDX highbyte

STX highbyte2

LSR highbyte2

ROR A

LSR highbyte2

ROR A

LSR highbyte2

ROR A

LSR highbyte2

ROR A

CLC

ADC lobyte

STA lobyte

LDA highbyte2

ADC highbyte

STA highbyte

This gets within 0.4% of 1/60 and totals 49 bytes.

Fabio · Post by **Fabio** » Mon Nov 22, 2021 8:48 pm

maybe you should use the fact that one full bye of movement is free

i'll try this

LDA highb-inp

STA midbyte2

STZ highbyte2             ; because 16:8 is a 24 bit number

lda lobyte-inp

ldy #2

:loop

ASL A

ROL midbyte2

ROL highbyte2

LSR highb-inp

ROR lobyte-inp

DEY

BNE :loop ;given priority on code compactness

CLC

ADC lobyte-inp

STA lobyte2

LDA highb-inp

ADC midbyte2

STA midbyte2

TYA    ; same as LDA #0 because now Y is zero

ADC highbyte2

STA highbyte2

RTS

BruceMcF · Post by **BruceMcF** » Tue Nov 23, 2021 12:53 am

On 11/22/2021 at 4:20 AM, paulscottrobson said:

You could approximate. The tick range isn't going to be that large, I wouldn't have thought. 1/60 (0.016667) is pretty close to 1/16 (0.015625), better done here obviously as 4/256. The error is about 6.5%.

This is a bit high, high enough to be noticeable perhaps, so you could have a fudge factor or factors to add based around a variety of tick rates, given the limited range of tick in practice. You could knock up some numbers in a spreadsheet and get it near enough for practical purposes.

It's even debatable whether there's any point in representing it as a fixed point fraction, I don't think many people are going to be able to tell a difference of 1-2% in tempo.

1/64 + 1/1024 = 0.0166015625, which is about 0.3% off. That is 4*(1/256) and (1/256)/4.

paulscottrobson · Post by **paulscottrobson** » Tue Nov 23, 2021 9:35 am

On 11/23/2021 at 12:53 AM, BruceMcF said:

1/64 + 1/1024 = 0.0166015625, which is about 0.3% off. That is 4*(1/256) and (1/256)/4.

As it's a 16 bit value you could probably just divide the upper byte by 1024, which would be >> 8x >> 2 which would be quicker and only marginally less accurate. It might be quicker to decompose the value into 256.a+b , apply your calculation and simplify it. But I've got to go for a flu jab in a minute ?

Ed Minchau · Post by **Ed Minchau** » Tue Nov 23, 2021 11:43 am

On 11/22/2021 at 5:53 PM, BruceMcF said:

1/64 + 1/1024 = 0.0166015625, which is about 0.3% off. That is 4*(1/256) and (1/256)/4.

That's what my little algorithm above does. The result isn't 16.8, but can be shifted there.