The code is a horrible, unpresentable mishmash of magic numbers and macros, but I'll do my best...
This is the main wait loop which just thrashes and waits for the vbl interrupt to set a flag
Quote
stz cycle_counterL
stz cycle_counterH
stz vsync_trigger
:
inc cycle_counterL
bne @carry
inc cycle_counterH ; (assume branch is not taken because it's a small error and that's ok)
@carry:
lda vsync_trigger
beq :- ; thrash until we're in a vblank
This just waits for vsync_trigger to be set (by the interrupt) and counts up a cycle_counter value while it does so. From the emulator's own cycle counter, I can see that this loop is worth 14 cycles, so each tick of cycle_counter here is actually 14 cycles.
Quote
; Max cycles per frame is 133,462, or 9533 thrashes around this loop
Movw arith_param_a, cycle_counter ; Copies one 16-bit ZP value to another
Stow arith_param_b, 37 ; Stores a 16-bit value into a ZP location. 37 is the high byte of 9533, scaling the CPU usage to 0..255 after division
jsr div_u16 ; Perform the division. Result is stored back in arith_param_a
So from there we know that the most amount of time we can spend in this loop is 9533 loops. If we divide our cycle counter by 37 (You'll need to find your own divide routine somewhere) then we end up with a value between 0 and 255 in the low byte of the division result.
Next up, we need to remap that byte to a 0..100 percentage value, which is:
Quote
; 0 == max CPU usage, 255 == idle, We want this inverted.
; Due to rounding errors, if the high byte of the result is > 0, then we should consider that idle.
lda arith_param_a_H
beq :+
lda #$ff
sta arith_param_a_L
:
lda arith_param_a_L
eor #$ff
sta arith_param_a_H ; Put result in high byte, shifting it up 8-bits
stz arith_param_a_L ; Put 0 in the low byte
; The cpu usage is now a value from 0..65280. If we divide by 652, we will have remapped
; the cpu usage value from 0..100, which can be displayed as a percentage
Stow arith_param_b, 652
jsr div_u16
lda arith_param_a_L
sta cpu_usage ; Store the single-byte result somewhere
There's probably cleaner or quicker ways to do this without the two divisions, but it seems to work well enough for my purposes.
I should probably calculate the number of cycles burned by calculating the CPU usage and subtract that from the result
?