FX: Sum of several multiplications?

All aspects of programming on the Commander X16.
Post Reply
doslogo
Posts: 16
Joined: Fri Dec 20, 2024 4:26 pm

General - CX16 FX: Sum of several multiplications?

Post by doslogo »

So far I can only multiply two 16-bit values by setting the 32-bit cache directly, and then having VERA write the result to VRAM for me to fetch using the CPU.
The official documentation talks about the "accumulator" as some form of reset operation. I need addition of two multiplication results like (X*X)+(Y*Y). But there are no examples other than that single multiplication I have managed to do. How can one multiplication turn into multiple multiplications? How can their results be added before being written to VRAM?
Wavicle
Posts: 288
Joined: Sun Feb 21, 2021 2:40 am

Re: FX: Sum of several multiplications?

Post by Wavicle »

I hope that I get this right, I'm interpreting the Verilog code for VERA:

1. Reset the accumulator (read from $9F29, DCSEL=6; or write bit 7 of $9F2C, DSCEL=2)
2. Perform first multiply
3. Add that product to the accumulator (read from $9F2A, DCSEL=6; or write bit 6 of $9F2C, DCSEL=2)
4. Perform your second multiply
5. Add that product to the accumulator (read from $9f2A, DCSEL=6; or write bit 6 of $9F2C, DCSEL=2)

The result should now have the sum of your two multiplies.

The documentation definitely needs example code for this.
DragWx
Posts: 355
Joined: Tue Mar 07, 2023 9:07 pm

Re: FX: Sum of several multiplications?

Post by DragWx »

I checked as well, and I agree with Wavicle's interpretation. I also found some points to add.


1) The 32-bit cache is always fed to the multiplier. Any change to the cache (including changes due to Cache Fill Enable mode) is immediately reflected in the multiplier's output.

2) While Multiplier Enable is 1, anything that uses the 32-bit cache as "output" (such as Cache Write Enable) will instead receive the value (Multiplier_Output +/- Accumulator_Output).

3) When you "Accumulate", the new value of Accumulator_Output will be (Multiplier_Output +/- Accumulator_Output).

4) When you "Reset Accumulator", the new value of Accumulator_Output will be 0. Because of point (2), it's always a good idea to reset the accumulator at least once in your program before you use the multiplier even if you don't plan to use the accumulator, or else you may receive unexpected results.

5) The combination of points (2) and (3) means, you don't need to accumulate after your final multiplication, just output.

6) For "+/-" mentioned above, Subtract Enable being 0 results in "+", and 1 results in "-". I believe this change is immediate (I haven't tested).

7) Accumulate and Reset Accumulator are triggered as a one-shot. That is, they trigger once with each read of $9F2A[dcsel=6] and $9F29[dcsel=6] respectively, or each time you write a '1' to the respective bits of $9F2C[dcsel=2] (you don't need to reset those bits to '0' in between).


And now an undocumented(?) feature assuming I'm reading this schematic correctly: when you Accumulate with Multiplier Enabled set to 0, the accumulator will be initialized to the current contents of the 32-bit cache.


The FX multiplier/accumulator is a DSP block on the FPGA itself, rather than an implementation in Verilog, so it's tricky to know exactly how it works unless you have the documentation for the FPGA too.
doslogo
Posts: 16
Joined: Fri Dec 20, 2024 4:26 pm

Re: FX: Sum of several multiplications?

Post by doslogo »

Thank you Wavicle and DragWx! I now got a functional dot product subroutine that takes two vectors in 1.7.8 fixed point format, and returns a scalar as a 1.7.8 fixed point, not including the call overhead or setting the values, takes 161 cycles:

Code: Select all

ZP_MUL_X0 = $22			; Zero Page
ZP_MUL_X1 = $24
ZP_MUL_Y0 = $26
ZP_MUL_Y1 = $28

VRAM_MULT_X = $1F9B8		; Must be aligned to 4 bytes offset in VRAM!

	; A test call to the subroutine below...
	jsr _init_mul_once
	
	lda #>$500			; vector0 x = 5.0
	sta ZP_MUL_X0+1

	lda #>$500			; vector1 x = 5.0
	sta ZP_MUL_X1+1

	lda #>$C00			; vector0 y = 12.0
	sta ZP_MUL_Y0+1

	lda #>$100			; vector1 y = 1.0
	sta ZP_MUL_Y1+1

	stz ZP_MUL_X0			; Fractions stay zero
	stz ZP_MUL_X1
	stz ZP_MUL_Y0
	stz ZP_MUL_Y1

	jsr __dot

	.... Check result in ZP_MUL_X0 and ZP_MUL_X0+1, result should be = 37.0

;
; Does dot product ZP_MUL_X0*ZP_MUL_X1+ZP_MUL_Y0*ZP_MUL_Y1 and returns the >> 8 result in ZP_MUL_X0 and ZP_MUL_X0+1:
;
__dot:
	stz $9F25			; ADDRSEL 0
	lda #<VRAM_MULT_X
	sta $9F20			; Start at VRAM_MULT_X
	lda #>VRAM_MULT_X
	sta $9F21
	lda #^VRAM_MULT_X		; No increment (overwrite the first product with the second in VRAM)
	sta $9F22

	; Access cache directly
	lda #(6<<1)			; DCSEL=6		ADDRSEL=0
	sta $9F25

	; Zero the accumulator
	lda $9F29	;FX_CTRL

	; First multiply
	lda ZP_MUL_X0
	ldx ZP_MUL_X0+1
	ldy ZP_MUL_X1
	sta $9F29
	stx $9F2A
	sty $9F2B
	lda ZP_MUL_X1+1
	sta $9F2C

	lda #(2<<1)			; DCSEL=2		ADDRSEL=0
	sta $9F25

	lda #%01000000			; Cache Write Enable (writes cache to any of the address ports)  Addr1=0
	sta $9F29 ;FX_CTRL

	stz $9F23			; And multiply X0*X1, store 4 bytes result in VRAM_MULT_X

	; Access cache directly
	lda #(6<<1)			; DCSEL=6		ADDRSEL=0
	sta $9F25

	; Add product to the accumulator
	lda $9F2A

	; Second multiply
	lda ZP_MUL_Y0
	ldx ZP_MUL_Y0+1
	ldy ZP_MUL_Y1
	sta $9F29
	stx $9F2A
	sty $9F2B
	lda ZP_MUL_Y1+1
	sta $9F2C

	lda #(2<<1)			; DCSEL=2		ADDRSEL=0
	sta $9F25

	stz $9F23			; Store 4 bytes result in VRAM_MULT_X

	stz $9F29 ;FX_CTRL		; No more cache writes (normal access to VRAM)

	stz $9F25			; DCSEL 0 (End of multiply FX)
	lda #<(VRAM_MULT_X+1)		; Return the dot product as a 1.7.8 fixed point value!
	sta $9F20			; Start at VRAM_MULT_X next time
	lda #($10|^VRAM_MULT_X)		; Increment 1
	sta $9F22

	lda $9F23
	sta ZP_MUL_X0
	lda $9F23
	sta ZP_MUL_X0+1
	
	rts
	
; Once on init, enable multiplier:
_init_mul_once:
	lda #(2<<1)			; DCSEL 2
	sta $9F25

	lda #%00010000			; Multiplier Enable (we always want the result from the cache when written to VRAM and not the contents)
	sta $9F2C

	stz $9F29 ; FX_CTRL
	rts
Last edited by doslogo on Tue Jan 07, 2025 9:52 pm, edited 2 times in total.
DragWx
Posts: 355
Joined: Tue Mar 07, 2023 9:07 pm

Re: FX: Sum of several multiplications?

Post by DragWx »

I found a couple errors in your code:
  • It never actually touches Multiplier_Enable ($9F2C.4, DCSEL=2).
  • It mistakenly reads from $9F25 instead of $9F29 when trying to reset the accumulator, although it sets DCSEL correctly right beforehand.
  • It mistakenly writes to $9F25 instead of $9F29 when trying to enable (and later disable) Cache_Write_Enable, although it sets DCSEL correctly right beforehand.
And then a note: There's no need to write the product to VRAM before accumulating, simply accumulating is enough.
doslogo
Posts: 16
Joined: Fri Dec 20, 2024 4:26 pm

Re: FX: Sum of several multiplications?

Post by doslogo »

DragWx wrote: Tue Jan 07, 2025 1:37 am I found a couple errors in your code:
  • It never actually touches Multiplier_Enable ($9F2C.4, DCSEL=2).
  • It mistakenly reads from $9F25 instead of $9F29 when trying to reset the accumulator, although it sets DCSEL correctly right beforehand.
  • It mistakenly writes to $9F25 instead of $9F29 when trying to enable (and later disable) Cache_Write_Enable, although it sets DCSEL correctly right beforehand.
And then a note: There's no need to write the product to VRAM before accumulating, simply accumulating is enough.
Yes, I messed up my own defines when I manually wrote the code for the forum. I have edited my previous post's code. I added an init_once subroutine that is similar to how my game is dealing with initialization.
Other than me missing the "stz VERA_FX_CTRL ; $9F29 (mainly to reset Addr1 Mode to 0)" from the example in the documentation in the init subroutine I just wrote, I think it is functional.

Do I need to set bit 6 (accumulate) in $9F2C DCSEL=2 on initialization? I don't think I am, since things are working.

Yes, by not going back to DCSEL=2, staying in DCSEL=6, adding product to the accumulator did the trick! After going back to DCSEL=2, write cache enable bit must be turned on for the two added multiplications to be stored to VRAM. This is a few cycles saved, so awesome! Thank you DragWx!
DragWx
Posts: 355
Joined: Tue Mar 07, 2023 9:07 pm

Re: FX: Sum of several multiplications?

Post by DragWx »

You're welcome! :D
doslogo wrote: Tue Jan 07, 2025 9:58 pm Do I need to set bit 6 (accumulate) in $9F2C DCSEL=2 on initialization? I don't think I am, since things are working.
You don't need to, because writing a "1" to bit 6 in $9F2C[DCSEL=2] does the exact same thing as reading $9F2A[DCSEL=6], you just use whichever one is convenient for your code.
doslogo
Posts: 16
Joined: Fri Dec 20, 2024 4:26 pm

Re: FX: Sum of several multiplications?

Post by doslogo »

DragWx wrote: Wed Jan 08, 2025 1:19 am You're welcome! :D
doslogo wrote: Tue Jan 07, 2025 9:58 pm Do I need to set bit 6 (accumulate) in $9F2C DCSEL=2 on initialization? I don't think I am, since things are working.
You don't need to, because writing a "1" to bit 6 in $9F2C[DCSEL=2] does the exact same thing as reading $9F2A[DCSEL=6], you just use whichever one is convenient for your code.
It is good that there are two ways to get that.

So the thing to look out for when using the multiplier is to try to minimize changing the DCSEL (it needs to go from 6 to 2 to 6 to 2 and reset to 0), and also make sure any IRQ_LINE won't change the state in the middle of a multiply, since interrupts must not be turned off or a scanline will be missed (which is important to my game), and lastly, the vertical blank interrupt can indeed be triggered in a middle of a multiply (if the frame took to long). It is quite a lot, but manageable .
Post Reply