Page 1 of 2
VERA FX
Posted: Wed Aug 23, 2023 7:52 pm
by DragWx
I have a silly question, was the "FX" extension for the VERA named after the SNES's "Super FX" chip? I noticed the similarities in the feature set and thought it might not be a coincidence.
Also, it's nice to finally see what this is all about after several commits on GitHub had been referring to it.
For anyone who hasn't seen, the VERA's new "FX" features provide some helper functions to accelerate drawing arbitrary lines, polygons, fills, and some simple bitmap rotation and scaling, but they're all
passive functions, meaning it's up to the CPU to actually poke the VERA for each byte or pixel that needs to be written to VRAM. It's pretty neat and it seems like it'd be a fun thing to play with at some point, and I can't wait to see what people do with it.
Re: VERA FX
Posted: Thu Aug 24, 2023 1:22 am
by ahenry3068
I'm intrigued. Is this ready for FLASHING to Hardware yet ?
Re: VERA FX
Posted: Thu Aug 24, 2023 2:02 am
by DragWx
I'm not sure, it looks like there's a pre-release build available on the GitHub page, but I think it might still need some more time for an "official" release or else we would've seen an announcement here.
Re: VERA FX
Posted: Thu Aug 24, 2023 2:19 am
by Ender
Well, it's in R44 of the emulator, and I've heard of people that already have it on their hardware without issues, so I would say it's probably safe to flash it.
Re: VERA FX
Posted: Fri Aug 25, 2023 2:36 am
by FearLabs
I flashed it on to my hardware, no issues
Re: VERA FX
Posted: Fri Aug 25, 2023 10:37 pm
by Guybrush
I have a (stupid?) question for the people who worked on VERA FX, so here it goes:
Why is there no option to read the entire 32-bit cache in one read operation, since there is an option to write it in one operation?
It would allow for near-DMA speeds when copying data within the video RAM. LDA DATA0/1 is 4 cycles, STA DATA0/1, #val is 5 cycles, which would make it possible to copy 4 bytes in just 9 cycles not accounting for loops, but let's add 3 more cycles for that, which makes it 12 cycles, 3 cycles per byte. That's pretty damn fast, and still totally under CPU control unlike traditional DMA.
32-bit cache write could stay just as it is right now, with nibble mask and everything, only a read mode would need to be added where all 4 bytes of the 32-bit cache would be loaded (they're already read from memory anyway). As for what would actually be returned to the CPU by the read operation, it could be the first byte or whatever.
Re: VERA FX
Posted: Fri Aug 25, 2023 11:25 pm
by Ed Minchau
If you're using VERA channel 0 or 1, that STA is also 4 cycles, since it's going to an absolute address.
Re: VERA FX
Posted: Sat Aug 26, 2023 12:10 am
by Guybrush
Ed Minchau wrote: ↑Fri Aug 25, 2023 11:25 pm
If you're using VERA channel 0 or 1, that STA is also 4 cycles, since it's going to an absolute address.
You're absolutely right, I was probably thinking of standard loops and indexed addressing. That means that the simple non-unrolled loop would be 11 cycles per 4 bytes, which is even better
Re: VERA FX
Posted: Wed Aug 30, 2023 6:39 am
by Ed Minchau
I think I found a major bug in the multiplier. I made a test program:
Code: Select all
bra test
testresult:
.word $0000
.word $0000
testinput:
.word $7fff
.word $4000
test:
ldx #$00 ;setting vera channel 0 and 1 to 1df00
ldy #$df
lda #$00
sta $9f25
lda #$11 ;step size 1 for channel 0
stx $9f20
sty $9f21
sta $9f22
lda #$01
sta $9f25
lda #$31 ;step size 4 for channel 1
stx $9f20
sty $9f21
sta $9f22
lda #$0c ;DCSEL = 6
sta $9f25
ldy #$00 ;copy test input into cache
:lda testinput,y
sta $9f29,y
iny
cpy #$04
bne :-
lda #$04 ;DCSEL = 2
sta $9f25
lda #$40 ;enable cache write, addr-1 mode normal
sta $9f29
lda #$10 ;multiply
sta $9f2c
sta $9f24 ;send result to VRAM
stz $9f29 ;disable cache write
stz $9f25 ;DCSEL = 0
ldy #$00 ;copy result to RAM
:lda $9f23
sta testresult,y
iny
cpy #$04
bne :-
rts
It took a bit of code wrasslin' to figure out to shut off cache write when I was done. Anyhow, My results were unexpected. it looks like bits 16-19 of the result are always 0.
For example,
Code: Select all
inputs expected output actual output
7fff 4000 1fffc000 1ff0c000
7fff 2000 0fffe000 0ff0e000
5555 7fff 2aaa2aab 2aa02aab
Re: VERA FX
Posted: Wed Aug 30, 2023 7:25 am
by Ed Minchau
I've also found a strange problem with the bits 7:0. These are the results of sequential runs, without resetting testresult to 00000000 each time; is it a problem with my test program? Or with the multiplier?
Code: Select all
starting with testresult set to 00000000 before the first run
inputs expected actual
5555 7fff 2aaa2aab 2aa02a00
5555 7fff 2aaa2aab 2aa02aab
5555 5555 1c718e39 1c708eab
5555 5555 1c718e39 1c708e39
7fff 5555 2aaa2aab 2aa02a39
7fff 5555 2aaa2aab 2aa02aab