Page 2 of 2

Re: How to Use VERA FX "Line Helper"?

Posted: Wed Apr 10, 2024 6:26 pm
by hstubbs3
DragWx wrote: Wed Apr 10, 2024 6:00 pm
hstubbs3 wrote: Wed Apr 10, 2024 3:46 pm
The FX routines also pay no attention to the target being 4bit or 8bit... which could be fun, as using 4bit mode with BYTE increments against an 8bit target could allow you to overwrite either nibble, possibly just switching palette offset of the pixel, maybe with a given palette that means could lighten/darken or do color shifts just changing 1 nibble each byte... .
Yep! It's also nice when you just want to do general-purpose pixel plotting in 16-color bitmap mode; you don't need to grab a copy of the current byte (2 pixels) from VRAM before modifying just one pixel within. In addition to FX 4-bit mode, there's also FX Transparent Writes mode ($9F29.7), where writing a "0" results in no change to VRAM, which can also simplify blitting code. :D
The way I tried it, it did not do 4bit transparent writes AND byte reads...

The use case is blitting 4bit sprite data from somewhere in VRAM to 4bit bitmap layer, because I have >128 sprites to get onto screen at a time, so am blitting the remainder ....

If I set FX 4-bit mode but increment for DATA0 is 1 BYTE, 4bit is ignored... so even though I do have transparency write enabled...

LDA DATA0
LDA DATA0
LDA DATA0
LDA DATA0
STZ DATA1

will only actually do transparency write if the pair of 4bit pixels within the BYTE are both zero... because it is operating in 8bit mode, not 4bit mode...

If I set the DATA0 increment to nibble, I would get 4bit transparency writes still, with DATA1 increment being 4BYTE ?

Or would this just be waste of time and end up same as before -
set DATA0 increment to nibble...

LDA DATA0 ; pixel 0
LDA DATA0 ; pixel 1
LDA DATA0 ; pixel 2
LDA DATA0 ; pixel 3
LDA DATA0 ; pixel 4
LDA DATA0 ; pixel 5
LDA DATA0 ; pixel 6
LDA DATA0 ; pixel 7
STZ DATA1 ; write cache out


.....

I can 100% accept a limitation like transparency can only mask per byte in this case.. That can be managed via the assets involved....
or if really desperate alter the nibble mask writing the cache out ? ( which may make optimized code very weird and maybe not very general ).

https://www.youtube.com/watch?v=1TKdrVahM8g
https://github.com/hstubbs3/CommanderX1 ... ex_sprites
hex.PRG

press '9' should disable it from using sprites and you can really see it blitting its little heart out..
(is not optimized, there is a ton of overdraw as the tile sprites are 16x64 ... )

Re: How to Use VERA FX "Line Helper"?

Posted: Wed Apr 10, 2024 7:28 pm
by DragWx
According to Verilog, FX docs:

When using the 32-bit cache...

If FX 4-Bit Mode is ON, Transparent Writes mode will function on each zero-nybble of the 32-bit cache when you attempt to write the cache to VRAM.

If FX 4-Bit Mode is OFF, Transparent Writes mode will function on each zero-byte of the cache instead.

If you'd like, you could disable FX 4-Bit Mode to load the cache with four reads, then enable FX 4-Bit Mode right before writing the cache to VRAM to still get the nybble-based transparency. Best case scenario, that's 7 memory accesses instead of the 9 you'd need if you just stayed in FX 4-Bit Mode, so it's still faster.

Re: How to Use VERA FX "Line Helper"?

Posted: Wed Apr 10, 2024 8:18 pm
by hstubbs3
DragWx wrote: Wed Apr 10, 2024 7:28 pm According to Verilog, FX docs:

When using the 32-bit cache...

If FX 4-Bit Mode is ON, Transparent Writes mode will function on each zero-nybble of the 32-bit cache when you attempt to write the cache to VRAM.

If FX 4-Bit Mode is OFF, Transparent Writes mode will function on each zero-byte of the cache instead.

If you'd like, you could disable FX 4-Bit Mode to load the cache with four reads, then enable FX 4-Bit Mode right before writing the cache to VRAM to still get the nybble-based transparency. Best case scenario, that's 7 memory accesses instead of the 9 you'd need if you just stayed in FX 4-Bit Mode, so it's still faster.
LDA #magic_enable_8bit ; 2 cycles
STA FX_CTRL ; 6 cycles
LDA DATA0
LDA DATA0
LDA DATA0
LDA DATA0 ; 4 cycles x 4 = +16 22 cycles..
LDA #magic_enable_4bit ;2 cycles 24
STA FX_CTRL ; 4 28
STZ DATA1 ; 4 32 cycles

vs not fiddling with it...

LDA DATA0
LDA DATA0
LDA DATA0
LDA DATA0 ; 4 cycles x 4 = +16
STZ DATA1 ; 4 cycles 20

is >50% cycles switching between... if my program wasn't already trying to do too much, I could see it being worth it, sure.

is not so bad, just very retro :D

Re: How to Use VERA FX "Line Helper"?

Posted: Thu Apr 11, 2024 12:55 am
by DragWx

Code: Select all

 LDA #magic_enable_8bit
 LDX #magic_enable_4bit

loop:
 STA FX_CTRL ;4, Disable FX 4-bit
 BIT DATA0   ;4, Read DATA0 but don't store it in A
 BIT DATA0   ;4
 BIT DATA0   ;4
 BIT DATA0   ;4
 STX FX_CTRL ;4, Enable FX 4-bit
 STZ DATA1   ;4, Write cache to VRAM with 4-bit transparency
;             = 28 cycles
This will save 4 cycles, just in case. :P

Edit: And then for comparison (for anyone else reading), the "normal" way for 4-bit mode (i.e., read DATA0 eight times and write to DATA1 once) is 36 cycles.

Re: How to Use VERA FX "Line Helper"?

Posted: Thu Apr 11, 2024 5:49 pm
by hstubbs3
DragWx wrote: Thu Apr 11, 2024 12:55 am

Code: Select all

 LDA #magic_enable_8bit
 LDX #magic_enable_4bit

loop:
 STA FX_CTRL ;4, Disable FX 4-bit
 BIT DATA0   ;4, Read DATA0 but don't store it in A
 BIT DATA0   ;4
 BIT DATA0   ;4
 BIT DATA0   ;4
 STX FX_CTRL ;4, Enable FX 4-bit
 STZ DATA1   ;4, Write cache to VRAM with 4-bit transparency
;             = 28 cycles
This will save 4 cycles, just in case. :P

Edit: And then for comparison (for anyone else reading), the "normal" way for 4-bit mode (i.e., read DATA0 eight times and write to DATA1 once) is 36 cycles.

:shock: < bows > Thanks for that. You even left me Y to use as loop counter. :D BIT is an instruction I have not paid enough attention to.

Re: How to Use VERA FX "Line Helper"?

Posted: Fri Apr 19, 2024 1:59 am
by russell-s-harper
Here's my progress to date: https://github.com/Russell-S-Harper/EXPLORE

Directory cx16-v2 has an example of dual 16-color screens, with swapping during the VBI, and 16-color line drawing routines, all using VERA. Even though it's written in C and implements clipping, it's still about 25% faster than the 256-color line drawing routines in TGI.

Thanks to hstubbs3 and DragWx for their assistance!