New demo uploaded: Wolfenstein 3D - raycasting demo with textures

All aspects of programming on the Commander X16.
Ed Minchau
Posts: 503
Joined: Sat Jul 11, 2020 3:30 pm

New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Post by Ed Minchau »


@Jeffrey I'm not just looking at your source code, I'm also looking at the compiled code with the META/L editor.  There is a lot of room for improvement in that code; although macros make things easier to read and C makes it easier to write, the compiler produces much longer code than is necessary.  For instance, there's a lot of places where the code reads

LDX #00

LDA #22

STAA VERA_DAT_0

LDX #00

LDA #22

STAA VERA_DAT_0

over and over again.  The second and subsequent LDX and LDA instructions aren't necessary because the value being sent to VERA isn't changing, and each adds two cycles.  The sequences like that aren't all the same, some of them include unnecessary LDX#00 and LDY#08 and LDA ($22),Y instructions over and over, or some similar things.  A few cycles here, a few cycles there, repeated hundreds of times per column of pixels and it really adds up.  If this was all optimized assembly code, then a target FPS of 15 to match the original is definitely achievable.

 

Jeffrey
Posts: 62
Joined: Fri Feb 19, 2021 9:46 am

New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Post by Jeffrey »



1 hour ago, Ed Minchau said:




@Jeffrey I'm not just looking at your source code, I'm also looking at the compiled code with the META/L editor.  There is a lot of room for improvement in that code; although macros make things easier to read and C makes it easier to write, the compiler produces much longer code than is necessary.  For instance, there's a lot of places where the code reads



LDX #00



LDA #22



STAA VERA_DAT_0



LDX #00



LDA #22



STAA VERA_DAT_0



over and over again.  The second and subsequent LDX and LDA instructions aren't necessary because the value being sent to VERA isn't changing, and each adds two cycles.  The sequences like that aren't all the same, some of them include unnecessary LDX#00 and LDY#08 and LDA ($22),Y instructions over and over, or some similar things.  A few cycles here, a few cycles there, repeated hundreds of times per column of pixels and it really adds up.  If this was all optimized assembly code, then a target FPS of 15 to match the original is definitely achievable.



 



I think you are looking at the compiled C code. I have a assembly version in the .asm which contains the asm version. The c version is just for testing purposes.

Edit: in fact: I call the asm version from c in different ways (for easier debugging). And some of the c-code I didn't convert yet to assembly because its not very performance critical atm (like drawing the menu once or clearing the render part once).

Ed Minchau
Posts: 503
Joined: Sat Jul 11, 2020 3:30 pm

New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Post by Ed Minchau »



2 hours ago, Jeffrey said:




I think you are looking at the compiled C code. I have a assembly version in the .asm which contains the asm version. The c version is just for testing purposes.



Edit: in fact: I call the asm version from c in different ways (for easier debugging). And some of the c-code I didn't convert yet to assembly because its not very performance critical atm (like drawing the menu once or clearing the render part once).



Yeah that makes sense. I'll keep digging. BTW I generated the lookup tables for interpolation last night, that was the easy part. I should have the code done in a day or so.

Ed Minchau
Posts: 503
Joined: Sat Jul 11, 2020 3:30 pm

New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Post by Ed Minchau »



On 3/28/2021 at 12:06 AM, Ed Minchau said:




Anyhow, I'll generate the data tables and subroutines needed for the interpolation and will post them here soon.



OK, so I got the data tables generated; this will all go in ray.h 


Quote




// interpolation tables

//

// first the ray to try; the first 4 are always cast



extern i16 _tryray[] = {

0,256,288,304,128,64,192,32,96,160,224,16,48,80,112,144,

176,208,240,272,8,24,40,56,72,88,104,120,136,152,168,184,

200,216,232,248,264,280,296,4,12,20,28,36,44,52,60,68,

76,84,92,100,108,116,124,132,140,148,156,164,172,180,188,196,

204,212,220,228,236,244,252,260,268,276,284,292,300,2,6,10,

14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74,

78,82,86,90,94,98,102,106,110,114,118,122,126,130,134,138,

142,146,150,154,158,162,166,170,174,178,182,186,190,194,198,202,

206,210,214,218,222,226,230,234,238,242,246,250,254,258,262,266,

270,274,278,282,286,290,294,298,302,1,3,5,7,9,11,13,

15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,

47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,

79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,

111,113,115,117,119,121,123,125,127,129,131,133,135,137,139,141,

143,145,147,149,151,153,155,157,159,161,163,165,167,169,171,173,

175,177,179,181,183,185,187,189,191,193,195,197,199,201,203,205,

207,209,211,213,215,217,219,221,223,225,227,229,231,233,235,237,

239,241,243,245,247,249,251,253,255,257,259,261,263,265,267,269,

271,273,275,277,279,281,283,285,287,289,291,293,295,297,299,301,303

};





// the ray previously calculated to the left of the ray being tried



extern i16 _leftray[] = {

32767,32767,32767,32767,0,0,128,0,64,128,192,0,32,64,96,128,

160,192,224,256,0,16,32,48,64,80,96,112,128,144,160,176,

192,208,224,240,256,272,288,0,8,16,24,32,40,48,56,64,

72,80,88,96,104,112,120,128,136,144,152,160,168,176,184,192,

200,208,216,224,232,240,248,256,264,272,280,288,296,0,4,8,

12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,

76,80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,

140,144,148,152,156,160,164,168,172,176,180,184,188,192,196,200,

204,208,212,216,220,224,228,232,236,240,244,248,252,256,260,264,

268,272,276,280,284,288,292,296,300,0,2,4,6,8,10,12,

14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,

46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,

78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,

110,112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,

142,144,146,148,150,152,154,156,158,160,162,164,166,168,170,172,

174,176,178,180,182,184,186,188,190,192,194,196,198,200,202,204,

206,208,210,212,214,216,218,220,222,224,226,228,230,232,234,236,

238,240,242,244,246,248,250,252,254,256,258,260,262,264,266,268,

270,272,274,276,278,280,282,284,286,288,290,292,294,296,298,300,302

};





// the ray previously calculated to the right of the ray being tried



extern i16 _rightray[] = {

32767,32767,32767,32767,256,128,256,64,128,192,256,32,64,96,128,160,

192,224,256,288,16,32,48,64,80,96,112,128,144,160,176,192,

208,224,240,256,272,288,304,8,16,24,32,40,48,56,64,72,

80,88,96,104,112,120,128,136,144,152,160,168,176,184,192,200,

208,216,224,232,240,248,256,264,272,280,288,296,304,4,8,12,

16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76,

80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,140,

144,148,152,156,160,164,168,172,176,180,184,188,192,196,200,204,

208,212,216,220,224,228,232,236,240,244,248,252,256,260,264,268,

272,276,280,284,288,292,296,300,304,2,4,6,8,10,12,14,

16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,

48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,

80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,

112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,142,

144,146,148,150,152,154,156,158,160,162,164,166,168,170,172,174,

176,178,180,182,184,186,188,190,192,194,196,198,200,202,204,206,

208,210,212,214,216,218,220,222,224,226,228,230,232,234,236,238,

240,242,244,246,248,250,252,254,256,258,260,262,264,266,268,270,

272,274,276,278,280,282,284,286,288,290,292,294,296,298,300,302,304

};





// if the above two rays are on the same map block and face, then this table



// is the number of rays to interpolate +1 ;  also rightray minus leftray



//in this case 1 indicates 255 rays, 0 is no

// interpolation.  This table is also the starting point for the interfrac

// table.  If you are interpolating 127 rays, you start at position 128

// on the interfrac table; if you are interpolating 31 rays you start at

// position 32 on the interfrac table



extern i16 _interpolnum[] = {

0,0,0,0,1,128,128,64,64,64,64,32,32,32,32,32,

32,32,32,32,16,16,16,16,16,16,16,16,16,16,16,16,

16,16,16,16,16,16,16,8,8,8,8,8,8,8,8,8,

8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,

8,8,8,8,8,8,8,8,8,8,8,8,8,4,4,4,

4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,

4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,

4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,

4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,

4,4,4,4,4,4,4,4,4,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2

};



// the different interpolation routines use these fractions; there are

// actually seven tables of fractions in this one page of RAM

// Note that if you are only interpolating one ray (indicated by a 2 in

// the interpolnum table) then you don't need to use this fraction table,

// as the results will just be the average of the leftray and rightray

// parameters.  If you're interpolating 255 values you also don't need 

// this fraction table, as the column number itself would be the fraction





extern u8 _interfrac[]={

0,0,0,128,0,64,128,192,0,32,64,96,128,160,192,224,

0,16,32,48,64,80,96,112,128,144,160,176,192,208,224,240,

0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,

128,136,144,152,160,168,176,184,192,200,208,216,224,232,240,248,

0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,

64,68,72,76,80,84,88,92,96,100,104,108,112,116,120,124,

128,132,136,140,144,148,152,156,160,164,168,172,176,180,184,188,

192,196,200,204,208,212,216,220,224,228,232,236,240,244,248,252,

0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,

32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,

64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,

96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,

128,130,132,134,136,138,140,142,144,146,148,150,152,154,156,158,

160,162,164,166,168,170,172,174,176,178,180,182,184,186,188,190,

192,194,196,198,200,202,204,206,208,210,212,214,216,218,220,222,

224,226,228,230,232,234,236,238,240,242,244,246,248,250,252,254

};



Jeffrey
Posts: 62
Joined: Fri Feb 19, 2021 9:46 am

New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Post by Jeffrey »


Just a little personal update: I have been very busy lately IRL. But I will be returning to this project. ?

Also, the last few days/weeks I have been working on a (completely) new demo. And I am very excited about it :). ?

Lets just say that the x16 is much more capable than I had thought...

Rob
Posts: 3
Joined: Tue Feb 22, 2022 6:42 pm

New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Post by Rob »


The Vera can scale its video output:



Using only 70% of the original resolution (226 x 140) still looks decent while cutting the number of pixels to update by half (if 320 x 200 was targeted).

How much would the frame rate increase?

Eliminates the need for drawing borders around the first person view.



It should be easy to implement and verify? I'd like to see it! ?



HSCALE

$9F2A, $2C



VSCALE

$9F2B, $2C

User avatar
svenvandevelde
Posts: 488
Joined: Wed Dec 23, 2020 6:30 am
Location: Belgium, Antwerpen

New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Post by svenvandevelde »



On 2/22/2022 at 8:02 PM, Rob said:




The Vera can scale its video output:



Using only 70% of the original resolution (226 x 140) still looks decent while cutting the number of pixels to update by half (if 320 x 200 was targeted).

How much would the frame rate increase?

Eliminates the need for drawing borders around the first person view.



It should be easy to implement and verify? I'd like to see it! ?



HSCALE

$9F2A, $2C



VSCALE

$9F2B, $2C



Very smart remark.

KICKC home page by Jesper Gravgaard.
My KICKC alpha with Commander X16 extensions.
Rob
Posts: 3
Joined: Tue Feb 22, 2022 6:42 pm

New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Post by Rob »



On 2/22/2022 at 1:13 PM, svenvandevelde said:




Very smart remark.



Thanks.

The real intelligence goes into optimizing the algorithm. This is just a cheat. ?

Ed Minchau
Posts: 503
Joined: Sat Jul 11, 2020 3:30 pm

New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Post by Ed Minchau »



On 2/22/2022 at 12:02 PM, Rob said:




The Vera can scale its video output:



Using only 70% of the original resolution (226 x 140) still looks decent while cutting the number of pixels to update by half (if 320 x 200 was targeted).

How much would the frame rate increase?

Eliminates the need for drawing borders around the first person view.



It should be easy to implement and verify? I'd like to see it! ?



HSCALE

$9F2A, $2C



VSCALE

$9F2B, $2C



Good idea, but there is a drawback.  When you use a VSCALE or HSCALE that is anything other than $80, VERA will still make the resultant image 640x480. Your "pixels" are actually more than one pixel wide or high. When V/HSCALE are $40 (ie 320x240) or $20 (ie 160x120) that isn't a problem, all the "pixels" are just 2x2 or 4x4, respectively. 

But for a scaling factor that isn't a power of two, VERA has to make variable-size pixels. I'm using $33 for Asteroid Commander, giving me a resolution of 255x192. VERA handles this by making half the "pixels" 3 pixels wide, alternating with 2 pixels wide, and the same for the height. So my pixels are either 2x2, or 2x3, or 3x2, or 3x3. With a value of 2C, you'd get a resolution of 220x165, and 200 of your columns would be 3 pixels wide, the other 20 only 2; similarly 150 rows would be 3 pixels tall, the other 15 only two. Basically every 11th row and column is smaller.

A huge advantage of using $33 or below is that you only need one byte for a column index. That simplifies and speeds up a lot of calculations. 

 

Rob
Posts: 3
Joined: Tue Feb 22, 2022 6:42 pm

New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Post by Rob »



On 2/22/2022 at 3:59 PM, Ed Minchau said:




Good idea, but there is a drawback.  When you use a VSCALE or HSCALE that is anything other than $80, VERA will still make the resultant image 640x480. Your "pixels" are actually more than one pixel wide or high. When V/HSCALE are $40 (ie 320x240) or $20 (ie 160x120) that isn't a problem, all the "pixels" are just 2x2 or 4x4, respectively. 



But for a scaling factor that isn't a power of two, VERA has to make variable-size pixels. I'm using $33 for Asteroid Commander, giving me a resolution of 255x192. VERA handles this by making half the "pixels" 3 pixels wide, alternating with 2 pixels wide, and the same for the height. So my pixels are either 2x2, or 2x3, or 3x2, or 3x3. With a value of 2C, you'd get a resolution of 220x165, and 200 of your columns would be 3 pixels wide, the other 20 only 2; similarly 150 rows would be 3 pixels tall, the other 15 only two. Basically every 11th row and column is smaller.



A huge advantage of using $33 or below is that you only need one byte for a column index. That simplifies and speeds up a lot of calculations. 



 



So, in short, unless you're using either a native resolution or a scaling factor of 2, you will end up with odd-sized pixels that won't look right unless those odd-sized pixels are more evenly dispersed.

I think I'd be okay with a border-less lower resolution option to experience an even smoother frame rate.



I got to this thread while thinking of a PETSCII Wolfenstein engine, so seeing this engine working on even 160 x 120 would still be glorious.

Post Reply