Hey, folks. I haven't had time to play with X16 stuff the last few months for a variety of reasons. Better late than never, I am finally getting around to trying my hand at optimizing this neat little demo. I have Matt's original version running now to see exactly how long it takes for a full run on R38. To benchmark it I added two lines:
.... the very FIRST line of the listing....
[code]1 TI$="000000":REM [-......-] [/code]
...and the very last line to be executed after all the plotting is done...
[code]500 A$=TI$+"!":FOR I =1TO6:B$=MID$(A$,I,1):POKE$815+I,ASC(B$):NEXT[/code]
What's going on here is this added code resets the TI$ system variable at the very start, leaving some specifically formatted space in a REM statement for later; and then when plotting is all done, it grab the current value of TI$, parses it, and pokes the ASC values of the characters representing elapsed time right into the BASIC listing at the location reserved in that REM statement.
So, the idea is that you run the program, and when its done you type "RESET" and enter (or do a "CTL-R" on the emulator) to reset the X16 and get the regular text screen back; then you issue an "OLD" command to restore the BASIC listing. After that, when you list the program you will see the elapsed time of the just completed run coded right into that REM statement at the beginning.
Anyway, Matt has already made a post listing the obvious stuff in terms of optimization. As he suggests, I'll be changing the output stage to use regular POKES to the VERA data port to take advantage of its auto-increment ability and avoid all the maths calculating offsets, as well as the branches to figure out where things are in the image to select the correct VPOKE bank, etc. That's a big time savings since the output runs 320x240 times (76,800 pixels put on the screen). Also, if I remember correctly, some testing I did last year revealed the regular POKE routine in BASIC takes a bit less time to execute on average than VPOKE.
The biggest time savings will come from really chewing on that inner-most loop that iterates over and over to arrive at the color for each pixel, as well as the mid level loop that plots each pixel in a row using the results of that inner most loop for the respective pixel. As Matt points out, the inner-most loop iterates at least 100 times per pixel, and up to 355 times. Aside from avoiding duplication of the x*x and y*y operations, I see some other interesting possibilities.
The goal is going to have to be to absolutely reduce the number of BASIC operations. There's probably not much that can be done with the math. It is what it is. Yeah, I'll optimize the order of variables initiation and knock everything down to single character variables where possible. But at least from my first glance, it seems the heavy lifting will be wonky C64 BASIC stuff related to the inner most loops, rather than any sort of math shortcuts.
For now, I think I'll test my changes by running two limited ranges output, say perhaps the first 30 rows and 30 rows in the middle staring at row 100. Something like that.
I'll let everyone know how it goes.
EDITED: OK, the original code takes 15 hours, 8 minutes and 49 seconds to do a full run on R38. Wowzers.
This is less time than you might come up with just running a small section and extrapolating. The reason is that while the inner most loop that accounts for most of the execution time is a FOR/NEXT initiated to run from 0 to 355, in many (most?) cases its never reaching the end of the loop. For example, any pixels that appear in the original C64 colors and grey-scale ranges at the very beginning of the VERA color palette are falling out of the loop very early in its possible range of iterations. We know from Matt's write-up that it always takes at least 100 iterations, but those colors occur when the plotting threshold is met within 32 additional iterations after the first 100 are done. So on average that loop is not going to make it all the way to 355 most of the time.
All that said, I'm going to do my optimization write up at the end of my original BASIC optimizing thread in the "HOWTOs" section to avoid mucking up this thread with things that aren't necessarily questions for the original author. The work on this program will start a couple posts down on page 2 of the thread, which can be found here:
https://www.commanderx16.com/forum/index.php?/topic/1488-basic-convertingoptimizing-a-simple-basic-program-from-another-commodore-platform-to-the-x16/page/2/#comments