For Scene Demos that display animations, graphics, and music. Also for tech demos of graphics capability of VERA or the audio capabilities of the PSG, FM, or PCM audio channels.
During the development of the 2R demo remake MooingLemur and I shared quite a few screenshots/videos of in-between results of parts of this demo and we had quite some discussions about those that show partially working/buggy demo-parts and (partially) reverse engineered simulations etc.
I thought others might be interested if we were to share some of that and give a bit of explaination/context. In this forum thread I would like to start doing that. Note that I know most about the demo-parts that I implemented so I will probably limit myself to those parts. Maybe MooingLemur can comment a bit on the other parts, but I cannot speak for him on this (its a bit time consuming to explain this highly technical stuff).
General info
Here is some basic information about the original demo and the remake for the X16:
On september 12, 2023 MooingLemur sent me a cryptic hint (a music file) showing this:
He asked me if I were interested in creating a secret demo: a remake of Second Reality! I was intrigued, but also sceptical that it would be possible to recreate the entire demo on the X16. In fact this was my initial reaction:
He argued that the new VERA FX upgrade would be particalarly suited at this part:
And the next day I had created a PoC Rotazoomer and from that point on we basicly started working on the demo parts.
Demo parts
Here is a (partial) list of the demo parts. For the ones that I know enough about I will create a post in this thread to show how it was recreated using screenshots/videos and explanation. Whenever I add a new post, I will add the link to it as well.
Sidenote: the original "Script" can be found here (in Finnish)
Music and integration
MooingLemur has done all the work for implementing the music and sound effects. This was an incredible amount of work. I cannot comment on it much more than what was already told. I do know that for quite a few demo parts had to be synced with the music.
MooingLemur also did all the integration of all the demo parts, the packing, loading of resources etc.
---
( I will be editing this post when more parts are added )
Last edited by Jeffrey on Sun Nov 17, 2024 6:07 pm, edited 2 times in total.
The very first demo part I tried to implement was the Rotazoomer. The reason for this is quite simple: the FX affine helper natively supports translation, scaling and (with some generated tables) rotation as well. So this should be one of the easiest demo parts to replicate on the X16. Note that this would be very slow without the FX update.
Initial investigation
When looking in the original source folder that implements the Rotazoomer (as well as the LENS-part) we can see images in .LBM format. When converting we can see the original image used (minus the forehead symbol) as well as two alternative images which they didnt use.
These images are all 320x200px. This is the source image resolution used in the Rotazoomer-part. However the original Rotazoomer-part actually used an effective 160x100px screen resolution (no doubt due to performance reasons).
Fitting the image into a FX tilemap
Since we want to rotate this whole image it has to fit into a FX tilemap. An FX tilemap consist of of 8x8px tiles and has a maximum of 256 unique tiles (unlike the normal VERA tilemaps, which can have 16x16px tiles and 1024 unique ones and can be flipped vertically and horizontally).
The image is actually padded into a square and then repeated in the original demo. When trying to fit the image (as a square) into a FX tilemap (and trying to have the least amount of unique tiles) it becomes clear we can't keep the original 320x200 source image resolution: that would amount in too many unique 8x8px tiles. However the screen resolution in the original is 160x100 so this is not too hurtful (this is only noticable when zooming in).
A pretty good fit is a 16x16 tile map. Since FX maps are restricted to certain sizes, we have to choose a 32x32 map. Basicly replicating the original image 4 times in a map, like so:
Note: tiles shown with a purple background are not completely black (and are unique). Tiles with a black backgrpund are completely black (and can be re-used). Also note that tilemap assembly is done by this code.
Possible improvement
Using a higher resolution source image would make the zoomed in parts a bit better. An attempt was made to fit twice as much pixels into the map. Left is a 160x100px source image, right is a 160x200px source image:
The right image looks a bit better. Note that its FX tiles are also 8x8px, but for better comparision I have shrunk the image vertically, showing a better vertical resolution. But the problem is that this leads to 259 unique tiles (which is too much). This can be reduced by moving a few tiles 1 pixel so a few can be reused, leading to less than 256 unique tiles. But since the improvement was not very significant (on a 160x100 screen) it was not implemented.
While this was nice, but a few things weren't quite right:
The performance isnt quite good enough: there is shearing visible. Its also not vsynced.
Its only rotating. Its not scaling and also not translating (aka "moving" in x and y direction)
Much later I addressed these issues and solved them. Below is a description of the implementation of this solution.
Inner loop and performance
At the very core of the affine helper algorithms is the sampling of pixels inside an FX tilemap and the storing of those pixels inside a screenbuffer. This is also the case for the rotazoomer. Below is a depiction how 4 pixels are drawn to the screenbuffer: each source pixel is first read ( 4 x lda DATA1 ) into the 32-bit FX cache and then those 4 pixels are stored ( sta DATA0 ) inside the screen buffer.
As can be seen there are 5 opcodes needed to draw 4 pixels to the screen:
lda DATA1 (4 cpu cycles)
lda DATA1 (4 cpu cycles)
lda DATA1 (4 cpu cycles)
lda DATA1 (4 cpu cycles)
sta DATA0 (4 cpu cycles)
This means that it takes 20 cpu cycles to draw 4 pixels on screen. Or on average 5 cpu cycles per pixel.
As discussed above we chose to limit the screen resolution to 160x100px (as done by the original). To draw one row of 160 pixels we need to repeat the above code 40 times. In order to speed up even more, we created an unrolled loop of those 40 iterations which is called "COPY_ROW_CODE" and we generate this unrolled code here.
For each row only a few FX position and increment settings have to be reset, so there is very little overhead per row. This effecitively means we spend about 160 * 100 * 5 = 80000 cpu cycles per frame. Since a frame has around 133333 cpu cycles available, our rotazoomer can easily run at 60 fps. We only needed to add a vsync. And since we are faster than the "beam" we dont even need a double buffer.
Rotation, scaling and rotation
We still needed to replicate the results of the original demo, by rotating, scaling and translating the same way. This was tricky.
After lots of deciphering/reading code it turned out that inside MAIN.C there is a function called "part3" which deals with the rotation, scaling and translation of the image. This code was then rewritten as python code to replicate the results. This translation, rotation and scaling was then converted to settings for the affine helper and put into an "animation"-table.
Here is my first attempt at replicating this:
When looking closely it was obvious it wasnt quite right: the background is "dancing around" to the point where you almost get dizzy. It turned out the origin point (in my python math) wasn't set correctly. When fixed it looked like this:
This was the version I submitted to MooingLemur, which in turn integrated it into the rest of the demo parts. Some interpolation of frames was needed, fade-in and fade-outs were added and the demo-parts was synced with the music.