5 hours ago, Kilian Hekhuis said:
I can imagine having to almost duplicate the sprite rendering engine (if you could call it that) for the sprite collision detection, but if that allows processing in a seperate thread, I'm all for it :).
Hrm, I'm not sure if that would help. Something I'm not sure I conveyed well is that the sprite collision on the VERA is determined by whether or not it tried to draw two or more sprites over the same pixel. So the work units *really* matter, all the way through.
I think there's something, though, to be mined from the approach of precalculating the work units per line for each sprite, so that we have less to calculate on-the-fly later on and can at least zip through the 128 sprites a little more quickly with faster ops when we're confident that there's enough work units to get through the entire sprite.
Another thought is that it may be useful to cache precalculated data about sprites, so that we don't have to pay the cost of re-calculating sprite info for looping sprite animations. In my most recent branch for working on performance improvements, I'm doing a lot of backbuffering of layer data, and then caching the backbuffers and layer settings based on a signature for the layer, so this is a similar idea to that. My video_layer_properties caching is pretty naive, though... it's just a linked list of up to 16 elements, but it gets the job done and removes the performance hit of switching between layer settings after the cache has been warmed up. I think a similar approach would be useful for sprites, but I anticipate needing something more like a hash table for storing hundreds of video_sprite_properties structs. There's also an enormous caveat because it's very easy to have to invalidate most, if not all, of the sprite cache when performing a write to VRAM that isn't directly to sprite properties, because otherwise we have to loop through every entry in the sprite cache and individually poke assets. There's gotta be a better way, and I'm wondering if there's a "split the difference" approach that might be possible by doing some precalculation over the entirety of VRAM, but for very few unique combinations of sprite properties.
This is currently where my thought processes are at.