Page 1 of 1

KICK C and the 65C02 CPU - Optimize the size of your Array of Structures

Posted: Tue May 02, 2023 2:23 pm
by svenvandevelde
If you chose to use Arrays of Structures in your program then I would like to address with you a second topic (the first is here), which is why you may consider to optimize the size of your structure and align it with 2^n, in order to:
- Improve the overall performance of your code.
- Reduce the size of your code footprint.

Let's consider an example. Imagine a program with a structure that defines properties for a sprite, and then defines an array sprites of 64 elements of this structure in memory:

Code: Select all

struct sprite {
	unsigned int id;
	unsigned char type;
	unsigned int x;
	unsigned int y;  
};

struct sprite sprites[64];
The total size of the structure is 7 bytes, so the compiler will allocate 64*7 bytes in memory, which is a total of 448 bytes. The total size of the array is well above 256 bytes, so we need an unsigned integer to index the array.

Since the total size is larger than 256 bytes, the compiler cannot address your array using absolute indexed addressing mode. Instead, it needs to address your array using 2 zeropage memory addresses, applying indirect indexed addressing mode. For example lda ($04),y which clobbers the accumulator and the y register, and needs 2 zeropage memory addresses.

When you write code indexing your Array of Structures, the arithmetic the compiler needs to generate to position the zeropage offset to the right address, can become complex and long. It can result in code overhead.

Let us create some totally useless code, that uses the sprites array, indexes the array and retrieves values x and y from each sprite element, indexed by index, and moving the values into global variables x and y:

Code: Select all

unsigned int x;
unsigned int y;

for(unsigned int index=0; index<64; index++) {
	x = sprites[index].x;
	y = sprites[index].y;
}
Let us observe the assembler code the compiler generates as a result.

At first it prepares the zeropages used as the offset for the indirect indexed address mode.

Since the size of the structure sprite is 7 bytes, it needs to calculate the offset 7 * index, which requires code resulting in bitshifts and additions, doing the following: $zp = ((((index << 1) + index) << 1) + index). For example, if index == 1, then shifting index to the left makes index == 2, then adding 1 makes index == 3, shifting to the left makes index == 6, and adding the last 1 makes index == 7. Taking an other example with index == 5, the calculation would be 5 << 1 makes index == 10, adding 5 makes index == 15, shifting to the left makes index == 30, adding 5 makes index == 35 (which is 7*5). This generates the code below:

Retrieve index (an absolute address but we use a label here) and shift the accumulator to the left, and store in $02/$03, so we need to do this for both the low and the high byte of index:

Code: Select all

    lda.z index
    asl
    sta.z $02
    lda.z index+1
    rol
    sta.z $02+1
Add index to the result in $02/03:

Code: Select all

    clc
    lda.z $02
    adc.z index
    sta.z $02
    lda.z $02+1
    adc.z index+1
    sta.z $02+1
Shift the result in $02 to the left (the result is already in $02, so we can just shift $02):

Code: Select all

    asl.z $02
    rol.z $02+1
Add index to the result in $02/$03:

Code: Select all

    clc
    lda.z $02
    adc.z index
    sta.z $02
    lda.z $02+1
    adc.z i+1
    sta.z $02+1
Remember, we started from the index value, of which now the multiple of 7 is stored in $02/$03.
Now the compiler adds the offset of the memory location of the sprites array, stores this result in $02/$03.

Code: Select all

    lda.z $02
    clc
    adc #<sprites
    sta.z $02
    lda.z $02+1
    adc #>sprites
    sta.z $02+1
All the code above is just to calculate the index, it hasn't really moved anything yet!
So finally the compiler uses $02/$03 to load the unsigned integer of x from the sprite array element, and store it into the absolute memory location of the global variable x.

Code: Select all

    ldy #3 // Offset value x in the structure sprite
    lda ($02),y
    sta.z x
    iny
    lda ($02),y
    sta.z x+1
Same for global variable y, but this manouver is a little bit more complex, since 2 bytes need to be moved, so the stack is used to first load the low byte and store on the stack, then load the high byte and store the high byte of y, and then pop the stack and store the low byte of y.

Code: Select all

    ldy #5 // Offset value y in the structure sprite
    lda ($02),y
    pha
    iny
    lda ($02),y
    sta.z y+1
    pla
    sta.z y
Now come the thing … What if we align the size of structure sprite to 8 bytes instead of 7? Then only 3 bit shifts to the left would suffice to align the zeropage offset, right? And yes, this costs you 64 dummy bytes in memory, but the win is higher performance and less code (which can quickly add up and exceed your 64 bytes):

Code: Select all

struct sprite {
	unsigned int id;
	unsigned char type;
	unsigned int x;
	unsigned int y;
	unsigned char unused1;  
};
If we compile and observe the generated assembler for this structure, with exactly the same logic, we see that the code has become much more compact, there are less instructions required to calculate the offset:

Code: Select all

    lda.z index
    asl
    sta.z $02
    lda.z index+1
    rol
    sta.z $02+1
    asl.z $02
    rol.z $02+1
    asl.z $02
    rol.z $02+1
So the point I'm trying to make in this article is, that when coding with structures, be very careful about the size of your structures, and try to align the size to 2^n, so that the generated code to index your structure is optimal.

However, before you decide to use AoS, first consider if you can use SoA, a dynamic choice which was discussed here.

Hopefully this has been an interesting read and see you in the next article.

Sven

Re: Optimize the size of your Array of Structures

Posted: Tue May 02, 2023 4:20 pm
by desertfish
Thanks for the write up, it confirmed a lot of thinks I already thought were bad :)

Prog8 takes the easy way out here by limiting your options:
- arrays can never be larger than 256 bytes in memory
- arrays are limited to bytes, words and floating point numbers. (prog8 doesn't have structs. Yet.)

This allows for much much less code to calculate indexes, but it still happens, and for word and float arrays I'm contemplating on how to support something like what is often done in assembler: splitting the low and high bytes into separate arrays for each, so the generated code can use a single index value and use indexed addressing mode to access both bytes in one instruction each.

Re: Optimize the size of your Array of Structures

Posted: Tue May 02, 2023 4:29 pm
by svenvandevelde
desertfish wrote: Tue May 02, 2023 4:20 pm I'm contemplating on how to support something like what is often done in assembler: splitting the low and high bytes into separate arrays for each, so the generated code can use a single index value and use indexed addressing mode to access both bytes in one instruction each.
Thanks for reading my post and your reponse and insights. Your comment brings us back to the discussion on Structure of Arrays versus Arrays of Structures ...

Re: KICK C and the 65C02 CPU - Optimize the size of your Array of Structures

Posted: Tue May 02, 2023 6:29 pm
by Ender
Very cool! 8-) Although I'd suggest making the code blocks a different color, since that color is a bit hard to read with this forum's default theme.

Re: KICK C and the 65C02 CPU - Optimize the size of your Array of Structures

Posted: Tue May 02, 2023 9:18 pm
by svenvandevelde
Ender wrote: Tue May 02, 2023 6:29 pm Very cool! 8-) Although I'd suggest making the code blocks a different color, since that color is a bit hard to read with this forum's default theme.
Thank you for telling. Indeed I have configured in my profile the colour of the background to dark brown but didn't realise that applying light colors don't come down well using the default theme. Fixed the code blocks back to the default colors.