KICK C and the 65C02 CPU - Optimize the size of your Array of Structures
Posted: Tue May 02, 2023 2:23 pm
If you chose to use Arrays of Structures in your program then I would like to address with you a second topic (the first is here), which is why you may consider to optimize the size of your structure and align it with 2^n, in order to:
- Improve the overall performance of your code.
- Reduce the size of your code footprint.
Let's consider an example. Imagine a program with a structure that defines properties for a sprite, and then defines an array sprites of 64 elements of this structure in memory:
The total size of the structure is 7 bytes, so the compiler will allocate 64*7 bytes in memory, which is a total of 448 bytes. The total size of the array is well above 256 bytes, so we need an unsigned integer to index the array.
Since the total size is larger than 256 bytes, the compiler cannot address your array using absolute indexed addressing mode. Instead, it needs to address your array using 2 zeropage memory addresses, applying indirect indexed addressing mode. For example lda ($04),y which clobbers the accumulator and the y register, and needs 2 zeropage memory addresses.
When you write code indexing your Array of Structures, the arithmetic the compiler needs to generate to position the zeropage offset to the right address, can become complex and long. It can result in code overhead.
Let us create some totally useless code, that uses the sprites array, indexes the array and retrieves values x and y from each sprite element, indexed by index, and moving the values into global variables x and y:
Let us observe the assembler code the compiler generates as a result.
At first it prepares the zeropages used as the offset for the indirect indexed address mode.
Since the size of the structure sprite is 7 bytes, it needs to calculate the offset 7 * index, which requires code resulting in bitshifts and additions, doing the following: $zp = ((((index << 1) + index) << 1) + index). For example, if index == 1, then shifting index to the left makes index == 2, then adding 1 makes index == 3, shifting to the left makes index == 6, and adding the last 1 makes index == 7. Taking an other example with index == 5, the calculation would be 5 << 1 makes index == 10, adding 5 makes index == 15, shifting to the left makes index == 30, adding 5 makes index == 35 (which is 7*5). This generates the code below:
Retrieve index (an absolute address but we use a label here) and shift the accumulator to the left, and store in $02/$03, so we need to do this for both the low and the high byte of index:
Add index to the result in $02/03:
Shift the result in $02 to the left (the result is already in $02, so we can just shift $02):
Add index to the result in $02/$03:
Remember, we started from the index value, of which now the multiple of 7 is stored in $02/$03.
Now the compiler adds the offset of the memory location of the sprites array, stores this result in $02/$03.
All the code above is just to calculate the index, it hasn't really moved anything yet!
So finally the compiler uses $02/$03 to load the unsigned integer of x from the sprite array element, and store it into the absolute memory location of the global variable x.
Same for global variable y, but this manouver is a little bit more complex, since 2 bytes need to be moved, so the stack is used to first load the low byte and store on the stack, then load the high byte and store the high byte of y, and then pop the stack and store the low byte of y.
Now come the thing … What if we align the size of structure sprite to 8 bytes instead of 7? Then only 3 bit shifts to the left would suffice to align the zeropage offset, right? And yes, this costs you 64 dummy bytes in memory, but the win is higher performance and less code (which can quickly add up and exceed your 64 bytes):
If we compile and observe the generated assembler for this structure, with exactly the same logic, we see that the code has become much more compact, there are less instructions required to calculate the offset:
So the point I'm trying to make in this article is, that when coding with structures, be very careful about the size of your structures, and try to align the size to 2^n, so that the generated code to index your structure is optimal.
However, before you decide to use AoS, first consider if you can use SoA, a dynamic choice which was discussed here.
Hopefully this has been an interesting read and see you in the next article.
Sven
- Improve the overall performance of your code.
- Reduce the size of your code footprint.
Let's consider an example. Imagine a program with a structure that defines properties for a sprite, and then defines an array sprites of 64 elements of this structure in memory:
Code: Select all
struct sprite {
unsigned int id;
unsigned char type;
unsigned int x;
unsigned int y;
};
struct sprite sprites[64];
Since the total size is larger than 256 bytes, the compiler cannot address your array using absolute indexed addressing mode. Instead, it needs to address your array using 2 zeropage memory addresses, applying indirect indexed addressing mode. For example lda ($04),y which clobbers the accumulator and the y register, and needs 2 zeropage memory addresses.
When you write code indexing your Array of Structures, the arithmetic the compiler needs to generate to position the zeropage offset to the right address, can become complex and long. It can result in code overhead.
Let us create some totally useless code, that uses the sprites array, indexes the array and retrieves values x and y from each sprite element, indexed by index, and moving the values into global variables x and y:
Code: Select all
unsigned int x;
unsigned int y;
for(unsigned int index=0; index<64; index++) {
x = sprites[index].x;
y = sprites[index].y;
}
At first it prepares the zeropages used as the offset for the indirect indexed address mode.
Since the size of the structure sprite is 7 bytes, it needs to calculate the offset 7 * index, which requires code resulting in bitshifts and additions, doing the following: $zp = ((((index << 1) + index) << 1) + index). For example, if index == 1, then shifting index to the left makes index == 2, then adding 1 makes index == 3, shifting to the left makes index == 6, and adding the last 1 makes index == 7. Taking an other example with index == 5, the calculation would be 5 << 1 makes index == 10, adding 5 makes index == 15, shifting to the left makes index == 30, adding 5 makes index == 35 (which is 7*5). This generates the code below:
Retrieve index (an absolute address but we use a label here) and shift the accumulator to the left, and store in $02/$03, so we need to do this for both the low and the high byte of index:
Code: Select all
lda.z index
asl
sta.z $02
lda.z index+1
rol
sta.z $02+1
Code: Select all
clc
lda.z $02
adc.z index
sta.z $02
lda.z $02+1
adc.z index+1
sta.z $02+1
Code: Select all
asl.z $02
rol.z $02+1
Code: Select all
clc
lda.z $02
adc.z index
sta.z $02
lda.z $02+1
adc.z i+1
sta.z $02+1
Now the compiler adds the offset of the memory location of the sprites array, stores this result in $02/$03.
Code: Select all
lda.z $02
clc
adc #<sprites
sta.z $02
lda.z $02+1
adc #>sprites
sta.z $02+1
So finally the compiler uses $02/$03 to load the unsigned integer of x from the sprite array element, and store it into the absolute memory location of the global variable x.
Code: Select all
ldy #3 // Offset value x in the structure sprite
lda ($02),y
sta.z x
iny
lda ($02),y
sta.z x+1
Code: Select all
ldy #5 // Offset value y in the structure sprite
lda ($02),y
pha
iny
lda ($02),y
sta.z y+1
pla
sta.z y
Code: Select all
struct sprite {
unsigned int id;
unsigned char type;
unsigned int x;
unsigned int y;
unsigned char unused1;
};
Code: Select all
lda.z index
asl
sta.z $02
lda.z index+1
rol
sta.z $02+1
asl.z $02
rol.z $02+1
asl.z $02
rol.z $02+1
However, before you decide to use AoS, first consider if you can use SoA, a dynamic choice which was discussed here.
Hopefully this has been an interesting read and see you in the next article.
Sven