So in this article we will show a concrete example of C code that produces a double linked list capability for the CX16, following the above design principles. You can find the code and executable attached, but let's walk through the code and with a few screenshots of debug output we can have a look how this works.
The double linked list has a concrete use case in my equinoxe game. I needed a capability to administer various lists for the various types of objects flying around on the screen, this frame per frame. And this frame aspect requires me to squeeze every cycle out of the CPU where I can. The equinoxe game has various types of objects, like a player space ship and later player controlled objects, various enemy ships, scenery objects, ground objects, bullets, explosions, engine exhausts, etc ... Each of these objects have code that models the moving and AI behaviour and have data properties attached. The generated code to administer all this logic, is deployed in banked memory, so each type of object code is located at a different RAM bank between A000 and BFFF on the CX16. Using kernal routine JSRFAR I'm calling for each type of object an "update" routine that calculates for each frame the new positions and actions of the object type.
By using the RAM banking, I have lot's of code space to model the behaviour of these objects and i'm not bound to the $0800 till $9F00 address range in low memory. In fact, I started my game like this and quickly realized that the low memory address space is rather limited. Unlike the C64 and C128, the upper memory address spaces cannot be "deactivated", especially the ROMs between $C000 and $FFFF. There is no shadow RAM underneath ROM address space, so hence RAM banking was the solution! Together with Jesper we developed a nice RAM banking capability in kickc, that allows you to compile your code banked and deploy it banked, but that explanation will come later how to use it.
The consequence of the RAM banking is that FAR calls need to be made to administer the positions of each object type. Imagine I have 1 player ship, but 32 enemies and 25 bullets in the air. The equinoxe game logic calls a dedicated update routine first for the player ship, then it calls the update routine for the enemy ships and then it calls the update routine for the bullets using JSRFAR kernal routine. So, instead of doing for each object a JSRFAR routine, i'm doing for each object type a JSRFAR call that then in encapsulation mode performs its logic. This saves me a lot of CPU expensive JSRFAR calls, however, that requires some mechanism to quickly be able to loop through the entries that describe the object types, and ONLY those object types.
Simply said, the CX16 has 128 sprites, of which 127 are usable for equinoxe (the sprite 0 is the mouse). Imagine I would use a simple array (been there....), that would be used by each update routine for each object type, which would simply loop through the array, validating if the object underneath is of the same type, and then perform the update logic. That means, that for players i would need to loop 127 times, for enemy ships i would need to loop 127 times and for bullets i would need to loop 127 times. Add to that equation object types for ground installations, explostions, scenary and the like, and you see quickly that the looping would cause a significant overhead to the game logic. It really counts, loop overheads are killing the frame per frame performance of the game engine. So it is important to have a design that minimizes such loop overheads!
I reworked the engine to use a simple double linked lists (plural), mapped on a SoA, with multiple root positions for each object type, having it's own linked list within the SoA. That allows me for each object type to simply iterate through the list of each object type, avoiding the problematic loop overhead!
Remember, we declare a structure that contains multiple arrays of maximum one byte length! We use where possible for each field a data type that is one byte large! This will result in absolute indexed addressing and will produce minimal code size and fast and functional code!
So below you find an example how such a list can be designed. I'm not showing the actual equinoxe game engine solution because that would really bring us too far. Let's first talk about the setup of the data structures and types.
We declare ELEMENTS, which denote a maximum of 64 elements within the SoA. Note that I've decided to use #defines instead of const as the const keyword is not the same as #define in C.
Code: Select all
#define ELEMENTS 64
Code: Select all
#define ALPHA 0
#define GAMMA 1
Code: Select all
typedef unsigned char data_index_t;
typedef unsigned char data_type_t;
Code: Select all
typedef struct {
char used[ELEMENTS];
char symbol[ELEMENTS];
char next[ELEMENTS];
char prev[ELEMENTS];
char root[4];
char index;
} data_t;
Next, we allocate the memory space for the SoA, so we define a variable called data, which is globally defined in this example. Note that for absolute indexed addressing to be generated within the list manipulation functions, we have built the logic to use this one global variable.
Code: Select all
data_t data;
The function data_add will simply allocate a new element in the SoA of the given object type ALPHA or GAMMA, and will assign the char to the symbol field of the element allocated. It will adjust the next and prev fields accordingly and will adjust the root entries within the SoA if needed. Each new element will cause the root to be updated for the given object type!
The function data_remove will delete an element from the SoA for the given object type. This function will put symbol to a '-', adjust the next and prev indexes accordingly and adjusts the root position when needed for the given object type ALPHA or GAMMA.
Code: Select all
void main() {
clrscr();
cx16_k_screen_set_charset(3, (char*)0);
char a = data_add(ALPHA, 'A');
char b = data_add(ALPHA, 'B');
char x = data_add(GAMMA, 'X');
char y = data_add(GAMMA, 'Y');
char c = data_add(ALPHA, 'C');
char z = data_add(GAMMA, 'Z');
data_remove(ALPHA, b);
data_remove(ALPHA, c);
data_remove(ALPHA, a);
....
}
Each new element is added at a free index within the SoA. The field used outlines whether a field us used or not. There is however one caveat! The value NULL! So are not using element 0 in the SoA, as indexes pointing to 0 should depict the end of the list. I've kept the NULL value as an actual value, in order to have code performance. 0 allows assembler instructions like BNE and BEQ to be used and avoids CMP instructions to be generated. And yes, this results in a little unused data overhead ...
So the function data_add accepts a data_type_t value ALPHA or GAMMA, which contains the index to the root field to be used to administer the root of each list within the SoA. The symbol will be assigned to the newly allocated element. The function returns the index pointing to the newly allocated element within the SoA.
data_index_t data_add(data_type_t type, char symbol) {
We iterate through the SoA fields, ensuring that we don't trespass the 64 index boundary!
The logic uses data.index to search for a free element, skipping the first element 0.
Code: Select all
data.index %= ELEMENTS;
while (!data.index || data.used[data.index]) {
data.index = (data.index + 1) % ELEMENTS;
}
unsigned char i = data.index;
Code: Select all
data_index_t root = data.root[type];
data.next[index] = root;
data.prev[index] = NULL;
if (root)
data.prev[root] = index;
data.root[type] = index;
Once we have the relevant indexes updated within the SoA, we update the symbol and we indicate the element is used.
Code: Select all
data.used[index] = 1;
data.symbol[index] = symbol;
return index;
}
It will remove an element of the given object type ALPHA or GAMMA, by the given index.
Code: Select all
void data_remove(data_type_t type, data_index_t index) {
Code: Select all
if (data.used[index]) {
data.used[index] = 0;
So we set the root of the object type to NULL!
Code: Select all
data_index_t root = data.root[type];
if(!data.next[root]) {
data.root[type] = NULL;
If we remove the root node, we must adjust the root field index in the SoA to the next index of the element being removed! This next index can be NULL, but that does not matter.
Code: Select all
} else {
data_index_t next = data.next[index];
data_index_t prev = data.prev[index];
if (next) {
data.prev[next] = prev;
}
if (prev) {
data.next[prev] = next;
}
if (root == index) {
data.root[type] = next;
}
}
Code: Select all
data.next[index] = NULL;
data.prev[index] = NULL;
data.symbol[index] = '-';
}
}
The attached PRG contains a working logic, which would provide you with a view like this:
The left side shows the situation before the list action and the right side the situation after the action has been processed. Making such small example programs allows me to quickly build a working logic that can be cut/pasted into my equinoxe game, and test it with all the use cases required.
Please have a look at the scenario unfolding as we run the attached prg file on the cx16.
We add 'A' to the list, element of ALPHA. We can see in the right columns that the indexes next and prev of A are NULL. Root ALPHA points to index 1. Also note that index 0 is not used in the list.
We add 'B' to the list, element of ALPHA. Now you can see on the right columns that B has been added as index 2, and the next index of B points to 1, while the prev index of A points to 2. The prev index of B is NULL and the next index of A is NULL. The root ALPHA is set to 2, so the newly added element now becomes root!
We add 'X' to the list, element of GAMMA. X has been added as index 3, and the next and prev indexes of X point to NULL. The root GAMMA is set to 3, and is the first element of GAMMA in the list.
We add 'Y' to the list, element of GAMMA. Y has been added as index 4, and the next index of Y points to 3, while the prev index of X points to 4. The prev index of Y is NULL and the next index of X is NULL. The root GAMMA is set to 4, so the newly added element now becomes root! The ALPHA list is untouched!
We add 'C' to the list, element of ALPHA. C has been added as index 5, and the next index of C points to 2 (B), while the prev index of B points to 5. The A element is untouched. The prev index of C is NULL and the next index of A is NULL. So now we have a chain of 3 elements C, B, A, following the next indexes. The root ALPHA is set to 5, so the newly added element now becomes root!
We add 'Z' to the list, element of GAMMA. Z has been added as index 6, and the next index of Z points to 4 (Y), while the prev index of Y points to 6. The X element is untouched. The prev index of Z is NULL and the next index of X is NULL. The root GAMMA is set to 6, so the newly added element now becomes root! So now we have a chain of 3 elements Z, Y, X, following the next indexes of the GAMMA list.
We remove 'B' from the list, element of ALPHA. The index 2 becomes free, the next index of C is adjusted to 1 (A), while the prev index of A is adjusted to 5 (C). For display reasons the symbol of B becomes '-'. Note that the root of ALPHA still remains 5, but the ALPHA list now has only 2 elements.
Now we remove 'C' from the list, which is the root element of ALPHA. The index 5 (C) becomes free, the next and prev indexes of A are adjusted to NULL, as it is the only element left in the list. For display reasons the symbol of C becomes '-'. The root of ALPHA now becomes index 1.
Now we remove 'A' from the list, which is the root element of ALPHA. The index 1 (A) becomes free. The next and prev indexes of A are adjusted to NULL. For display reasons the symbol of A becomes '-'. The root of ALPHA now becomes NULL, as there are no more elements in the list.
The program and the C-code contains further use cases, but as you can see, this is how to build a fast double linked list using a simple structure, and no usage of pointers, which results in fast assembler code that uses absolute indexed addressing. This is the assembler for the data_add function:
Code: Select all
data_add: {
.label data_add__5 = $16
.label data_add__6 = $16
.label i = $3b
.label r = $28
.label kbhit1_return = $14
.label symbol = $2b
.label return = $3b
.label return_1 = $41
.label return_2 = $40
.label return_3 = $44
.label return_4 = $43
.label return_5 = $3f
.label return_6 = $42
.label type = $35
// data_debug(0)
// [187] call data_debug
// [286] phi from data_add to data_debug [phi:data_add->data_debug]
// [286] phi data_debug::c#11 = 0 [phi:data_add->data_debug#0] -- call_phi_near
lda #0
sta.z data_debug.c
jsr data_debug
// data_add::@7
// data.index %= ELEMENTS
// [188] *((char *)&data+$104) = *((char *)&data+$104) & $40-1 -- _deref_pbuc1=_deref_pbuc1_band_vbuc2
lda #$40-1
and data+$104
sta data+$104
// data_add::@2
__b2:
// while (!data.index || data.used[data.index])
// [189] if(0==*((char *)&data+$104)) goto data_add::@3 -- 0_eq__deref_pbuc1_then_la1
lda data+$104
beq __b3
// data_add::@9
// [190] if(0!=((char *)&data)[*((char *)&data+$104)]) goto data_add::@3 -- 0_neq_pbuc1_derefidx_(_deref_pbuc2)_then_la1
tay
lda data,y
cmp #0
bne __b3
// data_add::@4
// unsigned char i = data.index
// [191] data_add::i#0 = *((char *)&data+$104) -- vbuz1=_deref_pbuc1
tya
sta.z i
// data_index_t r = data.root[type]
// [192] data_add::r#0 = ((char *)&data+$100)[data_add::type#15] -- vbuz1=pbuc1_derefidx_vbuz2
// p.r = 3 => f[3].n = 2, f[2].n = 1, f[1].n = -
// => f[3].p = -, f[2].p = 3, f[1].p = 2
// Add 4
// p.r = 4 => f[4].n = 3, f[3].n = 2, f[2].n = 1, f[1].n = -
// p.r = 4 => f[4].p = -, f[3].p = 4, f[2].p = 3, f[1].p = 2
ldy.z type
lda data+$100,y
sta.z r
// data.next[i] = r
// [193] ((char *)&data+$80)[data_add::i#0] = data_add::r#0 -- pbuc1_derefidx_vbuz1=vbuz2
ldy.z i
sta data+$80,y
// data.prev[i] = NULL
// [194] ((char *)&data+$c0)[data_add::i#0] = 0 -- pbuc1_derefidx_vbuz1=vbuc2
lda #0
sta data+$c0,y
// if (r)
// [195] if(0==data_add::r#0) goto data_add::@1 -- 0_eq_vbuz1_then_la1
lda.z r
beq __b1
// data_add::@5
// data.prev[r] = i
// [196] ((char *)&data+$c0)[data_add::r#0] = data_add::i#0 -- pbuc1_derefidx_vbuz1=vbuz2
tya
ldy.z r
sta data+$c0,y
// data_add::@1
__b1:
// data.root[type] = i
// [197] ((char *)&data+$100)[data_add::type#15] = data_add::i#0 -- pbuc1_derefidx_vbuz1=vbuz2
lda.z i
ldy.z type
sta data+$100,y
// data.used[i] = 1
// [198] ((char *)&data)[data_add::i#0] = 1 -- pbuc1_derefidx_vbuz1=vbuc2
lda #1
ldy.z i
sta data,y
// data.symbol[i] = symbol
// [199] ((char *)&data+$40)[data_add::i#0] = data_add::symbol#15 -- pbuc1_derefidx_vbuz1=vbuz2
lda.z symbol
sta data+$40,y
// data_debug(40)
// [200] call data_debug
// [286] phi from data_add::@1 to data_debug [phi:data_add::@1->data_debug]
// [286] phi data_debug::c#11 = $28 [phi:data_add::@1->data_debug#0] -- call_phi_near
lda #$28
sta.z data_debug.c
jsr data_debug
// [201] phi from data_add::@1 data_add::@6 to data_add::kbhit1 [phi:data_add::@1/data_add::@6->data_add::kbhit1]
// data_add::kbhit1
kbhit1:
// data_add::kbhit1_cbm_k_clrchn1
// asm
// asm { jsrCBM_CLRCHN }
jsr CBM_CLRCHN
// [203] phi from data_add::kbhit1_cbm_k_clrchn1 to data_add::kbhit1_@2 [phi:data_add::kbhit1_cbm_k_clrchn1->data_add::kbhit1_@2]
// data_add::kbhit1_@2
// cbm_k_getin()
// [204] call cbm_k_getin -- call_phi_near
jsr cbm_k_getin
// [205] cbm_k_getin::return#2 = cbm_k_getin::return#1
// data_add::@8
// [206] data_add::kbhit1_return#0 = cbm_k_getin::return#2
// data_add::@6
// while(!kbhit())
// [207] if(0==data_add::kbhit1_return#0) goto data_add::kbhit1 -- 0_eq_vbuz1_then_la1
lda.z kbhit1_return
beq kbhit1
// data_add::@return
// }
// [208] return
rts
// data_add::@3
__b3:
// data.index + 1
// [209] data_add::$5 = *((char *)&data+$104) + 1 -- vbuz1=_deref_pbuc1_plus_1
lda data+$104
inc
sta.z data_add__5
// (data.index + 1) % ELEMENTS
// [210] data_add::$6 = data_add::$5 & $40-1 -- vbuz1=vbuz1_band_vbuc1
lda #$40-1
and.z data_add__6
sta.z data_add__6
// data.index = (data.index + 1) % ELEMENTS
// [211] *((char *)&data+$104) = data_add::$6 -- _deref_pbuc1=vbuz1
sta data+$104
jmp __b2
}