SweetCX16

All aspects of programming on the Commander X16.
BruceMcF
Posts: 1336
Joined: Fri Jul 03, 2020 4:27 am

SweetCX16

Post by BruceMcF »



On 1/16/2022 at 9:01 AM, BruceMcF said:




Since he organized the instruction set for ease of hand-assembly, with only CPR having the opcode it has for functional reasons, I do think that saving odd/even in a zero page byte, and cutting the size of the two vector tables in half is the most useful decode.



However, after drafting several approaches, the game is not worth the candle ... the smallest I can come up with, without going in and copy and pasting from Woz's code, gets down to 416 bytes from the 496 bytes of the smaller version of the "pure" JUMP (optable,X) version that jumps directly to each OP. With a drop down to 394 bytes available from just adopting Woz's code, (including save/restore register code that Woz's version gets from the Apple II ROM), it's not worth it.

Not, that is, unless someone could find space savings IN Woz's version by doing some decoding, but as spaghetti coded as the original Sweet16 is, that someone would not be me.

If either version of my Sweet16 and Woz's original are assembled to be at the END of GoldenRAM, they each would have a different start point.

However, after translating a copy of Woz's code to acme assembler, with "SAVE" and "RESTORE" in front, I find there are six bytes at the end free before the end of the page. Then I could assemble versions of all three with a two routine jump table at the TOP of  golden RAM ($07FA and $CFFA for CX16 and C64 respectively), one for entering Sweet16, the other for entering either SAVE or RESTORE (based on carry set or carry clear). Then the starting point of the routine is flexible, C64 code could enter Sweet16 with JSR $CFFE and CX16 code with JSR $07FE.

That would make it possible to assemble Sweet16 code independent of the choice of Sweet16 VM.

To fit into that, I'm going to shrink the size of my "two page" version by using INC Register and DEC Register subroutines, which will free up as much space as it frees up, and leave my "3 page" version as the full fat speed optimized version.

Edit: What I get is that the "full fat" Sweet16c would occupy $0500-$07FF of Golden Ram, leaving one page (256 bytes) free at $0400. The "two page" Sweet16c2 would occupy $061C-$07FF, leaving 530 bytes (two pages plus 18 bytes) of Golden RAM available at $0400. And the adapted "Sweet 16 original" with SAVE/RESTORE code included and the jump table would occupy $066f-$07FF, leaving 623 bytes (two pages plus 111bytes) of Golden RAM free.

TBC, none of those are tested code, so the final numbers may vary following bug fixes, but those should be the right ball park.

BruceMcF
Posts: 1336
Joined: Fri Jul 03, 2020 4:27 am

SweetCX16

Post by BruceMcF »


I've had a rethink on the three "unused ops" in Woz's Sweet16, and what I've decided is to use that as embedded calls for Machine Language routines. My first idea for calls was trying to make it possible to call Kernel calls directly, but I've since realized that the ML code that is called to can be a bridge routine, so it is not necessary to build register loading and retrieval into the Sweet16 operation ... the routine that is called can handle that as appropriate.

The first thing to do is to wedge it into Woz's original code base. What I've already done is insert "SAVE" and "RESTORE" in between his OPTBLE/BRTABLE data and the "SET" operation which must occur on the 2nd address (or later) of the page holding the Sweet16 ops themselves, since he dispatches with the page address pushed onto the stack, then the table entry, which is (opaddress-1), pushed onto the stack, and then RTS to dispatch the operation.

I also have the codebase END with "JMP Sweet16", so that Sweet16 VM's of different sizes can be placed at the END of GoldenRAM and be called with a stable entry point.

This leaves 3 bytes leeway, in which I put JMP SYSOP, which calls the common routine that executes one of the SYS operations.

I have three "SYS" calls. All SYS calls jump through an indirect call via register 13, the register used by CPR to store the results of a comparison operation. "SYSR n" uses the contents of the register pointed to by the status byte, which is most often Register 0, the Sweet16 accumulator. The current status of CARRY is in the carry flag when executing the call.  "SYS13" uses the current contents of register 13 (and it is the user responsibility to make sure there hasn't been a CPR operation since it was loaded), and the carry flag is clear. For both SYSR and SYS13, the value of "n" is simply available for any use the called routine may wish to make of it.

"SYSZ n" loads R13 with the 16 bit value it finds at the zero page address "n". This is DESIGNED to allow the Sweet16 register with the target address to be specified with "SYSZ Reg0" through "SYSZ Reg14" (using the PC at R15 would not work, as it will contain "n" rather than a ML routine) ... but it CAN be used to execute ANY address in the zero page.

At one and the same time, these SYS operations allow the writing of bridge routines to called Kernel routines, as well as routines to extend Sweet16 to include any desired operation.  Indexed calls are available by simply using Sweet16 ADD operations and "SYSR" on the result.

Note that "SYSZ" uses zero page address, not register number like Sweet16 instruction codes, so I will also note that a convenient way to include Sweet16 code in your assembly code is to define the opcodes and registers as named byte symbols and use your byte data pseudo-op to include the code. Bytewise OR ("|" in ACME) can be used for the 15 registers that embed their target register in their bytecode, with the register number given rather than the register address. An advantage of this is that "extended" Sweet16 code with SYSZ that is portable between Apple systems, based on 16 pseudo-registers at $00-$1F, and those for the C64/CX16, based on 16 pseudo-registers at $02-$21, can be ported by simply re-assembling with the register symbols set correctly.

Placing $0416 in Register 10 would be done with

  !byte ..., SET|10, $16, $04,...

Then using that register to call the routine at the Golden RAM location $0416 would be done with:

  !byte ..., SYSZ, Reg10, ...

 

rje
Posts: 1263
Joined: Mon Apr 27, 2020 10:00 pm
Location: Dallas Area

SweetCX16

Post by rje »


That's a clever use of the assembler to write target-agnostic Sw*t16.

BruceMcF
Posts: 1336
Joined: Fri Jul 03, 2020 4:27 am

SweetCX16

Post by BruceMcF »



On 4/24/2022 at 2:51 PM, BruceMcF said:




I've had a rethink on the three "unused ops" in Woz's Sweet16, and what I've decided is to use that as embedded calls for Machine Language routines. My first idea for calls was trying to make it possible to call Kernel calls directly, but I've since realized that the ML code that is called to can be a bridge routine, so it is not necessary to build register loading and retrieval into the Sweet16 operation ... the routine that is called can handle that as appropriate.



The first thing to do is to wedge it into Woz's original code base. What I've already done is insert "SAVE" and "RESTORE" in between his OPTBLE/BRTABLE data and the "SET" operation which must occur on the 2nd address (or later) of the page holding the Sweet16 ops themselves, since he dispatches with the page address pushed onto the stack, then the table entry, which is (opaddress-1), pushed onto the stack, and then RTS to dispatch the operation.



I also have the codebase END with "JMP Sweet16", so that Sweet16 VM's of different sizes can be placed at the END of GoldenRAM and be called with a stable entry point.



This leaves 3 bytes leeway, in which I put JMP SYSOP, which calls the common routine that executes one of the SYS operations.



I have three "SYS" calls. All SYS calls jump through an indirect call via register 13, the register used by CPR to store the results of a comparison operation. "SYSR n" uses the contents of the register pointed to by the status byte, which is most often Register 0, the Sweet16 accumulator. The current status of CARRY is in the carry flag when executing the call.  "SYS13" uses the current contents of register 13 (and it is the user responsibility to make sure there hasn't been a CPR operation since it was loaded), and the carry flag is clear. For both SYSR and SYS13, the value of "n" is simply available for any use the called routine may wish to make of it.



"SYSZ n" loads R13 with the 16 bit value it finds at the zero page address "n". This is DESIGNED to allow the Sweet16 register with the target address to be specified with "SYSZ Reg0" through "SYSZ Reg14" (using the PC at R15 would not work, as it will contain "n" rather than a ML routine) ... but it CAN be used to execute ANY address in the zero page. ...



The wedge into Woz's Sweet16 is something like (if the registers are not at $00-$1F ... in the original AppleII registers, "SEC : SBC #R0L" can be omitted):


Quote




SYSOP:

   CPX #$1C    ; X = #$1C = 2*SYSR?

   BEQ SYS1    ; If so, test register index is in A

   BMI SYS2    ; X= #$1A = 2*SYS13, no loading needed

   LDY #0        ; Else X = #$1E = 2*SYSZ

   LDA (R15L),Y    ; ZP address is at (Reg15)

   SEC        ; Adjust to use R0L,X indexing

   SBC #R0L

   CLC

SYS1:

   JSR SYS3    ; Fetch vector into Reg13, then use

   RTS

SYS2:

   CLC        ; Vector already in Reg13, just use

   JSR SYS4

   RTS



SYS3:

   TAX             ; Load Reg13 if needed, ...

   LDA R0L,X

   STA R13L

   LDA R0H,X

   STA R13H

SYS4:

   JMP (R13L)    ; Vectored jump based on (Reg13)



 



Swift16 will be similar, but will be able to jump directly to the SYSR, SYS13 and SYSZ operations, since Swift16 operations do not have to start executing in the same page.

BruceMcF
Posts: 1336
Joined: Fri Jul 03, 2020 4:27 am

SweetCX16

Post by BruceMcF »



On 4/25/2022 at 10:30 AM, rje said:




That's a clever use of the assembler to write target-agnostic Sw*t16.



One thing to be careful of is that the code using the Sweet16 VM cannot be in the same namespace as the code implementing the Sweet16 VM, because the namespace uses the "plaintext" names of the operations as addresses of the implementation of the operation, while the code using the Sweet16 VM would have those defined as symbols for the opcode of those operations.

BruceMcF
Posts: 1336
Joined: Fri Jul 03, 2020 4:27 am

SweetCX16

Post by BruceMcF »



On 4/25/2022 at 6:02 PM, BruceMcF said:




The wedge into Woz's Sweet16 is something like (if the registers are not at $00-$1F ... in the original AppleII registers, "SEC : SBC #R0L" can be omitted):

"..."



Swift16 will be similar, but will be able to jump directly to the SYSR, SYS13 and SYSZ operations, since Swift16 operations do not have to start executing in the same page.



Waitaminute! I just realized that the Sweet16 "status" register is the HIGH byte of Register 14 ... if Register 13 is being "reused" as the temporary store for the SYS jump vector ... so can the low byte of Register 14, allowing a complete JMP() instruction to be built IN the Sweet16 register space. Instead of SYS13, I can have a SYSZ call with a one byte zero page address, and a SYSM call with a two byte absolute address, which can increment Reg15 and use it to grab the high byte of the address.

 


Quote




 



SYSOP:

  CLC

  LDY #0  ; not used in 65C02

  CPX #$1C    ; 2*$0E = 2*SYSZ = $1C

  BMI SYS2    ; 2*$0D = 2*SYSR = $1A -- contents of A is a zero page address

  BEQ SYS1   ; If NE, then, 2*$0F = 2*SYSM = SEC to fetch high byte

  SEC

SYS1:

  LDA (R15L),Y ; fetch zero page address, "LDA (R15L)" in 65C02

SYS2:

  STA R13H ; low byte of JMP() operand

  LDA #$6C ; JMP() opcode

  STA R13L

  TYA ; for zero page addressing

  BCC SYS4

  INC R15L

  BNE SYS3

  INC R15H

SYS3:

  LDA (R15L),Y

  CLC

SYS4:

  STA R14L

  JMP R13L



I'm thinking the Swift16 version would be basically the same, but with three entry points because of no need to "wedge" the call into the common Sweet16 VM opcode page:


Quote




SYSR:

  CLC

  BRA SYS2

SYSZ:

  CLC

  BRA SYS1

SYSM:

  SEC

SYS1:

  LDA (R15L) ; fetch first byte of operand

SYS2:

  STZ R14L ; High byte of operand for zero page addressing

  STA R13H ; store first byte of operand

  LDA #$6C ; JMP() opcode

  STA R13L ; JMP() instruction is now built

  BCC SYS4

  INC R15L

  BNE SYS3

  INC R15H

SYS3:

  CLC

  LDA (R15L)

  STA R14L

SYS4:

  JMP R13L ; returns to Sweet16 VM executive loop



 

BruceMcF
Posts: 1336
Joined: Fri Jul 03, 2020 4:27 am

SweetCX16

Post by BruceMcF »


I've been thinking on this, and think that while I was getting closer, I was fighting Sweet16 too much, rather than going along with it.

Given that the calls are going to be machine language routines providing operations IN the Sweet16 source code -- whether all new operations or bridge calls to Kernel calls -- they can be packaged into a jump table or vector table format for access, so what is really needed is an INDEXED machine language call. That fits well with the single byte operand of the Branch operations.

Also, while the contents of Reg13 are purely transitory, since they are overwritten by each CPR operation ... if a "JMP addr" or "JMP (addr)" instruction is written in R13L, R13H and R14L, then these operations can take advantage of the fact that R14L is a "free" single byte register (the "high" byte, R14H, is constantly over-written to point to the register that Zero/Nonzero, Minus1,NotMinus1 refer to after load, arithmetic and comparison operations), so unlike the JMP opcode and the low byte of the operand, the high byte of the operand in R14L can stay resident.

Which leaves me at TWO TYPES of operation, making up the "Tabled System Calls" opecodes: "TBL page", which sets R14L to the desired page (high byte address), and the "SYS n" and "SYSI n" operations, which performs either a jump TO the nth byte of the table page or a jump using the VECTOR at the nth byte of the table page.

In the "Sweet16 wedge", included with the block of SAVE and RESTORE code between the opcode tables and the "opcode page", this would be something like:

 


Quote




; $0D -- TBL n -- set binary page (high address byte) used for SYS calls

; $0E -- SYS n -- Jump to indexed address of table page

; $0F -- SYSI n -- Jump using indexed vector of table page



SYSOP:    LDA #$4C

    CPX #$1C

    BMI SYS2

    BEQ SYS1

    LDA #$6C

SYS1:    STA R13L

    LDY #0

    LDA (R15L),Y

    STA R13H

    JMP R13L



SYS2:    LDY #0

    LDA (R15L),Y

    STA R14L

    RTS



where the "Swift16" version would be something like:


Quote




; $0D -- TBL n -- set high page of SYS calls

; $0E -- SYS n -- Jump to indexed address of table page

; $0F -- SYSI n -- Jump using indexed vector of table page



SYS:    LDA #$4C

    BRA +

SYSI:    LDA #$6C

+  STA R13L

    LDA (R15L)

    STA R13H

    JMP R13L



TBL:    LDY #0

    LDA (R15L),Y

    STA R14L

    RTS



 



 

rje
Posts: 1263
Joined: Mon Apr 27, 2020 10:00 pm
Location: Dallas Area

SweetCX16

Post by rje »


I appreciate you geeking out on this... My brain is too tired to follow it, but I like to see this.  And hope to try things out with it.

BruceMcF
Posts: 1336
Joined: Fri Jul 03, 2020 4:27 am

SweetCX16

Post by BruceMcF »


After the effort of trying to "crunch" the JMP (abs,X) approach to a Sweet16 VM couldn't beat Woz's code for compactness, I've evolved toward a slightly extended version of Woz's Sweet16 as the "compact" VM, Sweet16c for the 65C02 as the "faster, though larger" 65c02" version, and a 65816 version of the VM that can execute mixed 6502/Sweet16 code with the 6502 code running in emulation mode and the Sweet16VM implemented in native 65816 mode.

Now, aside from porting the VM independent of Woz's code, I have two "new" things: the three new System Jump Table opcodes, and the jump table at the end allowing the same code on a system to be able to be used with a variety of Sweet16 VM implementations.

However, assembling the Woz code with my SYSOP "wedge", the page with the opcodes didn't have space for the jump table --  it came up three bytes short.

The first opcode has to be at address $01 or higher in the page, because first "LDA #>SET" is pushed onto the stack, and then the bottom byte of the subroutine return vector is defined with, eg,, "<SET-1". But if SET is at (eg) $0700, then ">SET" is $07 and "<SET-1" is $FF, because SET-1 is $06FF. But then the subroutine return vector on the stack is, effectively, $07FF, which returns to $0800 ... oops!

To be clear, the idea is to tuck the VM up "high" in a memory space ... the top of "Golden RAM", or the top of a HighRAM segment, or etc. The "high entry point" when added to Woz's original VM really has to fit into the end of the same page that has the opcodes.

But if placing the first opcode routine at one past the page boundary, my precious two-operation jump table spills three bytes out of Golden RAM!

The first trick is following Woz's lead with "BPL SETZ" being an effective "BRA SETZ" because branch apps are called after loading A with the offset from Register0 of the register that the status is based on, so the sign flag should always be clear when starting execution of a "Branch Op".

I had already done that with "BPL SYSOP" ... but Woz placed "RTN: JMP RTNZ" at the end of his code. Replacing that with a "RTN: BPL RTNZ" in front of "SET: BPL SETZ" saves one byte.

And then the second trick was a design simplification, winnowing the jump table to just the single "JMP SWEET16". The idea of the second routine in the table was to export the Save/Restore register routines, but it is possible to set things up so that that their addresses can be inferred, so I've settled for that.

Now it all JUST fits. And  ... with a single byte to spare!

NOTE: The idea I am have been toying that makes direct access to register restore an issue for interspersed Sweet16 and 6502 code is to make the state of carry significant when entering Sweet16: with carry clear, state is stored on entry and restored on exit, with carry set. So if originally called with carry clear, then returning to 6502 code for some task before returning to Sweet16 code with carry set, the ORIGINAL state stored when first entering Sweet16 is still there, and at the end of the WHOLE process, Sweet16 can return to 65C02 code which can end with a JUMP to restore the state, where the restore state subroutine returns to the caller. And of course, say, fetching the call address at the end of the Sweet16 VM, subtracting two from it and fetching the word at that address in a Sweet16 register (that the process won't be using) is a very short routine in Sweet16 code. If it was Reg11, the terminating 6502 ending code might end with JMP (Reg11) to restore the register state when the whole combined routine was first called.

BruceMcF
Posts: 1336
Joined: Fri Jul 03, 2020 4:27 am

SweetCX16

Post by BruceMcF »


OK, cracked it. Since I have exactly one byte leeway in my "augmented version", what I am doing is this:

START POINT ; Doesn't have to be first byte of VM, but often is

   JSR PUTSTATE

   ...

GETSTATE:

   LDA REGP

   PHA

   LDA REGA

   LDX REGX

   LDY REGY

   PLP

   RTS

 

PUTSTATE:

   PHP

   STA REGA

   STX REGX

   STY REGY

   PLA

   STA REGP

   RTS

...

GS_OFFSET: !byte (PUTSTATE - GETSTATE)

; ENTRY POINT

   JMP SWEET16

... In other words, the final word of the VM is implicitly a handle for SAVE ... it contains a pointer to one less than the pointer to the SAVE routine.. So if I know how far RESTORE, aka GETSTATE is located (within 255 bytes), I can build my own jump table or vector table. That offset is contained in the byte before the  entry point.

The limitations on ANY Sweet16 VM using this system would be that the SAVE routine must FOLLOW the RESTORE routine, and be within 255 bytes of it.

It is arbitrary which one must be first, so this follows the Apple2 ROM addresses of register "SAVE" at $FF4A and register "RESTORE" at $FF3F, so a RAM based "augmented Sweet16" for an Apple II could re-use the Apple II ROM SAVE and RESTORE.

For the direct additions to Woz's original Sweet16 source code, I don't have an open source licensed copy (even if clearly Woz won't mind!), I can distribute additions to the source available at 6502.org, so that must follow the naming in the original, but for my own implementation, I avoid calling them "SAVE" and "RESTORE" to avoid confusion with C64 KERNAL / CX16 Kernal routines.

 

Post Reply