Page 13 of 18
BASIC 2? Why not get BASIC 7?
Posted: Thu Jul 29, 2021 5:06 am
by Snickers11001001
Quote
I would think the first / main reason is the amount of bank switching required.
Bingo!
'CHRGET' (which I think I mentioned earlier) is the machine code that fetches a byte of the BASIC listing from program memory for the interpreter. There are equiv. routines for BASIC to go get variable values etc., from var. space or the string heap. But the 'CHRGET' routine gets called a LOT by the BASIC stuff.
In the C64 (and our X16) the actual routine is copied to zeropage at startup and runs from there. Its 24 bytes long with the main pointer actually included in the routine as a self-modified pointer. The pointer contains the address of the next byte to be fetched. On the C64 the two byte pointer is actually the object of an LDA right in the routine.
On the Plus/4 (which is where my experience was) that routine is 33 bytes and doesn't even live in zero page, although the two byte pointer takes two more bytes and IS in ZP. Now part of the reason its not in ZP can be chalked up to questionable decisions on the Plus4 side from Commodore's team. They used space in zero page to hold variables for the RENUMBER and AUTO line number stuff -- things that are not performance critical and on the Plussy and did not need to use ZP space. They also reserved 20 bytes or so in ZP for a speech chip enhancement for a top end Plus4 model that was never released. But mostly, the Plus4 and 128 CHRGET functions don't live in zero page because they wouldn't get much benefit all things considered from doing so. All those extra bytes on the PLUS4 are disabling interrupts, banking out the ROMs with a write to a register on the TED chip (up in the FFxx area), then they load the text pointer (using a more cycle costly instruction ) and finally they have to make another write to the TED registers to bank the ROMS back in and, finally, reenable the interrupts. AND the Plus4 does that for EVERY single BASIC character fetch. (A similar routine with the added interrupts/banking cost is used when it grabs a variable or a string). It does the banking even where the pointer contains an address in an area of the memory that would never be under a ROM. Simply put, the logic to 'look and decide' if the pointer was to an address under ROM or not would itself be more costly than just doing the bank operations every time.
If memory serves, the 128 banked ram/rom for the first 64K for BASIC program stuff; and ROM/RAM at the second 64K for variables and string stuff. Same sort of problem/cost. The 128 also introduced so many new tokens that they had to use a token as an escape character so that some keywords now had two tokens, but I don't think that would have much of a performance hit except maybe in scanning the dispatch table when those tokens were used -- and that would not be the case in an apples to apples comparison of a Prime Numbers test or something like that.
I suppose we have to also acknowledge Commodore was doing all these changes 'in house' instead of having Microsoft (who we might expect to have been better at updating their own BASIC) do so. But Jack Tramiel had gotten away with murder on his BASIC license deal with Microsoft back with the PET machine. Commodore's license was a flat one-time payment, and NOT tied to units sold and was perpetual for Commodore to use and extend for ANY Commodore computer model based on a 6502!. You can bet Tramiel was NOT going to got back to MS asking for them to work on any update as that would have opened the door to amending that license!
But yeah, mostly it was the banking. Someone made a utility for the Plus4 last year and all it does is copy the entire ROM set (BASIC and KERNAL) down into RAM; rewrite all the routines that used bank switching to skip doing that; adjust the top of memory to account for the permanent residency of the rom code there, and just leave the ROMS banked out. Effectively it puts it in the same mode as C64 but still with a little more costly CHRGET routine since its still not in ZP and not self modifying. And because the Plus4 ROMs are so much larger, it leaves the user with only 28K of memory for BASIC but its often a double digit improvement in performance.
BASIC 2? Why not get BASIC 7?
Posted: Thu Jul 29, 2021 5:26 am
by Scott Robison
The +4 was a faster clocked CPU, which means the fact that it is slower than a C= 64 is even more "impressive". Lots of code must run to get each byte. I have to believe there would be a way to write that routine to not take the speed hit of bank switching when accessing always RAM / never ROM space while not significantly slowing down RAM under ROM, but that's just intuition. Even tripling the amount of ram for those routines (which I don't think would be necessary) would be well worth the double digit speed improvement, but that ship sailed long ago.
Edit: forgot to say, the fact that the C= 128 was slower in otherwise identical BASIC code isn't surprising, since it was clocked at the same speed as the 64 (and we can't switch to FAST without the program becoming different).
BASIC 2? Why not get BASIC 7?
Posted: Thu Jul 29, 2021 6:04 am
by Snickers11001001
15 minutes ago, Scott Robison said:
The +4 was a faster clocked CPU, which means the fact that it is slower than a C= 64 is even more "impressive". Lots of code must run to get each byte. I have to believe there would be a way to write that routine to not take the speed hit of bank switching when accessing always RAM / never ROM space while not significantly slowing down RAM under ROM, but that's just intuition. Even tripling the amount of ram for those routines (which I don't think would be necessary) would be well worth the double digit speed improvement, but that ship sailed long ago.
Edit: forgot to say, the fact that the C= 128 was slower in otherwise identical BASIC code isn't surprising, since it was clocked at the same speed as the 64 (and we can't switch to FAST without the program becoming different).
The cost is mostly in cycles. My idea back in the day was to put together a custom interrupt handler that added a peek at the high byte of the pointer and if it were less than $79 it would write a zero/nonzero flag somewhere that might let one branch between fast/slow versions of the fetch routines like CHRGET. But... by that point I was in college chasing women, had moved on to my first 386 DOS machine, and didn't have the interest (or skills if I'm honest) to chase the issue down and figure out if it would really be feasible or worthwhile.
BASIC 2? Why not get BASIC 7?
Posted: Thu Jul 29, 2021 6:14 am
by Scott Robison
I just wrote the following test code to see how much difference it makes. The program:
As I don't have any actual hardware, I'm using WinVice 3.5, which should be adequate and illustrative if not 100% exact. It's close enough. All time reported in jiffies as it is a measure independent of the host environment. All machines are NTSC.
x128 GO64: 521 jiffies (8,883,050 cycles)
x128 SLOW VIC: 706 jiffies (35.5% slower, 12,037,300 cycles)
x128 FAST VDC: 334 jiffies (35.9% faster, 11,389,400 cycles [roughly the same plus benefit of VIC being inactive])
xplus4: 659 jiffies (26.5% slower, 19,330,666 cycles)
Just to be complete, a special version of the program that enables 2 MHz clock in 64 mode:
x128 GO64 POKE 53296: 252 jiffies (51.6% faster, 8,593,200 cycles [roughly the same as regular 64 mode plus benefit of VIC being inactive])
BASIC 2? Why not get BASIC 7?
Posted: Thu Jul 29, 2021 6:20 am
by Scott Robison
Without taking time to look, I wonder how much of the enhanced speed of the 128 relative to the +4 is due to better coding on the 128 and how much is due to the 128 MMU being more efficient at bank switching.
BASIC 2? Why not get BASIC 7?
Posted: Thu Jul 29, 2021 7:06 am
by Snickers11001001
To round out your little bench mark results I got some additional measurements on the Plus/4 platform. Also via an emulator (but PLUS4EMU instead of VICE -- which I despise since the last major overhaul turned it into bloat city).
I typed your listing, spaces and all, exactly as referenced in your screen shot above. A sanity check running on 'bone stock' NTSC Plus/4 settings for the emulator, duplicated your 659 jiffies result published above.
Then:
I: Using the "FAST BASIC" utility I described above (actually called 'FBI+4' on the Plus4world.com site) --- i.e., the one where they copy all the ROMs to their corresponding RAM addresses, and just don't use ROM anymore (with appropriate modifications to the routines that were banking for every fetch):
Result: 550 jiffies.
II. Using the Plus/4 pokes to take the TED video stuff out of the loop (equiv. to fast on the 128), location 65286 poked to '0' at beginning of program, and restored to '27' at the end to get the screen display back:
Result: 396 jiffies.
III. Applying BOTH of the above 'enhancements':
Result: 331 jiffies.
Observation: Result II is the big one, it caused me to do some googling. Looks like the Plus/4 actually runs at 50% CPU clock speed (i.e., SLOWER than a C64) during all operations where the TED and CPU are juggling access to memory for screen display. BUT, it appears that whenever the beam's painting the borders or overscan area, the processor jumps up to full speed. That's why disabling the screen gets such a good increase in performance. Still, whether the screen is on or off, taking the drastic step of giving up over half of the available RAM for that utility to do its thing cuts the execution time down by quite a large percentage.
Pretty interesting.
BASIC 2? Why not get BASIC 7?
Posted: Thu Jul 29, 2021 7:30 am
by Snickers11001001
1 hour ago, Scott Robison said:
Without taking time to look, I wonder how much of the enhanced speed of the 128 relative to the +4 is due to better coding on the 128 and how much is due to the 128 MMU being more efficient at bank switching.
Speaking to this point, I'd be curious if you can post a screen shot of the disassembly of the 128's 'CHRGET' routine. The 'Mapping the 128' book says it lives at $0380-$039E. From BASIC type 'MONITOR' and then (assuming there's not something you have to do in monitor to deal with 128 banking that is) the command:
D 0380 039E
should get the disassembly onscreen.
BASIC 2? Why not get BASIC 7?
Posted: Thu Jul 29, 2021 2:23 pm
by Scott Robison
6 hours ago, Snickers11001001 said:
Speaking to this point, I'd be curious if you can post a screen shot of the disassembly of the 128's 'CHRGET' routine. The 'Mapping the 128' book says it lives at $0380-$039E. From BASIC type 'MONITOR' and then (assuming there's not something you have to do in monitor to deal with 128 banking that is) the command:
D 0380 039E
should get the disassembly onscreen.
Here you go.
BASIC 2? Why not get BASIC 7?
Posted: Thu Jul 29, 2021 3:07 pm
by Snickers11001001
33 minutes ago, Scott Robison said:
Here you go.
Thanks. Looks like you were right... the 128's gotta be smarter/better at managing things, probably via the MMU. In the screen shot of the Plus/4 version of 'CHRGET' I'll drop below, you'll see set/clear interrupt instructions that are not present in the 128 version. I have to assume that's because of something they did architecturally different between the two machines. Other than that, the routines are pretty much the same.
BASIC 2? Why not get BASIC 7?
Posted: Thu Jul 29, 2021 3:24 pm
by Scott Robison
It seems that the 128 copies the ROM from FF05 - FFFF to both RAM banks. The IRQ vector points to a trampoline in that page that can save state, restore ROM, run, then restore the original setting. Not my analysis, I read this at
https://retrocomputing.stackexchange.com/questions/17132/is-there-a-possibility-for-a-user-defined-irq-hardware-vector-on-a-commodore-128 ... anyway, it saves the need to SEI/CLI.
Anyway, after searching online for information about +4 bank switching, I think the two systems are really very comparable in how they deal with bank switching. +4 write to one address switches ROM out (FF3E) , another address to switch ROM in (FF3F). 128 is more complicated, but they simplify it by having four configurable memory configurations, of which they use two: FF01 to get RAM, FF03 to restore ROM (if I'm reading everything right before waking up completely).