Quote
I would think the first / main reason is the amount of bank switching required.
Bingo!
'CHRGET' (which I think I mentioned earlier) is the machine code that fetches a byte of the BASIC listing from program memory for the interpreter. There are equiv. routines for BASIC to go get variable values etc., from var. space or the string heap. But the 'CHRGET' routine gets called a LOT by the BASIC stuff.
In the C64 (and our X16) the actual routine is copied to zeropage at startup and runs from there. Its 24 bytes long with the main pointer actually included in the routine as a self-modified pointer. The pointer contains the address of the next byte to be fetched. On the C64 the two byte pointer is actually the object of an LDA right in the routine.
On the Plus/4 (which is where my experience was) that routine is 33 bytes and doesn't even live in zero page, although the two byte pointer takes two more bytes and IS in ZP. Now part of the reason its not in ZP can be chalked up to questionable decisions on the Plus4 side from Commodore's team. They used space in zero page to hold variables for the RENUMBER and AUTO line number stuff -- things that are not performance critical and on the Plussy and did not need to use ZP space. They also reserved 20 bytes or so in ZP for a speech chip enhancement for a top end Plus4 model that was never released. But mostly, the Plus4 and 128 CHRGET functions don't live in zero page because they wouldn't get much benefit all things considered from doing so. All those extra bytes on the PLUS4 are disabling interrupts, banking out the ROMs with a write to a register on the TED chip (up in the FFxx area), then they load the text pointer (using a more cycle costly instruction ) and finally they have to make another write to the TED registers to bank the ROMS back in and, finally, reenable the interrupts. AND the Plus4 does that for EVERY single BASIC character fetch. (A similar routine with the added interrupts/banking cost is used when it grabs a variable or a string). It does the banking even where the pointer contains an address in an area of the memory that would never be under a ROM. Simply put, the logic to 'look and decide' if the pointer was to an address under ROM or not would itself be more costly than just doing the bank operations every time.
If memory serves, the 128 banked ram/rom for the first 64K for BASIC program stuff; and ROM/RAM at the second 64K for variables and string stuff. Same sort of problem/cost. The 128 also introduced so many new tokens that they had to use a token as an escape character so that some keywords now had two tokens, but I don't think that would have much of a performance hit except maybe in scanning the dispatch table when those tokens were used -- and that would not be the case in an apples to apples comparison of a Prime Numbers test or something like that.
I suppose we have to also acknowledge Commodore was doing all these changes 'in house' instead of having Microsoft (who we might expect to have been better at updating their own BASIC) do so. But Jack Tramiel had gotten away with murder on his BASIC license deal with Microsoft back with the PET machine. Commodore's license was a flat one-time payment, and NOT tied to units sold and was perpetual for Commodore to use and extend for ANY Commodore computer model based on a 6502!. You can bet Tramiel was NOT going to got back to MS asking for them to work on any update as that would have opened the door to amending that license!
But yeah, mostly it was the banking. Someone made a utility for the Plus4 last year and all it does is copy the entire ROM set (BASIC and KERNAL) down into RAM; rewrite all the routines that used bank switching to skip doing that; adjust the top of memory to account for the permanent residency of the rom code there, and just leave the ROMS banked out. Effectively it puts it in the same mode as C64 but still with a little more costly CHRGET routine since its still not in ZP and not self modifying. And because the Plus4 ROMs are so much larger, it leaves the user with only 28K of memory for BASIC but its often a double digit improvement in performance.