2 hours ago, Snickers11001001 said:
Love, love love these projects.
If I can posit any advice from the cheap seats (i.e., from someone like me who absolutely has no ability to make such a thing, but who is a user of BASIC and enjoys it), it would be this: There are things that BASIC needs but not everything from 'gen 3' basics like visual basic need to be included or are feasible. Remember at bottom this platform is still an 8 bit machine with no prefetch, no branch prediction, no modern pipelining, no cache memory, almost no maths, and a 16 bit address space. Its running at 8mhz, maybe, unless they (a) can't get it to work at that speed, in which case it will be a 4mhz machine, or (b) if they release the faster X8 FPGA version (that won't have the banked memory, ZP space, or space in the pages below $0800 that the X16 has) which probably won't be able to fit all your features anyway.
Just take a look at the discussion near the end of the "BASIC 2 vs BASIC 7" thread and the impact of just a few more instructions in the byte fetching/parsing core between the C64 and the later Plus/4 and C128 in terms of the negative impact on performance for the later machines with better BASICs. If you're having to bank-switch, for example, surely it takes a hit.
Tokenizing a lot of stuff inline (e.g., constants, jumps, variable memory locations) is a great idea, but I suggest a simple escape code structure using byte codes. Parser finds, say petscii code for '@' not inside quotes and it knows the next two bytes are a small-int in 16 bit signed format; it finds petscii code for [english pound] (which looks a bit like a mutant F), it knows the next 5 bytes are the exponent and mantisa for a float; it finds token for 'goto' or 'gosub' it knows the next two bytes are the actual 6 bit address for the destination of the jump in the code, instead of the petscii numeric representation of a line number; it finds petscii code for "%" it knows the next two bytes are the 16 bit address to the value followed by the name of an int style variable in memory. (At execution it just needs to fetch the value, during LIST operation it grabs the variable name at that address+2 until it hits the terminator). Yeah, OK, the modern way to do many of these things would be with a hash table, but I caution you to consider the performance impact on an 8 bit machine.
If you use the idea of inlining 16 bit addresses for jump locations to speed up execution, of course, then there are other issues. With line numbers, your "LIST" routine needs only follow the 16 bit address and then grab the line number at that address and put it on the screen during a 'LIST"; but with LABLES, you will need to set up a data structure (probably a linked list) that can be consulted by the interpreter during code LIST operations to regurgitate the labels or line numbers when the user lists the code and that metadata has to get saved with the program. That's actually a better place to use banked memory... the performance cost of swapping banks is not as important when listing the code. I don't think its feasible to tokenize at runtime, it needs to be as you enter things.
This is not far off from what I suggested a while back. Tokenize not just the keywords (PRINT, GOTO, etc) but also variable names and numeric literals. Assuming 01-12 are available as token codes, we could use:
01 - 8 bit byte (an integer literal between 0 and 255)
02 - 16 bit integer (any integer literal between -32768 and 65535)
03 - 40 bit float (any numeric literal with a decimal point, eg: 3.14 or 1.0)
04 - byte variable (# sigil or DIM x AS BYTE)
05 - integer variable (% sigil or DIM x AS INT)
06 - float variable (! sigil or DIM x AS FLOAT)
07 - string variable ($ sigil or DIM x AS STRING)
08 - label
09 - start of subroutine
10 - start of function
11 - end of function or subroutine
PRINT 1234 gets changed to
94 02 34 12
PRINT A$ might get converted to
94 07 01
and A$="HELLO" becomes
07 01 B2 "HELLO"
You could also change types on the fly by referencing a variable with a different sigil. So
X = $1234
could be referenced with the byte sigil and would act like a 2 byte array:
PRINT X#
34
PRINT X#(1)
12
(Remember that arrays are zero-based)
This implies that arrays are nothing special: array variables would simply reserve more than 1 space in the heap, so:
DIM NAMES$(25) AS STRING would reserve 50 bytes on the heap, and if you recalled NAMES%(x), you would get back a 2-byte value, which is actually the pointer to the string.
Where this comes in useful is creating large, arbitrary data arrays. For example, rooms in an adventure game.
DIM ROOM#(1024) creates a 1K chunk of memory that can be used for any purpose. You could then load rooms in on-demand from disk, every time the player moves from one room to another.
Labels and subroutine names would simply be more entries on the variable table.
The variable table itself is super simple:
01-02: data/code address
03: length of variable name
04-?? text of variable name
There are no types on the variable table, because the type is determined at runtime based on the token code. The token code is determined at compile time based on the sigil or a DEF <BYTE | INT | FLOAT> statement.
There are a ton of advantages to this system. Right now, the BASIC routines all have to parse their own data. Doing it this way means the data is pre-parsed. The routines simply read the parameters directly out of the program stream.
The actual program text can be more compact, too. You don't store spaces. You don't store commas in parameter sequences. Those just discarded and re-created if the program is detokenized (listed).