INC BASIC (for lack of a better name)
Posted: Fri Apr 02, 2021 8:05 am
Preface: I'm not trying to teach anyone anything, I'm just trying to commit some thoughts to writing and soliciting feedback.
On the one hand we have Commodore / Microsoft BASIC. It tokenizes keywords into one or two byte tokens, but otherwise the line is stored in memory as typed. Execution is slow, but it is easy to list for the programmer to read, and the interpreter reparses the line at run time for every execution.
At the other extreme we have machine language which is very hard to read. Assembly makes it a little easier, but it is still difficult to read.
In between we have various languages that can be compiled to some form of code, either machine code or p-code, but there is a "painful" compile / run / test cycle.
I think a more elaborate BASIC interpreter could improve on the execution speed of traditional BASIC by doing more work while crunching, trading a little less efficiency as the programmer is writing the code for improved efficiency after typing run. This BASIC would still allow a highly interactive experience for the programmer, without a time consuming edit / compile / link / run cycle.
As an example, consider a line of BASIC:
10 PRINT 200+300*500+700
BASIC crunches it down to the following bytes (in hex):
17 08 (pointer to next line)
0A 00 (line number 10)
99 20 (PRINT token followed by a space)
32 30 30 (digits 2 0 0)
AA (add token)
33 30 30 (digits 3 0 0)
AC (multiply token)
35 30 30 (digits 5 0 0)
AA (add token)
37 30 30 (digits 7 0 0)
00 (end of line)
To execute that after typing run, BASIC has to read 22 bytes. First it notes that the keyword is PRINT. Skip the whitespace. Convert the digits 200 to a floating point number. Push it on the stack. Read the add operator which means there has to be another expression to the right. Convert the digits 300 to a floating point number. Push it on the stack. Read the multiply operator which means there is another expression to the right, and multiply has higher precedence than add. Convert the digits 500 to a floating point number. Read the add operator which means there is another expression to the right, and add has lower precedence than multiply, so finish the multiply by popping 300 and 500 from the stack, multiplying them, and pushing the result (150000) back on the stack. Convert the digits 700 to a floating point number. Push it on the stack. Read the end of line marker. Pop 150000 and 700, add them, and push the result (150700) back on the stack. Pop 200 and 150700, add them, and push the result (150900) back on the stack. Now we have a single expression, so print it.
Imagine an alternative implementation that crunches the line to bytes as follows:
xx yy (pointer to next line; the exact value doesn't matter at the moment)
0A 00 (line number 10)
10 (length of "listable" crunched line in bytes [16])
99 20 (PRINT token followed by a space)
01 C8 (literal byte value 200)
AA (add token)
02 2C 01 (literal word value 300)
AC (multiply token)
02 F4 01 (literal word value 500)
AA (add token)
02 BC 02 (literal word value 700)
08 (length of "executable" crunched line in bytes)
02 (offset of literal byte value 200)
05 (offset of literal word value 300)
09 (offset of literal word value 500)
AC (multiply)
AA (add)
0D (offset of literal word value 700)
AA (add)
99 (PRINT)
Listing the code becomes more complex (slower) because there is more "uncrunching" to do. Entering a line becomes more complex (slower) because there is more "crunching" to do. Running that line of code has to read 25 bytes instead of 22 bytes, but it doesn't have to convert strings to numbers which results in much less machine code being executed. In my imaginary code above I'm using bytes and words to store the literal numbers, but we could store them in another larger but still preprocessed format (such as floating point) that is much faster for the interpreter to process at run time, rather than continually converting text to numbers.
Of course, this benefit can be achieved in large part by storing numeric constants in variables which only have to be converted once, and people do that already in BASIC when they are trying to optimize their code.
I'm not suggesting this is the exact alternative format that should be used for INC BASIC. Some of my thoughts include:
1. A full screen editor to edit blocks of code by name, rather than requiring line numbers.
2. Crunching a line would identify all the "tokens" in the line of text and store them in a table that includes variables, constants, labels. In this way variable creation would be part of editing the code, rather than a step that takes place at run time as variables are encountered for the first time.
3. Constant expressions could be evaluated at crunch time.
4. Labels are basically just constant expressions, so there would not need to be any slow linked list search of where to goto or gosub next.
5. Inline assembly for small speed critical parts would be nice to have.
6. Support more than just floating point expressions, since byte math is faster than word math is faster than floating point math.
In essence, the full screen editor would "compile" the text into a tokenized form, updating data structures as it went so that when it came time to run the program, all it had to do is reset variables to zero / default values.
I welcome feedback. If you think it is the worst idea in the history of ideas, that's fine, I'm just thinking it could be a nice middle ground between existing BASIC and the machine language monitor, especially if it could be located in a ROM bank. The way to make code faster is to execute less of it, and I think something like this is at least an interesting thought experiment.