New community dev tool uploaded: 8sh

BruceMcF · Post by **BruceMcF** » Sat Apr 10, 2021 4:51 am

On 4/9/2021 at 1:31 AM, rje said:

The bit *I* find challenging is compiling to bytecode. I think it's a recursive descent parser, with an expression engine, and a bytecode emitter with forward referencing for handling blocks. Or maybe I can use a stack?

One way to simplify the compilation is to simplify the grammer.

For example, suppose that ALL infix operations are evaluated left to right, and without parentheses all right hand side operands are evaluated first. So: 3+5*8-4/3 means:

3+(5*(8-(4/3))). And suppose () are supported. Then you simply parse between operations and operands, push operands on an operand stack, push operations on an operations stack, and when you get to the end of the line you execute the operations stack. ")" goes ahead and executes the current operations stack, and "(" on the operations stack stops executing the operations stack and goes back to executing the bytecode.

So in practice, people get used to using () a lot.

Now function or procedure calls are just prefix operations, a "proc(" pushes the ( onto the operation stack then the proc, and in proc(x,y,z) the comma is just a separator: (x,y,z) gets executed as ( ... x y z ) on the operand stack and "proc" uses the top two entries on the operand stack. ")" doesn't care whether it is closing an expression or a procedure call, it just executes the current operation stack.

rje · Post by **rje** » Sat Apr 10, 2021 8:44 pm

Thanks, Bruce! I think I can keep the expression engine, which I think was in pretty good shape.

BruceMcF · Post by **BruceMcF** » Sun Apr 11, 2021 4:43 am

7 hours ago, rje said:

Thanks, Bruce! I think I can keep the expression engine, which I think was in pretty good shape.

Yeah, just spitballing. One thing about operand and operator stack approaches on a 65c02 is the hardware stack can be used for the operator stack, so you push (address-1) subroutine return vectors onto the operator stack and normal operator execution routine end with RTS to execute the next operation.

rje · Post by **rje** » Sun Apr 11, 2021 5:48 pm

THAT is something for me to keep in mind. Thank you again!

rje · Post by **rje** » Sun Apr 11, 2021 6:09 pm

The precedence parser function looks like this. It uses a parse table that encodes precedence in atoms.


static void parsePrecedence(uint8_t precedence)
{
   advance();
   //
   //  The prefix rule MUST exist.
   //
   if ( parser_previous->type >= TOKEN_START_OF_ERRORS )  // take care of all the error tokens
   {                                                  // (which DO NOT have Pratt entries!)
      return;                                         //
   }                                                  //
   if ( prefixRule[parser_previous->type] == NULL )
   {
      error(TOKEN_ERROR_EXPRESSION_EXPECTED);    // "expression expected."
      return;
   }

   (prefixRule[parser_previous->type])();

   //
   //  If we're at a lower precedence, then advance and call
   //  the infix rule on the previous token, if present.
   //
   while(precedence <= precedenceRule[parser_current->type])
   {
      advance();
      if ( infixRule[parser_previous->type] ) // might not exist
      {
         (infixRule[parser_previous->type])();
      }
   }
}

rje · Post by **rje** » Mon Apr 12, 2021 2:39 pm

On 4/9/2021 at 11:51 PM, BruceMcF said:

One way to simplify the compilation is to simplify the grammer.

For example, suppose that ALL infix operations are evaluated left to right, and without parentheses all right hand side operands are evaluated first. ...

So in practice, people get used to using () a lot.

Bruce, I've been thinking about it, and you've triggered something in my mind that might work.

Wikipedia talks about this structural approach to precedence in math expressions. It's in the same section as Pratt parsers: https://en.wikipedia.org/wiki/Operator-precedence_parser#Pratt_parsing

Basically, you in-line replace "+" with the equivalent of ")) + ((" and "*" with ") * (" --- or maybe it's the other 'way 'round. Any rate, it's a mechanical substitution method that works very well for being relatively unintelligent.

Quote

Another approach is to first fully parenthesize the expression, inserting a number of parentheses around each operator, such that they lead to the correct precedence even when parsed with a linear, left-to-right parser. This algorithm was used in the early FORTRAN I compiler:^[7]

The Fortran I compiler would expand each operator with a sequence of parentheses. In a simplified form of the algorithm, it would

replace + and – with ))+(( and ))-((, respectively;

replace * and / with )*( and )/(, respectively;

add (( at the beginning of each expression and after each left parenthesis in the original expression; and

add )) at the end of the expression and before each right parenthesis in the original expression.

Although not obvious, the algorithm was correct, and, in the words of Knuth, “The resulting formula is properly parenthesized, believe it or not.”^[8]

Example code of a simple C application that handles parenthesisation of basic math operators (+, -, *, /, ^, ( and )?

I was thinking, I could do that in the TOKENIZER, and then the compiler could be simpler. That's the right place to "rewrite" the input stream. I should be able to isolate that bit into its own little chunk of code. It could be a nice tradeoff.

rje · Post by **rje** » Mon Apr 19, 2021 3:09 pm

Of course the interpreter has hashtables, even though they will be small.

I will have to change the hashtable used by the reference code. Even Perl's hashtable gets me more elements (>320 versus 200 or so). (Yeah these are small tables -- low resources!)

So I will probably just wrap calls around a trivial manager that uses RAM banking -- one bank for a system table, one for constants, and one for dynamic use. Or not.

Keys are probably just a char* with a length byte overhead. Call it a "string".

Values are probably a type and a union, perhaps a minimum of 5 bytes in size.

Assume 8 bytes min per entry and one bank could hold a maximum of a thousand entries, probably as a reductio ad absurdum.

rje · Post by **rje** » Mon Apr 19, 2021 4:11 pm

Purpose purpose purpose.

I'm asking it myself. Besides the purpose of DOING it, that is, which I am finding valuable.

I don't have a good answer.

kelli217 · Post by **kelli217** » Mon Apr 19, 2021 4:59 pm

The process of building a shell is educational. If you ever go completely wild and decide to replicate something like LUnix or GeckOS then you'll have this experience to call upon.

rje · Post by **rje** » Mon Apr 19, 2021 6:11 pm

Yep, totally getting that.