Author Topic: A question on Parsing  (Read 6693 times)

Offline drogon

  • BASIC Developer
  • Posts: 11
    • Drogon Projects
A question on Parsing
« on: September 02, 2013, 05:25:47 AM »
Here is a question to you all...

Coding the parser in RTB wasn't my best accomplishment, however it works. I started with the idea that I'd strip spaces out of the input line the pick out the keywords and so on. (I had a vague idea that I could have variables names with spaces in at one point) This works relatively well, but the downside right now is that I can't have variable names that include keywords. This is frustrating at times - e.g. on the Raspberry Pi when using the GPIO, I want a variable called pin, but I can't because pin has pi in it which is a built-in constant. Similarly for something like fortune = 42 which contains the keyword for.

So what's the solution? Improve the parsing code to try to work out the difference between a variable and a keyword or impose a code entry restriction that requires keywords to be space (or other keyword/symbol) bounded?

So fortune  = 42 fine to set the variable fortune to 42, but for tune = 1 to 42 would require the space after for and fortune = 1 to 42 is  syntax error because it wasn't expecting to in a variable assignment...

Similarly something like: a = sin(45) returns the result of sine (45) into a, but a = sine(45) fetches element 45 from an array called sine... (which it won't as sin will be recognised first...

I suspect that if I did change over then there may be some old code that might not work - e.g. places where someone has not put a space after for for example... Not sure if it would be worth the bother!

Anyone have any views/suggestions? What do others do here? (or am I just a naive parser coder ;-)

-Gordon




Offline John

  • Forum Support / SB Dev
  • Posts: 3512
    • ScriptBasic Open Source Project
Re: A question on Parsing
« Reply #1 on: September 02, 2013, 09:56:30 PM »
FWIW - ScripBasic requires clear/ readable BASIC syntax to work. I don't think stripping spaces does anything for readability. I never liked multiple statements on a line either. SB is an embeddable scripting API and any replication of functionality has be minimized to keep it's runtime at ~500KB. I haven't missed CASE and find IF/ELSEIF works just fine. I was taken back at first by your use of CYCLE/REPEAT common loop structure but I'm getting more comfortable with it.

« Last Edit: September 02, 2013, 10:00:43 PM by JRS »

Offline drogon

  • BASIC Developer
  • Posts: 11
    • Drogon Projects
Re: A question on Parsing
« Reply #2 on: September 03, 2013, 02:46:23 AM »
FWIW - ScripBasic requires clear/ readable BASIC syntax to work. I don't think stripping spaces does anything for readability. I never liked multiple statements on a line either. SB is an embeddable scripting API and any replication of functionality has be minimized to keep it's runtime at ~500KB. I haven't missed CASE and find IF/ELSEIF works just fine. I was taken back at first by your use of CYCLE/REPEAT common loop structure but I'm getting more comfortable with it.


It was the interpreter that was stripping spaces as part of the parser - they were re-generated in the LIST output (and preserved when you use the built-in editor, or an external editor) And yes , I never liked the multipe statement per line thing either (and RTB doesn't support it). It was a neccessary evil in the bad old days to save a bit of memory and execution time...

The cycle...repeat thing comes from an Algol-like language called Imp77 which was used in Edinburgh (& Manchester I think) universities in the 1970s and '80s to write their operating system (EMAS) and utilities in. It was my first exposure to a structured programming language. (After BASIC & FORTRAN) It supported the while/until at the top or bottom of the loops too and had a few other interesting constructs like:

Code: [Select]
  a = 5 unles b = 7
unless was the opposite test for if just as while and until are opposite tests. (and you could put the test after the statement).

While Imp77 supported more or less everything that C does (including the ability to write complete multi-user operating systems in), it more or less died as a general purpose language due to the widespread adoption adoption of C & Unix. It used 'stroppping' for the keywords - I guess to make the parser easier to impement in those days of limited memory/resources, so each keyword was prefixed by %, so:

Code: [Select]
a = 5 %unless b = 7
and the text formatter you passed listings through would use the % character as a flag to underline stuff. Doing that and ignoring spaces allowed variables, function names, etc. to have spaces in them... (which is where I got that idea from!)

So my task this week is to improve the parser :-)

And back to cycle repeat - I'm toying with the idea of a simple counted repeat now - after using RTB to teach introduction to programming using turtle graphics...

Code: [Select]
  clock // Angles
  cycle 4 // alternative to: for count = 1 to 4 cycle ?
    move (100)
    turn (15)
  repeat

-Gordon

Offline Charles Pegge

  • BASIC Developer
  • Posts: 69
Re: A question on Parsing
« Reply #3 on: September 05, 2013, 01:28:06 AM »
Hello Gordon,

Spaces within variable names: I tried this once, but it proved to be confusing for programming, and demanding for the parser. However, there is one possible use. If your language is going to support compound types or objects then the member name could separated from the main name by a space, instead of the more usual dot.

For example:


type mammal
  head
  body
  tail
end type

dim as mammal fox

You can then refer to the tail member:

fox tail


instead of

fox.tail

Offline JESSEW

  • WarSOFT Apps
  • BASIC Developer
  • Posts: 6
Re: A question on Parsing
« Reply #4 on: December 05, 2013, 07:56:57 PM »
Interesting questions... The methods of parsing I'm familiar with use a form of BNF notation for syntax checking, and it keeps 'commands' separate from other keywords, so that command names could be used as variables. It started it syntax and tokenization by first getting the first word in the new line of text, which would be the statement command. It then looked up this 'word' in it's command syntax table, and if it found it, would then parse the remainder of the input line according to that statements unique bnf description. If either the initial statement match check failed or the syntax check of a particular statement failed, the line was then tested against the bnf description of the implied assign statement (ie. a=5). So then it's perfectly ok to code
Code: [Select]
For for = 1 To 5
  Print for
Next for
and it's also ok to code
Code: [Select]
For = Sin(Pi)But... it's NOT ok to use function names this way, as they are checked in the same syntax check as variables are.

This is how I would do it if I were creating a basic interpreter. Here is the BNF description for TinyBasic

Code: [Select]
<line> = <number> <statement> <CR>
         <statement> <CR>
<statement> = PRINT <printlist>
              PR <printlist>
              INPUT <varlist>
              LET <var> = <expression>
              <var> = <expression>
              GOTO <expression>
              GOSUB <expression>
              RETURN
              IF <expression> <relop> <expression> THEN <statement>
              IF <expression> <relop> <expression> <statement>
              REM <commentstring>
              CLEAR
              RUN
              RUN <exprlist>
              LIST
              LIST <exprlist>
<printlist> =
              <printitem>
              <printitem> :
              <printitem> <separator> <printlist>
<printitem> = <expression>
              "<characterstring>"
<varlist> = <var>
            <var> , <varlist>
<exprlist> = <expression>
             <expression> , <exprlist>
<expression> = <unsignedexpr>
               + <unsignedexpr>
               - <unsignedexpr>
<unsignedexpr> = <term>
                 <term> + <unsignedexpr>
                 <term> - <unsignedexpr>
<term> = <factor>
         <factor> * <term>
         <factor> / <term>
<factor> = <var>
           <number>
           ( <expression> )
           <function>
<function> = RND ( <expression> )
             USR ( <exprlist> )
<number> = <digit>
           <digit> <number>
<separator> = , | ;
<var> = A | B | ... | Y | Z
<digit> = 0 | 1 2 | ... | 9
<relop> = < | > | = | <= | >= | <> | ><
« Last Edit: December 05, 2013, 07:59:53 PM by JESSEW »
When life throws a planet at you, pull your rip-cord!