parsing a block of code
Jelle Feringa
2013-11-14 11:22:22 UTC

I'm writing a parser for the RAPID robot language.
I'd like to know what is a good way to approach the following
parsing problem. Something specific for RAPID is that
for loops, conditional, function, module definitions
all use a familar MODULE < module block > ENDMODULE
or PROC < procedure block > ENDPROC structure.

My question is what is the right way to go about this?
Here we have an example of a procedure defined in RAPID.

Intuitively I would write a regex that matches the
name of the procedure, its argument and the procedure block.

Another way would be to drop into a state when such a
START / END block is found, but that feels unnecessarily

PROC top_front( string strNoStepIn )
! procedure block
MoveL ...;

Since this pattern is so present in the language, I'd like to get it
right and in a p(l)ythonic manner. Thing is that I'm too new to the
parsing to really see that.


You received this message because you are subscribed to the Google Groups "ply-hack" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ply-hack+***@googlegroups.com.
To post to this group, send email to ply-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ply-hack/loom.20131114T120932-607%40post.gmane.org.
For more options, visit https://groups.google.com/groups/opt_out.
2013-11-14 11:55:52 UTC
Post by Jelle Feringa
My question is what is the right way to go about this?
Here we have an example of a procedure defined in RAPID.
The example seems to be missing, but in general, you don't start with the parser, you start with the
scanner, identifying the individual words that you should recognize.
Post by Jelle Feringa
PROC top_front( string strNoStepIn )
! procedure block
MoveL ...;
becomes a sequence of tokens (1 per line), empty lines and // text is added to clarify what you
read. (token names are written all uppercase)

STRING // if "string" is not a built-in, it would become an IDENTIFIER

// Assuming ! means 'comment', skipped it.


// skipped some


You break down your input text in these small elementary words with the scanner. I didn't do it, but
it's often useful to add a suffix or prefix to keywords (I use ...KW, eg PROCKW), and other tokens
(I use ...TK), it makes the parser rules below more readable, and avoids name conflicts between
different tokens that are closely related, like the keyword string denoting a type and a literal
string like "abcd".

The parser takes this stream of tokens, and reconstructs the parts you want to keep together, with
grammar rules, like


A "Procedure" is thing that starts with the keyword PROC and ends with the keyword ENDPROC. There
are 2 variants, one with and one without FormalParameters.

FormalParameters : FormalParameter
| FormalParameters COMMA FormalParameter

FortmalParameter : Type IDENTIFIER ;

| ...

FormalParameters is one or more FormalParameter, separated by COMMA. The latter is a sequence of
Post by Jelle Feringa
Intuitively I would write a regex that matches the
name of the procedure, its argument and the procedure block.
In general, regex is not powerful enough to handle programming languages. Consider the case

string x = "endproc";

in the middle of a proc. Good luck detecting the right 'endproc' word. Similar cases exist when a
user comments away a part of a proc.

You may get it working for a set of cases, but all cases that are valid for the RAPID compiler is
impossible, probably.
Post by Jelle Feringa
Since this pattern is so present in the language, I'd like to get it
right and in a p(l)ythonic manner. Thing is that I'm too new to the
parsing to really see that.
The pattern is not really special, { .. } or BEGIN .. END are mostly the same thing, although they
group different things.

Good luck with your parsing adventure,
You received this message because you are subscribed to the Google Groups "ply-hack" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ply-hack+***@googlegroups.com.
To post to this group, send email to ply-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ply-hack/5284BA48.7040408%40tue.nl.
For more options, visit https://groups.google.com/groups/opt_out.
Jelle Feringa
2013-11-18 15:57:03 UTC
Dear Albert,

Thanks so much for your constructive comments.
I first completed the tokenization of the RAPID grammar, and
when print the tok.type, tok.value of parts of RAPID code, it
becomes really obvious how to parse the code, since at some
the parsing code is hinted by the "type" attribute.

So parsing remains a challenging field, but I managed to
move a lot further, also thanks to your comments!

So thanks again,

You received this message because you are subscribed to the Google Groups "ply-hack" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ply-hack+***@googlegroups.com.
To post to this group, send email to ply-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ply-hack/loom.20131118T164659-291%40post.gmane.org.
For more options, visit https://groups.google.com/groups/opt_out.