-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Access to rule name from nodes? #7
Comments
Which language are you implementing your parser in? Also I'm not quite sure I understand this statement:
Can you clarify what you mean by that? |
Just getting to know Canopy in JavaScript. Sorry for not being clear enough. From http://canopy.jcoglan.com/langs/javascript.html the example (url.peg), run:
Now take a look at url.js. The exported parse() function has second argument
By first noticing this "options" structure and the "types" key I was excited to think that this is precisely what I'm looking for. But if you now take a look at the
But by grepping through the url.js file, I can see that the |
Ahh sorry, ok so now I understand the It seems that currently my best bet at having the parse tree contain the matched rule name is to augment the rules with actions and implement actions in such a way that it would decorate the node with the desired rule name. I'll post some sample code if I come up with some code that better demonstrates what I'm talking about. |
I came in here to post a link but you already found what I was going to link you to :) I would encourage you not to think of things in terms of rule names. There is a reason that rules, types, and actions are distinct things. Rules direct parsing control flow, but don't necessarily express types. One rule may include multiple nodes you want to attach types to, for example. Think of it this way: each parsing rule compiles to a function. When you have an expression in a programming language, its meaning does not depend on the name of the function it appears in. The functions just exist to shape the set of procedures you can reuse. They direct control flow, not necessarily data structures. It might be that you don't need the names of anything at all. If each type implements methods that do the right thing for the expression they're attached to, you should never need to know the name of the rule they appear in. |
I was actually looking for the same thing, but to use it to make sure the right rules are matching when parsing various types of input. If you were going to print the generated parse tree, you get this in other libraries. I think it's not a big deal, but it might make testing the parser easier if you could compare the rule name to the node returned by the parse. I can imagine that you wouldn't want this in production, so potentially adding a node accessor for "rule_name" as a --debug or some kind of option would be useful. Maybe I'm missing something, but if you have complicated grammars, how do you know the right rules matched at the right time without this kind of annotation? Any thoughts appreciated. |
I've been looking at this again, and I really need it for building better error messages. My thought is something like augmenting the builders so that they set a local variable with the rule name (which you already have when you create the function (something like this):
Then, when you were actually in each function, the rule name would be pushed onto a stack of rules in the Grammar instance so that the current rule failure would have a rule name. Now, we probably don't want to give a raw rule name to a user, but it'd be better than this:
However, once you had the rule names, then your error handler could have a strings object of messages with "human" names for each of the rule name keys. As far as I can see, this is a slight modification to the builder to get it to generate the correct code to set the variables and manage the stack and then modifications to the generated Parser instance so that it includes the top of the rule stack when it creates the SyntaxError instance thrown to the caller. A few things that I'm not sure about are:
I'm trying to follow the directions in the CONTRIBUTING file, but when I try and run the tests, I get a lot of errors, and, since I'm still getting the hang of the npm environment, I'm not quite sure how to get it to generate a local package that I can play with interactively. As far as your comment to @joonas-fi about not worrying about the rule names, I think that's fair in the normal processing sequence now that I understand how you're supposed to construct your own object graph as the rules fire. However, I don't think it's accurate when you're talking about error messages—especially when you're trying to figure out a grammar where a) you're still new to canopy, b) you're a little new to PEG and c) you're not 100% convinced exactly how to express the rules you want. :) I'm trying to replace a hand-written parser for two existing languages with Canopy, and, after taking a few months getting sidetracked, I'm now making good progress as we're coming to an understanding. However, somehow, I need better error messages than the above, because the way they are, they're only slightly better than IBM BASIC (c. 1980) "Syntax error." messages. Having this facility would not only make development of grammars with Canopy much faster, but it's also important to the end product you're building the parser for in the first place. This is what I get when I try and build, and, for now, if I can just get it working the way I described with Javascript, I'll be happy. I don't need/want the other languages right now.
Happy to help implement this, but I think I need a little bit of guidance on how things work to be efficient. Cheers |
I have just implemented a limited version of this focussed on improving the quality of error messages. For example, the error you now get on a parse failure looks something like this:
I think showing users the rule name where a particular expected token comes from is helpful. I am less convinced that adding the rule names to successful parse results is beneficial; as I've elaborated in #47 is creates a strong coupling between the factoring of the grammar and the results it emits. As a comparison, in some parsing systems you get a distinct node for each delegation a grammar contains; if a grammar contains rules like:
Then parsing with rule
In Canopy, parsing with rule I'm on the fence about including the full stack of rule names in error messages, because it presupposes too much about the implementation and might be confusion on deeply nested rules, but I'll leave that under review. For now, I think this error message improvement is worth having. |
One further thing I'll note here is that I don't want to commit to a way of identifying rule names that user programs might come to depend on. Something I'm still considering is if/how Canopy should support composing multiple grammars together, and that it likely to affect how rules are identified since it creates a namespacing problem. For now I'm happy to include this in error messages but less keen to put them in data that might break its format in later releases. |
E.g. how do I know what type of an element was matched if the parse tree does not contain the matched rule name?
Let's take a hypothetical programming language, which I'm trying to parse:
And when parsing the class definition, class can contain 0..N (propertyDefinition OR methodDefinition).
When I get the parse tree and am handling the class definition node, it's sub-elements are (ideally):
Sure, I could detect the type of node by seeing if the object has key "property_name" (=> it's a property definition) or if it has key "method_name" (=> it's a method definition), but that's just ugly.
I noticed there is _types field in the parser and in parse(..., options), but it seems to be unimplemented.
Any advice on this?
The text was updated successfully, but these errors were encountered: