-
-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interactive parsing with Earley #1493
Comments
The difficulty with Earley is that it explores many paths at the same time, many of which will never reach the final tree. So, if you try to follow it step-by-step and report on what it's doing, you will face the issue that paths will be blended in together. So Unfortunately, I can't think of a simple solution to this. Although in theory it should be possible to keep track of each "line of thinking", and provide a more accurate context, similar to what the LALR interactive parser does. I hope that helps! |
Thank you for your reply! I did not understand why paths getting blended in together would cause an issue in future. Could you please explain with some example? Also, how can we categorize a path as valid or invalid without knowing the full stream? I am of the opinion that any interactive parse should output all possible paths for a partial stream. What do you say? Thank you for your patience. |
You can't. You can only know at the end of the parse. That's why a step-by-step parsing has to assume all paths are valid until they are disqualified by the parser.
It's an issue if you want to be able to predict the next valid tokens, for example. |
Suggestion
I am trying to implement interactive parsing with lark using the earley parser because llar cannot handle all the cases represented through my grammar. Could you please help me in efficiently incorporating it in the lark repository?
My Methodology
As a first implementation (or proof of concept) the interactive parsing option is exposed through
parse
with an extra argument. The idea is to return a list of regex patterns for the next expected token given an input stream that partially matches the grammar. To accomplish this functionality, I focused on keeping track of theto_scan
list coming from thepredict_and_complete
function ofxearley
class. For the case whento_scan
does not result in valid solutions, it returns the list ofexpected_terminals
instead of throwing an UnexpectedEOF error (when the interactive parser argument is true). While this case works for many cases, it does not take care of corner cases such asdelayed_matches
which may have resulted in a correct match after some tokens. Therefore, I also add thedelayed_matches
toto_scan
in case of partial input stream.Feedback
Could you please provide feedback on the given approach? Am I missing any possible corner cases with this approach? I understand it is an ad-hoc strategy and probably could be made efficient with dynamic programming. I would be more than happy to share the code and expand on it based on your suggestions/feedback.
Thank you. Looking forward to your response!
The text was updated successfully, but these errors were encountered: