Interactive parsing with Earley #1493

tanvisharma · 2024-11-29T19:15:40Z

Suggestion
I am trying to implement interactive parsing with lark using the earley parser because llar cannot handle all the cases represented through my grammar. Could you please help me in efficiently incorporating it in the lark repository?

My Methodology
As a first implementation (or proof of concept) the interactive parsing option is exposed through parse with an extra argument. The idea is to return a list of regex patterns for the next expected token given an input stream that partially matches the grammar. To accomplish this functionality, I focused on keeping track of the to_scan list coming from the predict_and_complete function of xearley class. For the case when to_scan does not result in valid solutions, it returns the list of expected_terminals instead of throwing an UnexpectedEOF error (when the interactive parser argument is true). While this case works for many cases, it does not take care of corner cases such as delayed_matches which may have resulted in a correct match after some tokens. Therefore, I also add the delayed_matches to to_scan in case of partial input stream.

Feedback
Could you please provide feedback on the given approach? Am I missing any possible corner cases with this approach? I understand it is an ad-hoc strategy and probably could be made efficient with dynamic programming. I would be more than happy to share the code and expand on it based on your suggestions/feedback.

Thank you. Looking forward to your response!

The text was updated successfully, but these errors were encountered:

erezsh · 2025-01-01T10:49:54Z

The difficulty with Earley is that it explores many paths at the same time, many of which will never reach the final tree.

So, if you try to follow it step-by-step and report on what it's doing, you will face the issue that paths will be blended in together. So to_scan will contain tokens not only for the correct path, but also for invalid paths, and also for valid paths that will end up being de-prioritized.

Unfortunately, I can't think of a simple solution to this. Although in theory it should be possible to keep track of each "line of thinking", and provide a more accurate context, similar to what the LALR interactive parser does.

I hope that helps!

tanvisharma · 2025-01-02T21:00:35Z

Thank you for your reply!

I did not understand why paths getting blended in together would cause an issue in future. Could you please explain with some example?

Also, how can we categorize a path as valid or invalid without knowing the full stream? I am of the opinion that any interactive parse should output all possible paths for a partial stream. What do you say?

Thank you for your patience.

erezsh · 2025-01-03T09:57:55Z

how can we categorize a path as valid or invalid without knowing the full stream?

You can't. You can only know at the end of the parse.

That's why a step-by-step parsing has to assume all paths are valid until they are disqualified by the parser.

why paths getting blended in together would cause an issue in future

It's an issue if you want to be able to predict the next valid tokens, for example.

tanvisharma added the enhancement label Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interactive parsing with Earley #1493

Interactive parsing with Earley #1493

tanvisharma commented Nov 29, 2024

erezsh commented Jan 1, 2025 •

edited

Loading

tanvisharma commented Jan 2, 2025

erezsh commented Jan 3, 2025

Interactive parsing with Earley #1493

Interactive parsing with Earley #1493

Comments

tanvisharma commented Nov 29, 2024

erezsh commented Jan 1, 2025 • edited Loading

tanvisharma commented Jan 2, 2025

erezsh commented Jan 3, 2025

erezsh commented Jan 1, 2025 •

edited

Loading