Include token index on nodes when parsing AST #2795
Replies: 4 comments 11 replies
-
Thanks, it would be nice indeed to have the parsed nodes contain more information of the original parsed expression. So basically we want to create a source map. Before implementing a solution, I think we should think trough how we want to solve this exactly. Some initial thoughts:
@mattvague do you have any opinion in this regard with your experience in parsing, transforming, and highlighting of expressions? |
Beta Was this translation helpful? Give feedback.
-
@matthew-canestraro Just FYI I already have a partially complete implementation of this here if you need inspiration or would like to build on that instead of doing it all yourself |
Beta Was this translation helpful? Give feedback.
-
Hi! Sorry for the long absence on this one, I had to shift priorities for a while and finally found time to get back to this. I just pushed changes to add { index: 6, source: "foo" } There was no need for a The mapping for most nodes points to their identifying symbol (EG For nodes with multiple identifying symbols, I included everything between the first and last symbol. EG for conditional { index: 6, source: '? "hello" :' } Which seems awkward at first glance, but this methodology lets the user quickly get the index of There are a few quirky edge-cases, like BlockNode which always receives I'm about to begin writing tests, (which will make it more easy to see how each node maps) but let me know if you have any major concerns with this direction! |
Beta Was this translation helpful? Give feedback.
-
Hi @josdejong, I've moved on to fixing tests and realized that the new What should the Viable options I could see: { index: 4, text: "x"} // source maps back to the variable which was resolved { index: 4, text: "2" } // source acts as if value was in the source string { index: 4, text: "" } // no text because this node is implicit and not actually in the source [] // no source at all, easy way out |
Beta Was this translation helpful? Give feedback.
-
Currently when parsing a string into an AST, there is no way to trace an individual node back to its token's original location in the string. This makes it difficult to lint the string or perform other semantics-aware manipulations of it. At the same time, turning an AST back into a string using
toString()
causes all whitespace formatting to be lost so cannot be used as an alternativeThe parser already tracks the current string index of each token as it parses, so adding that value to nodes as a
tokenIndex
seems like a relatively small change which would resolve this problem, but I am open to any alternatives which would help map an AST back to the source stringI am happy to create a pull request for this, but want to make sure I'm tackling it in a way others will agree to
(POC pull request opened here: #2796)
Beta Was this translation helpful? Give feedback.
All reactions