-
-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multiline linenos for Python3.7 #132
Conversation
As a general feedback, I think the complexity the current implementation brings outweighs the benefits. I don't like the idea of adding another |
However, we quite often use placeholders with no Symbols in value stack when instructions are executed which throws away lineno info. Take the following example:
If we push an empty placeholder for the LOAD_CONST instruction, we wouldn't have any information on the lineno for the STORE_FAST instruction. We could wrap every piece in value stack as a Symbol class but that's the same as my implementation |
(outdated) You're right. How about this, we can create a class called like To be honest I also find having a list on the stack not ideal (for a long time), so perhaps we can even make This is just a thought and we don't have to go this way. Let me know what you think. |
cyberbrain/value_stack.py
Outdated
return "{" + repr(self.start_lineno) + " " + repr(self.sources) + "}" | ||
|
||
def get_top_source(self) -> Symbol: | ||
# This happens when variables (e.g. globals) that we don't care about are mutated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get it
cyberbrain/value_stack.py
Outdated
def __repr__(self) -> str: | ||
return "{" + repr(self.start_lineno) + " " + repr(self.sources) + "}" | ||
|
||
def get_top_source(self) -> Symbol: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be better to write?
@Property
def top_source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll change it
cyberbrain/value_stack.py
Outdated
class BaseValueStack: | ||
"""Class that simulates the a frame's value stack. | ||
|
||
This class contains instr handlers that are the same across different versions. | ||
""" | ||
|
||
def __init__(self): | ||
self.stack = [] | ||
self.stack: List[SymbolStackItem | CustomValueStackItem] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use this because we need to support older versions
https://www.python.org/dev/peps/pep-0604/ is added in 3.10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for other usages of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
cyberbrain/value_stack.py
Outdated
@@ -187,6 +268,30 @@ def tos1(self): | |||
def tos2(self): | |||
return self._tos(2) | |||
|
|||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method probably should live outside of the class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, merge_stack_item
feels a more appropriate name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this before but you mentioned in #132 (comment) to change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, apologize for the confusion, let me explain. Back then we were talking about having a single StackItem class, so a factory method makes more sense. Now we have two separate StackItem classes, and this method can return either of them, so we can't really put it into those classes. Meanwhile, even in the first comment I didn't mean to put it into the BaseValueStack
class, it feels a bit weird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
cyberbrain/value_stack.py
Outdated
|
||
# Placeholder for a None on the stack | ||
# Could be a None that is stored in a variable or a None used by exception handling. | ||
_none_placeholder = SymbolWithCustomValueStackItem(-1, [], None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget about this, what are the cases when None
needs to be stored as a custom value? I think it would be helpful to list them in the comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, when using with
with no errors, a None
is passed up to the finally
block to signify nothing went wrong. There might be other cases, this is the only one I've found so far, will add it to the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation. You can add to the comment that None
is pushed in ?? opcode and checked in WITH_CLEANUP_START
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, moved to LOAD_CONST as it's only being used there
cyberbrain/value_stack.py
Outdated
@@ -137,6 +217,7 @@ def emit_event_and_update_stack( | |||
""" | |||
self.snapshot = snapshot | |||
opname = instr.opname | |||
_placeholder.start_lineno = self.last_starts_line = lineno |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you think of a way that does not require modifying _placeholder
? It's a global value so we shouldn't modify it. Plus, I don't see where _placeholder.start_lineno
is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you LOAD_CONST and then STORE_VALUE, you push a _placeholder
onto the stack and then create an event off of that placeholder, taking its start_lineno
. Therefore, the lineno needs to either exist in the stack or every event creation needs to check if lineno is -1 and act accordingly (which may also cause issues if LOAD and STORE are on different lines). We could change _placeholder
to be a function that takes in a lineno and returns the corresponding SymbolStackItem instead or add a check in _push
(if _placeholder: set lineno).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'm concerned with the current implementation because in the long term cb will support tracing multiple frames, and if they share a mutable _placeholder, things will not work. Now it looks like every frame should have its own placeholder. And since every frame has a value stack, an option is to move it to be an instance variable. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the most up to date version, _placeholder
is only used 8 times, so maybe it'd be better to just use the full form (SymbolStackItem(self.last_starts_line, [])
)? Otherwise yeah it makes sense to create a ValueStack attribute since it uses the ValueStack's last_starts_line
attribute anyway and would support multi-frame tracing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simplicity, moving it inside the class feels like a better approach to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -201,35 +306,39 @@ def _tos(self, n): | |||
def _push(self, *values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get rid of this if..else
, it should now be safe to assume everything on the stack is a stack item instance, and that's the whole point of this migration: to formalize what is stored on the stack.
We should explictly convert things to SymbolStackItem
or CustomValueStackItem
before pushing them onto the stack. Since _placeholder
is already a CustomValueStackItem
, what's left is probably only those custom values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make the handler code messier, especially for exception handling with all the CustomValueStackItems but is doable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards doing it. It adds more clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Maybe add type annotation to values
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need a union operator since we’re pushing both SymbolStackItems and CustomValueStackItems
LGTM'ed, only a nit left This is a long journey, but I'm glad we finally completed it. Thanks for the dedication and patience! |
This PR tries to address the problem found in #110. The problem with multiline linenos was fixed in Python3.8 as described in python/cpython#8774.
Instead of keeping random types of items on Cyberbrain's simulated value stack, the PR uses new StackItem classes.
The SymbolStackItem class contains the relevant sources and the starting line number: the smallest line number that corresponds to a bytecode operation relevant to that stack item. This will capture the starting line number for multi-line code segments better than relying on CPython's implementation; however, it is still not perfect as outlined at the end of this description.
The CustomValueStackItem class captures custom values such as Exceptions, which are saved under the
custom_value
attribute of the class.Since some values can be pushed onto the stack either as normal or custom values and we don't know which one until it's used, the SymbolWithCustomValueStackItem class provides the flexibility for both classes. For now, it is only used when LOAD_CONST pushes a None onto the stack but will be used more in the future.
In addition, added
return_list
option to_pop
method to enforce returning a list of StackItems even when we are only popping one item.Previous comment before solution was finalized:
For Python3.7, we can add the linenos to the value stack as we trace the bytecode. Then for any event, we take the smallest lineno from all of the bytecode operations for the event to estimate where the real line starts. Since some lines do not have any bytecode operation, this solution is not perfect. For the following example, currently, the corresponding event will have
lineno=4
. The proposed solution will change it to havelineno=2
. The perfect solution, aka what has been implemented in Python3.8 haslineno=1
.