Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multiline linenos for Python3.7 #132

Merged
merged 36 commits into from
Jan 3, 2022

Conversation

victorjzsun
Copy link
Collaborator

@victorjzsun victorjzsun commented Aug 25, 2021

This PR tries to address the problem found in #110. The problem with multiline linenos was fixed in Python3.8 as described in python/cpython#8774.

Instead of keeping random types of items on Cyberbrain's simulated value stack, the PR uses new StackItem classes.

  • The SymbolStackItem class contains the relevant sources and the starting line number: the smallest line number that corresponds to a bytecode operation relevant to that stack item. This will capture the starting line number for multi-line code segments better than relying on CPython's implementation; however, it is still not perfect as outlined at the end of this description.

  • The CustomValueStackItem class captures custom values such as Exceptions, which are saved under the custom_value attribute of the class.

  • Since some values can be pushed onto the stack either as normal or custom values and we don't know which one until it's used, the SymbolWithCustomValueStackItem class provides the flexibility for both classes. For now, it is only used when LOAD_CONST pushes a None onto the stack but will be used more in the future.

In addition, added return_list option to _pop method to enforce returning a list of StackItems even when we are only popping one item.


Previous comment before solution was finalized:

For Python3.7, we can add the linenos to the value stack as we trace the bytecode. Then for any event, we take the smallest lineno from all of the bytecode operations for the event to estimate where the real line starts. Since some lines do not have any bytecode operation, this solution is not perfect. For the following example, currently, the corresponding event will have lineno=4. The proposed solution will change it to have lineno=2. The perfect solution, aka what has been implemented in Python3.8 has lineno=1.

s = [ # line 1
  a,  # line 2
  b,  # line 3
  c   # line 4
]     # line 5

@laike9m
Copy link
Owner

laike9m commented Aug 27, 2021

As a general feedback, I think the complexity the current implementation brings outweighs the benefits. I don't like the idea of adding another Value class, and I would hope to see metadata (like lineno) added to the Symbol class. Could you try moving towards that direction and see if it works?

@victorjzsun
Copy link
Collaborator Author

victorjzsun commented Aug 27, 2021

However, we quite often use placeholders with no Symbols in value stack when instructions are executed which throws away lineno info. Take the following example:

LOAD_CONST lineno=2
STORE_FAST lineno=None

If we push an empty placeholder for the LOAD_CONST instruction, we wouldn't have any information on the lineno for the STORE_FAST instruction. We could wrap every piece in value stack as a Symbol class but that's the same as my implementation

@laike9m
Copy link
Owner

laike9m commented Aug 28, 2021

(outdated) You're right. How about this, we can create a class called like StackItem, and put lineno there. Let Symbol inherit from StackItem. Then, for placeholders and errors, we wrap it with StackItem to make sure that everything on the stack is a StackItem object or a List[StackItem].

To be honest I also find having a list on the stack not ideal (for a long time), so perhaps we can even make StackItem be able to act as a container and hold multiple symbols, so we don't need to have a list sometimes, and it's always a StackItem object that's on the stack.

This is just a thought and we don't have to go this way. Let me know what you think.

@victorjzsun victorjzsun marked this pull request as ready for review November 23, 2021 22:55
cyberbrain/value_stack.py Outdated Show resolved Hide resolved
cyberbrain/value_stack.py Outdated Show resolved Hide resolved
cyberbrain/value_stack.py Outdated Show resolved Hide resolved
cyberbrain/value_stack.py Outdated Show resolved Hide resolved
cyberbrain/value_stack.py Outdated Show resolved Hide resolved
return "{" + repr(self.start_lineno) + " " + repr(self.sources) + "}"

def get_top_source(self) -> Symbol:
# This happens when variables (e.g. globals) that we don't care about are mutated.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get it

def __repr__(self) -> str:
return "{" + repr(self.start_lineno) + " " + repr(self.sources) + "}"

def get_top_source(self) -> Symbol:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to write?

@Property
def top_source

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll change it

cyberbrain/value_stack.py Outdated Show resolved Hide resolved
class BaseValueStack:
"""Class that simulates the a frame's value stack.

This class contains instr handlers that are the same across different versions.
"""

def __init__(self):
self.stack = []
self.stack: List[SymbolStackItem | CustomValueStackItem] = []
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use this because we need to support older versions
https://www.python.org/dev/peps/pep-0604/ is added in 3.10

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for other usages of |

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

cyberbrain/value_stack.py Outdated Show resolved Hide resolved
@@ -187,6 +268,30 @@ def tos1(self):
def tos2(self):
return self._tos(2)

@staticmethod
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method probably should live outside of the class

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, merge_stack_item feels a more appropriate name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this before but you mentioned in #132 (comment) to change it

Copy link
Owner

@laike9m laike9m Dec 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, apologize for the confusion, let me explain. Back then we were talking about having a single StackItem class, so a factory method makes more sense. Now we have two separate StackItem classes, and this method can return either of them, so we can't really put it into those classes. Meanwhile, even in the first comment I didn't mean to put it into the BaseValueStack class, it feels a bit weird

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed


# Placeholder for a None on the stack
# Could be a None that is stored in a variable or a None used by exception handling.
_none_placeholder = SymbolWithCustomValueStackItem(-1, [], None)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget about this, what are the cases when None needs to be stored as a custom value? I think it would be helpful to list them in the comments

Copy link
Collaborator Author

@victorjzsun victorjzsun Dec 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, when using with with no errors, a None is passed up to the finally block to signify nothing went wrong. There might be other cases, this is the only one I've found so far, will add it to the comment

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation. You can add to the comment that None is pushed in ?? opcode and checked in WITH_CLEANUP_START

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, moved to LOAD_CONST as it's only being used there

@@ -137,6 +217,7 @@ def emit_event_and_update_stack(
"""
self.snapshot = snapshot
opname = instr.opname
_placeholder.start_lineno = self.last_starts_line = lineno
Copy link
Owner

@laike9m laike9m Dec 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you think of a way that does not require modifying _placeholder? It's a global value so we shouldn't modify it. Plus, I don't see where _placeholder.start_lineno is used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you LOAD_CONST and then STORE_VALUE, you push a _placeholder onto the stack and then create an event off of that placeholder, taking its start_lineno. Therefore, the lineno needs to either exist in the stack or every event creation needs to check if lineno is -1 and act accordingly (which may also cause issues if LOAD and STORE are on different lines). We could change _placeholder to be a function that takes in a lineno and returns the corresponding SymbolStackItem instead or add a check in _push (if _placeholder: set lineno).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'm concerned with the current implementation because in the long term cb will support tracing multiple frames, and if they share a mutable _placeholder, things will not work. Now it looks like every frame should have its own placeholder. And since every frame has a value stack, an option is to move it to be an instance variable. wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the most up to date version, _placeholder is only used 8 times, so maybe it'd be better to just use the full form (SymbolStackItem(self.last_starts_line, []))? Otherwise yeah it makes sense to create a ValueStack attribute since it uses the ValueStack's last_starts_line attribute anyway and would support multi-frame tracing.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, moving it inside the class feels like a better approach to me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -201,35 +306,39 @@ def _tos(self, n):
def _push(self, *values):
Copy link
Owner

@laike9m laike9m Dec 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of this if..else, it should now be safe to assume everything on the stack is a stack item instance, and that's the whole point of this migration: to formalize what is stored on the stack.

We should explictly convert things to SymbolStackItem or CustomValueStackItem before pushing them onto the stack. Since _placeholder is already a CustomValueStackItem, what's left is probably only those custom values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the handler code messier, especially for exception handling with all the CustomValueStackItems but is doable

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards doing it. It adds more clarity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Maybe add type annotation to values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need a union operator since we’re pushing both SymbolStackItems and CustomValueStackItems

cyberbrain/value_stack.py Outdated Show resolved Hide resolved
@laike9m
Copy link
Owner

laike9m commented Jan 3, 2022

LGTM'ed, only a nit left

This is a long journey, but I'm glad we finally completed it. Thanks for the dedication and patience!

@victorjzsun victorjzsun merged commit 4f2303f into laike9m:master Jan 3, 2022
@victorjzsun victorjzsun deleted the fix-multiline branch September 9, 2022 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants