Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pytorch Profiler #473

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from
Draft

Conversation

frobnitzem
Copy link

Description

This adds a training callback that stores pytorch profiling outputs following the simple config at https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html.

Motivation and Context

How Has This Been Tested?

This has been used to produce and view profile outputs.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds or improves functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation improvement (updates to user guides, docstrings, or developer docs)

Checklist:

  • My code follows the code style of this project and has been formatted using black.
  • All new and existing tests passed, including on GPU (if relevant).
  • I have added tests that cover my changes (if relevant).
  • The option documentation (docs/options) has been updated with new or changed options.
  • I have updated CHANGELOG.md.
  • I have updated the documentation (if relevant).

@frobnitzem frobnitzem changed the base branch from main to develop December 11, 2024 17:37
@Linux-cpp-lisp
Copy link
Collaborator

Thanks @frobnitzem! Will be very happy to merge and have this feature.

One small thing: can you run the code through black: https://nequip.readthedocs.io/en/develop/dev/contributing.html#style-enforcement?

And one bigger thing:

The TensorBoard integration with the PyTorch profiler is now deprecated. Instead, use Perfetto or the Chrome trace to view trace.json files.
https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html

Would it make sense to add support in this callback for the newer export_chrome_trace API (https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html#using-tracing-functionality), with some option to flip between the older Tensorboard and newer Perfetto/chrome tracing in the on_trace_ready callback?

Finally, should any of the scheduler or profiler options be exposed optionally in the arguments to the callback?

Thanks again for the PR!!

@frobnitzem
Copy link
Author

Still a work in progress. Trying this in October, I got much better profile info, so will probably scrap the tensorboard analysis in favor of the workflow you linked here.

@Linux-cpp-lisp
Copy link
Collaborator

Sounds great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants