-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MI300 details to docs #446
base: develop
Are you sure you want to change the base?
Conversation
2763296
to
de0b4ca
Compare
d490ba3
to
f027f4d
Compare
start adding MI300 content Signed-off-by: Peter Park <[email protected]>
Signed-off-by: Peter Park <[email protected]>
add anchor
db79d4e
to
809ebb9
Compare
add anchor
d72d1aa
to
6a44753
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good start!
docs/conceptual/l2-cache.rst
Outdated
system supports a maximum of two instances. In contrast, the CDNA3-based | ||
:ref:`MI300 <mixxx-note>` accelerator features 16 channels per XCD, each with a | ||
capacity of 256KB and also utilizing 256B address interleaving, allowing for a | ||
total of up to *eight* instances. Incoming requests are mapped to specific L2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total of up to eight instances (one per XCD)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to this:
...
The L2 cache consists of several distinct channels. The CDNA3-based :ref:MI300 <mixxx-note>
accelerator consists of 16 channels each with a capacity of 256KB and utilizing
256B address interleaving. These channels can operate largely independently and
the system supports up to 8 instances (one per XCD). In constrast, the
:ref:MI200 <mixxx-note>
and earlier CDNA accelerators have 32 L2 cache
channels each using 256B address interleaving, but only supports a maximum of 2
instances. ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I... think for MI200, at best it would be "up to two instances, one per GCD". And MI100 has 16 channels :P
I forget how we've generally discussed MI200's GCD stuff in these docs, but I think typically we just talk about them like they're entirely separate GPUs.
I would probably do:
The L2 cache consists of several distinct channels. The CDNA3-based :ref:MI300
accelerator consists of 16 channels each with a capacity of 256KB and utilizing
256B address interleaving. These channels can operate largely independently and
the system supports up to 8 total L2 cache instances (one per XCD). In constrast, the
:ref:MI200 CDNA accelerators have 32 L2 cache
channels each using 256B address interleaving, and MI100 CDNA accelerators / GCN GPUs have only 16 L2 Cache channels. ...
cc: @feizheng10 any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated c7bc7bd
.. list-table:: | ||
:header-rows: 1 | ||
|
||
* - Feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table seems weird with just one entry right now, but I'm sure we had ideas on how to fill it :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, we probably want to (eventually) take a second pass and go through to find places where we distinguish values based on the architecture, like the waveslots discussion below (or AGPRs), and add them here.
That can probably wait till this is ~ finalized though
d1528cc
to
95b600e
Compare
This PR updates the documentation with info about the MI300 series
Performance model
L1
UTCL1
L2
Update L2 cache line size to 128B for MI300 (https://advanced-micro-devices-demo--446.com.readthedocs.build/projects/rocprofiler-compute/en/446/conceptual/l2-cache.html#l2-cache-line-size) (128B for MI300 and MI200).
Update channel count in text for MI300 (https://advanced-micro-devices-demo--446.com.readthedocs.build/projects/rocprofiler-compute/en/446/conceptual/l2-cache.html#l2-cache-tcc)
Atomic requests (https://advanced-micro-devices-demo--446.com.readthedocs.build/projects/rocprofiler-compute/en/446/conceptual/l2-cache.html#request-flow)
Add 128B read request metric to table (https://advanced-micro-devices-demo--446.com.readthedocs.build/projects/rocprofiler-compute/en/446/conceptual/l2-cache.html#detailed-transaction-metrics)
VALU
= Add MI300 to list of products with MFMA units (https://advanced-micro-devices-demo--446.com.readthedocs.build/projects/rocprofiler-compute/en/446/conceptual/pipeline-descriptions.html#vector-arithmetic-logic-unit-valu)
AGPRs
Scalar / Instruction cache