Add MI300 details to docs #446

peterjunpark · 2024-10-09T17:34:21Z

This PR updates the documentation with info about the MI300 series

Performance model

start adding MI300 content Signed-off-by: Peter Park <[email protected]>

Signed-off-by: Peter Park <[email protected]>

add anchor

skyreflectedinmirrors

Very good start!

skyreflectedinmirrors · 2025-01-28T21:31:08Z

docs/conceptual/l2-cache.rst

+system supports a maximum of two instances. In contrast, the CDNA3-based
+:ref:`MI300 <mixxx-note>` accelerator features 16 channels per XCD, each with a
+capacity of 256KB and also utilizing 256B address interleaving, allowing for a
+total of up to *eight* instances. Incoming requests are mapped to specific L2


total of up to eight instances (one per XCD)

Updated to this:
...
The L2 cache consists of several distinct channels. The CDNA3-based :ref:MI300 <mixxx-note>
accelerator consists of 16 channels each with a capacity of 256KB and utilizing
256B address interleaving. These channels can operate largely independently and
the system supports up to 8 instances (one per XCD). In constrast, the
:ref:MI200 <mixxx-note> and earlier CDNA accelerators have 32 L2 cache
channels each using 256B address interleaving, but only supports a maximum of 2
instances. ...

I... think for MI200, at best it would be "up to two instances, one per GCD". And MI100 has 16 channels :P

I forget how we've generally discussed MI200's GCD stuff in these docs, but I think typically we just talk about them like they're entirely separate GPUs.

I would probably do:

The L2 cache consists of several distinct channels. The CDNA3-based :ref:MI300
accelerator consists of 16 channels each with a capacity of 256KB and utilizing
256B address interleaving. These channels can operate largely independently and
the system supports up to 8 total L2 cache instances (one per XCD). In constrast, the
:ref:MI200 CDNA accelerators have 32 L2 cache
channels each using 256B address interleaving, and MI100 CDNA accelerators / GCN GPUs have only 16 L2 Cache channels. ...

cc: @feizheng10 any thoughts?

Updated c7bc7bd

docs/conceptual/l2-cache.rst

skyreflectedinmirrors · 2025-01-28T21:38:51Z

docs/conceptual/performance-model.rst

+.. list-table::
+   :header-rows: 1
+
+   * - Feature


Table seems weird with just one entry right now, but I'm sure we had ideas on how to fill it :P

Oh, right, we probably want to (eventually) take a second pass and go through to find places where we distinguish values based on the architecture, like the waveslots discussion below (or AGPRs), and add them here.

That can probably wait till this is ~ finalized though

docs/conceptual/vector-l1-cache.rst

docs/tutorial/includes/infinity-fabric-transactions.rst

add ref

peterjunpark added the documentation Improvements or additions to documentation label Oct 9, 2024

peterjunpark force-pushed the docs/mi300 branch from 2763296 to de0b4ca Compare October 10, 2024 17:38

peterjunpark force-pushed the docs/mi300 branch 2 times, most recently from d490ba3 to f027f4d Compare January 22, 2025 19:30

peterjunpark added 7 commits January 22, 2025 15:23

Change MI2XX to MI200. Add MI300 note

b3943e2

start adding MI300 content Signed-off-by: Peter Park <[email protected]>

Fix request flow image sizes

b43df09

Signed-off-by: Peter Park <[email protected]>

update UTCL1 hit-on-miss note to specify MI200

29c451a

update sL1D size and how many CUs it's shared between

bcf1464

WIP: update L2 channel count for MI300

26615c0

update wording

28f7ad2

add anchor

WIP: supported features table

809ebb9

peterjunpark force-pushed the docs/mi300 branch from db79d4e to 809ebb9 Compare January 22, 2025 20:23

update l2 cache line size note for mi300

6a44753

add anchor

peterjunpark force-pushed the docs/mi300 branch from d72d1aa to 6a44753 Compare January 22, 2025 20:42

peterjunpark added 9 commits January 23, 2025 09:33

update agprs section

6db105c

update l2 cache section

42fc830

add 128 read request to L2-Fabric transaction table

ccf2c62

update intro

acc1ba1

update atomic requests for mi300

73e1994

fix links

3129cab

fix link

b440e35

fix formatting issue

d72c261

wording

d5b3368

peterjunpark changed the base branch from amd-staging to develop January 23, 2025 19:03

skyreflectedinmirrors requested changes Jan 28, 2025

View reviewed changes

vedithal-amd force-pushed the develop branch from 918082e to da1bd04 Compare January 29, 2025 19:35

peterjunpark added 4 commits January 29, 2025 14:38

update "one per XCD" to l2 cache description

2bf9580

clarify l2 transaction sizes text

b1670c8

wording

201d0d0

add ref

update l2 cache text

c7bc7bd

vedithal-amd force-pushed the develop branch 11 times, most recently from d1528cc to 95b600e Compare January 31, 2025 20:12

add back image width

98c356e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MI300 details to docs #446

Add MI300 details to docs #446

peterjunpark commented Oct 9, 2024 •

edited

Loading

skyreflectedinmirrors left a comment

skyreflectedinmirrors Jan 28, 2025

peterjunpark Jan 29, 2025

skyreflectedinmirrors Jan 29, 2025

peterjunpark Feb 5, 2025

skyreflectedinmirrors Jan 28, 2025

skyreflectedinmirrors Jan 28, 2025

Add MI300 details to docs #446

Are you sure you want to change the base?

Add MI300 details to docs #446

Conversation

peterjunpark commented Oct 9, 2024 • edited Loading

Performance model

L1

UTCL1

L2

VALU

AGPRs

Scalar / Instruction cache

skyreflectedinmirrors left a comment

Choose a reason for hiding this comment

skyreflectedinmirrors Jan 28, 2025

Choose a reason for hiding this comment

peterjunpark Jan 29, 2025

Choose a reason for hiding this comment

skyreflectedinmirrors Jan 29, 2025

Choose a reason for hiding this comment

peterjunpark Feb 5, 2025

Choose a reason for hiding this comment

skyreflectedinmirrors Jan 28, 2025

Choose a reason for hiding this comment

skyreflectedinmirrors Jan 28, 2025

Choose a reason for hiding this comment

peterjunpark commented Oct 9, 2024 •

edited

Loading