Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CUB transfrom #3825

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Feb 15, 2025

  • Check SASS diff for cub.test.device_transform.lid_0 for SM 80;90;100 - small address changes
  • Benchmark on B200

@bernhardmgruber bernhardmgruber requested a review from a team as a code owner February 15, 2025 13:36
@bernhardmgruber bernhardmgruber force-pushed the ref_transform branch 2 times, most recently from 786cae9 to 962dd6e Compare February 17, 2025 17:45
Copy link
Contributor

🟨 CI finished in 2h 50m: Pass: 78%/93 | Total: 2d 07h | Avg: 35m 49s | Max: 1h 46m | Hits: 77%/104330
  • 🟨 cub: Pass: 77%/45 | Total: 1d 12h | Avg: 48m 10s | Max: 1h 46m | Hits: 73%/41683

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  76%/43  | Total:  1d 10h | Avg: 47m 40s | Max:  1h 46m | Hits:  73%/39273 
      🟩 arm64              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 03s | Max:  1h 00m | Hits:  68%/2410  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 29s | Max: 57m 48s | Hits:  74%/2082  
      🔍 nvcc               Pass:  76%/43  | Total:  1d 10h | Avg: 47m 44s | Max:  1h 46m | Hits:  73%/39601 
    🟨 ctk
      🟨 12.0               Pass:  60%/5   | Total:  3h 38m | Avg: 43m 47s | Max: 59m 57s | Hits:  68%/3621  
      🟩 12.5               Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 05m | Hits:  68%/2228  
      🟨 12.8               Pass:  78%/38  | Total:  1d 06h | Avg: 47m 53s | Max:  1h 46m | Hits:  74%/35834 
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 54m | Avg: 57m 29s | Max: 57m 48s | Hits:  74%/2082  
      🟨 nvcc12.0           Pass:  60%/5   | Total:  3h 38m | Avg: 43m 47s | Max: 59m 57s | Hits:  68%/3621  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 05m | Hits:  68%/2228  
      🟨 nvcc12.8           Pass:  77%/36  | Total:  1d 04h | Avg: 47m 21s | Max:  1h 46m | Hits:  74%/33752 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 58m | Avg: 59m 34s | Max:  1h 01m | Hits:  68%/4828  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 57m | Avg: 58m 49s | Max:  1h 00m | Hits:  68%/2410  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 55m | Avg: 57m 44s | Max: 58m 12s | Hits:  68%/2410  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 54m | Avg: 57m 22s | Max: 57m 46s | Hits:  68%/2410  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 35m | Avg: 47m 58s | Max:  1h 02m | Hits:  79%/8107  
      🟥 GCC7               Pass:   0%/2   | Total:  7m 57s | Avg:  3m 58s | Max:  5m 46s
      🟩 GCC8               Pass: 100%/1   | Total: 59m 46s | Avg: 59m 46s | Max: 59m 46s | Hits:  68%/1207  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 50m | Avg: 55m 21s | Max: 55m 50s | Hits:  68%/2414  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 03m | Hits:  68%/2414  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 51m | Avg: 55m 55s | Max: 57m 54s | Hits:  68%/2410  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 04m | Hits:  68%/2410  
      🟨 GCC13              Pass:  63%/11  | Total:  6h 42m | Avg: 36m 37s | Max:  1h 46m | Hits:  81%/8435  
      🟥 MSVC14.29          Pass:   0%/2   | Total:  1h 26m | Avg: 43m 21s | Max: 45m 40s
      🟥 MSVC14.42          Pass:   0%/2   | Total:  1h 28m | Avg: 44m 06s | Max: 45m 37s
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 05m | Hits:  68%/2228  
    🟨 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 22m | Avg: 54m 14s | Max:  1h 02m | Hits:  73%/20165 
      🟨 GCC                Pass:  72%/22  | Total: 15h 41m | Avg: 42m 48s | Max:  1h 46m | Hits:  74%/19290 
      🟥 MSVC               Pass:   0%/4   | Total:  2h 54m | Avg: 43m 43s | Max: 45m 40s
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 05m | Hits:  68%/2228  
    🟨 gpu
      🟥 h100               Pass:   0%/3   | Total: 12m 44s | Avg:  4m 14s | Max: 12m 44s
      🟨 rtx2080            Pass:  79%/34  | Total:  1d 06h | Avg: 53m 25s | Max:  1h 05m | Hits:  68%/32043 
      🟩 rtxa6000           Pass: 100%/8   | Total:  5h 38m | Avg: 42m 19s | Max:  1h 46m | Hits:  88%/9640  
    🟨 jobs
      🟨 Build              Pass:  78%/37  | Total:  1d 08h | Avg: 52m 47s | Max:  1h 05m | Hits:  68%/34453 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 23s | Avg: 22m 23s | Max: 22m 23s | Hits:  99%/1205  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 27s | Avg: 16m 27s | Max: 16m 27s | Hits:  99%/1205  
      🟨 HostLaunch         Pass:  66%/3   | Total: 48m 01s | Avg: 16m 00s | Max: 24m 54s | Hits:  99%/2410  
      🟨 TestGPU            Pass:  66%/3   | Total:  2h 07m | Avg: 42m 29s | Max:  1h 46m | Hits:  84%/2410  
    🟥 sm
      🟥 90                 Pass:   0%/3   | Total: 12m 44s | Avg:  4m 14s | Max: 12m 44s
      🟥 90;90a;100         Pass:   0%/1   | Total: 40m 11s | Avg: 40m 11s | Max: 40m 11s
    🟨 std
      🟨 17                 Pass:  75%/20  | Total: 16h 57m | Avg: 50m 52s | Max:  1h 05m | Hits:  68%/17832 
      🟨 20                 Pass:  80%/25  | Total: 19h 10m | Avg: 46m 00s | Max:  1h 46m | Hits:  76%/23851 
    
  • 🟨 thrust: Pass: 77%/45 | Total: 18h 38m | Avg: 24m 51s | Max: 52m 06s | Hits: 79%/62351

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  76%/43  | Total: 17h 46m | Avg: 24m 48s | Max: 52m 06s | Hits:  79%/58788 
      🟩 arm64              Pass: 100%/2   | Total: 52m 22s | Avg: 26m 11s | Max: 27m 23s | Hits:  77%/3563  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 49m 43s | Avg: 24m 51s | Max: 25m 28s | Hits:  77%/3562  
      🔍 nvcc               Pass:  76%/43  | Total: 17h 49m | Avg: 24m 51s | Max: 52m 06s | Hits:  79%/58789 
    🟨 ctk
      🟨 12.0               Pass:  60%/5   | Total:  2h 04m | Avg: 24m 58s | Max: 35m 38s | Hits:  77%/5344  
      🟩 12.5               Pass: 100%/2   | Total:  1h 39m | Avg: 49m 53s | Max: 52m 06s | Hits:  65%/3562  
      🟨 12.8               Pass:  78%/38  | Total: 14h 54m | Avg: 23m 31s | Max: 38m 09s | Hits:  81%/53445 
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 49m 43s | Avg: 24m 51s | Max: 25m 28s | Hits:  77%/3562  
      🟨 nvcc12.0           Pass:  60%/5   | Total:  2h 04m | Avg: 24m 58s | Max: 35m 38s | Hits:  77%/5344  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 39m | Avg: 49m 53s | Max: 52m 06s | Hits:  65%/3562  
      🟨 nvcc12.8           Pass:  77%/36  | Total: 14h 04m | Avg: 23m 27s | Max: 38m 09s | Hits:  81%/49883 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 54m | Avg: 28m 44s | Max: 29m 54s | Hits:  77%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 56m 24s | Avg: 28m 12s | Max: 28m 57s | Hits:  77%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 55m 14s | Avg: 27m 37s | Max: 28m 21s | Hits:  77%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 58m 14s | Avg: 29m 07s | Max: 29m 26s | Hits:  77%/3562  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 30m | Avg: 21m 29s | Max: 29m 34s | Hits:  83%/12467 
      🟥 GCC7               Pass:   0%/2   | Total:  6m 50s | Avg:  3m 25s | Max:  4m 43s
      🟩 GCC8               Pass: 100%/1   | Total: 28m 42s | Avg: 28m 42s | Max: 28m 42s | Hits:  77%/1782  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 01m | Avg: 30m 34s | Max: 30m 52s | Hits:  77%/3564  
      🟩 GCC10              Pass: 100%/2   | Total: 58m 06s | Avg: 29m 03s | Max: 29m 12s | Hits:  77%/3564  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 01m | Avg: 30m 56s | Max: 31m 23s | Hits:  77%/3564  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 08s | Max: 32m 23s | Hits:  77%/3564  
      🟨 GCC13              Pass:  70%/10  | Total:  2h 36m | Avg: 15m 39s | Max: 32m 05s | Hits:  86%/12474 
      🟥 MSVC14.29          Pass:   0%/2   | Total:  1h 10m | Avg: 35m 25s | Max: 35m 38s
      🟥 MSVC14.42          Pass:   0%/3   | Total:  1h 15m | Avg: 25m 10s | Max: 38m 09s
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 39m | Avg: 49m 53s | Max: 52m 06s | Hits:  65%/3562  
    🟨 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 15m | Avg: 25m 36s | Max: 29m 54s | Hits:  79%/30277 
      🟨 GCC                Pass:  76%/21  | Total:  7h 17m | Avg: 20m 50s | Max: 32m 23s | Hits:  81%/28512 
      🟥 MSVC               Pass:   0%/5   | Total:  2h 26m | Avg: 29m 16s | Max: 38m 09s
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 39m | Avg: 49m 53s | Max: 52m 06s | Hits:  65%/3562  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 38m 16s | Avg: 19m 08s | Max: 27m 00s | Hits:  88%/3564  
    🟨 gpu
      🟥 h100               Pass:   0%/2   | Total:  4m 14s | Avg:  2m 07s | Max:  4m 14s
      🟨 rtx2080            Pass:  81%/33  | Total: 15h 42m | Avg: 28m 33s | Max: 52m 06s | Hits:  76%/48098 
      🟨 rtx4090            Pass:  80%/10  | Total:  2h 52m | Avg: 17m 12s | Max: 38m 09s | Hits:  91%/14253 
    🟨 jobs
      🟨 Build              Pass:  78%/38  | Total: 17h 50m | Avg: 28m 10s | Max: 52m 06s | Hits:  76%/53443 
      🟨 TestCPU            Pass:  66%/3   | Total: 15m 38s | Avg:  5m 12s | Max:  8m 03s | Hits:  99%/3563  
      🟨 TestGPU            Pass:  75%/4   | Total: 32m 25s | Avg:  8m 06s | Max: 11m 16s | Hits:  99%/5345  
    🟥 sm
      🟥 90                 Pass:   0%/2   | Total:  4m 14s | Avg:  2m 07s | Max:  4m 14s
      🟥 90;90a;100         Pass:   0%/1   | Total:  6m 18s | Avg:  6m 18s | Max:  6m 18s
    🟨 std
      🟨 17                 Pass:  75%/20  | Total:  9h 31m | Avg: 28m 33s | Max: 47m 40s | Hits:  76%/26722 
      🟨 20                 Pass:  78%/23  | Total:  8h 29m | Avg: 22m 09s | Max: 52m 06s | Hits:  81%/32065 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 15m 15s | Avg: 7m 37s | Max: 12m 55s | Hits: 98%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 15m 15s | Avg:  7m 37s | Max: 12m 55s | Hits:  98%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 15m 15s | Avg:  7m 37s | Max: 12m 55s | Hits:  98%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 15m 15s | Avg:  7m 37s | Max: 12m 55s | Hits:  98%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 15m 15s | Avg:  7m 37s | Max: 12m 55s | Hits:  98%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 15m 15s | Avg:  7m 37s | Max: 12m 55s | Hits:  98%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 15m 15s | Avg:  7m 37s | Max: 12m 55s | Hits:  98%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 15s | Avg:  7m 37s | Max: 12m 55s | Hits:  98%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 20s | Avg:  2m 20s | Max:  2m 20s | Hits:  98%/148   
      🟩 Test               Pass: 100%/1   | Total: 12m 55s | Avg: 12m 55s | Max: 12m 55s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 29m 52s | Avg: 29m 52s | Max: 29m 52s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 29m 52s | Avg: 29m 52s | Max: 29m 52s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 29m 52s | Avg: 29m 52s | Max: 29m 52s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 29m 52s | Avg: 29m 52s | Max: 29m 52s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 29m 52s | Avg: 29m 52s | Max: 29m 52s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 29m 52s | Avg: 29m 52s | Max: 29m 52s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 29m 52s | Avg: 29m 52s | Max: 29m 52s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 29m 52s | Avg: 29m 52s | Max: 29m 52s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 29m 52s | Avg: 29m 52s | Max: 29m 52s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@bernhardmgruber bernhardmgruber force-pushed the ref_transform branch 2 times, most recently from f5acdc1 to 0ffb845 Compare February 18, 2025 08:22
Copy link
Contributor

🟨 CI finished in 1h 36m: Pass: 88%/93 | Total: 2d 09h | Avg: 37m 19s | Max: 1h 11m | Hits: 78%/118216
  • 🟨 cub: Pass: 86%/45 | Total: 1d 13h | Avg: 49m 42s | Max: 1h 11m | Hits: 75%/46659

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  86%/43  | Total:  1d 11h | Avg: 49m 19s | Max:  1h 11m | Hits:  75%/44241 
      🟩 arm64              Pass: 100%/2   | Total:  1h 55m | Avg: 57m 54s | Max: 59m 17s | Hits:  68%/2418  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 55m | Avg: 57m 39s | Max: 58m 08s | Hits:  74%/2090  
      🔍 nvcc               Pass:  86%/43  | Total:  1d 11h | Avg: 49m 19s | Max:  1h 11m | Hits:  75%/44569 
    🔍 gpu: rtx2080 🔍
      🟩 h100               Pass: 100%/3   | Total:  1h 15m | Avg: 25m 00s | Max: 26m 21s | Hits:  89%/3627  
      🔍 rtx2080            Pass:  82%/34  | Total:  1d 07h | Avg: 56m 10s | Max:  1h 11m | Hits:  69%/33360 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 11m | Avg: 31m 26s | Max:  1h 00m | Hits:  91%/9672  
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  83%/37  | Total:  1d 10h | Avg: 55m 32s | Max:  1h 11m | Hits:  69%/36987 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 58s | Avg: 21m 58s | Max: 21m 58s | Hits:  99%/1209  
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 11s | Avg: 17m 11s | Max: 17m 11s | Hits:  99%/1209  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 14m | Avg: 24m 44s | Max: 25m 41s | Hits:  99%/3627  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 07m | Avg: 22m 39s | Max: 23m 13s | Hits:  99%/3627  
    🟨 ctk
      🟨 12.0               Pass:  60%/5   | Total:  4h 08m | Avg: 49m 44s | Max:  1h 00m | Hits:  68%/3633  
      🟩 12.5               Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 10m | Hits:  68%/2236  
      🟨 12.8               Pass:  89%/38  | Total:  1d 06h | Avg: 48m 38s | Max:  1h 11m | Hits:  76%/40790 
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 55m | Avg: 57m 39s | Max: 58m 08s | Hits:  74%/2090  
      🟨 nvcc12.0           Pass:  60%/5   | Total:  4h 08m | Avg: 49m 44s | Max:  1h 00m | Hits:  68%/3633  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 10m | Hits:  68%/2236  
      🟨 nvcc12.8           Pass:  88%/36  | Total:  1d 04h | Avg: 48m 08s | Max:  1h 11m | Hits:  76%/38700 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 50m | Avg: 57m 38s | Max:  1h 00m | Hits:  68%/4844  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 01m | Hits:  68%/2418  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 54m | Avg: 57m 03s | Max: 59m 33s | Hits:  68%/2418  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 58m | Avg: 59m 20s | Max:  1h 00m | Hits:  68%/2418  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 35m | Avg: 47m 59s | Max:  1h 00m | Hits:  79%/8135  
      🟥 GCC7               Pass:   0%/2   | Total:  1h 08m | Avg: 34m 28s | Max: 35m 10s
      🟩 GCC8               Pass: 100%/1   | Total: 59m 32s | Avg: 59m 32s | Max: 59m 32s | Hits:  68%/1211  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 56m | Avg: 58m 16s | Max:  1h 01m | Hits:  68%/2422  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 55m | Avg: 57m 32s | Max: 58m 10s | Hits:  68%/2422  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 54m | Avg: 57m 10s | Max: 59m 39s | Hits:  68%/2418  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 55m | Avg: 57m 40s | Max: 58m 18s | Hits:  68%/2418  
      🟩 GCC13              Pass: 100%/11  | Total:  6h 49m | Avg: 37m 14s | Max:  1h 11m | Hits:  85%/13299 
      🟥 MSVC14.29          Pass:   0%/2   | Total:  1h 29m | Avg: 44m 35s | Max: 45m 35s
      🟥 MSVC14.42          Pass:   0%/2   | Total:  1h 25m | Avg: 42m 45s | Max: 43m 15s
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 10m | Hits:  68%/2236  
    🟨 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 22m | Avg: 54m 17s | Max:  1h 01m | Hits:  73%/20233 
      🟨 GCC                Pass:  90%/22  | Total: 16h 39m | Avg: 45m 25s | Max:  1h 11m | Hits:  77%/24190 
      🟥 MSVC               Pass:   0%/4   | Total:  2h 54m | Avg: 43m 40s | Max: 45m 35s
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 10m | Hits:  68%/2236  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 15m | Avg: 25m 00s | Max: 26m 21s | Hits:  89%/3627  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 11m | Avg:  1h 11m | Max:  1h 11m | Hits:  68%/1209  
    🟨 std
      🟨 17                 Pass:  75%/20  | Total: 18h 04m | Avg: 54m 13s | Max:  1h 09m | Hits:  69%/17892 
      🟨 20                 Pass:  96%/25  | Total: 19h 11m | Avg: 46m 04s | Max:  1h 11m | Hits:  79%/28767 
    
  • 🟨 thrust: Pass: 88%/45 | Total: 19h 50m | Avg: 26m 27s | Max: 52m 57s | Hits: 79%/71261

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  88%/43  | Total: 18h 58m | Avg: 26m 28s | Max: 52m 57s | Hits:  79%/67698 
      🟩 arm64              Pass: 100%/2   | Total: 52m 27s | Avg: 26m 13s | Max: 28m 16s | Hits:  77%/3563  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 47m 35s | Avg: 23m 47s | Max: 24m 48s | Hits:  77%/3562  
      🔍 nvcc               Pass:  88%/43  | Total: 19h 03m | Avg: 26m 35s | Max: 52m 57s | Hits:  79%/67699 
    🚨 cxx_family: MSVC 🚨
      🟩 Clang              Pass: 100%/17  | Total:  6h 58m | Avg: 24m 37s | Max: 29m 50s | Hits:  79%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  8h 44m | Avg: 24m 57s | Max: 33m 06s | Hits:  80%/37422 
      🔥 MSVC               Pass:   0%/5   | Total:  2h 26m | Avg: 29m 16s | Max: 38m 14s
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 41m | Avg: 50m 54s | Max: 52m 57s | Hits:  65%/3562  
    🟨 ctk
      🟨 12.0               Pass:  80%/5   | Total:  2h 28m | Avg: 29m 41s | Max: 36m 24s | Hits:  77%/7126  
      🟩 12.5               Pass: 100%/2   | Total:  1h 41m | Avg: 50m 54s | Max: 52m 57s | Hits:  65%/3562  
      🟨 12.8               Pass:  89%/38  | Total: 15h 40m | Avg: 24m 45s | Max: 38m 14s | Hits:  80%/60573 
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 47m 35s | Avg: 23m 47s | Max: 24m 48s | Hits:  77%/3562  
      🟨 nvcc12.0           Pass:  80%/5   | Total:  2h 28m | Avg: 29m 41s | Max: 36m 24s | Hits:  77%/7126  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 41m | Avg: 50m 54s | Max: 52m 57s | Hits:  65%/3562  
      🟨 nvcc12.8           Pass:  88%/36  | Total: 14h 53m | Avg: 24m 48s | Max: 38m 14s | Hits:  81%/57011 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 48m | Avg: 27m 13s | Max: 28m 56s | Hits:  77%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 56m 43s | Avg: 28m 21s | Max: 29m 50s | Hits:  77%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 53m 27s | Avg: 26m 43s | Max: 27m 11s | Hits:  77%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 54m 30s | Avg: 27m 15s | Max: 27m 47s | Hits:  77%/3562  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 24m | Avg: 20m 42s | Max: 29m 16s | Hits:  83%/12467 
      🟩 GCC7               Pass: 100%/2   | Total: 59m 47s | Avg: 29m 53s | Max: 33m 06s | Hits:  69%/3564  
      🟩 GCC8               Pass: 100%/1   | Total: 28m 48s | Avg: 28m 48s | Max: 28m 48s | Hits:  77%/1782  
      🟩 GCC9               Pass: 100%/2   | Total: 59m 44s | Avg: 29m 52s | Max: 30m 01s | Hits:  77%/3564  
      🟩 GCC10              Pass: 100%/2   | Total: 55m 51s | Avg: 27m 55s | Max: 28m 48s | Hits:  77%/3564  
      🟩 GCC11              Pass: 100%/2   | Total: 56m 59s | Avg: 28m 29s | Max: 29m 54s | Hits:  77%/3564  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 25s | Max: 31m 40s | Hits:  77%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 22m | Avg: 20m 12s | Max: 32m 03s | Hits:  86%/17820 
      🟥 MSVC14.29          Pass:   0%/2   | Total:  1h 14m | Avg: 37m 16s | Max: 38m 09s
      🟥 MSVC14.42          Pass:   0%/3   | Total:  1h 11m | Avg: 23m 55s | Max: 38m 14s
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 41m | Avg: 50m 54s | Max: 52m 57s | Hits:  65%/3562  
    🟨 gpu
      🟩 h100               Pass: 100%/2   | Total: 28m 39s | Avg: 14m 19s | Max: 17m 14s | Hits:  88%/3564  
      🟨 rtx2080            Pass:  90%/33  | Total: 16h 32m | Avg: 30m 03s | Max: 52m 57s | Hits:  75%/53444 
      🟨 rtx4090            Pass:  80%/10  | Total:  2h 50m | Avg: 17m 00s | Max: 38m 14s | Hits:  91%/14253 
    🟨 jobs
      🟨 Build              Pass:  89%/38  | Total: 18h 51m | Avg: 29m 46s | Max: 52m 57s | Hits:  76%/60571 
      🟨 TestCPU            Pass:  66%/3   | Total: 15m 13s | Avg:  5m 04s | Max:  7m 46s | Hits:  99%/3563  
      🟩 TestGPU            Pass: 100%/4   | Total: 44m 07s | Avg: 11m 01s | Max: 11m 32s | Hits:  99%/7127  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 34m 50s | Avg: 17m 25s | Max: 23m 47s | Hits:  88%/3564  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 28m 39s | Avg: 14m 19s | Max: 17m 14s | Hits:  88%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total: 32m 03s | Avg: 32m 03s | Max: 32m 03s | Hits:  77%/1782  
    🟨 std
      🟨 17                 Pass:  85%/20  | Total: 10h 04m | Avg: 30m 12s | Max: 48m 51s | Hits:  75%/30286 
      🟨 20                 Pass:  91%/23  | Total:  9h 11m | Avg: 23m 59s | Max: 52m 57s | Hits:  82%/37411 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 12m 57s | Avg: 6m 28s | Max: 10m 40s | Hits: 98%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 40s | Hits:  98%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 40s | Hits:  98%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 40s | Hits:  98%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 40s | Hits:  98%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 40s | Hits:  98%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 40s | Hits:  98%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 40s | Hits:  98%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 17s | Avg:  2m 17s | Max:  2m 17s | Hits:  98%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 40s | Avg: 10m 40s | Max: 10m 40s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 30m 17s | Avg: 30m 17s | Max: 30m 17s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 30m 17s | Avg: 30m 17s | Max: 30m 17s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 30m 17s | Avg: 30m 17s | Max: 30m 17s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 30m 17s | Avg: 30m 17s | Max: 30m 17s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 30m 17s | Avg: 30m 17s | Max: 30m 17s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 30m 17s | Avg: 30m 17s | Max: 30m 17s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 30m 17s | Avg: 30m 17s | Max: 30m 17s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 30m 17s | Avg: 30m 17s | Max: 30m 17s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 30m 17s | Avg: 30m 17s | Max: 30m 17s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

Copy link
Contributor

🟩 CI finished in 1h 38m: Pass: 100%/93 | Total: 2d 13h | Avg: 39m 37s | Max: 1h 19m | Hits: 74%/133653
  • 🟩 cub: Pass: 100%/45 | Total: 1d 15h | Avg: 52m 24s | Max: 1h 19m | Hits: 70%/53221

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 13h | Avg: 52m 07s | Max:  1h 19m | Hits:  70%/50803 
      🟩 arm64              Pass: 100%/2   | Total:  1h 56m | Avg: 58m 19s | Max: 59m 19s | Hits:  68%/2418  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 49m | Avg: 57m 50s | Max:  1h 03m | Hits:  58%/5879  
      🟩 12.5               Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 06m | Hits:  68%/2236  
      🟩 12.8               Pass: 100%/38  | Total:  1d 08h | Avg: 51m 00s | Max:  1h 19m | Hits:  71%/45106 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 55m | Avg: 57m 41s | Max: 57m 46s | Hits:  74%/2090  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 49m | Avg: 57m 50s | Max:  1h 03m | Hits:  58%/5879  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 06m | Hits:  68%/2236  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 06h | Avg: 50m 38s | Max:  1h 19m | Hits:  71%/43016 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 55m | Avg: 57m 41s | Max: 57m 46s | Hits:  74%/2090  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 13h | Avg: 52m 09s | Max:  1h 19m | Hits:  69%/51131 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 44m | Avg: 56m 13s | Max:  1h 00m | Hits:  68%/4844  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 49m | Avg: 54m 56s | Max: 55m 10s | Hits:  68%/2418  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 49m | Avg: 54m 40s | Max: 55m 12s | Hits:  68%/2418  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 52m | Avg: 56m 08s | Max: 58m 01s | Hits:  68%/2418  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 35m | Avg: 47m 54s | Max:  1h 00m | Hits:  79%/8135  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 56m | Avg: 58m 27s | Max: 58m 32s | Hits:  68%/2422  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  68%/1211  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 58m | Avg: 59m 17s | Max:  1h 02m | Hits:  68%/2422  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 51m | Avg: 55m 49s | Max: 56m 05s | Hits:  68%/2422  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 54m | Avg: 57m 00s | Max: 57m 05s | Hits:  68%/2418  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 52m | Avg: 56m 17s | Max: 56m 45s | Hits:  68%/2418  
      🟩 GCC13              Pass: 100%/11  | Total:  6h 46m | Avg: 36m 56s | Max:  1h 10m | Hits:  85%/13299 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 19m | Hits:  14%/2070  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 31m | Avg:  1h 15m | Max:  1h 16m | Hits:  14%/2070  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 06m | Hits:  68%/2236  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 14h 51m | Avg: 52m 27s | Max:  1h 00m | Hits:  73%/20233 
      🟩 GCC                Pass: 100%/22  | Total: 17h 20m | Avg: 47m 18s | Max:  1h 10m | Hits:  76%/26612 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 54m | Avg:  1h 13m | Max:  1h 19m | Hits:  14%/4140  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 06m | Hits:  68%/2236  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 14m | Avg: 24m 49s | Max: 27m 18s | Hits:  89%/3627  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 09h | Avg: 59m 51s | Max:  1h 19m | Hits:  63%/39922 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 08m | Avg: 31m 05s | Max:  1h 03m | Hits:  91%/9672  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 12h | Avg: 58m 56s | Max:  1h 19m | Hits:  63%/43549 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 30s | Avg: 21m 30s | Max: 21m 30s | Hits:  99%/1209  
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 04s | Avg: 17m 04s | Max: 17m 04s | Hits:  99%/1209  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 15m | Avg: 25m 00s | Max: 27m 18s | Hits:  99%/3627  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 03m | Avg: 21m 12s | Max: 22m 42s | Hits:  99%/3627  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 14m | Avg: 24m 49s | Max: 27m 18s | Hits:  89%/3627  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 10m | Avg:  1h 10m | Max:  1h 10m | Hits:  68%/1209  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 20h 07m | Avg:  1h 00m | Max:  1h 19m | Hits:  61%/23419 
      🟩 20                 Pass: 100%/25  | Total: 19h 11m | Avg: 46m 02s | Max:  1h 16m | Hits:  76%/29802 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 21h 23m | Avg: 28m 30s | Max: 59m 02s | Hits: 77%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 36m 28s | Avg: 18m 14s | Max: 25m 03s | Hits:  88%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 20h 30m | Avg: 28m 37s | Max: 59m 02s | Hits:  77%/76573 
      🟩 arm64              Pass: 100%/2   | Total: 52m 03s | Avg: 26m 01s | Max: 27m 17s | Hits:  77%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  2h 41m | Avg: 32m 23s | Max: 49m 15s | Hits:  72%/8901  
      🟩 12.5               Pass: 100%/2   | Total:  1h 33m | Avg: 46m 35s | Max: 47m 33s | Hits:  65%/3562  
      🟩 12.8               Pass: 100%/38  | Total: 17h 07m | Avg: 27m 02s | Max: 59m 02s | Hits:  78%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 44m 43s | Avg: 22m 21s | Max: 22m 48s | Hits:  77%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 41m | Avg: 32m 23s | Max: 49m 15s | Hits:  72%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 33m | Avg: 46m 35s | Max: 47m 33s | Hits:  65%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 16h 23m | Avg: 27m 18s | Max: 59m 02s | Hits:  78%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 44m 43s | Avg: 22m 21s | Max: 22m 48s | Hits:  77%/3562  
      🟩 nvcc               Pass: 100%/43  | Total: 20h 38m | Avg: 28m 47s | Max: 59m 02s | Hits:  77%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 54m | Avg: 28m 33s | Max: 29m 58s | Hits:  77%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 53m 18s | Avg: 26m 39s | Max: 26m 51s | Hits:  77%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 56m 58s | Avg: 28m 29s | Max: 28m 50s | Hits:  77%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 57m 31s | Avg: 28m 45s | Max: 29m 13s | Hits:  77%/3562  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 19m | Avg: 19m 55s | Max: 26m 31s | Hits:  83%/12467 
      🟩 GCC7               Pass: 100%/2   | Total: 55m 08s | Avg: 27m 34s | Max: 28m 00s | Hits:  77%/3564  
      🟩 GCC8               Pass: 100%/1   | Total: 26m 41s | Avg: 26m 41s | Max: 26m 41s | Hits:  77%/1782  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 00m | Avg: 30m 28s | Max: 30m 32s | Hits:  77%/3564  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 15s | Max: 30m 59s | Hits:  77%/3564  
      🟩 GCC11              Pass: 100%/2   | Total: 59m 50s | Avg: 29m 55s | Max: 30m 56s | Hits:  77%/3564  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 05m | Avg: 32m 38s | Max: 33m 00s | Hits:  77%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 21m | Avg: 20m 10s | Max: 33m 21s | Hits:  86%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 40m | Avg: 50m 06s | Max: 50m 58s | Hits:  54%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 18m | Avg: 46m 01s | Max: 59m 02s | Hits:  60%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 33m | Avg: 46m 35s | Max: 47m 33s | Hits:  65%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 01m | Avg: 24m 47s | Max: 29m 58s | Hits:  79%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  8h 50m | Avg: 25m 14s | Max: 33m 21s | Hits:  81%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  3h 58m | Avg: 47m 39s | Max: 59m 02s | Hits:  58%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 33m | Avg: 46m 35s | Max: 47m 33s | Hits:  65%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 26m 58s | Avg: 13m 29s | Max: 15m 38s | Hits:  88%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total: 17h 12m | Avg: 31m 17s | Max: 50m 58s | Hits:  74%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 43m | Avg: 22m 20s | Max: 59m 02s | Hits:  85%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 19h 52m | Avg: 31m 22s | Max: 59m 02s | Hits:  74%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 46m 53s | Avg: 15m 37s | Max: 30m 55s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 55s | Avg: 10m 58s | Max: 11m 25s | Hits:  99%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 26m 58s | Avg: 13m 29s | Max: 15m 38s | Hits:  88%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total: 29m 40s | Avg: 29m 40s | Max: 29m 40s | Hits:  77%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 10h 47m | Avg: 32m 21s | Max: 50m 58s | Hits:  73%/35611 
      🟩 20                 Pass: 100%/23  | Total:  9h 59m | Avg: 26m 03s | Max: 59m 02s | Hits:  80%/40961 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 13m 02s | Avg: 6m 31s | Max: 10m 44s | Hits: 98%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 44s | Hits:  98%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 44s | Hits:  98%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 44s | Hits:  98%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 44s | Hits:  98%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 44s | Hits:  98%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 44s | Hits:  98%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 44s | Hits:  98%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 18s | Avg:  2m 18s | Max:  2m 18s | Hits:  98%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 44s | Avg: 10m 44s | Max: 10m 44s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 30m 11s | Avg: 30m 11s | Max: 30m 11s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 30m 11s | Avg: 30m 11s | Max: 30m 11s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 30m 11s | Avg: 30m 11s | Max: 30m 11s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 30m 11s | Avg: 30m 11s | Max: 30m 11s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 30m 11s | Avg: 30m 11s | Max: 30m 11s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 30m 11s | Avg: 30m 11s | Max: 30m 11s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 30m 11s | Avg: 30m 11s | Max: 30m 11s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 30m 11s | Avg: 30m 11s | Max: 30m 11s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 30m 11s | Avg: 30m 11s | Max: 30m 11s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@bernhardmgruber
Copy link
Contributor Author

bernhardmgruber commented Feb 19, 2025

So, I have a diff in SASS, but it looks just like some addresses changed. I think it's because I changed the order of evaluation of some packs. I am checking whether I can revert this.

The diff looks mostly like this:
image

@elstehle
Copy link
Collaborator

So, I have a diff in SASS, but it looks just like some addresses changed. I think it's because I changed the order of evaluation of some packs. I am checking whether I can revert this.

The diff looks mostly like this: image

I'd be curious to see if this has any noticeable performance implications?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

2 participants