Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use libcu++ limits/trait in tests/benchmarks #3822

Merged
merged 5 commits into from
Feb 17, 2025

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Feb 15, 2025

This pulls out the non-breaking part of #3384, which:

  • Adds specialization of libcu++ traits for several types used for testing
  • Replace each use of CUB traits or std traits in unit tests or benchmarkes by cuda::std traits. Applies to limits as well.

This PR does NOT replace any use of CUB traits in the CUB/Thrust headers, because that would break any user opting in with a custom type by specializing CUB traits.

Copy link
Contributor

🟨 CI finished in 1h 27m: Pass: 97%/93 | Total: 1d 20h | Avg: 28m 27s | Max: 1h 11m | Hits: 80%/132111
  • 🟨 cub: Pass: 95%/45 | Total: 1d 12h | Avg: 49m 14s | Max: 1h 11m | Hits: 53%/51319

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  95%/43  | Total:  1d 10h | Avg: 48m 47s | Max:  1h 11m | Hits:  54%/48877 
      🟩 arm64              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 04s | Max: 59m 47s | Hits:  47%/2442  
    🔍 ctk: 12.8 🔍
      🟩 12.0               Pass: 100%/5   | Total:  4h 11m | Avg: 50m 16s | Max:  1h 00m | Hits:  42%/5939  
      🟩 12.5               Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 11m | Hits:  46%/2260  
      🔍 12.8               Pass:  94%/38  | Total:  1d 06h | Avg: 48m 07s | Max:  1h 07m | Hits:  55%/43120 
    🔍 cudacxx: nvcc12.8 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 51m | Avg: 55m 36s | Max: 57m 14s | Hits:  83%/2114  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 11m | Avg: 50m 16s | Max:  1h 00m | Hits:  42%/5939  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 11m | Hits:  46%/2260  
      🔍 nvcc12.8           Pass:  94%/36  | Total:  1d 04h | Avg: 47m 42s | Max:  1h 07m | Hits:  54%/41006 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 51m | Avg: 55m 36s | Max: 57m 14s | Hits:  83%/2114  
      🔍 nvcc               Pass:  95%/43  | Total:  1d 11h | Avg: 48m 57s | Max:  1h 11m | Hits:  52%/49205 
    🔍 gpu: rtxa6000 🔍
      🟩 h100               Pass: 100%/3   | Total:  1h 14m | Avg: 24m 47s | Max: 25m 49s | Hits:  82%/3663  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 07h | Avg: 55m 36s | Max:  1h 11m | Hits:  46%/40330 
      🔍 rtxa6000           Pass:  75%/8   | Total:  4h 10m | Avg: 31m 22s | Max:  1h 02m | Hits:  82%/7326  
    🔍 jobs: TestGPU 🔍
      🟩 Build              Pass: 100%/37  | Total:  1d 09h | Avg: 55m 07s | Max:  1h 11m | Hits:  46%/43993 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 32s | Avg: 21m 32s | Max: 21m 32s | Hits:  99%/1221  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 40s | Avg: 16m 40s | Max: 16m 40s | Hits:  99%/1221  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 11m | Avg: 23m 56s | Max: 25m 18s | Hits:  99%/3663  
      🔍 TestGPU            Pass:  33%/3   | Total:  1h 06m | Avg: 22m 08s | Max: 23m 15s | Hits:  99%/1221  
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total: 17h 55m | Avg: 53m 47s | Max:  1h 05m | Hits:  44%/23659 
      🔍 20                 Pass:  92%/25  | Total: 19h 00m | Avg: 45m 36s | Max:  1h 11m | Hits:  61%/27660 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 39m | Avg: 54m 57s | Max: 57m 55s | Hits:  47%/4892  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 54m | Avg: 57m 01s | Max: 57m 14s | Hits:  47%/2442  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 02m | Hits:  47%/2442  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 59m | Avg: 59m 39s | Max:  1h 02m | Hits:  47%/2442  
      🟨 Clang18            Pass:  85%/7   | Total:  5h 32m | Avg: 47m 25s | Max:  1h 02m | Hits:  67%/6998  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 50m | Avg: 55m 01s | Max: 56m 30s | Hits:  47%/2446  
      🟩 GCC8               Pass: 100%/1   | Total: 52m 43s | Avg: 52m 43s | Max: 52m 43s | Hits:  47%/1223  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 58m | Avg: 59m 09s | Max:  1h 00m | Hits:  47%/2446  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 01m | Hits:  47%/2446  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 48m | Avg: 54m 24s | Max: 55m 04s | Hits:  47%/2442  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 02m | Hits:  47%/2442  
      🟨 GCC13              Pass:  90%/11  | Total:  6h 45m | Avg: 36m 50s | Max:  1h 07m | Hits:  73%/12210 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 02m | Avg: 31m 06s | Max: 33m 03s | Hits:  15%/2094  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  1h 09m | Avg: 34m 40s | Max: 36m 13s | Hits:  15%/2094  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 11m | Hits:  46%/2260  
    🟨 cxx_family
      🟨 Clang              Pass:  94%/17  | Total: 15h 08m | Avg: 53m 25s | Max:  1h 02m | Hits:  55%/19216 
      🟨 GCC                Pass:  95%/22  | Total: 17h 20m | Avg: 47m 17s | Max:  1h 07m | Hits:  59%/25655 
      🟩 MSVC               Pass: 100%/4   | Total:  2h 11m | Avg: 32m 53s | Max: 36m 13s | Hits:  15%/4188  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 11m | Hits:  46%/2260  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 14m | Avg: 24m 47s | Max: 25m 49s | Hits:  82%/3663  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 07m | Avg:  1h 07m | Max:  1h 07m | Hits:  47%/1221  
    
  • 🟩 thrust: Pass: 100%/45 | Total: 6h 26m | Avg: 8m 35s | Max: 31m 22s | Hits: 96%/80496

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 32s | Avg:  8m 16s | Max: 11m 03s | Hits:  99%/3580  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 17m | Avg:  8m 46s | Max: 31m 22s | Hits:  96%/76917 
      🟩 arm64              Pass: 100%/2   | Total:  9m 22s | Avg:  4m 41s | Max:  5m 00s | Hits:  99%/3579  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 41m 24s | Avg:  8m 16s | Max: 21m 47s | Hits:  94%/8941  
      🟩 12.5               Pass: 100%/2   | Total: 26m 54s | Avg: 13m 27s | Max: 13m 31s | Hits:  99%/3578  
      🟩 12.8               Pass: 100%/38  | Total:  5h 18m | Avg:  8m 23s | Max: 31m 22s | Hits:  96%/67977 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 08s | Avg:  5m 04s | Max:  5m 17s | Hits: 100%/3578  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 41m 24s | Avg:  8m 16s | Max: 21m 47s | Hits:  94%/8941  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 26m 54s | Avg: 13m 27s | Max: 13m 31s | Hits:  99%/3578  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 08m | Avg:  8m 34s | Max: 31m 22s | Hits:  96%/64399 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 08s | Avg:  5m 04s | Max:  5m 17s | Hits: 100%/3578  
      🟩 nvcc               Pass: 100%/43  | Total:  6h 16m | Avg:  8m 45s | Max: 31m 22s | Hits:  96%/76918 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 17s | Avg:  5m 04s | Max:  5m 26s | Hits: 100%/7156  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 36s | Avg:  5m 48s | Max:  5m 52s | Hits: 100%/3578  
      🟩 Clang16            Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  5m 18s | Hits: 100%/3578  
      🟩 Clang17            Pass: 100%/2   | Total: 10m 55s | Avg:  5m 27s | Max:  5m 29s | Hits: 100%/3578  
      🟩 Clang18            Pass: 100%/7   | Total: 42m 43s | Avg:  6m 06s | Max: 10m 11s | Hits: 100%/12523 
      🟩 GCC7               Pass: 100%/2   | Total: 10m 17s | Avg:  5m 08s | Max:  5m 12s | Hits:  99%/3580  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 29s | Avg:  5m 29s | Max:  5m 29s | Hits:  99%/1790  
      🟩 GCC9               Pass: 100%/2   | Total: 10m 18s | Avg:  5m 09s | Max:  5m 28s | Hits:  99%/3580  
      🟩 GCC10              Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  5m 18s | Hits:  99%/3580  
      🟩 GCC11              Pass: 100%/2   | Total: 11m 29s | Avg:  5m 44s | Max:  5m 52s | Hits:  99%/3580  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 26s | Avg:  5m 43s | Max:  5m 44s | Hits:  99%/3580  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 15m | Avg:  7m 33s | Max: 11m 45s | Hits:  99%/17900 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 46m 10s | Avg: 23m 05s | Max: 24m 23s | Hits:  70%/3566  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 22m | Avg: 27m 32s | Max: 31m 22s | Hits:  70%/5349  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 26m 54s | Avg: 13m 27s | Max: 13m 31s | Hits:  99%/3578  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 36m | Avg:  5m 39s | Max: 10m 11s | Hits: 100%/30413 
      🟩 GCC                Pass: 100%/21  | Total:  2h 15m | Avg:  6m 26s | Max: 11m 45s | Hits:  99%/37590 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 08m | Avg: 25m 45s | Max: 31m 22s | Hits:  70%/8915  
      🟩 NVHPC              Pass: 100%/2   | Total: 26m 54s | Avg: 13m 27s | Max: 13m 31s | Hits:  99%/3578  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 16m 16s | Avg:  8m 08s | Max: 11m 45s | Hits:  99%/3580  
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 07m | Avg:  7m 29s | Max: 24m 23s | Hits:  97%/59033 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 03m | Avg: 12m 20s | Max: 31m 22s | Hits:  94%/17883 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  4h 56m | Avg:  7m 47s | Max: 27m 12s | Hits:  96%/67975 
      🟩 TestCPU            Pass: 100%/3   | Total: 46m 30s | Avg: 15m 30s | Max: 31m 22s | Hits:  90%/5362  
      🟩 TestGPU            Pass: 100%/4   | Total: 44m 17s | Avg: 11m 04s | Max: 11m 45s | Hits:  99%/7159  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 16m 16s | Avg:  8m 08s | Max: 11m 45s | Hits:  99%/3580  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 29s | Avg:  6m 29s | Max:  6m 29s | Hits:  99%/1790  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  2h 49m | Avg:  8m 28s | Max: 24m 23s | Hits:  95%/35771 
      🟩 20                 Pass: 100%/23  | Total:  3h 20m | Avg:  8m 44s | Max: 31m 22s | Hits:  97%/41145 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 13m 09s | Avg: 6m 34s | Max: 10m 51s | Hits: 98%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 13m 09s | Avg:  6m 34s | Max: 10m 51s | Hits:  98%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 13m 09s | Avg:  6m 34s | Max: 10m 51s | Hits:  98%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 13m 09s | Avg:  6m 34s | Max: 10m 51s | Hits:  98%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 13m 09s | Avg:  6m 34s | Max: 10m 51s | Hits:  98%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 13m 09s | Avg:  6m 34s | Max: 10m 51s | Hits:  98%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 13m 09s | Avg:  6m 34s | Max: 10m 51s | Hits:  98%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 13m 09s | Avg:  6m 34s | Max: 10m 51s | Hits:  98%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 18s | Avg:  2m 18s | Max:  2m 18s | Hits:  98%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 51s | Avg: 10m 51s | Max: 10m 51s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 29m 56s | Avg: 29m 56s | Max: 29m 56s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 29m 56s | Avg: 29m 56s | Max: 29m 56s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 29m 56s | Avg: 29m 56s | Max: 29m 56s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 29m 56s | Avg: 29m 56s | Max: 29m 56s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 29m 56s | Avg: 29m 56s | Max: 29m 56s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 29m 56s | Avg: 29m 56s | Max: 29m 56s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 29m 56s | Avg: 29m 56s | Max: 29m 56s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 29m 56s | Avg: 29m 56s | Max: 29m 56s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 29m 56s | Avg: 29m 56s | Max: 29m 56s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
+/- Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

cub/test/test_util.h Outdated Show resolved Hide resolved
cub/test/test_util.h Show resolved Hide resolved
template <template <typename> class... Policies>
class numeric_limits<c2h::custom_type_t<Policies...>>
class __numeric_limits_impl<c2h::custom_type_t<Policies...>, __numeric_limits_type::__other>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt this also work if we just specialized numeric_limits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't and I need your guidance here. The problem is that CUB sometimes queries numeric_limits<const T> and that does not find a specialization if I only specialize:

class numeric_limits<c2h::custom_type_t<Policies...>> { ... };

Specializing __numeric_limits_impl works, because numeric_limits strips CV qualifiers and passes the type to __numeric_limits_impl. What's the best practice? Should be specialize numeric_limits multiple times for T, const T, and maybe also for referneces of those? Or should we try harder inside CUB to call like numeric_limits<decay_t<T>>?

Copy link
Collaborator

@miscco miscco Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a bug in our implementation then: https://eel.is/c++draft/numeric.limits#general-5

The value of each member of a specialization of numeric_limits on a cv-qualified type cv T shall be equal to the value of the corresponding member of the specialization on the unqualified type T.

// replace with
// std::numeric_limits<T>::max() when
// C++11 support is more prevalent
// TODO(bgruber): replace with ::cuda::std::numeric_limits<T>::max() (breaking change)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were breaking things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to separate non-breaking from breaking changes, so we can more easily roll if needed, and have a smaller diff to inspect with more attention.

Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through it, no change requested.

That said I found a few places where we are unconsistent about fully qualifying cuda::std

Feel free to ignore

cub/test/catch2_test_device_histogram.cu Outdated Show resolved Hide resolved
cub/test/catch2_test_device_reduce_by_key.cu Outdated Show resolved Hide resolved
cub/test/catch2_test_device_scan.cu Outdated Show resolved Hide resolved
cub/test/catch2_test_device_scan_iterators.cu Outdated Show resolved Hide resolved
cub/test/catch2_test_device_segmented_reduce.cu Outdated Show resolved Hide resolved
cub/test/catch2_test_device_segmented_reduce.cu Outdated Show resolved Hide resolved
@bernhardmgruber bernhardmgruber force-pushed the ref_limits branch 2 times, most recently from ab531ba to bab6e89 Compare February 17, 2025 09:45
Copy link
Contributor

🟨 CI finished in 1h 25m: Pass: 97%/93 | Total: 1d 22h | Avg: 30m 11s | Max: 1h 13m | Hits: 86%/131751
  • 🟨 cub: Pass: 95%/45 | Total: 1d 15h | Avg: 52m 49s | Max: 1h 13m | Hits: 71%/51319

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  95%/43  | Total:  1d 13h | Avg: 52m 20s | Max:  1h 13m | Hits:  71%/48877 
      🟩 arm64              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 05m | Hits:  72%/2442  
    🔍 ctk: 12.8 🔍
      🟩 12.0               Pass: 100%/5   | Total:  4h 48m | Avg: 57m 47s | Max:  1h 02m | Hits:  61%/5939  
      🟩 12.5               Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m | Hits:  68%/2260  
      🔍 12.8               Pass:  94%/38  | Total:  1d 08h | Avg: 51m 24s | Max:  1h 13m | Hits:  72%/43120 
    🔍 cudacxx: nvcc12.8 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 58m | Avg: 59m 09s | Max:  1h 01m | Hits:  75%/2114  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 48m | Avg: 57m 47s | Max:  1h 02m | Hits:  61%/5939  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m | Hits:  68%/2260  
      🔍 nvcc12.8           Pass:  94%/36  | Total:  1d 06h | Avg: 50m 58s | Max:  1h 13m | Hits:  72%/41006 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 58m | Avg: 59m 09s | Max:  1h 01m | Hits:  75%/2114  
      🔍 nvcc               Pass:  95%/43  | Total:  1d 13h | Avg: 52m 31s | Max:  1h 13m | Hits:  71%/49205 
    🔍 gpu: rtxa6000 🔍
      🟩 h100               Pass: 100%/3   | Total:  1h 12m | Avg: 24m 08s | Max: 24m 51s | Hits:  90%/3663  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 10h | Avg:  1h 00m | Max:  1h 13m | Hits:  65%/40330 
      🔍 rtxa6000           Pass:  75%/8   | Total:  4h 00m | Avg: 30m 05s | Max: 59m 31s | Hits:  90%/7326  
    🔍 jobs: TestGPU 🔍
      🟩 Build              Pass: 100%/37  | Total:  1d 12h | Avg: 59m 30s | Max:  1h 13m | Hits:  66%/43993 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 19s | Avg: 21m 19s | Max: 21m 19s | Hits:  99%/1221  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 12s | Avg: 16m 12s | Max: 16m 12s | Hits:  99%/1221  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 12m | Avg: 24m 03s | Max: 24m 24s | Hits:  99%/3663  
      🔍 TestGPU            Pass:  33%/3   | Total:  1h 05m | Avg: 21m 40s | Max: 23m 35s | Hits:  99%/1221  
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total: 20h 10m | Avg:  1h 00m | Max:  1h 12m | Hits:  64%/23659 
      🔍 20                 Pass:  92%/25  | Total: 19h 26m | Avg: 46m 38s | Max:  1h 13m | Hits:  77%/27660 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 42m | Avg: 55m 36s | Max: 59m 45s | Hits:  72%/4892  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 55m | Avg: 57m 37s | Max: 59m 40s | Hits:  72%/2442  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 54m | Avg: 57m 14s | Max: 58m 15s | Hits:  72%/2442  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 57m | Avg: 58m 39s | Max:  1h 00m | Hits:  72%/2442  
      🟨 Clang18            Pass:  85%/7   | Total:  5h 41m | Avg: 48m 44s | Max:  1h 05m | Hits:  77%/6998  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 55m | Avg: 57m 36s | Max:  1h 01m | Hits:  71%/2446  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  72%/1223  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 56m | Avg: 58m 17s | Max: 58m 57s | Hits:  72%/2446  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 07s | Max:  1h 00m | Hits:  72%/2446  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 54m | Avg: 57m 03s | Max: 59m 36s | Hits:  71%/2442  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 16s | Max:  1h 00m | Hits:  71%/2442  
      🟨 GCC13              Pass:  90%/11  | Total:  6h 48m | Avg: 37m 08s | Max:  1h 09m | Hits:  85%/12210 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 12m | Hits:  12%/2094  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 13m | Hits:  12%/2094  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m | Hits:  68%/2260  
    🟨 cxx_family
      🟨 Clang              Pass:  94%/17  | Total: 15h 10m | Avg: 53m 34s | Max:  1h 05m | Hits:  74%/19216 
      🟨 GCC                Pass:  95%/22  | Total: 17h 31m | Avg: 47m 47s | Max:  1h 09m | Hits:  78%/25655 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 40m | Avg:  1h 10m | Max:  1h 13m | Hits:  12%/4188  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m | Hits:  68%/2260  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 12m | Avg: 24m 08s | Max: 24m 51s | Hits:  90%/3663  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m | Hits:  71%/1221  
    
  • 🟩 thrust: Pass: 100%/45 | Total: 6h 28m | Avg: 8m 37s | Max: 32m 06s | Hits: 96%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 59s | Avg:  8m 29s | Max: 11m 04s | Hits:  99%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 18m | Avg:  8m 48s | Max: 32m 06s | Hits:  96%/76573 
      🟩 arm64              Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  5m 05s | Hits:  99%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 42m 01s | Avg:  8m 24s | Max: 22m 21s | Hits:  94%/8901  
      🟩 12.5               Pass: 100%/2   | Total: 25m 40s | Avg: 12m 50s | Max: 13m 05s | Hits:  99%/3562  
      🟩 12.8               Pass: 100%/38  | Total:  5h 20m | Avg:  8m 25s | Max: 32m 06s | Hits:  96%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 13s | Avg:  5m 06s | Max:  5m 20s | Hits: 100%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 42m 01s | Avg:  8m 24s | Max: 22m 21s | Hits:  94%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 25m 40s | Avg: 12m 50s | Max: 13m 05s | Hits:  99%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 10m | Avg:  8m 37s | Max: 32m 06s | Hits:  96%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 13s | Avg:  5m 06s | Max:  5m 20s | Hits: 100%/3562  
      🟩 nvcc               Pass: 100%/43  | Total:  6h 17m | Avg:  8m 47s | Max: 32m 06s | Hits:  96%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 51s | Avg:  5m 12s | Max:  5m 43s | Hits: 100%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 08s | Avg:  5m 34s | Max:  5m 53s | Hits: 100%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 10m 56s | Avg:  5m 28s | Max:  5m 41s | Hits: 100%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 11m 26s | Avg:  5m 43s | Max:  5m 44s | Hits: 100%/3562  
      🟩 Clang18            Pass: 100%/7   | Total: 44m 03s | Avg:  6m 17s | Max: 10m 10s | Hits: 100%/12467 
      🟩 GCC7               Pass: 100%/2   | Total: 10m 21s | Avg:  5m 10s | Max:  5m 20s | Hits:  99%/3564  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 19s | Avg:  5m 19s | Max:  5m 19s | Hits:  99%/1782  
      🟩 GCC9               Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  5m 39s | Hits:  99%/3564  
      🟩 GCC10              Pass: 100%/2   | Total: 10m 18s | Avg:  5m 09s | Max:  5m 11s | Hits:  99%/3564  
      🟩 GCC11              Pass: 100%/2   | Total: 10m 57s | Avg:  5m 28s | Max:  5m 30s | Hits:  99%/3564  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 59s | Avg:  5m 59s | Max:  6m 15s | Hits:  99%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 15m | Avg:  7m 32s | Max: 11m 21s | Hits:  99%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 46m 00s | Avg: 23m 00s | Max: 23m 39s | Hits:  70%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 23m | Avg: 27m 44s | Max: 32m 06s | Hits:  70%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 25m 40s | Avg: 12m 50s | Max: 13m 05s | Hits:  99%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 38m | Avg:  5m 47s | Max: 10m 10s | Hits: 100%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  2h 14m | Avg:  6m 25s | Max: 11m 21s | Hits:  99%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 09m | Avg: 25m 50s | Max: 32m 06s | Hits:  70%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total: 25m 40s | Avg: 12m 50s | Max: 13m 05s | Hits:  99%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 16m 43s | Avg:  8m 21s | Max: 11m 21s | Hits:  99%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 08m | Avg:  7m 32s | Max: 26m 16s | Hits:  97%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 02m | Avg: 12m 15s | Max: 32m 06s | Hits:  94%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  4h 56m | Avg:  7m 48s | Max: 26m 16s | Hits:  96%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 47m 24s | Avg: 15m 48s | Max: 32m 06s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 55s | Avg: 10m 58s | Max: 11m 21s | Hits:  99%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 16m 43s | Avg:  8m 21s | Max: 11m 21s | Hits:  99%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total:  5m 58s | Avg:  5m 58s | Max:  5m 58s | Hits:  99%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  2h 51m | Avg:  8m 35s | Max: 26m 16s | Hits:  95%/35611 
      🟩 20                 Pass: 100%/23  | Total:  3h 19m | Avg:  8m 40s | Max: 32m 06s | Hits:  97%/40961 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 12m 56s | Avg: 6m 28s | Max: 10m 38s | Hits: 98%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 12m 56s | Avg:  6m 28s | Max: 10m 38s | Hits:  98%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 12m 56s | Avg:  6m 28s | Max: 10m 38s | Hits:  98%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 12m 56s | Avg:  6m 28s | Max: 10m 38s | Hits:  98%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 12m 56s | Avg:  6m 28s | Max: 10m 38s | Hits:  98%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 12m 56s | Avg:  6m 28s | Max: 10m 38s | Hits:  98%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 12m 56s | Avg:  6m 28s | Max: 10m 38s | Hits:  98%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 12m 56s | Avg:  6m 28s | Max: 10m 38s | Hits:  98%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 18s | Avg:  2m 18s | Max:  2m 18s | Hits:  98%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 38s | Avg: 10m 38s | Max: 10m 38s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 29m 59s | Avg: 29m 59s | Max: 29m 59s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 29m 59s | Avg: 29m 59s | Max: 29m 59s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 29m 59s | Avg: 29m 59s | Max: 29m 59s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 29m 59s | Avg: 29m 59s | Max: 29m 59s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 29m 59s | Avg: 29m 59s | Max: 29m 59s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 29m 59s | Avg: 29m 59s | Max: 29m 59s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 29m 59s | Avg: 29m 59s | Max: 29m 59s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 29m 59s | Avg: 29m 59s | Max: 29m 59s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 29m 59s | Avg: 29m 59s | Max: 29m 59s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
+/- Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@bernhardmgruber
Copy link
Contributor Author

I specialized cuda::std::numeric_traits directly, and now I have failures in e.g. cub.cpp20.test.block_scan.alg_1.mode_0. Somehow the specialization is not picked up.

@bernhardmgruber
Copy link
Contributor Author

I specialized cuda::std::numeric_traits directly, and now I have failures in e.g. cub.cpp20.test.block_scan.alg_1.mode_0. Somehow the specialization is not picked up.

ok, this is a fun one. The test is broken on main, because c2h uses std::numeric_limits to generate test data but defines limits for vector types (e.g. uchar3) using cub::Traits. So the generated test data is just zeros and the test passes. This PR fixes this problem by accident and uncovers the broken test.

@bernhardmgruber
Copy link
Contributor Author

I reported the bug here: #3835. And will disable the test now in this PR.

Comment on lines +329 to +330
// FIXME(bgruber): uchar3 fails the test, see #3835
using vec_types = c2h::type_list<ulonglong4, /*uchar3,*/ short2>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am disabling the failing test here. We should update #3835 once this PR is merged.

Copy link
Contributor

🟩 CI finished in 1h 27m: Pass: 100%/93 | Total: 1d 23h | Avg: 30m 41s | Max: 1h 14m | Hits: 86%/134193
  • 🟩 cub: Pass: 100%/45 | Total: 1d 15h | Avg: 52m 41s | Max: 1h 14m | Hits: 72%/53761

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 13h | Avg: 52m 11s | Max:  1h 14m | Hits:  72%/51319 
      🟩 arm64              Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 08m | Hits:  72%/2442  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 57m | Avg: 59m 32s | Max:  1h 04m | Hits:  61%/5939  
      🟩 12.5               Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 12m | Hits:  69%/2260  
      🟩 12.8               Pass: 100%/38  | Total:  1d 08h | Avg: 50m 59s | Max:  1h 14m | Hits:  74%/45562 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m | Hits:  75%/2114  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 57m | Avg: 59m 32s | Max:  1h 04m | Hits:  61%/5939  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 12m | Hits:  69%/2260  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 06h | Avg: 50m 19s | Max:  1h 14m | Hits:  74%/43448 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m | Hits:  75%/2114  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 13h | Avg: 52m 13s | Max:  1h 14m | Hits:  72%/51647 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 48m | Avg: 57m 12s | Max: 59m 47s | Hits:  72%/4892  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 53m | Avg: 56m 43s | Max: 58m 58s | Hits:  72%/2442  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 53m | Avg: 56m 58s | Max: 57m 17s | Hits:  72%/2442  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 55m | Avg: 57m 37s | Max:  1h 01m | Hits:  72%/2442  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 41m | Avg: 48m 49s | Max:  1h 04m | Hits:  81%/8219  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 55m | Avg: 57m 50s | Max: 58m 07s | Hits:  72%/2446  
      🟩 GCC8               Pass: 100%/1   | Total: 57m 09s | Avg: 57m 09s | Max: 57m 09s | Hits:  72%/1223  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 57m | Avg: 58m 41s | Max: 58m 51s | Hits:  72%/2446  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 55m | Avg: 57m 51s | Max: 59m 01s | Hits:  72%/2446  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 50m | Avg: 55m 09s | Max: 55m 20s | Hits:  72%/2442  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 51m | Avg: 55m 56s | Max: 56m 34s | Hits:  72%/2442  
      🟩 GCC13              Pass: 100%/11  | Total:  6h 53m | Avg: 37m 32s | Max:  1h 09m | Hits:  87%/13431 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 13m | Hits:  12%/2094  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 14m | Hits:  12%/2094  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 12m | Hits:  69%/2260  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 13m | Avg: 53m 43s | Max:  1h 04m | Hits:  75%/20437 
      🟩 GCC                Pass: 100%/22  | Total: 17h 21m | Avg: 47m 19s | Max:  1h 09m | Hits:  79%/26876 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 40m | Avg:  1h 10m | Max:  1h 14m | Hits:  12%/4188  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 12m | Hits:  69%/2260  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 15m | Avg: 25m 05s | Max: 26m 43s | Hits:  90%/3663  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 10h | Avg:  1h 00m | Max:  1h 14m | Hits:  66%/40330 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 04m | Avg: 30m 34s | Max:  1h 01m | Hits:  92%/9768  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 12h | Avg: 59m 17s | Max:  1h 14m | Hits:  66%/43993 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 24s | Avg: 21m 24s | Max: 21m 24s | Hits:  99%/1221  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 06s | Avg: 16m 06s | Max: 16m 06s | Hits:  99%/1221  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 14m | Avg: 24m 41s | Max: 25m 30s | Hits:  99%/3663  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 06m | Avg: 22m 06s | Max: 23m 03s | Hits:  99%/3663  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 15m | Avg: 25m 05s | Max: 26m 43s | Hits:  90%/3663  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m | Hits:  72%/1221  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 19h 53m | Avg: 59m 40s | Max:  1h 13m | Hits:  64%/23659 
      🟩 20                 Pass: 100%/25  | Total: 19h 37m | Avg: 47m 06s | Max:  1h 14m | Hits:  79%/30102 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 7h 18m | Avg: 9m 44s | Max: 38m 56s | Hits: 95%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 17m 05s | Avg:  8m 32s | Max: 11m 03s | Hits:  99%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  7h 08m | Avg:  9m 58s | Max: 38m 56s | Hits:  94%/76573 
      🟩 arm64              Pass: 100%/2   | Total:  9m 32s | Avg:  4m 46s | Max:  5m 00s | Hits:  99%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 07m | Avg: 13m 29s | Max: 30m 29s | Hits:  83%/8901  
      🟩 12.5               Pass: 100%/2   | Total: 26m 58s | Avg: 13m 29s | Max: 13m 38s | Hits:  99%/3562  
      🟩 12.8               Pass: 100%/38  | Total:  5h 43m | Avg:  9m 03s | Max: 38m 56s | Hits:  96%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  5m 23s | Hits: 100%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 07m | Avg: 13m 29s | Max: 30m 29s | Hits:  83%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 26m 58s | Avg: 13m 29s | Max: 13m 38s | Hits:  99%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 33m | Avg:  9m 15s | Max: 38m 56s | Hits:  96%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  5m 23s | Hits: 100%/3562  
      🟩 nvcc               Pass: 100%/43  | Total:  7h 07m | Avg:  9m 56s | Max: 38m 56s | Hits:  94%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 38s | Avg:  5m 09s | Max:  5m 44s | Hits: 100%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 28s | Avg:  5m 44s | Max:  5m 52s | Hits: 100%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 10m 48s | Avg:  5m 24s | Max:  5m 26s | Hits: 100%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 11m 36s | Avg:  5m 48s | Max:  5m 49s | Hits: 100%/3562  
      🟩 Clang18            Pass: 100%/7   | Total: 42m 38s | Avg:  6m 05s | Max: 10m 06s | Hits: 100%/12467 
      🟩 GCC7               Pass: 100%/2   | Total: 35m 54s | Avg: 17m 57s | Max: 30m 29s | Hits:  74%/3564  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 12s | Avg:  5m 12s | Max:  5m 12s | Hits:  99%/1782  
      🟩 GCC9               Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  5m 32s | Hits:  99%/3564  
      🟩 GCC10              Pass: 100%/2   | Total: 10m 48s | Avg:  5m 24s | Max:  5m 27s | Hits:  99%/3564  
      🟩 GCC11              Pass: 100%/2   | Total: 11m 28s | Avg:  5m 44s | Max:  5m 50s | Hits:  99%/3564  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 24s | Avg:  5m 42s | Max:  5m 49s | Hits:  99%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 43m | Avg: 10m 18s | Max: 38m 56s | Hits:  97%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 46m 01s | Avg: 23m 00s | Max: 23m 48s | Hits:  70%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 19m | Avg: 26m 36s | Max: 30m 26s | Hits:  70%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 26m 58s | Avg: 13m 29s | Max: 13m 38s | Hits:  99%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 37m | Avg:  5m 42s | Max: 10m 06s | Hits: 100%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  3h 08m | Avg:  8m 58s | Max: 38m 56s | Hits:  96%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 05m | Avg: 25m 09s | Max: 30m 26s | Hits:  70%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total: 26m 58s | Avg: 13m 29s | Max: 13m 38s | Hits:  99%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 16m 02s | Avg:  8m 01s | Max: 11m 32s | Hits:  99%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 32m | Avg:  8m 15s | Max: 30m 29s | Hits:  95%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 29m | Avg: 14m 58s | Max: 38m 56s | Hits:  91%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  5h 21m | Avg:  8m 27s | Max: 30m 29s | Hits:  95%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 45m 17s | Avg: 15m 05s | Max: 30m 26s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total:  1h 11m | Avg: 17m 54s | Max: 38m 56s | Hits:  94%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 16m 02s | Avg:  8m 01s | Max: 11m 32s | Hits:  99%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 28s | Avg:  6m 28s | Max:  6m 28s | Hits:  99%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 13m | Avg:  9m 39s | Max: 30m 29s | Hits:  92%/35611 
      🟩 20                 Pass: 100%/23  | Total:  3h 48m | Avg:  9m 55s | Max: 38m 56s | Hits:  96%/40961 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 14m 33s | Avg: 7m 16s | Max: 12m 15s | Hits: 98%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 14m 33s | Avg:  7m 16s | Max: 12m 15s | Hits:  98%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 14m 33s | Avg:  7m 16s | Max: 12m 15s | Hits:  98%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 14m 33s | Avg:  7m 16s | Max: 12m 15s | Hits:  98%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 14m 33s | Avg:  7m 16s | Max: 12m 15s | Hits:  98%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 14m 33s | Avg:  7m 16s | Max: 12m 15s | Hits:  98%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 14m 33s | Avg:  7m 16s | Max: 12m 15s | Hits:  98%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 14m 33s | Avg:  7m 16s | Max: 12m 15s | Hits:  98%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 18s | Avg:  2m 18s | Max:  2m 18s | Hits:  98%/148   
      🟩 Test               Pass: 100%/1   | Total: 12m 15s | Avg: 12m 15s | Max: 12m 15s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 30m 16s | Avg: 30m 16s | Max: 30m 16s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 30m 16s | Avg: 30m 16s | Max: 30m 16s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 30m 16s | Avg: 30m 16s | Max: 30m 16s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 30m 16s | Avg: 30m 16s | Max: 30m 16s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 30m 16s | Avg: 30m 16s | Max: 30m 16s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 30m 16s | Avg: 30m 16s | Max: 30m 16s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 30m 16s | Avg: 30m 16s | Max: 30m 16s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 30m 16s | Avg: 30m 16s | Max: 30m 16s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 30m 16s | Avg: 30m 16s | Max: 30m 16s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
+/- Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@bernhardmgruber bernhardmgruber merged commit 8595091 into NVIDIA:main Feb 17, 2025
108 of 110 checks passed
@bernhardmgruber bernhardmgruber deleted the ref_limits branch February 17, 2025 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants