Add support for nvidia modelopt fp8 kv cache #5128
Set up job
12s
12s
Error:
This step has been truncated due to its large size. Download the full logs from the menu
once the workflow run has completed.
Checkout code
9s
9s
Error:
This step has been truncated due to its large size. Download the full logs from the menu
once the workflow run has completed.
Install dependencies
1m 7s
1m 7s
Error:
This step has been truncated due to its large size. Download the full logs from the menu
once the workflow run has completed.
Benchmark single latency
27s
27s
Error:
This step has been truncated due to its large size. Download the full logs from the menu
once the workflow run has completed.
Benchmark online latency
2m 54s
2m 54s
Error:
This step has been truncated due to its large size. Download the full logs from the menu
once the workflow run has completed.
Benchmark offline throughput
3m 25s
3m 25s
Error:
This step has been truncated due to its large size. Download the full logs from the menu
once the workflow run has completed.
Benchmark offline throughput (Non-streaming, small batch size)
1m 24s
1m 24s
Error:
This step has been truncated due to its large size. Download the full logs from the menu
once the workflow run has completed.
Benchmark online latency (EAGLE)
1m 50s
1m 50s
Error:
This step has been truncated due to its large size. Download the full logs from the menu
once the workflow run has completed.
Post Checkout code
1s
1s
Error:
This step has been truncated due to its large size. Download the full logs from the menu
once the workflow run has completed.
Complete job
0s
0s
Error:
This step has been truncated due to its large size. Download the full logs from the menu
once the workflow run has completed.
Loading