[pull] main from vllm-project:main #32

pull · 2024-05-21T21:06:50Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

Signed-off-by: kerthcet <[email protected]>

openshift-ci · 2024-05-21T21:07:03Z

Hi @pull[bot]. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

…4722)

Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs

The 2nd PR for #4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).

…4894)

…Config (#4991)

…e) (#4983)

…ot defined (#5009)

Signed-off-by: Muralidhar Andoorveedu <[email protected]>

Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>

Co-authored-by: Elisei Smirnov <[email protected]>

Co-authored-by: Michael Goin <[email protected]>

Co-authored-by: Cody Yu <[email protected]>

Co-authored-by: Lei Wen <[email protected]>

…-Small model (#4799) Co-authored-by: beagleski <[email protected]> Co-authored-by: bapatra <[email protected]> Co-authored-by: Barun Patra <[email protected]> Co-authored-by: Michael Goin <[email protected]>

…5000)

Co-authored-by: Roger Wang <[email protected]>

…5108)

Co-authored-by: Alexey Kondratiev <[email protected]> Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]> Co-authored-by: Alexei V. Ivanov <[email protected]> Co-authored-by: omkarkakarparthi <okakarpa>

Co-authored-by: Breno Faria <[email protected]>

…er.py (#5129)

Co-authored-by: Roger Wang <[email protected]>

…red_metadata modifier (introduced with PTX 8.5) (#5136)

Co-authored-by: Zhuohan Li <[email protected]>

…e ::ordered_metadata modifier (introduced with PTX 8.5)" (#5149)

Co-authored-by: xuhao <[email protected]>

openshift-ci · 2024-05-31T09:01:24Z

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: pull[bot]

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Adds support for multi-lora adapters. Passing tests added over in this PR: https://github.ibm.com/ai-foundation/tgis-deploy-tests/pull/25/files --------- Signed-off-by: Joe Runde <[email protected]>

* add gaudi installation readme * readme writeup * Create README_GAUDI.md * Update README.md * Update README_GAUDI.md * Update README.md * Update readmes

Update linear.py

mgoin and others added 3 commits May 21, 2024 09:06

[CI/Build] Codespell ignore build/ directory (#4945)

757b62c

[Bugfix] Fix flag name for max_seq_len_to_capture (#4935)

14772ee

Signed-off-by: kerthcet <[email protected]>

[Bugfix][Kernel] Add head size check for attention backend selection (#…

99eff67

…4944)

openshift-ci bot requested review from dtrifiro and rpancham May 21, 2024 21:06

openshift-ci bot added the needs-ok-to-test label May 21, 2024

pull bot added ⤵️ pull and removed needs-ok-to-test labels May 21, 2024

sasha0552 and others added 2 commits May 22, 2024 01:32

[Frontend] Dynamic RoPE scaling (#4638)

9b9a10d

[CI/Build] Enforce style for C++ and CUDA code with clang-format (#…

5f6d10c

…4722)

dtrifiro added the ok-to-test label May 22, 2024

rkooo567 and others added 18 commits May 22, 2024 09:02

[misc] remove comments that were supposed to be removed (#4977)

c74c913

[Kernel] Fixup for CUTLASS kernels in CUDA graphs (#4954)

8674f98

Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs

[Misc] Load FP8 kv-cache scaling factors from checkpoints (#4893)

a3a73ab

The 2nd PR for #4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).

[Model] LoRA gptbigcode implementation (#3949)

97b0300

[Core] Eliminate parallel worker per-step task scheduling overhead (#…

eb6d3c2

…4894)

[Minor] Fix small typo in llama.py: QKVParallelLinear -> Quantization…

a36de68

…Config (#4991)

[Misc] Take user preference in attention selector (#4960)

ee3eea0

Marlin 24 prefill performance improvement (about 25% better on averag…

6066253

…e) (#4983)

[Bugfix] Update Dockerfile.cpu to fix NameError: name 'vllm_ops' is n…

2ba80be

…ot defined (#5009)

[Core][1/N] Support send/recv in PyNCCL Groups (#4988)

5eda2ea

Signed-off-by: Muralidhar Andoorveedu <[email protected]>

[Kernel] Initial Activation Quantization Support (#4525)

a124232

Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>

[Core]: Option To Use Prompt Token Ids Inside Logits Processor (#4985)

e3470f8

Co-authored-by: Elisei Smirnov <[email protected]>

[Doc] add ccache guide in doc (#5012)

6a50f4c

Co-authored-by: Michael Goin <[email protected]>

[Bugfix] Fix Mistral v0.3 Weight Loading (#5005)

9197709

Co-authored-by: Cody Yu <[email protected]>

[Core][Bugfix]: fix prefix caching for blockv2 (#4764)

e64fde4

Co-authored-by: Lei Wen <[email protected]>

[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3…

8e192ff

…-Small model (#4799) Co-authored-by: beagleski <[email protected]> Co-authored-by: bapatra <[email protected]> Co-authored-by: Barun Patra <[email protected]> Co-authored-by: Michael Goin <[email protected]>

[Misc] add logging level env var (#5045)

325c119

[Dynamic Spec Decoding] Minor fix for disabling speculative decoding (#…

d5a1697

…5000)

Etelis and others added 18 commits May 29, 2024 16:13

[Bugfix] logprobs is not compatible with the OpenAI spec #4795 (#5031)

7c3604f

[Doc][Build] update after removing vllm-nccl (#5103)

4fbcb0f

Co-authored-by: Roger Wang <[email protected]>

[Bugfix] gptq_marlin: Ensure g_idx_sort_indices is not a Parameter (#…

5bf185a

…5108)

[CI/Build] Docker cleanup functionality for amd servers (#5112)

e07aff9

Co-authored-by: Alexey Kondratiev <[email protected]> Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]> Co-authored-by: Alexei V. Ivanov <[email protected]> Co-authored-by: omkarkakarparthi <okakarpa>

[BUGFIX] [FRONTEND] Correct chat logprobs (#5029)

87d41c8

Co-authored-by: Breno Faria <[email protected]>

[Bugfix] Automatically Detect SparseML models (#5119)

d910816

[CI/Build] increase wheel size limit to 200 MB (#5130)

f758505

[Misc] remove duplicate definition of seq_lens_tensor in model_runn…

d79d9ea

…er.py (#5129)

[Doc] Use intersphinx and update entrypoints docs (#5125)

a9bcc7a

add doc about serving option on dstack (#3074)

429d897

Co-authored-by: Roger Wang <[email protected]>

Bump version to v0.4.3 (#5046)

87a658c

[Build] Disable sm_90a in cu11 (#5141)

45a1a69

[Bugfix] Avoid Warnings in SparseML Activation Quantization (#5120)

b35be54

[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::orde…

6d21fa1

…red_metadata modifier (introduced with PTX 8.5) (#5136)

Fix cutlass sm_90a vesrion in CMakeList

533c217

[Model] Support MAP-NEO model (#5081)

a22dea5

Co-authored-by: Zhuohan Li <[email protected]>

Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using th…

e9d3aa0

…e ::ordered_metadata modifier (introduced with PTX 8.5)" (#5149)

[Misc]: optimize eager mode host time (#4196)

a377f0b

Co-authored-by: xuhao <[email protected]>

dtrifiro marked this pull request as ready for review May 31, 2024 09:00

openshift-ci bot removed the do-not-merge/work-in-progress label May 31, 2024

dtrifiro added lgtm approved labels May 31, 2024

openshift-ci bot requested review from heyselbi and vaibhavjainwiz May 31, 2024 09:01

dtrifiro enabled auto-merge (rebase) May 31, 2024 09:49

dtrifiro merged commit 527c996 into opendatahub-io:main May 31, 2024
15 of 16 checks passed

Xaenalt pushed a commit that referenced this pull request Sep 18, 2024

Add release docs for Gaudi (#32)

b6f5584

* add gaudi installation readme * readme writeup * Create README_GAUDI.md * Update README.md * Update README_GAUDI.md * Update README.md * Update readmes

prarit pushed a commit to prarit/vllm that referenced this pull request Oct 18, 2024

Merge pull request opendatahub-io#32 from ROCm/gshtras-patch-1

69ce080

Update linear.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from vllm-project:main #32

[pull] main from vllm-project:main #32

pull bot commented May 21, 2024 •

edited

Loading

openshift-ci bot commented May 21, 2024

openshift-ci bot commented May 31, 2024

[pull] main from vllm-project:main #32

[pull] main from vllm-project:main #32

Conversation

pull bot commented May 21, 2024 • edited Loading

openshift-ci bot commented May 21, 2024

openshift-ci bot commented May 31, 2024

pull bot commented May 21, 2024 •

edited

Loading