Question about the difference of inference results between NPU and GPU #31

AIR-hl · 2025-02-11T02:59:29Z

The inference results on the GPU are significantly different from those on the NPU. We used the same code and set temperature=0 to ensure reproducibility. Additionally, the speed on NPU is significant lower than A800, even 4090. I want to know if this is normal?

vllm: 0.7.2
vllm-ascend: latest
GPU: A800, 4090
NPU: 910b3

A800:

910b3:

One of inference result on 910b3 occured repeat, it never happend on other deivces.

The text was updated successfully, but these errors were encountered:

wangxiyuan · 2025-02-11T03:08:58Z

Hi, vllm-ascend is still in progress. There are still some PRs need merge into vllm and vllm-ascend. If you hit the error in multi-card env, it's a known issue. See #16.

If you hit another error, please fill up with more content.

If you face the performance problem, we're working on it. Please wait more. Thanks.

We'll make vllm-ascend avaliable ASAP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the difference of inference results between NPU and GPU #31

Question about the difference of inference results between NPU and GPU #31

AIR-hl commented Feb 11, 2025

wangxiyuan commented Feb 11, 2025 •

edited

Loading

Question about the difference of inference results between NPU and GPU #31

Question about the difference of inference results between NPU and GPU #31

Comments

AIR-hl commented Feb 11, 2025

wangxiyuan commented Feb 11, 2025 • edited Loading

wangxiyuan commented Feb 11, 2025 •

edited

Loading