Skip to content

Commit

Permalink
Fix block wise fp8 torch compile (#3232)
Browse files Browse the repository at this point in the history
  • Loading branch information
ispobock authored Jan 31, 2025
1 parent 734daed commit c02e313
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions python/sglang/srt/layers/quantization/fp8.py
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,13 @@ def process_weights_after_loading(self, layer: Module) -> None:
weight_scale, requires_grad=False
)
layer.input_scale = None
else:
layer.weight = torch.nn.Parameter(
layer.weight.data, requires_grad=False
)
layer.weight_scale_inv = torch.nn.Parameter(
layer.weight_scale_inv.data, requires_grad=False
)
return
layer.weight = torch.nn.Parameter(layer.weight.data, requires_grad=False)
# If checkpoint not serialized fp8, quantize the weights.
Expand Down

0 comments on commit c02e313

Please sign in to comment.