sync wait before L1 and L2 flush (#3709)

Summary: Pull Request resolved: #3709 X-link: facebookresearch/FBGEMM#791 during flush, make sure we blocking wait on all the pending kernels before we do sync flush on L1 and L2 Reviewed By: q10, sryap Differential Revision: D69557437 fbshipit-source-id: 04d4a7850709f94055f8b2d5beab0fe622903378
pytorch · Feb 19, 2025 · 56d6e4a · 56d6e4a
1 parent eb7e7e0
commit 56d6e4a
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/fbgemm_gpu/fbgemm_gpu/tbe/ssd/training.py b/fbgemm_gpu/fbgemm_gpu/tbe/ssd/training.py
@@ -1831,12 +1831,12 @@ def flush(self) -> None:
 
         torch.cuda.current_stream().wait_stream(self.ssd_eviction_stream)
 
+        torch.cuda.synchronize()
         self.ssd_db.set(
             active_ids_cpu,
             active_weights_cpu,
             torch.tensor([active_ids_cpu.numel()]),
         )
-
         self.ssd_db.flush()
 
     def prepare_inputs(