You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have known that LISA's core code in src\lmflow\pipeline\finetuner.py, mainly in class DynamicLayerActivationCallback. I read it with Algorithm 1 Layerwise Importance Sampling AdamW (LISA) in paper aside.
So where is step2: Freeze all layers except the embedding and language modeling head layer? I can only find def freeze_all_layers(self) in class DynamicLayerActivationCallback, not excluding embedding and head layer
And i'm curious on the notation k in paper Algorithm 1:
step 4: Run AdamW for K iterations with ${η_t}_{t=ik}^{ik=k-1}$ Is the k same as K ?
My english is bad so tell me if any understanding problem,thanks for answering
The text was updated successfully, but these errors were encountered:
I have known that LISA's core code in src\lmflow\pipeline\finetuner.py, mainly in class DynamicLayerActivationCallback. I read it with Algorithm 1 Layerwise Importance Sampling AdamW (LISA) in paper aside.
So where is step2: Freeze all layers except the embedding and language modeling head layer? I can only find def freeze_all_layers(self) in class DynamicLayerActivationCallback, not excluding embedding and head layer
And i'm curious on the notation k in paper Algorithm 1:${η_t}_{t=ik}^{ik=k-1}$ Is the k same as K ?
step 4: Run AdamW for K iterations with
My english is bad so tell me if any understanding problem,thanks for answering
The text was updated successfully, but these errors were encountered: