Some question on code of LISA #897

milong26 · 2024-09-19T07:56:57Z

I have known that LISA's core code in src\lmflow\pipeline\finetuner.py, mainly in class DynamicLayerActivationCallback. I read it with Algorithm 1 Layerwise Importance Sampling AdamW (LISA) in paper aside.

So where is step2: Freeze all layers except the embedding and language modeling head layer? I can only find def freeze_all_layers(self) in class DynamicLayerActivationCallback, not excluding embedding and head layer

And i'm curious on the notation k in paper Algorithm 1:
step 4: Run AdamW for K iterations with ${η_t}_{t=ik}^{ik=k-1}$ Is the k same as K ?

My english is bad so tell me if any understanding problem,thanks for answering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some question on code of LISA #897

Some question on code of LISA #897

milong26 commented Sep 19, 2024

Some question on code of LISA #897

Some question on code of LISA #897

Comments

milong26 commented Sep 19, 2024