Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add evolla rebase main #36232

Open
wants to merge 82 commits into
base: main
Choose a base branch
from

Conversation

zhoubay
Copy link

@zhoubay zhoubay commented Feb 17, 2025

What does this PR do?

Fix issue #36231

Evolla is an advanced 80-billion-parameter protein-language generative model designed to decode the molecular language of proteins. It integrates information from protein sequences, structures, and user queries to generate precise and contextually nuanced insights into protein function. Trained on an unprecedented AI-generated dataset of 546 million protein question-answer pairs and 150 billion word tokens, Evolla significantly advances research in proteomics and functional genomics, providing expert-level insights and shedding light on the molecular logic encoded in proteins.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@amyeroberts @qubvel @ArthurZucker

@zucchini-nlp
Copy link
Member

@Cyrilvallez hey! yeah, we have InstructBlip with 2 tokenizers, one for inputs and another for Qformer module. We added a custom from/save pretrained to get the tokenizers. On the hub, the other tokenizer is saved in a separate file called qformer_tokenizer

@zhoubay
Copy link
Author

zhoubay commented Feb 19, 2025

Thank you @zucchini-nlp and @Cyrilvallez !

I've solved the problem by refering the code from instructblip! However, even I pass the test_processor, I still failed in tests_generate because of tests/models/video_llava/test_modeling_video_llava.py::VideoLlavaForConditionalGenerationModelTest::test_assisted_decoding_with_logits_to_keep which is passed in my environment.

What should I do to progress the process?

As for the modular issue @Cyrilvallez mentioned, is it compulsory to change it to that version? I'm reading the docs, and very favor to apply it to reduce my reused codes without hurting the performance!

@zucchini-nlp
Copy link
Member

Feel free to ignore tests from other models. The test failing for you was fixed yesterday, so rebasing might help. If there are other flaky tests with vision LLMs, let me know, I am working on decreasing flakiness to minimum :)

@zhoubay
Copy link
Author

zhoubay commented Feb 19, 2025

@zucchini-nlp Thanks! As for the instructblip, I've noticed a little flakiness with their implementing.

For the referenced lines, they removed qformer_tokenizer and then append it as the last item of self.attributes. However, this would work only because the qformer_tokenizer is originally the last of the self.attributes, see here.

# We modify the attributes so that only the tokenizer and image processor are saved in the main folder
qformer_present = "qformer_tokenizer" in self.attributes
if qformer_present:
    self.attributes.remove("qformer_tokenizer")

outputs = super().save_pretrained(save_directory, **kwargs)

if qformer_present:
    self.attributes += ["qformer_tokenizer"]

This might cause unexpected error when you change the order of the self.attributes to, say, attributes = ["image_processor", "qformer_tokenizer", "tokenizer"], and at the same time change the signiture of __init__ into def __init__(self, image_processor, qformer_tokenizer, tokenizer, num_query_tokens=None, **kwargs): (since the attributes should have same order of the __init__ function). If you change that way, there'll be an unexpected error.

My proposal is as my recent commit, add a variable to track where the popped out attribute is, as shown below:

# we modify the attributes so that only the text tokenizer are saved in the main folder
protein_tokenizer_present = "protein_tokenizer" in self.attributes
# find the correct position of it in the attributes list
protein_tokenizer_index = self.attributes.index("protein_tokenizer") if protein_tokenizer_present else None
if protein_tokenizer_present and protein_tokenizer_index is not None:
    self.attributes.remove("protein_tokenizer")

outputs = super().save_pretrained(save_directory, **kwargs)

if protein_tokenizer_present and protein_tokenizer_index is not None:
    self.attributes.insert(protein_tokenizer_index, "protein_tokenizer")

Hope this would help to make a more stable vision LLMs!

And I'll rebase the main branch and see how it going. Thank you again!

@zucchini-nlp
Copy link
Member

@zhoubay yeah, but we don't expect users to change the order of attributes unless they are doing some advanced experiments and adding new attributes. In that case we'll expect them to also change the custom saving and loading. Usually we just have a tokenizer + image processor, one for each attribute, so not much opportunity to experiment 😄

For Evolla feel free to do what you think works, I am not familiar with the model. You will get a review from Cyril on that later

@zhoubay
Copy link
Author

zhoubay commented Feb 19, 2025

@zucchini-nlp Thank you for your information and your collaboration! I'm looking forward to it!

@Cyrilvallez
Copy link
Member

Hey @zhoubay! Yes, we do enforce the use of modular now, especially as it is very much applicable in your case (a lot of code seem directly copy-pasted from other models)! In the end, it is a huge win-win scenario, as it greatly facilitates our review and maintenance process, and in turn ensures your model will get all potential future features/improvements we may roll out (which is otherwise not the case for all models, as model are supposed to stay more or less static in time) 🤗

Also, not sure what you did but you now have more than 150 changed files! 🙃 You should revert to previous state!

@zhoubay
Copy link
Author

zhoubay commented Feb 19, 2025

Hi @Cyrilvallez ! Thank you for your information! I'll convert my codes using modules!

As for the file changes, I think most of them are involved because I just rebased my branch to the current main branch. Could you show me some clues about how to change it back?

@zhoubay zhoubay force-pushed the add-evolla-rebase-main branch from 06ff185 to 9bf9d35 Compare February 19, 2025 10:28
@zhoubay
Copy link
Author

zhoubay commented Feb 19, 2025

Hi @Cyrilvallez, I hope this would help! Sorry I messed up the history of commit! I'll never never never do rebase blindly again 😭

@Cyrilvallez
Copy link
Member

No worries, we've all been there 😆 Let me know when you integrated the modular files and are ready for a review!

@zhoubay
Copy link
Author

zhoubay commented Feb 19, 2025

Sure! Thank you! I'll let you know when I finish integrating the modular files!

@Cyrilvallez
Copy link
Member

Just a heads-up, if you make 2 modular it will likely not work right now, but #36287 will fix it! I will merge asap

@zhoubay
Copy link
Author

zhoubay commented Feb 20, 2025

Hey @Cyrilvallez ! I've got a little question about modular! Since the modular relies on the codes from other people, what happens if they have changed the file making the performance differently?

For example, in my case, I inherited the llama code long long ago, like transformers:v4.30.2. And at that time, the signiture in LlamaDecoderLayer.forward has no position_embeddings. see here.

def forward(
    self,
    hidden_states: torch.Tensor,
    attention_mask: Optional[torch.Tensor] = None,
    position_ids: Optional[torch.LongTensor] = None,
    past_key_value: Optional[Tuple[torch.Tensor]] = None,
    output_attentions: Optional[bool] = False,
    use_cache: Optional[bool] = False,
) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:

And at that time, the decoder_layer won't accept position_embeddings as input (see here), because they compute position_embeddings inside LlamaAttention (see here).

  layer_outputs = decoder_layer(
      hidden_states,
      attention_mask=attention_mask,
      position_ids=position_ids,
      past_key_value=past_key_value,
      output_attentions=output_attentions,
      use_cache=use_cache,
  )

However, for now, the decoder_layer must accept the position_embeddings as input (see here).

  layer_outputs = decoder_layer(
      hidden_states,
      attention_mask=causal_mask,
      position_ids=position_ids,
      past_key_value=past_key_values,
      output_attentions=output_attentions,
      use_cache=use_cache,
      cache_position=cache_position,
      position_embeddings=position_embeddings,
      **flash_attn_kwargs,
  )

My point is, for my case, I just need to implement the EvollaModel.forward in fine-grain, so I have to copy all of the codes of LlamaModel.forward at that time of point, and I won't change anything in LlamaDecoderLayer, so I can inherit it.

So my code of running decoder_layer would like previous one. This is implemented by me, so when they (llama team) change detailed implementations, these lines wouldn't change.

  layer_outputs = decoder_layer(
      hidden_states,
      attention_mask=attention_mask,
      position_ids=position_ids,
      past_key_value=past_key_value,
      output_attentions=output_attentions,
      use_cache=use_cache,
  )

But over some time, the llama team changed the signiture of decoder_layer and position_embeddings are compulsory (see here), my codes would suffer because of inheriting their LlamaDecoderLayer and wouldn't work.

def forward(
    self,
    hidden_states: torch.Tensor,
    attention_mask: Optional[torch.Tensor] = None,
    position_ids: Optional[torch.LongTensor] = None,
    past_key_value: Optional[Cache] = None,
    output_attentions: Optional[bool] = False,
    use_cache: Optional[bool] = False,
    cache_position: Optional[torch.LongTensor] = None,
    position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,  # necessary, but kept here for BC
    **kwargs: Unpack[FlashAttentionKwargs],
) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:

I don't know if I made myself clear, if you have any question about what I said, please let me know!

Looking forward to your idea to solve this issue! Thank you in advance!

@zhoubay
Copy link
Author

zhoubay commented Feb 24, 2025

Hi @Cyrilvallez , I made conversion to modular for llama parts and I found there's a typo in the doc, as issue #36362 pointed out.

I fixed the typo in the doc, hope this wouldn't bother you!

Let me know if I should do more things!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants