Add THL-150 model architecture implementation #36407

ErebusTN · 2025-02-25T22:58:41Z

Description

Implement core THL-150 architecture with sliding window attention
Add configuration with RoPE embedding support
Include fast/slow tokenizers using BPE
Implement all model heads (CausalLM, Classification, QA, TokenClass)
Add dynamic RoPE scaling and GQA support
Validate attention mask generation for long sequences

Motivation

This implementation enables:

Efficient 32k context window processing
Flexible attention mechanisms (sliding window + full attention hybrid)
Compatibility with HF Transformers pipelines
Modern architecture features like Grouped Query Attention

Context

Built for long-context NLP tasks
Implements architecture similar to LLaMA with sliding window extensions
Designed for easy integration with existing HF ecosystems

Dependencies

Requires PyTorch >= 2.0
Recommends flash-attn >= 2.3 for optimal performance

Checklist

Updated documentation in docstrings
Verified config validation
Tests will be added in subsequent PR (current focus on core implementation)

Notes

Implements EGen License v0.1
Initial release focused on base architecture
Special thanks to the reviewer for consultation

Authored-by: @ErebusTN [[email protected]]

Core Components: - Added THL150Config with sliding window attention parameters - Implemented THL150Model with RoPE embeddings and GQA support - Created slow/fast tokenizers with BPE preprocessing - Included all model heads: * THL150ForCausalLM * THL150ForSequenceClassification * THL150ForTokenClassification * THL150ForQuestionAnswering - Added configuration validation and attention mask handling Key Features: - 32k context window support - Sliding window attention implementation - Dynamic RoPE scaling - Multi-query attention compatibility - HF Transformers integration ready Code Structure: src/transformers/models/thl_150/ ├── __init__.py ├── configuration_thl_150.py ├── modeling_thl_150.py ├── tokenization_thl_150.py └── tokenization_thl_150_fast.py Fixes Included: - Implemented missing loss functions in model heads - Fixed RoPE initialization fallback - Added proper BOS token handling - Resolved attention mask generation issues - Validated configuration imports License: EGen License v0.1

Rocketknight1 · 2025-02-26T13:46:41Z

hi @ErebusTN, we usually don't add architectures to the library until there's an existing pretrained model. Is there a relevant model repo or paper anywhere?

ErebusTN · 2025-02-26T13:54:45Z

Hi @Rocketknight1 i working on a model EGen V1 and it's my phD project and i going to update it frequently it's just in the first steps as v0.1 I still have a lot to work on to improve it so I haven't uploaded it yet but stay tuned the next update will be the first and I will upload the model to be open source

Rocketknight1 · 2025-02-26T14:28:47Z

Cool! One thing we'd advise is that models can be uploaded as custom code using the steps here. This will let you share the model immediately, and it'll work exactly the same as a library model (except that users will need to set trust_remote_code=True)

This can be a lot faster than actually getting a PR into transformers, and it's a good way to validate the model and get users, which will help speed up the PR later!

ErebusTN and others added 5 commits February 25, 2025 23:46

Merge branch 'main' into main

9865b66

verify setup.py

40b65f4

Merge branch 'main' of https://github.com/ErebusTN/transformers

5d0fcbf

Update model_doc by adding THL-150 model documentation

be06f5b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add THL-150 model architecture implementation #36407

Add THL-150 model architecture implementation #36407

ErebusTN commented Feb 25, 2025 •

edited

Loading

Rocketknight1 commented Feb 26, 2025

ErebusTN commented Feb 26, 2025 •

edited

Loading

Rocketknight1 commented Feb 26, 2025

Add THL-150 model architecture implementation #36407

Are you sure you want to change the base?

Add THL-150 model architecture implementation #36407

Conversation

ErebusTN commented Feb 25, 2025 • edited Loading

Description

Motivation

Context

Dependencies

Checklist

Notes

Rocketknight1 commented Feb 26, 2025

ErebusTN commented Feb 26, 2025 • edited Loading

Rocketknight1 commented Feb 26, 2025

ErebusTN commented Feb 25, 2025 •

edited

Loading

ErebusTN commented Feb 26, 2025 •

edited

Loading