Releases: ModelCloud/Tokenicer
Toke(n)icer v0.0.4
What's Changed
⚡ Now tokenicer instance dynamically inherits the native tokenizer.__class__
of tokenizer
passed in or loaded via our Tokenicer.load()
api.
⚡ CI now tests tokenizers from 64
models
- fix mpt pad token bug by @CL-ModelCloud in #24
- fix model_config bugs by @CL-ModelCloud in #25
- test code clean up by @CL-ModelCloud in #26
- Inherits PretrainedTokenizer by @Qubitium in #28
- loop & test all models by @CSY-ModelCloud in #30
Full Changelog: v0.0.2...v0.0.4
Toke(n)icer v0.0.3
What's Changed
Now tokenicer instance dynamically inherits the native tokenizer.__class__
of tokenizer
passed in or loaded via our Tokenicer.load()
api.
- fix mpt pad token bug by @CL-ModelCloud in #24
- fix model_config bugs by @CL-ModelCloud in #25
- test code clean up by @CL-ModelCloud in #26
- Inherits PretrainedTokenizer by @Qubitium in #28
Full Changelog: v0.0.2...v0.0.3
Toke(n)icer v0.0.2
What's Changed
⚡ Auto-fix models not setting padding_token
⚡ Auto-Fix models released with wrong padding_token: many models incorrectly use eos_token as pad_token which leads to subtle and hidden errors in post-training and inference when batching is used which is almost always.
⚡ Compatible with all HF Transformers recognized tokenizers
- Auto fix pad token by @CL-ModelCloud in #5
- Forward to Tokenizer by @CL-ModelCloud in #6
- read requirements.txt in setup.py by @CSY-ModelCloud in #7
- [CI] add tokenicer forward test by @CL-ModelCloud in #10
- add unit tests by @CSY-ModelCloud in #11
- refractor by @Qubitium in #8
- add deepseek_v3 map by @CL-ModelCloud in #15
New Contributors
- @CSY-ModelCloud made their first contribution in #1
- @Qubitium made their first contribution in #3
- @CL-ModelCloud made their first contribution in #5
Full Changelog: https://github.com/ModelCloud/Tokenicer/commits/v0.0.2