Skip to content

Releases: ModelCloud/Tokenicer

Toke(n)icer v0.0.4

21 Feb 09:36
dd95bdf
Compare
Choose a tag to compare

What's Changed

⚡ Now tokenicer instance dynamically inherits the native tokenizer.__class__ of tokenizer passed in or loaded via our Tokenicer.load() api.
⚡ CI now tests tokenizers from 64 models

Full Changelog: v0.0.2...v0.0.4

Toke(n)icer v0.0.3

21 Feb 07:18
b0b2591
Compare
Choose a tag to compare

What's Changed

Now tokenicer instance dynamically inherits the native tokenizer.__class__ of tokenizer passed in or loaded via our Tokenicer.load() api.

Full Changelog: v0.0.2...v0.0.3

Toke(n)icer v0.0.2

10 Feb 13:41
efc81a2
Compare
Choose a tag to compare

What's Changed

⚡ Auto-fix models not setting padding_token
⚡ Auto-Fix models released with wrong padding_token: many models incorrectly use eos_token as pad_token which leads to subtle and hidden errors in post-training and inference when batching is used which is almost always.
⚡ Compatible with all HF Transformers recognized tokenizers

New Contributors

Full Changelog: https://github.com/ModelCloud/Tokenicer/commits/v0.0.2