From b0b25911de49770bf6321d8b77205db0ed0edfa4 Mon Sep 17 00:00:00 2001 From: Qubitium-ModelCloud Date: Fri, 21 Feb 2025 15:16:20 +0800 Subject: [PATCH] prepare for 0.0.3 release (#29) * prepare for 0.0.3 release * Update README.md --- README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.md b/README.md index 925ecbf..9eac0f2 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,8 @@

## News +* 02/21/2025 [0.0.3](https://github.com/ModelCloud/Tokenicer/releases/tag/v0.0.3): Now `tokenicer` instance dynamically inherits the `native` `tokenizer.__class__` of tokenizer passed in or loaded via our `tokenicer.load()` api. + * 02/10/2025 [0.0.2](https://github.com/ModelCloud/Tokenicer/releases/tag/v0.0.2): 🤗 Initial release! ## Features: @@ -52,10 +54,13 @@ pip install -v . # With `Tokenicer.load()` from tokenicer import Tokenicer + +# Returns `Tokenicer` instance that inherits original `Qwen2TokenizerFast` type. tokenizer = Tokenicer.load('Qwen/Qwen2.5-0.5B-Instruct') # That's it! Toke(n)icer has auto-fixed Qwen2.5-0.5B-Instruct's incorrect `pad_token`. # Now this this model can be `trained` and `inferenced` correctly with `batch` and `masks`. +# Now use the new tokenizer like any normal HF PretrainedTokenizer(Fast) print(f"pad_token: `{tokenizer.pad_token}`") ```