-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anthropic models #27
Comments
Hi @enricoros! Looking at the Here is an example code using const util = require("util");
const { Tokenizer } = require("tokenizers");
let tokenizer = Tokenizer.fromFile("claude-v1-tokenization.json");
const encode = util.promisify(tokenizer.encode.bind(tokenizer));
const decode = util.promisify(tokenizer.decode.bind(tokenizer));
async function main() {
const encoded = await encode("Hello from Anthropic!");
console.log({ encoded: encoded.getIds() });
const decoded = await decode(
encoded.getIds(),
true // skipSpecialTokens: true
);
console.log({ decoded });
}
main(); However, it does seem that the |
It's cool that afaict the core is also in rust? |
@enricoros Some progress (with experimental JSON configs for @dqbd/tiktoken) can be seen here: dqbd/tiktokenizer#5 Demo of Tiktokenizer playground: https://tiktokenizer-git-custom-bpe-models-dqbd.vercel.app/ |
Very interesting approach, and I love the playground too. Thanks for the update! |
Anthropic has released the models for research, and has opened their code on GitHub:
https://github.com/anthropics/anthropic-sdk-python/blob/main/anthropic/tokenizer.py
In this repo, there's a link to a file:
CLAUDE_TOKENIZER_REMOTE_FILE = "https://public-json-tokenization-0d8763e8-0d7e-441b-a1e2-1c73b8e79dc3.storage.googleapis.com/claude-v1-tokenization.json"
Can this help in extending Tiktoken to support 'claude-v1' models?
The text was updated successfully, but these errors were encountered: