2022-12-19

Is there a javascript implementation of cl100k_base tokenizer?

OpenAI's new embeddings API uses the cl100k_base tokenizer. I'm calling it from the NodeJS client but I see no easy way of slicing my strings so they don't exceed the OpenAI limit of 8192 tokens.

This would be trivial if I could first encode the string, slice it to the limit, then decode it and send it to the API.



No comments:

Post a Comment