Is there a javascript implementation of cl100k_base tokenizer?

OpenAI's new embeddings API uses the cl100k_base tokenizer. I'm calling it from the NodeJS client but I see no easy way of slicing my strings so they don't exceed the OpenAI limit of 8192 tokens.

This would be trivial if I could first encode the string, slice it to the limit, then decode it and send it to the API.



Comments

Popular posts from this blog

Today Walkin 14th-Sept

Spring Elasticsearch Operations

Hibernate Search - Elasticsearch with JSON manipulation