Categorygithub.com/milansuk/token_go
repositorypackage
0.0.0-20240531140427-09d0b8fc202a
Repository: https://github.com/milansuk/token_go.git
Documentation: pkg.go.dev

# README

Token_go

Simple & fast Encoder/Decoder for tiktoken vocabulary. Implemented from scratch(no regex library). Tokenizer is in vocab.go which has ~120 lines of code.

Performance

p50k_base.tiktoken:

  • Encoder: 4.625M toks/sec, 19.143 MB/sec, 1 thread
  • Decoder: 37.817M toks/sec, 156.516 MB/sec, 1 thread

cl100k_base.tiktoken:

  • Encoded 3.949M toks/sec, 16.748 MB/sec, 1 thread
  • Decoded 35.825M toks/sec, 151.952 MB/sec, 1 thread

Server(p50k_base)

  • 8x clients calls 100K times Encode("Hi there!" + index).
  • 800K total requests in 26.7sec => 30K req/sec.

Examples

Encode/Decode:

vb, err := NewVocab("p50k_base.tiktoken", true)

toks := vb.Encode("Hi there!")
fmt.Println(toks)

str := vb.Decode(toks)
fmt.Println(str)

Client/Server:

go NewServer("8090", true)   //run server in extra thread

client := NewClient("localhost:8090", "p50k_base")

toks, err := client.Encode([]byte("Hi there!"))
fmt.Println(toks)

text, err := client.Decode([]int{17250, 612, 0})
fmt.Println(text)

Build

Written in Go language(https://go.dev/doc/install). No dependencies.

git clone https://github.com/milansuk/token_go
cd token_go
go build
./token_go

Author

Milan Suk

Email: [email protected]

Twitter: https://twitter.com/milansuk/

Sponsor: https://github.com/sponsors/MilanSuk

Feel free to follow or contact me with any idea, question or problem.

Contributing

Your feedback and code are welcome!

For bug report or question, please use GitHub's Issues

SkyAlt is licensed under Apache v2.0 license. This repository includes 100% of the code.