repositorypackage
0.0.0-20240531140427-09d0b8fc202a
Repository: https://github.com/milansuk/token_go.git
Documentation: pkg.go.dev
# README
Token_go
Simple & fast Encoder/Decoder for tiktoken vocabulary. Implemented from scratch(no regex library). Tokenizer is in vocab.go which has ~120 lines of code.
Performance
p50k_base.tiktoken:
- Encoder: 4.625M toks/sec, 19.143 MB/sec, 1 thread
- Decoder: 37.817M toks/sec, 156.516 MB/sec, 1 thread
cl100k_base.tiktoken:
- Encoded 3.949M toks/sec, 16.748 MB/sec, 1 thread
- Decoded 35.825M toks/sec, 151.952 MB/sec, 1 thread
Server(p50k_base)
- 8x clients calls 100K times Encode("Hi there!" + index).
- 800K total requests in 26.7sec => 30K req/sec.
Examples
Encode/Decode:
vb, err := NewVocab("p50k_base.tiktoken", true)
toks := vb.Encode("Hi there!")
fmt.Println(toks)
str := vb.Decode(toks)
fmt.Println(str)
Client/Server:
go NewServer("8090", true) //run server in extra thread
client := NewClient("localhost:8090", "p50k_base")
toks, err := client.Encode([]byte("Hi there!"))
fmt.Println(toks)
text, err := client.Decode([]int{17250, 612, 0})
fmt.Println(text)
Build
Written in Go language(https://go.dev/doc/install). No dependencies.
git clone https://github.com/milansuk/token_go
cd token_go
go build
./token_go
Author
Milan Suk
Email: [email protected]
Twitter: https://twitter.com/milansuk/
Sponsor: https://github.com/sponsors/MilanSuk
Feel free to follow or contact me with any idea, question or problem.
Contributing
Your feedback and code are welcome!
For bug report or question, please use GitHub's Issues
SkyAlt is licensed under Apache v2.0 license. This repository includes 100% of the code.