# README
geektoken
A BPE tokenizer for use with OpenAI's models,
ported and referenced from tiktoken and SharpToken.
requirements
Go standard library doesn't support PCRE, so it depends on go-pcre.
It requires libpcre3-dev
or libpcre++-dev
to be installed on the system.
usage
package main
import (
"log"
"github.com/meinside/geektoken"
)
func main() {
//text := "Hellow, world!"
text := "나는 우리나라가 세계에서 가장 아름다운 나라가 되기를 원한다. 가장 부강한 나라가 되기를 원하지 않는다."
tokenizer, _ := geektoken.GetTokenizerWithModel(geektoken.ModelGPT35Turbo)
if encoded, err := tokenizer.Encode(text, nil, nil); err == nil {
log.Printf("encoded token: %+v, token count = %d", encoded, len(encoded))
}
}
known issues / todos
- Some encoded bytes differ from the ones from other BPE libraries
- Add more tests
- Optimize codes
license
MIT
# Functions
GetTokenizerWithEncoding returns a Tokenizer with given encoding name.
No description provided by the author
# Constants
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
# Type aliases
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author