Categorygithub.com/meinside/geektoken
modulepackage
0.0.2
Repository: https://github.com/meinside/geektoken.git
Documentation: pkg.go.dev

# README

geektoken

A BPE tokenizer for use with OpenAI's models,

ported and referenced from tiktoken and SharpToken.

requirements

Go standard library doesn't support PCRE, so it depends on go-pcre.

It requires libpcre3-dev or libpcre++-dev to be installed on the system.

usage

package main

import (
    "log"

    "github.com/meinside/geektoken"
)

func main() {
    //text := "Hellow, world!"
    text := "나는 우리나라가 세계에서 가장 아름다운 나라가 되기를 원한다. 가장 부강한 나라가 되기를 원하지 않는다."

    tokenizer, _ := geektoken.GetTokenizerWithModel(geektoken.ModelGPT35Turbo)
    if encoded, err := tokenizer.Encode(text, nil, nil); err == nil {
        log.Printf("encoded token: %+v, token count = %d", encoded, len(encoded))
    }
}

known issues / todos

  • Some encoded bytes differ from the ones from other BPE libraries
  • Add more tests
  • Optimize codes

license

MIT

# Functions

GetTokenizerWithEncoding returns a Tokenizer with given encoding name.
No description provided by the author

# Constants

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Structs

BPECore struct.
No description provided by the author
Tokenizer struct.

# Type aliases

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author