package
0.2.2
Repository: https://github.com/sugarme/tokenizer.git
Documentation: pkg.go.dev

# README

BPE model

This demonstrates how to train a tokenizer from scratch using BPE model.

It trains a tokenizer for Esperanto language from scratch using data from input folder and saves vocab and merges into model folder.

To run:

# run training
go run . -mode=train

# run test
go run . -mode=test