Categorygithub.com/go-nlp/bpe
modulepackage
0.0.0-20210201114114-67576d341eb8
Repository: https://github.com/go-nlp/bpe.git
Documentation: pkg.go.dev

# README

bpe_prep

bpe

# Functions

Learn learns an Encoder from the given data in the corpus in the input.
MarkEOW is a modifier to inform the Learn function whether the end of the word should be marked.
P constructs a new Pair.
Pairs returns the Pairs of runes found in a word (as string).
PairsRunes returns the Pairs of runes found in a word (as []rune).
PairsRunesWithReuse is the PairsRunes function, but with a buffer passed in specifically.
PairStats returns the occurence frequencies of pairs of runes.
PairsWithReuse is the Pairs function, but with a buffer passed in specifically.
PreBPE is a function that provides mapping for runes.
SimpleTokenizer is a simple tokenizer of text.
WithReuse uses the given (usually pre-allocated) buffer of Pairs.

# Structs

Encoder represents a state that may be used to encode a word.
Pair is a pair of runes - it is an immutable tuple.
Statistics is the statistics of a corpus, used to figure out which pairs to replace.

# Type aliases

FuncOpt is an option to modify the behaviours of a function.
Tokenizer is a function that tokenizes a string.