modulepackage
0.0.0-20210201114114-67576d341eb8
Repository: https://github.com/go-nlp/bpe.git
Documentation: pkg.go.dev
# README
bpe_prep
bpe
# Functions
Learn learns an Encoder from the given data in the corpus in the input.
MarkEOW is a modifier to inform the Learn function whether the end of the word should be marked.
P constructs a new Pair.
Pairs returns the Pairs of runes found in a word (as string).
PairsRunes returns the Pairs of runes found in a word (as []rune).
PairsRunesWithReuse is the PairsRunes function, but with a buffer passed in specifically.
PairStats returns the occurence frequencies of pairs of runes.
PairsWithReuse is the Pairs function, but with a buffer passed in specifically.
PreBPE is a function that provides mapping for runes.
SimpleTokenizer is a simple tokenizer of text.
WithReuse uses the given (usually pre-allocated) buffer of Pairs.
# Structs
Encoder represents a state that may be used to encode a word.
Pair is a pair of runes - it is an immutable tuple.
Statistics is the statistics of a corpus, used to figure out which pairs to replace.