package
0.2.2
Repository: https://github.com/sugarme/tokenizer.git
Documentation: pkg.go.dev

# Functions

BytesToChar converts a given range from bytes to `char`.
CharToBytes converts a given range from `char` to bytes.
IsBertPunctuation checks whether an input rune is a BERT punctuation.
IsBertWhitespace checks whether an input rune is a BERT whitespace.
isChinese validates that rune c is in the CJK range according to BERT spec.
IsPunctuation returns whether input rune is a punctuation or not.
IsWhitespace checks whether an input rune is a whitespace.
Lowercase creates a lowercase normalizer.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
NewNormalizedFrom creates a Normalized instance from string input.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
RangeOf returns a range of normalized string It will return empty string if input range is out of bound.
WithBertNormalizer creates normalizer with BERT normalization features.
No description provided by the author
No description provided by the author
WithUnicodeNormalizer creates normalizer with one of unicode NFD, NFC, NFKD, or NFKC normalization feature.

# Constants

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Structs

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Invert the `is_match` flags for the wrapped Pattern.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
A `NormalizedString` takes care of processing an "original" string to modify it and obtain a "normalized" string.
OfsetsMatch contains a combination of Offsets position and a boolean indicates whether this is a match or not.
No description provided by the author
No description provided by the author
Prepend creates a normalizer that strip the normalized string inplace.
Range is a slice of indexes on either normalized string or original string It is INCLUSIVE start and EXCLUSIVE end.
No description provided by the author
No description provided by the author
RunePattern is a wrapper of primitive rune so that it can implement `Pattern` interface.
Sequence wraps a slice of normalizers to normalize string in sequence.
String is a wrapper of primitive string so that it can implement `Pattern` interface.
No description provided by the author
No description provided by the author
No description provided by the author

# Interfaces

No description provided by the author
Pattern is used to split a NormalizedString.

# Type aliases

No description provided by the author
RangeType is a enum like representing which string (original or normalized) then range indexes on.
NormFn is a convenient function type for applying on each `char` of normalized string.
No description provided by the author
PatternFn is a func type to apply pattern.
Enum of different patterns that Replace can use.
SplitDelimiterBehavior is a enum-like type .