modulepackage
0.0.0-20210119014218-4c7443a428a0
Repository: https://github.com/go-nlp/corpus.git
Documentation: pkg.go.dev
# README
corpus
Corpus provides vocabulary management data structures and utilities
# Functions
Construct creates a Corpus given the construction options.
FromDict is a construction option to take a map[string]int where the int represents the word ID.
FromDictWithFreq is like FromDict, but also has a frequency.
FromTextCorpus is a utility function to take in a text file, and return a Corpus.
New creates a new *Corpus.
ToDict returns a marshalable dict.
ToDictWithFreq returns a simple marshalable type.
ViterbiSplit is a Viterbi algorithm for splitting words given a corpus.
WithOrderedWords creates a Corpus with the given word order.
WithSize preallocates all the things in Corpus.
WithWords creates a corpus from a word list.
# Variables
NumberWords was generated with this python code
numberWords = {}
simple = '''zero one two three four five six seven eight nine ten eleven twelve
thirteen fourteen fifteen sixteen seventeen eighteen nineteen
twenty'''.split()
for i, word in zip(xrange(0, 20+1), simple):
numberWords[word] = i
tense = '''thirty forty fifty sixty seventy eighty ninety hundred'''.split()
for i, word in zip(xrange(30, 100+1, 10), tense):
numberWords[word] = i
larges = '''thousand million billion trillion quadrillion quintillion sextillion septillion'''.split()
for i, word in zip(xrange(3, 24+1, 3), larges):
numberWords[word] = 10**i
*/.
# Structs
Corpus is a data structure holding the relevant metadata and information for a corpus of text.
# Type aliases
ConsOpt is a construction option for manual creation of a Corpus.