Categorygithub.com/go-nlp/corpus
modulepackage
0.0.0-20210119014218-4c7443a428a0
Repository: https://github.com/go-nlp/corpus.git
Documentation: pkg.go.dev

# README

corpus

Corpus provides vocabulary management data structures and utilities

# Functions

Construct creates a Corpus given the construction options.
FromDict is a construction option to take a map[string]int where the int represents the word ID.
FromDictWithFreq is like FromDict, but also has a frequency.
FromTextCorpus is a utility function to take in a text file, and return a Corpus.
New creates a new *Corpus.
ToDict returns a marshalable dict.
ToDictWithFreq returns a simple marshalable type.
ViterbiSplit is a Viterbi algorithm for splitting words given a corpus.
WithOrderedWords creates a Corpus with the given word order.
WithSize preallocates all the things in Corpus.
WithWords creates a corpus from a word list.

# Variables

NumberWords was generated with this python code numberWords = {} simple = '''zero one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen twenty'''.split() for i, word in zip(xrange(0, 20+1), simple): numberWords[word] = i tense = '''thirty forty fifty sixty seventy eighty ninety hundred'''.split() for i, word in zip(xrange(30, 100+1, 10), tense): numberWords[word] = i larges = '''thousand million billion trillion quadrillion quintillion sextillion septillion'''.split() for i, word in zip(xrange(3, 24+1, 3), larges): numberWords[word] = 10**i */.

# Structs

Corpus is a data structure holding the relevant metadata and information for a corpus of text.

# Type aliases

ConsOpt is a construction option for manual creation of a Corpus.