package
0.0.0-20220910131548-a5a31e8a8009
Repository: https://github.com/donomii/racketprogs.git
Documentation: pkg.go.dev

# Packages

No description provided by the author

# README

trigrammr

Explore trigram databases

trigrammr is a highly experimental tool for exploring data. Using trigrammr, you can navigate your data, explore links between data and reveal hidden connections.

Get it

go get -u github.com/donomii/trigrammr

Build it

go build cmd/trigrammr-import-csv/trigrammr-import-csv.go
go build cmd/trigrammr-client/trigrammr-client.go
go build cmd/trigrammr-import-book/trigrammr-import-book.go 

Install it

go install github.com/donomii/trigrammr/...

Import data

cat data.csv | trigrammr-import-csv mydb.sqlite

or

cat example.txt | ./trigrammr-import-book mydb.sqlite

trigrammr-import-csv reads the data from STDIN and saves it in mydb.sqlite.

Large example database

You can download a large example database by typing

bash download_wikipedia.bash

You will need to install BASH, PERL and CURL for this to work. They come pre-installed on Linux and MacOSX.

Explore your database

Load a database

Before you can search for anything, you must load the data from a database.

> ./trigrammr-client mydb.sqlite

or

> ./trigrammr-client
» db mydb
Opening: mydb.sqlite

Get a summary of your database

» summary
map[the:1 cat:1 sat:1 on:1]

Print a summary of the top ten words in the database.

Explore links

Note that all words are imported as lower-case.

Search for the word "the"

» the

[[cat]]

Here, the [[cat]] is the only known word that follows [[the]]. Trigrammr displays every bigram (2-gram) that starts with [[the]].

Now typing "cat" displays every trigram (3-gram) that matches [[the]] [[cat]].

» cat

Searching for  [the cat]

[[sat]]

Scoring

Trigrammr can also assist with text analysis. The "score" command will take a sentence and print out the number of trigrams that match each word.

» score The cat sat on the mat
Score:
the(1) cat(2) sat(3) on(3) the(2) mat(1)

Scoring prints out the number of matches for a sentence. The sentence is broken up into trigrams, and each trigram is looked up in the database. If the trigram exists, we increase the score of each word from the trigram.

Note that the end words can only score 1, while "sat" scores 3, because it is part of three trigrams.

» score The cat stood on the mat

Score:
the(0) cat(0) stood(0) on(1) the(1) mat(1)

This scores much worse, because the word "stood" is not in the database, and any trigram that contains it will fail.

Commands

Trigrammr has several commands to help you navigate the database.

db NAME

Loads database NAME.sqlite from disk

summary

Prints out the top ten words in the database, to help you get started on a search

short

Trims the output if it gets too long

long

Prints the entire search output, no matter how many pages it might take up

dump

Trigrammer stores the original record when it is imported. "dump" will print out all records that match the current trigram.

reset

Resets the search (to nothing)

..

(Two dots) moves back to the previous search

.

(One dot) drops the first word in the current search trigram.

TODO

  • Fix autocomplete (add builtin commands, finish on dump etc)
  • Count trigrams for markov models
  • Convert download script to golang