Categorygithub.com/algao1/basically
repositorypackage
0.3.0
Repository: https://github.com/algao1/basically.git
Documentation: pkg.go.dev

# Packages

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# README

basically

basically is a Go implementation of the TextRank and Biased TextRank algorithm built upon prose. It provides fully unsupervised methods for keyword extraction and focused text summarization, along with additional quality of life features over the original implementations.

Methods

First, the document is parsed into its constituent sentences and words using a sentence segmenter and tokenizer. Sentiment values are assigned to individual sentences, and tokens are annotated with part of speech tags.

For keyword extraction, all words that pass the syntactic filter are added to a undirected, weighted graph, and an edge is added between words that co-occur within a window of $N$ words. The edge weight is set to be inversely proportional to the distance between the words. Each vertex is assigned an initial score of 1, and the following ranking algorithm is run on the graph

During post-processing, adjacent keywords are collapsed into a multi-word keyword, and the top keywords are then extracted.

For sentence extraction, every sentence is added to a undirected, weighted graph, with an edge between sentences that share common content. The edge weight is set simply as the number of common tokens between the lexical representations of the two sentences. Each vertex is also assigned an initial score of 1, and a bias score based on the focus text, before the following ranking algorithm is run on the graph

The top weighted sentences are then selected and sorted in chronological order to form a summary.

Further information on the two algorithms can be found here and here.

Installation

go get https://github.com/algao1/basically

Usage

// Instantiate a document for every text.
doc, err := document.Create(text, &btrank.BiasedTextRank{}, &trank.KWTextRank{}, &parser.Parser{})
if err != nil {
	log.Fatal(err)
}

// Summarize the document into 7 sentences, with no threshold value, and with respect to a focus sentence.
sents, err := document.Summarize(7, 0, focus)
if err != nil {
	log.Fatal(err)
}

for _, sent := range sents {
	fmt.Printf("[%.2f, %.2f] %s\n", sum.Score, sum.Sentiment, sum.Raw)
}

// Highlight the top 7 keywords in the document, with multi-word keywords enabled.
words, err := document.Highlight(7, true)
if err != nil {
	log.Fatal(err)
}

for _, word := range words {
	fmt.Println(word.Weight, word.Word)
}

Optionally, we can also specify configurations such as retaining conjunctions at the beginning of sentences for our summary

doc, err := document.Create(text, &btrank.BiasedTextRank{}, &trank.KWTextRank{}, &parser.Parser{}, document.WithConjunctions())

Things I Learned

This project was started to better familiarize myself with Go, and some best practices

Next Steps

Currently the project is more or less complete, with no major foreseeable updates. However, I'll be periodically updating the library as things come to mind.