package
0.0.0-20220715001353-00e0c845ae1c
Repository: https://github.com/cdipaolo/goml.git
Documentation: pkg.go.dev

# README

Clustering Algorithms (Supervised and Unsupervised)

import "github.com/cdipaolo/goml/cluster"

GoDoc

This part of the goml package implements clustering algorithms, unsupervised and supervised, to give the user options about what model they want to use.

implemented models

  • k-means clustering
    • Uses k-means++ instantiation for more reliable clustering (this paper outlines the method)
    • Both online and batch versions of the algorithm
    • Online version implements the algorithm discussed in this paper
  • triangle inequality accelerated k-means clusering
    • Implements the algorithm described in this paper by Charles Elkan of the University of California, San Diego to use upper and lower bounds on distances to clusters across iterations to dramatically reduce the number of (potentially really expensive) distance calculations made by the algorithm.
    • Uses k-means++ instantiation for more reliable clustering (this paper outlines the method)
  • n-nearest-neighbors clustering
    • Can use any distance metric, with L-p Norm, Euclidean Distance, and Manhattan Distance pre-defined within the goml/base package

example k-means model usage

This code produces four clusters (as expected,) which result in the following plot (made with ggplot2).

Clusterd By K

gaussian := [][]float64{}
for i := 0; i < 40; i++ {
	x := rand.NormFloat64() + 4
	y := rand.NormFloat64()*0.25 + 5
	gaussian = append(gaussian, []float64{x, y})
}
for i := 0; i < 66; i++ {
	x := rand.NormFloat64()
	y := rand.NormFloat64() + 10
	gaussian = append(gaussian, []float64{x, y})
}
for i := 0; i < 100; i++ {
	x := rand.NormFloat64()*3 - 10
	y := rand.NormFloat64()*0.25 - 7
	gaussian = append(gaussian, []float64{x, y})
}
for i := 0; i < 23; i++ {
	x := rand.NormFloat64() * 2
	y := rand.NormFloat64() - 1.25
	gaussian = append(gaussian, []float64{x, y})
}

model := NewKMeans(4, 15, gaussian)

if model.Learn() != nil {
	panic("Oh NO!!! There was an error learning!!")
}

// now you can predict like normal!
guess, err := model.Predict([]float64{-3, 6})
if err != nil {
	panic("prediction error")
}

// or if you want to get the clustering
// results from the data
results := model.Guesses()

// you can also concat that with the
// training set and save it to a file
// (if you wanted to plot it or something)
err = model.SaveClusteredData("/tmp/.goml/KMeansResults.csv")
if err != nil {
	panic("file save error")
}

// you can also persist the model to a
// file
err = model.PersistToFile("/tmp/.goml/KMeans.json")
if err != nil {
	panic("file save error")
}

// and also restore from file (at a
// later time if you want)
err = model.RestoreFromFile("/tmp/.goml/KMeans.json")
if err != nil {
	panic("file save error")
}

# Functions

NewKMeans returns a pointer to the k-means model, which clusters given inputs in an unsupervised manner.
NewKNN returns a pointer to the k-means model, which clusters given inputs in an unsupervised manner.
NewTriangleKMeans returns a pointer to the k-means model, which clusters given inputs in an unsupervised manner.

# Structs

KMeans implements the k-means unsupervised clustering algorithm.
KNN implements the KNN algorithm for classification, where an input is classified by finding the K nearest (by some distance metric) data points, and taking a vote based on those.
OnlineParams is used to pass optional parameters in to creating a new K-Means model if you want to learn using the online version of the model.
TriangleKMeans implements the k-means unsupervised clustering algorithm sped up to use the Triangle Inequality to reduce the number of reduntant distance calculations between datapoints and clusters.