package
1.0.26
Repository: https://github.com/askasoft/pango.git
Documentation: pkg.go.dev

# README

ldt

ldt is library to automatically detect language of texts for Go programming language.

This package was created by abadojack. Forked by the pango project in order to incorporate bugfixes and new features.

Natural language detection for Go.

Features

  • Supports 84 languages
  • 100% written in Go
  • No external dependencies
  • Fast
  • Recognizes not only a language, but also a script (Latin, Cyrillic, etc)

Getting started

Installation:

    go get -u github.com/askasoft/pango

Simple usage example:

package main

import (
	"fmt"

	"github.com/askasoft/pango/ldt"
)

func main() {
	info := ldt.Detect("Foje funkcias kaj foje ne funkcias")
	fmt.Println("Language:", info.Lang.String(), " Confidence: ", info.Confidence)
}

With Options:

package main

import (
	"fmt"

	"github.com/askasoft/pango/ldt"
)

func main() {
	// Excludes
	options := ldt.Options{
		Excludes: []ldt.Lang{ldt.Ydd},
	}

	info := ldt.DetectWithOptions("האקדמיה ללשון העברית", options)

	fmt.Println("Language:", info.Lang.String())

	// Includes
	options1 := ldt.Options{
		Includes: []ldt.Lang{ldt.Epo, ldt.Ukr},
	}

	info = ldt.DetectWithOptions("Mi ne scias", options1)
	fmt.Println("Language:", info.Lang.String())
}

Requirements

Go 1.8 or higher

How does it work?

How does the language recognition work?

The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.

How IsReliable calculated?

It is based on the following factors:

  • How many unique trigrams are in the given text
  • How big is the difference between the first and the second(not returned) detected languages? This metric is called rate in the code base.

Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:

Language recognition whatlang rust

For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.

# Functions

No description provided by the author
CodeToLang gets enum by ISO 639-3 code as a string.
Detect detects the language info of the given text.
DetectLang detects only the language of the given text.
DetectLangWithOptions detects only the language of the given text with the provided options.
DetectWithOptions detects the language info of the given text with the provided options.
LangToString converts enum into ISO 639-3 code as a string.
LangToStringShort converts enum into ISO 639-1 code as a string.

# Constants

Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
ReliableConfidenceThreshold is confidence rating that has to be succeeded for the language detection to be considered reliable.
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...
Aka ...

# Variables

Langs represents a map of Lang to language name.

# Structs

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Info represents a full outcome of language detection.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Options represents options that can be set when detecting a language or/and script such blacklisting languages to skip checking.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Interfaces

No description provided by the author

# Type aliases

No description provided by the author
Lang represents a language following ISO 639-3 standard.