Categorygithub.com/go-skynet/go-llama.cpp

modulepackage

0.0.0-20240314183750-6a8041ef6b46

Repository: https://github.com/go-skynet/go-llama.cpp.git

Documentation: pkg.go.dev

# README

go-llama.cpp

LLama.cpp golang bindings.

The go-llama.cpp bindings are high level, as such most of the work is kept into the C/C++ code to avoid any extra computational cost, be more performant and lastly ease out maintenance, while keeping the usage as simple as possible.

Check out this and this write-ups which summarize the impact of a low-level interface which calls C functions from Go.

If you are looking for an high-level OpenAI compatible API, check out here.

Attention!

Since https://github.com/go-skynet/go-llama.cpp/pull/180 is merged, now go-llama.cpp is not anymore compatible with ggml format, but it works ONLY with the new gguf file format. See also the upstream PR: https://github.com/ggerganov/llama.cpp/pull/2398.

If you need to use the ggml format, use the https://github.com/go-skynet/go-llama.cpp/releases/tag/pre-gguf tag.

Usage

Note: This repository uses git submodules to keep track of LLama.cpp.

Clone the repository locally:

git clone --recurse-submodules https://github.com/go-skynet/go-llama.cpp

To build the bindings locally, run:

cd go-llama.cpp
make libbinding.a

Now you can run the example with:

LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14

Acceleration

OpenBLAS

To build and run with OpenBLAS, for example:

BUILD_TYPE=openblas make libbinding.a
CGO_LDFLAGS="-lopenblas" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run -tags openblas ./examples -m "/model/path/here" -t 14

CuBLAS

To build with CuBLAS:

BUILD_TYPE=cublas make libbinding.a
CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14

ROCM

To build with ROCM (HIPBLAS):

BUILD_TYPE=hipblas make libbinding.a
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CGO_LDFLAGS="-O3 --hip-link --rtlib=compiler-rt -unwindlib=libgcc -lrocblas -lhipblas" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -ngl 64 -t 32

OpenCL

BUILD_TYPE=clblas CLBLAS_DIR=... make libbinding.a
CGO_LDFLAGS="-lOpenCL -lclblast -L/usr/local/lib64/" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14

You should see something like this from the output when using the GPU:

ggml_opencl: selecting platform: 'Intel(R) OpenCL HD Graphics'
ggml_opencl: selecting device: 'Intel(R) Graphics [0x46a6]'
ggml_opencl: device FP16 support: true

GPU offloading

Metal (Apple Silicon)

BUILD_TYPE=metal make libbinding.a
CGO_LDFLAGS="-framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go build ./examples/main.go
cp build/bin/ggml-metal.metal .
./main -m "/model/path/here" -t 1 -ngl 1

Enjoy!

The documentation is available here and the full example code is here.

License

MIT

# Packages

examples

No description provided by the author

# Functions

NewModelOptions

Create a new PredictOptions object with the given options.

NewPredictOptions

Create a new PredictOptions object with the given options.

SetBatch

SetBatch sets the batch size.

SetContext

SetContext sets the context size.

SetFrequencyPenalty

SetFrequencyPenalty sets the frequency penalty parameter, freq_penalty.

SetGPULayers

SetGPULayers sets the number of GPU layers to use to offload computation.

SetLogitBias

SetLogitBias sets the logit bias parameter.

SetLoraAdapter

No description provided by the author

SetLoraBase

No description provided by the author

SetMainGPU

SetMainGPU sets the main_gpu.

SetMemoryMap

SetMemoryMap sets memory mapping.

SetMirostat

SetMirostat sets the mirostat parameter.

SetMirostatETA

SetMirostatETA sets the mirostat ETA parameter.

SetMirostatTAU

SetMirostatTAU sets the mirostat TAU parameter.

SetMlock

SetMlock sets the memory lock.

SetMMap

SetContext sets the context size.

SetModelSeed

No description provided by the author

SetMulMatQ

No description provided by the author

SetNBatch

SetNBatch sets the n_Batch.

SetNDraft

No description provided by the author

SetNegativePrompt

No description provided by the author

SetNegativePromptScale

No description provided by the author

SetNKeep

SetKeep sets the number of tokens from initial prompt to keep.

SetPathPromptCache

SetPathPromptCache sets the session file to store the prompt cache.

SetPenalizeNL

SetPenalizeNL sets whether to penalize newlines or not.

SetPenalty

SetPenalty sets the repetition penalty for text generation.

SetPerplexity

No description provided by the author

SetPredictionMainGPU

SetPredictionMainGPU sets the main_gpu.

SetPredictionTensorSplit

SetPredictionTensorSplit sets the tensor split for the GPU.

SetPresencePenalty

SetPresencePenalty sets the presence penalty parameter, presence_penalty.

SetRepeat

SetRepeat sets the number of times to repeat text generation.

SetRopeFreqBase

Rope and negative prompt parameters.

SetRopeFreqScale

No description provided by the author

SetSeed

SetSeed sets the random seed for sampling text generation.

SetStopWords

SetStopWords sets the prompts that will stop predictions.

SetTailFreeSamplingZ

SetTailFreeSamplingZ sets the tail free sampling, parameter z.

SetTemperature

SetTemperature sets the temperature value for text generation.

SetTensorSplit

Set sets the tensor split for the GPU.

SetThreads

SetThreads sets the number of threads to use for text generation.

SetTokenCallback

SetTokenCallback sets the prompts that will stop predictions.

SetTokens

SetTokens sets the number of tokens to generate.

SetTopK

SetTopK sets the value for top-K sampling.

SetTopP

SetTopP sets the value for nucleus sampling.

SetTypicalP

SetTypicalP sets the typicality parameter, p_typical.

WithGrammar

WithGrammar sets the grammar to constrain the output of the LLM response.

WithRopeFreqBase

No description provided by the author

WithRopeFreqScale

No description provided by the author

# Variables

Debug

No description provided by the author

DefaultModelOptions

No description provided by the author

DefaultOptions

No description provided by the author

EnabelLowVRAM

No description provided by the author

EnableEmbeddings

No description provided by the author

EnableF16KV

No description provided by the author

EnableF16Memory

No description provided by the author

EnableMLock

No description provided by the author

EnableNUMA

No description provided by the author

EnablePromptCacheAll

No description provided by the author

EnablePromptCacheRO

No description provided by the author

IgnoreEOS

No description provided by the author

# Structs

ModelOptions

No description provided by the author

PredictOptions

No description provided by the author

# Type aliases

ModelOption

No description provided by the author

PredictOption

No description provided by the author