Categorygithub.com/go-skynet/go-llama.cpp
modulepackage
0.0.0-20240314183750-6a8041ef6b46
Repository: https://github.com/go-skynet/go-llama.cpp.git
Documentation: pkg.go.dev

# README

Go Reference go-llama.cpp

LLama.cpp golang bindings.

The go-llama.cpp bindings are high level, as such most of the work is kept into the C/C++ code to avoid any extra computational cost, be more performant and lastly ease out maintenance, while keeping the usage as simple as possible.

Check out this and this write-ups which summarize the impact of a low-level interface which calls C functions from Go.

If you are looking for an high-level OpenAI compatible API, check out here.

Attention!

Since https://github.com/go-skynet/go-llama.cpp/pull/180 is merged, now go-llama.cpp is not anymore compatible with ggml format, but it works ONLY with the new gguf file format. See also the upstream PR: https://github.com/ggerganov/llama.cpp/pull/2398.

If you need to use the ggml format, use the https://github.com/go-skynet/go-llama.cpp/releases/tag/pre-gguf tag.

Usage

Note: This repository uses git submodules to keep track of LLama.cpp.

Clone the repository locally:

git clone --recurse-submodules https://github.com/go-skynet/go-llama.cpp

To build the bindings locally, run:

cd go-llama.cpp
make libbinding.a

Now you can run the example with:

LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14

Acceleration

OpenBLAS

To build and run with OpenBLAS, for example:

BUILD_TYPE=openblas make libbinding.a
CGO_LDFLAGS="-lopenblas" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run -tags openblas ./examples -m "/model/path/here" -t 14

CuBLAS

To build with CuBLAS:

BUILD_TYPE=cublas make libbinding.a
CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14

ROCM

To build with ROCM (HIPBLAS):

BUILD_TYPE=hipblas make libbinding.a
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CGO_LDFLAGS="-O3 --hip-link --rtlib=compiler-rt -unwindlib=libgcc -lrocblas -lhipblas" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -ngl 64 -t 32

OpenCL

BUILD_TYPE=clblas CLBLAS_DIR=... make libbinding.a
CGO_LDFLAGS="-lOpenCL -lclblast -L/usr/local/lib64/" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14

You should see something like this from the output when using the GPU:

ggml_opencl: selecting platform: 'Intel(R) OpenCL HD Graphics'
ggml_opencl: selecting device: 'Intel(R) Graphics [0x46a6]'
ggml_opencl: device FP16 support: true

GPU offloading

Metal (Apple Silicon)

BUILD_TYPE=metal make libbinding.a
CGO_LDFLAGS="-framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go build ./examples/main.go
cp build/bin/ggml-metal.metal .
./main -m "/model/path/here" -t 1 -ngl 1

Enjoy!

The documentation is available here and the full example code is here.

License

MIT

# Packages

No description provided by the author

# Functions

Create a new PredictOptions object with the given options.
Create a new PredictOptions object with the given options.
SetBatch sets the batch size.
SetContext sets the context size.
SetFrequencyPenalty sets the frequency penalty parameter, freq_penalty.
SetGPULayers sets the number of GPU layers to use to offload computation.
SetLogitBias sets the logit bias parameter.
No description provided by the author
No description provided by the author
SetMainGPU sets the main_gpu.
SetMemoryMap sets memory mapping.
SetMirostat sets the mirostat parameter.
SetMirostatETA sets the mirostat ETA parameter.
SetMirostatTAU sets the mirostat TAU parameter.
SetMlock sets the memory lock.
SetContext sets the context size.
No description provided by the author
No description provided by the author
SetNBatch sets the n_Batch.
No description provided by the author
No description provided by the author
No description provided by the author
SetKeep sets the number of tokens from initial prompt to keep.
SetPathPromptCache sets the session file to store the prompt cache.
SetPenalizeNL sets whether to penalize newlines or not.
SetPenalty sets the repetition penalty for text generation.
No description provided by the author
SetPredictionMainGPU sets the main_gpu.
SetPredictionTensorSplit sets the tensor split for the GPU.
SetPresencePenalty sets the presence penalty parameter, presence_penalty.
SetRepeat sets the number of times to repeat text generation.
Rope and negative prompt parameters.
No description provided by the author
SetSeed sets the random seed for sampling text generation.
SetStopWords sets the prompts that will stop predictions.
SetTailFreeSamplingZ sets the tail free sampling, parameter z.
SetTemperature sets the temperature value for text generation.
Set sets the tensor split for the GPU.
SetThreads sets the number of threads to use for text generation.
SetTokenCallback sets the prompts that will stop predictions.
SetTokens sets the number of tokens to generate.
SetTopK sets the value for top-K sampling.
SetTopP sets the value for nucleus sampling.
SetTypicalP sets the typicality parameter, p_typical.
WithGrammar sets the grammar to constrain the output of the LLM response.
No description provided by the author
No description provided by the author

# Variables

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Structs

No description provided by the author
No description provided by the author

# Type aliases

No description provided by the author
No description provided by the author