Categorygithub.com/appleboy/go-whisper
repositorypackage
1.3.0
Repository: https://github.com/appleboy/go-whisper.git
Documentation: pkg.go.dev

# Packages

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# README

go-whisper

Docker Image for Speech-to-Text using ggerganov/whisper.cpp.

This Docker image provides a ready-to-use environment for converting speech to text using the ggerganov/whisper.cpp library. The whisper.cpp library is an open-source project that enables efficient and accurate speech recognition. By utilizing this Docker image, users can easily set up and run the speech-to-text conversion process without worrying about installing dependencies or configuring the system.

The Docker image includes all necessary components and dependencies, ensuring a seamless experience for users who want to leverage the power of the whisper.cpp library for their speech recognition needs. Simply pull the Docker image, run the container, and start converting your audio files into text with minimal effort.

In summary, this Docker image offers a convenient and efficient way to utilize the ggerganov/whisper.cpp library for speech-to-text conversion, making it an excellent choice for those looking to implement speech recognition in their projects.

OpenAI's Whisper models converted to ggml format

See the Available models.

ModelDiskMemSHA
tiny75 MB~390 MBbd577a113a864445d4c299885e0cb97d4ba92b5f
tiny.en75 MB~390 MBc78c86eb1a8faa21b369bcd33207cc90d64ae9df
base142 MB~500 MB465707469ff3a37a2b9b8d8f89f2f99de7299dac
base.en142 MB~500 MB137c40403d78fd54d454da0f9bd998f78703390c
small466 MB~1.0 GB55356645c2b361a969dfd0ef2c5a50d530afd8d5
small.en466 MB~1.0 GBdb8a495a91d927739e50b3fc1cc4c6b8f6c2d022
medium1.5 GB~2.6 GBfd9727b6e1217c2f614f9b698455c4ffd82463b4
medium.en1.5 GB~2.6 GB8c30f0e44ce9560643ebd10bbe50cd20eafd3723
large-v12.9 GB~4.7 GBb1caaf735c4cc1429223d5a74f0f4d0b9b59a299
large2.9 GB~4.7 GB0f4c8e34f21cf1a914c59d8b3ce882345ad349d6

For more information see ggerganov/whisper.cpp.

Prepare

Download the model you want to use and put it in the models directory.

curl -LJ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin \
  --output models/ggml-small.bin

Usage

Please follow these simplified instructions to transcribe the audio file using a Docker container:

  1. Ensure that you have a testdata directory containing the jfk.wav file.
  2. Mount both the models and testdata directories to the Docker container.
  3. Specify the model using the --model flag and the audio file path using the --audio-path flag.
  4. The transcript result file will be saved in the same directory as the audio file.

To transcribe the audio file, execute the command provided below.

docker run \
  -v $PWD/models:/app/models \
  -v $PWD/testdata:/app/testdata \
  ghcr.io/appleboy/go-whisper:latest \
  --model /app/models/ggml-small.bin \
  --audio-path /app/testdata/jfk.wav

See the following output:

whisper_init_from_file_no_state: loading model from '/app/models/ggml-small.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 3
whisper_model_load: mem required  =  743.00 MB (+   16.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     =  464.68 MB
whisper_model_load: model size    =  464.44 MB
whisper_init_state: kv self size  =   15.75 MB
whisper_init_state: kv cross size =   52.73 MB
1:46AM INF system_info: n_threads = 8 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | COREML = 0 | 
 module=transcript
whisper_full_with_state: auto-detected language: en (p = 0.967331)
1:46AM INF [    0s ->    11s] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country. module=transcript

command line arguments:

OptionsDescriptionDefault Value
--modelModel is the interface to a whisper model[$PLUGIN_MODEL, $INPUT_MODEL]
--audio-pathaudio path[$PLUGIN_AUDIO_PATH, $INPUT_AUDIO_PATH]
--output-folderoutput folder[$PLUGIN_OUTPUT_FOLDER, $INPUT_OUTPUT_FOLDER]
--output-formatoutput format, support txt, srt, csv(default: "txt") [$PLUGIN_OUTPUT_FORMAT, $INPUT_OUTPUT_FORMAT]
--output-filenameoutput filename[$PLUGIN_OUTPUT_FILENAME, $INPUT_OUTPUT_FILENAME]
--languageSet the language to use for speech recognition(default: "auto") [$PLUGIN_LANGUAGE, $INPUT_LANGUAGE]
--threadsSet number of threads to use(default: 8) [$PLUGIN_THREADS, $INPUT_THREADS]
--debugenable debug mode(default: false) [$PLUGIN_DEBUG, $INPUT_DEBUG]
--speedupspeed up audio by x2 (reduced accuracy)(default: false) [$PLUGIN_SPEEDUP, $INPUT_SPEEDUP]
--translatetranslate from source language to english(default: false) [$PLUGIN_TRANSLATE, $INPUT_TRANSLATE]
--print-progressprint progress(default: true) [$PLUGIN_PRINT_PROGRESS, $INPUT_PRINT_PROGRESS]
--print-segmentprint segment(default: false) [$PLUGIN_PRINT_SEGMENT, $INPUT_PRINT_SEGMENT]
--webhook-urlwebhook url[$PLUGIN_WEBHOOK_URL, $INPUT_WEBHOOK_URL]
--webhook-insecurewebhook insecure(default: false) [$PLUGIN_WEBHOOK_INSECURE, $INPUT_WEBHOOK_INSECURE]
--webhook-headerswebhook headers[$PLUGIN_WEBHOOK_HEADERS, $INPUT_WEBHOOK_HEADERS]
--youtube-urlyoutube url[$PLUGIN_YOUTUBE_URL, $INPUT_YOUTUBE_URL]
--youtube-insecureyoutube insecure(default: false) [$PLUGIN_YOUTUBE_INSECURE, $INPUT_YOUTUBE_INSECURE]
--youtube-retry-countyoutube retry count(default: 20) [$PLUGIN_YOUTUBE_RETRY_COUNT, $INPUT_YOUTUBE_RETRY_COUNT]
--promptinitial prompt[$PLUGIN_PROMPT, $INPUT_PROMPT]
--help, -hshow help
--version, -vprint the version