Categorygithub.com/semvis123/gosseract-wasm/v2

modulepackage

2.0.0-20230519073246-fd76d9b42f74

Repository: https://github.com/semvis123/gosseract-wasm.git

Documentation: pkg.go.dev

# README

gosseract OCR WASM port

Golang OCR package, by using Tesseract C++ library.

OCR Server

Do you just want OCR server, or see the working example of this package? Yes, there is already-made server application, which is seriously easy to deploy!

👉 https://github.com/otiai10/ocrserver

Example

package main

import (
	"fmt"
	"github.com/semvis123/gosseract-wasm/v2"
)

func main() {
	client := gosseract.NewClient()
	defer client.Close()
	client.SetImage("path/to/image.png")
	text, _ := client.Text()
	fmt.Println(text)
	// Hello, World!
}

Installation

~~1. tesseract-ocr, including library and headers~~.
2. go get -t github.com/semvis123/gosseract-wasm/v2

~~Please check this Dockerfile to get started step-by-step. Or if you want the env instantly, you can just try by docker run -it --rm otiai10/gosseract.~~

Test

~~In case you have tesseract-ocr on your local,~~

you can just hit

% go test .

~~Otherwise, if you DON'T want to install tesseract-ocr on your local, kick ./test/runtime which is using Docker and Vagrant to test the source code on some runtimes.~~

% ./test/runtime --driver docker
% ./test/runtime --driver vagrant

~~Check ./test/runtimes for more information about runtime tests.~~

Issues

https://github.com/semvis123/gosseract-wasm/issues

# Functions

GetAvailableLanguages

GetAvailableLanguages returns a list of available languages in the default tesspath.

NewClient

NewClient construct new Client.

NewClientWithFS

NewClient construct new Client with a FS that will be mounted at '/custom/'.

Version

Version returns the version of Tesseract-OCR.

# Constants

DEBUG_FILE

DEBUG_FILE - File to send output to.

PSM_AUTO

PSM_AUTO - (DEFAULT) Fully automatic page segmentation, but no OSD.

PSM_AUTO_ONLY

PSM_AUTO_ONLY - Automatic page segmentation, but no OSD, or OCR.

PSM_AUTO_OSD

PSM_AUTO_OSD - Automatic page segmentation with OSD.

PSM_CIRCLE_WORD

PSM_CIRCLE_WORD - Treat the image as a single word in a circle.

PSM_COUNT

PSM_COUNT - Just a number of enum entries.

PSM_OSD_ONLY

PSM_OSD_ONLY - Orientation and script detection (OSD) only.

PSM_RAW_LINE

PSM_RAW_LINE - Treat the image as a single text line, bypassing hacks that are Tesseract-specific.

PSM_SINGLE_BLOCK

PSM_SINGLE_BLOCK - Assume a single uniform block of text.

PSM_SINGLE_BLOCK_VERT_TEXT

PSM_SINGLE_BLOCK_VERT_TEXT - Assume a single uniform block of vertically aligned text.

PSM_SINGLE_CHAR

PSM_SINGLE_CHAR - Treat the image as a single character.

PSM_SINGLE_COLUMN

PSM_SINGLE_COLUMN - Assume a single column of text of variable sizes.

PSM_SINGLE_LINE

PSM_SINGLE_LINE - Treat the image as a single text line.

PSM_SINGLE_WORD

PSM_SINGLE_WORD - Treat the image as a single word.

PSM_SPARSE_TEXT

PSM_SPARSE_TEXT - Find as much text as possible in no particular order.

PSM_SPARSE_TEXT_OSD

PSM_SPARSE_TEXT_OSD - Sparse text with orientation and script det.

RIL_BLOCK

RIL_BLOCK - Block of text/image/separator line.

RIL_PARA

RIL_PARA - Paragraph within a block.

RIL_SYMBOL

RIL_SYMBOL - Symbol/character within a word.

RIL_TEXTLINE

RIL_TEXTLINE - Line within a paragraph.

RIL_WORD

RIL_WORD - Word within a textline.

TESSEDIT_CHAR_BLACKLIST

TESSEDIT_CHAR_BLACKLIST - Blacklist of chars not to recognize There is a known issue in 4.00 with LSTM https://github.com/tesseract-ocr/tesseract/issues/751.

TESSEDIT_CHAR_WHITELIST

TESSEDIT_CHAR_WHITELIST - Whitelist of chars to recognize There is a known issue in 4.00 with LSTM https://github.com/tesseract-ocr/tesseract/issues/751.