package
2.4.1
Repository: https://github.com/unionj-cloud/go-doudou.git
Documentation: pkg.go.dev

# README

Go-string

Go Report Card Str Count Badge

Useful string utility functions for Go projects. Either because they are faster than the common Go version or do not exist in the standard library.

You can find all details here https://pkg.go.dev/github.com/boyter/go-string

Probably the most useful methods are IndexAll and IndexAllIgnoreCase which for string literal searches should be drop in replacements for regexp.FindAllIndex while totally avoiding the regular expression engine and as such being much faster.

Some quick benchmarks using a simple program which opens a 550MB file and searches over it in memory. Each search is done three times, the first using regexp.FindAllIndex and the second using IndexAllIgnoreCase.

For this specific example the wall clock time to run is at least 10x less, but with the same matching results.

$ ./csperf ſecret 550MB
File length 576683100

FindAllIndex (regex ignore case)
Scan took 25.403231773s 16680
Scan took 25.39742299s 16680
Scan took 25.227218738s 16680

IndexAllIgnoreCase (custom)
Scan took 2.04013314s 16680
Scan took 2.019360935s 16680
Scan took 1.996732171s 16680

The above example in code for you to copy

// Simple test comparison between various search methods
func main() {
	arg1 := os.Args[1]
	arg2 := os.Args[2]

	b, err := os.ReadFile(arg2)
	if err != nil {
		fmt.Print(err)
		return
	}

	fmt.Println("File length", len(b))

	haystack := string(b)

	var start time.Time
	var elapsed time.Duration

	fmt.Println("\nFindAllIndex (regex)")
	r := regexp.MustCompile(regexp.QuoteMeta(arg1))
	for i := 0; i < 3; i++ {
		start = time.Now()
		all := r.FindAllIndex(b, -1)
		elapsed = time.Since(start)
		fmt.Println("Scan took", elapsed, len(all))
	}

	fmt.Println("\nIndexAll (custom)")
	for i := 0; i < 3; i++ {
		start = time.Now()
		all := str.IndexAll(haystack, arg1, -1)
		elapsed = time.Since(start)
		fmt.Println("Scan took", elapsed, len(all))
	}

	r = regexp.MustCompile(`(?i)` + regexp.QuoteMeta(arg1))
	fmt.Println("\nFindAllIndex (regex ignore case)")
	for i := 0; i < 3; i++ {
		start = time.Now()
		all := r.FindAllIndex(b, -1)
		elapsed = time.Since(start)
		fmt.Println("Scan took", elapsed, len(all))
	}

	fmt.Println("\nIndexAllIgnoreCase (custom)")
	for i := 0; i < 3; i++ {
		start = time.Now()
		all := str.IndexAllIgnoreCase(haystack, arg1, -1)
		elapsed = time.Since(start)
		fmt.Println("Scan took", elapsed, len(all))
	}
}

Note that it performs best with real documents and wost when searching over random data. Depending on what you are searching you may have a similar speed up or a marginal one.

FindAllIndex has a similar speed up,

// BenchmarkFindAllIndex-8                         2458844	       480.0 ns/op
// BenchmarkIndexAll-8                            14819680	        79.6 ns/op

See the benchmarks for full proof where they test various edge cases.

The other most useful method is HighlightString. HighlightString takes in some content and locations and then inserts in/out strings which can be used for highlighting around matching terms. For example you could pass in "test" and have it return "<strong>te</strong>st". The argument locations accepts output from regexp.FindAllIndex or the included IndexAllIgnoreCase or IndexAll.

All code is dual-licenced as either MIT or Unlicence. Your choice when you use it.

Note that as an Australian I cannot put this into the public domain, hence the choice most liberal licences I can find.

# Functions

AllSimpleFold given an input rune return a rune slice containing all of the possible simple fold.
Contains checks the supplied slice of string for the existence of a string and returns true if found, and false otherwise.
ContainsI assert s contains substr ignore case.
HasPrefixI assert s has prefix prefix ignore case.
HighlightString takes in some content and locations and then inserts in/out strings which can be used for highlighting around matching terms.
IndexAll extracts all of the locations of a string inside another string up-to the defined limit and does so without regular expressions which makes it faster than FindAllIndex in most situations while not being any slower.
IndexAllIgnoreCase extracts all of the locations of a string inside another string up-to the defined limit.
IsEmpty asserts s is empty.
IsNotEmpty asserts s is not empty.
IsSpace checks bytes MUST which be UTF-8 encoded for a space List of spaces detected (same as unicode.IsSpace): '\t', '\n', '\v', '\f', '\r', ' ', U+0085 (NEL), U+00A0 (NBSP).
PermuteCase given a str returns a slice containing all possible case permutations of that str such that input of foo will return foo Foo fOo FOo foO FoO fOO FOO Note that very long inputs can produce an enormous amount of results in the returned slice OR result in an overflow and return nothing.
PermuteCaseFolding given a str returns a slice containing all possible case permutations with characters being folded such that S will return S s ſ.
RemoveStringDuplicates is a simple helper method that removes duplicates from any given str slice and then returns a nice duplicate free str slice.
No description provided by the author
No description provided by the author
No description provided by the author
StartOfRune a byte and returns true if its the start of a multibyte character or a single byte character otherwise false.
No description provided by the author
No description provided by the author

# Variables

CacheSize this is public so it can be modified depending on project needs you can increase this value to cache more of the case permutations which can improve performance if doing the same searches over and over.