Categorygithub.com/go-andiamo/splitter
repositorypackage
1.2.5
Repository: https://github.com/go-andiamo/splitter.git
Documentation: pkg.go.dev

# README

Splitter

GoDoc Latest Version codecov Go Report Card

Overview

Go package for splitting strings (aware of enclosing braces and quotes)

The problem with standard Golang strings.Split is that it does not take into consideration that the string being split may contain enclosing braces and/or quotes (where the separator should not be considered where it's inside braces or quotes)

Take for example a string representing a slice of comma separated strings...

    str := `"aaa","bbb","this, for sanity, should not be split"`

running strings.Split on that...

package main

import "strings"

func main() {
    str := `"aaa","bbb","this, for sanity, should not be parts"`
    parts := strings.Split(str, `,`)
    println(len(parts))
}

would yield 5 (try on go-playground) - instead of the desired 3

However, with splitter, the result would be different...

package main

import "github.com/go-andiamo/splitter"

func main() {
    commaSplitter, _ := splitter.NewSplitter(',', splitter.DoubleQuotes)

    str := `"aaa","bbb","this, for sanity, should not be split"`
    parts, _ := commaSplitter.Split(str)
    println(len(parts))
}

which yields the desired 3! try on go-playground

Note: The varargs, after the first separator arg, are the desired 'enclosures' (e.g. quotes, brackets, etc.) to be taken into consideration

While splitting, any enclosures specified are checked for balancing!

Installation

To install Splitter, use go get:

go get github.com/go-andiamo/splitter

To update Splitter to the latest version, run:

go get -u github.com/go-andiamo/splitter

Enclosures

Enclosures instruct the splitter specific start/end sequences within which the separator is not to be considered. An enclosure can be one of two types: quotes or brackets.

Quote type enclosures only differ from bracket type enclosures in the way that their optional escaping works -

  • Quote enclosures can be:
    • escaped by escape prefix - e.g. a quote enclosure starting with " and ending with " but \" is not seen as ending
    • escaped by doubles - e.g. a quote enclosure starting with ' and ending with ' but any doubles '' are not seen as ending
  • Bracket enclosures can only be:
    • escaped by escape prefix - e.g. a bracket enclosure starting with ( and ending with ) and escape set to \
      • \( is not seen as a start
      • \) is not seen as an end

Note that brackets are ignored inside quotes - but quotes can exist within brackets. And when splitting, separators found within any specified quote or bracket enclosure are not considered.

The Splitter provides many pre-defined enclosures:

Var Name Type Start - End Escaped end
DoubleQuotesQuote" "none
DoubleQuotesBackSlashEscapedQuote" "\"
DoubleQuotesDoubleEscapedQuote" """
SingleQuotesQuote' 'none
SingleQuotesBackSlashEscapedQuote' '\'
SingleQuotesDoubleEscapedQuote' '''
SingleInvertedQuotesQuote` `none
SingleInvertedQuotesBackSlashEscapedQuote` `\'
SingleInvertedQuotesDoubleEscapedQuote` ```
SinglePointingAngleQuotesQuote none
SinglePointingAngleQuotesBackSlashEscapedQuote \›
DoublePointingAngleQuotesQuote« »none
LeftRightDoubleDoubleQuotesQuote none
LeftRightDoubleSingleQuotesQuote none
LeftRightDoublePrimeQuotesQuote none
SingleLowHigh9QuotesQuote none
DoubleLowHigh9QuotesQuote none
ParenthesisBrackets( )none
CurlyBracketsBrackets{ }none
SquareBracketsBrackets[ ]none
LtGtAngleBracketsBrackets< >none
LeftRightPointingAngleBracketsBrackets none
SubscriptParenthesisBrackets none
SuperscriptParenthesisBrackets none
SmallParenthesisBrackets none
SmallCurlyBracketsBrackets none
DoubleParenthesisBrackets none
MathWhiteSquareBracketsBrackets none
MathAngleBracketsBrackets none
MathDoubleAngleBracketsBrackets none
MathWhiteTortoiseShellBracketsBrackets none
MathFlattenedParenthesisBrackets none
OrnateParenthesisBrackets ﴿none
AngleBracketsBrackets none
DoubleAngleBracketsBrackets none
FullWidthParenthesisBrackets none
FullWidthSquareBracketsBrackets none
FullWidthCurlyBracketsBrackets none
SubstitutionBracketsBrackets none
SubstitutionQuotesQuote none
DottedSubstitutionBracketsBrackets none
DottedSubstitutionQuotesQuote none
TranspositionBracketsBrackets none
TranspositionQuotesQuote none
RaisedOmissionBracketsBrackets none
RaisedOmissionQuotesQuote none
LowParaphraseBracketsBrackets none
LowParaphraseQuotesQuote none
SquareWithQuillBracketsBrackets none
WhiteParenthesisBrackets none
WhiteCurlyBracketsBrackets none
WhiteSquareBracketsBrackets none
WhiteLenticularBracketsBrackets none
WhiteTortoiseShellBracketsBrackets none
FullWidthWhiteParenthesisBrackets none
BlackTortoiseShellBracketsBrackets none
BlackLenticularBracketsBrackets none
PointingCurvedAngleBracketsBrackets none
TortoiseShellBracketsBrackets none
SmallTortoiseShellBracketsBrackets none
ZNotationImageBracketsBrackets none
ZNotationBindingBracketsBrackets none
MediumOrnamentalParenthesisBrackets none
LightOrnamentalTortoiseShellBracketsBrackets none
MediumOrnamentalFlattenedParenthesisBrackets none
MediumOrnamentalPointingAngleBracketsBrackets none
MediumOrnamentalCurlyBracketsBrackets none
HeavyOrnamentalPointingAngleQuotesQuote none
HeavyOrnamentalPointingAngleBracketsBrackets none

Note: To convert any of the above enclosures to escaping - use the MakeEscapable() or MustMakeEscapable() functions.

Quote enclosures with escaping

Quotes within quotes can be handled by using an enclosure that specifies how the escaping works, for example the following uses \ (backslash) prefixed escaping...

package main

import "github.com/go-andiamo/splitter"

func main() {
    commaSplitter, _ := splitter.NewSplitter(',', splitter.DoubleQuotesBackSlashEscaped)

    str := `"aaa","bbb","this, for sanity, \"should\" not be split"`
    parts, _ := commaSplitter.Split(str)
    println(len(parts))
}

try on go-playground

Or with double escaping...

package main

import "github.com/go-andiamo/splitter"

func main() {
    commaSplitter, _ := splitter.NewSplitter(',', splitter.DoubleQuotesDoubleEscaped)

    str := `"aaa","bbb","this, for sanity, """"should,,,,"" not be split"`
    parts, _ := commaSplitter.Split(str)
    println(len(parts))
}

try on go-playground

Not separating when separator encountered in quotes or brackets...

package main

import (
    "fmt"
    "github.com/go-andiamo/splitter"
)

func main() {
    encs := []*splitter.Enclosure{
        splitter.Parenthesis, splitter.SquareBrackets, splitter.CurlyBrackets,
        splitter.DoubleQuotesDoubleEscaped, splitter.SingleQuotesDoubleEscaped,
    }
    commaSplitter, _ := splitter.NewSplitter(',', encs...)

    str := `do(not,)split,'don''t,split,this',[,{,(a,"this has "" quotes")}]`
    parts, _ := commaSplitter.Split(str)
    println(len(parts))
    for i, pt := range parts {
        fmt.Printf("\t[%d]%s\n", i, pt)
    }
}

try on go-playground

Options

Options define behaviours that are to be carried out on each found part during splitting.

An option, by virtue of it's return args from .Apply(), can do one of three things:

  1. return a modified string of what is to be added to the split parts
  2. return a false to indicate that the split part is not to be added to the split result
  3. return an error to indicate that the split part is unacceptable (and cease further splitting - the error is returned from the Split method)

Options can be added directly to the Splitter using .AddDefaultOptions() method. These options are checked for every call to the splitters .Split() method.

Options can also be specified when calling the splitter .Split() method - these options are only carried out for this call (and after any options already specified on the splitter)

Option Examples

1. Stripping empty parts

package main

import (
    "fmt"
    "github.com/go-andiamo/splitter"
)

func main() {
    s := splitter.MustCreateSplitter('/').
        AddDefaultOptions(splitter.IgnoreEmpties)

    parts, _ := s.Split(`/a//c/`)
    println(len(parts))
    fmt.Printf("%+v", parts)
}

try on go-playground

2. Stripping empty first/last parts

package main

import (
    "fmt"
    "github.com/go-andiamo/splitter"
)

func main() {
    s := splitter.MustCreateSplitter('/').
        AddDefaultOptions(splitter.IgnoreEmptyFirst, splitter.IgnoreEmptyLast)

    parts, _ := s.Split(`/a//c/`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)

    parts, _ = s.Split(`a//c/`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)

    parts, _ = s.Split(`/a//c`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)
}

try on go-playground

3. Trimming parts

package main

import (
    "fmt"
    "github.com/go-andiamo/splitter"
)

func main() {
    s := splitter.MustCreateSplitter('/').
        AddDefaultOptions(splitter.TrimSpaces)

    parts, _ := s.Split(`/a/b/c/`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)

    parts, _ = s.Split(`  / a /b / c/    `)
    println(len(parts))
    fmt.Printf("%+v\n", parts)

    parts, _ = s.Split(`/   a   /   b   /   c   /`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)
}

try on go-playground

4. Trimming spaces (and removing empties)

package main

import (
    "fmt"
    "github.com/go-andiamo/splitter"
)

func main() {
    s := splitter.MustCreateSplitter('/').
        AddDefaultOptions(splitter.TrimSpaces, splitter.IgnoreEmpties)

    parts, _ := s.Split(`/a/  /c/`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)

    parts, _ = s.Split(`  / a // c/    `)
    println(len(parts))
    fmt.Printf("%+v\n", parts)

    parts, _ = s.Split(`/   a   /      /   c   /`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)
}

try on go-playground

5. Error for empties found

package main

import (
    "fmt"
    "github.com/go-andiamo/splitter"
)

func main() {
    s := splitter.MustCreateSplitter('/').
        AddDefaultOptions(splitter.TrimSpaces, splitter.NoEmpties)

    if parts, err := s.Split(`/a/  /c/`); err != nil {
        println(err.Error())
    } else {
        println(len(parts))
        fmt.Printf("%+v\n", parts)
    }

    if parts, err := s.Split(`  / a // c/    `); err != nil {
        println(err.Error())
    } else {
        println(len(parts))
        fmt.Printf("%+v\n", parts)
    }

    if parts, err := s.Split(`/   a   /      /   c   /`); err != nil {
        println(err.Error())
    } else {
        println(len(parts))
        fmt.Printf("%+v\n", parts)
    }

    if parts, err := s.Split(` a / b/c `); err != nil {
        println(err.Error())
    } else {
        println(len(parts))
        fmt.Printf("%+v\n", parts)
    }
}

try on go-playground