Categorygithub.com/hachi8833/regexp2
modulepackage
1.0.1
Repository: https://github.com/hachi8833/regexp2.git
Documentation: pkg.go.dev

# README

#regexp2 - full featured regular expressions for Go Regexp2 is a feature-rich RegExp engine for Go. It doesn't have constant time guarantees like the built-in regexp package, but it allows backtracking and is compatible with Perl5 and .NET. You'll likely be better off with the RE2 engine from the 'regexp' package and should only use this if you need to write very complex patterns or require compatibility with .NET.

Basis of the engine

The engine is ported from the .NET framework's System.Text.RegularExpressions.Regex engine. That engine was open sourced in 2015 under the MIT license. There are some fundamental differences between .NET strings and Go strings that required a bit of borrowing from the Go framework regex engine as well. I cleaned up a couple of the dirtier bits during the port (regexcharclass.cs was terrible), but the parse tree, code emmitted, and therefore patterns matched should be identical.

Installing

This is a go-gettable library, so install is easy:

go get github.com/dlclark/regexp2/...

Usage

Usage is similar to the Go regexp package. Just like in regexp, you start by converting a regex into a state machine via the Compile or MustCompile methods. They ultimately do the same thing, but MustCompile will panic if the regex is invalid. You can then use the provided Regexp struct to find matches repeatedly. A Regexp struct is safe to use across goroutines.

re := regexp2.MustCompile(`Your pattern`, 0)
if isMatch, _ := re.MatchString(`Something to match`); isMatch {
    //do something
}

The only error that the *Match* methods should return is a Timeout if you set the re.MatchTimeout field. Any other error is a bug in the regexp2 package. If you need more details about capture groups in a match then use the FindStringMatch method, like so:

if m, _ := re.FindStringMatch(`Something to match`); m != nil {
    // the whole match is always group 0
    fmt.Printf("Group 0: %v\n", m.String())

    // you can get all the groups too
    gps := m.Groups()

    // a group can be captured multiple times, so each cap is separately addressable
    fmt.Printf("Group 1, first capture", gps[1].Captures[0].String())
    fmt.Printf("Group 1, second capture", gps[1].Captures[1].String())
}

Group 0 is embedded in the Match. Group 0 is an automatically-assigned group that encompasses the whole pattern. This means that m.String() is the same as m.Group.String() and m.Groups()[0].String()

The last capture is embedded in each group, so g.String() will return the same thing as g.Capture.String() and g.Captures[len(g.Captures)-1].String().

Compare regexp and regexp2

Categoryregexpregexp2
Catastrophic backtracking possibleno, constant execution time guaranteesyes, if your pattern is at risk you can use the re.MatchTimeout field
Python-style capture groups (P<name>re)yesno
.NET-style capture groups (<name>re) or ('name're)noyes
comments (?#comment)noyes
branch numbering reset (?|a|b)nono
possessive match (?>re)noyes
positive lookahead (?=re)noyes
negative lookahead (?!re)noyes
positive lookbehind (?<=re)noyes
negative lookbehind (?<!re)noyes
back reference \1noyes
named back reference \k'name'noyes
named ascii character class [[:foo:]]yesno
conditionals ((expr)yes|no)noyes

Library features that I'm still working on

  • Regex split

Potential bugs

I've run a battery of tests against regexp2 from various sources and found the debug output matches the .NET engine, but .NET and Go handle strings very differently. I've attempted to handle these differences, but most of my testing deals with basic ASCII with a little bit of multi-byte Unicode. There's a chance that there are bugs in the string handling related to character sets with supplementary Unicode chars. Right-to-Left support is coded, but not well tested either.

Find a bug?

I'm open to new issues and pull requests with tests if you find something odd!

# Packages

No description provided by the author

# Functions

Compile parses a regular expression and returns, if successful, a Regexp object that can be used to match against text.
Escape adds backslashes to any special characters in the input string.
MustCompile is like Compile but panics if the expression cannot be parsed.
Unescape removes any backslashes from previously-escaped special characters in the input string.

# Constants

"c".
"d".
"e".
"n".
"i".
"x".
"m".
No description provided by the author
"r".
"s".

# Variables

Default timeout used when running regexp matches -- "forever".

# Structs

Capture is a single capture of text within the larger original string.
Group is an explicit or implit (group 0) matched group within the pattern.
Match is a single regex result match that contains groups and repeated captures -Groups -Capture.
Regexp is the representation of a compiled regular expression.

# Type aliases

RegexOptions impact the runtime and parsing behavior for each specific regex.