package
0.2.1
Repository: https://github.com/theory/sqljson.git
Documentation: pkg.go.dev

# README

Go SQL/JSON Path

The path package ports the SQL/JSON Path data type from PostgreSQL to Go. It supports both SQL-standard path expressions and PostgreSQL-specific predicate check expressions.

πŸ’‘ Use the πŸ› Playground links below to run the examples in this document, and to experiment with jsonpath execution. The Go SQL/JSON Path Playground is a single-page stateless JavaScript and Go WebAssembly app that offers permalink generation to share examples, like this one.

The SQL/JSON Path Language

This section was ported from the PostgreSQL docs.

SQL/JSON Path is a query language for JSON values. A path expression applied to a JSON value produces a JSON result.

SQL/JSON path expressions specify item(s) to be retrieved from a JSON value, similarly to XPath expressions used for access to XML content. In Go, path expressions are implemented in the path package and can use any elements described below.

Syntax

The path package implements support for the SQL/JSON path language in Go to efficiently query JSON data. It provides an abstract syntax tree of the parsed SQL/JSON path expression that specifies the items to be retrieved by the path engine from the JSON data for further processing with the SQL/JSON query functions.

The semantics of SQL/JSON path predicates and operators generally follow SQL. At the same time, to provide a natural way of working with JSON data, SQL/JSON path syntax uses some JavaScript conventions:

  • Dot (.) is used for member access.

  • Square brackets ([]) are used for array access.

  • SQL/JSON arrays are 0-relative, like Go slices, but unlike regular SQL arrays, which start from 1.

Numeric literals in SQL/JSON path expressions follow JavaScript rules, which are different from Go, SQL, and JSON in some minor details. For example, SQL/JSON path allows .1 and 1., which are invalid in JSON. Non-decimal integer literals and underscore separators are supported, for example, 1_000_000, 0x1EEE_FFFF, 0o273, 0b100101. In SQL/JSON path (and in JavaScript, but not in SQL or Go), there must not be an underscore separator directly after the radix prefix.

An SQL/JSON path expression is typically written as a Go string literal, so it must be enclosed in back quotes or double quotes --- and with the latter any double quotes within the value must be escaped (see string literals).

Some forms of path expressions require string literals within them. These embedded string literals follow JavaScript/ECMAScript conventions: they must be surrounded by double quotes, and backslash escapes may be used within them to represent otherwise-hard-to-type characters. In particular, the way to write a double quote within a double-quoted string literal is \", and to write a backslash itself, you must write \\. Other special backslash sequences include those recognized in JSON strings: \b, \f, \n, \r, \t, \v for various ASCII control characters, and \uNNNN for a Unicode character identified by its 4-hex-digit code point. The backslash syntax also includes two cases not allowed by JSON: \xNN for a character code written with only two hex digits, and \u{N...} for a character code written with 1 to 6 hex digits.

A path expression consists of a sequence of path elements, which can be any of the following:

For details on using JSON path expressions with SQL/JSON query functions, see Operation.

Path Variables

VariableDescription
$A variable representing the JSON value being queried (the context item).
$varnameA named variable. Its value can be set by the exec.WithVars option of Path processing functions
@A variable representing the result of path evaluation in filter expressions.

Path Accessors

Accessor OperatorDescription
.key, ."$varname"Member accessor that returns an object member with the specified key. If the key name matches some named variable starting with $ or does not meet the JavaScript rules for an identifier, it must be enclosed in double quotes to make it a string literal.
.*Wildcard member accessor that returns the values of all members located at the top level of the current object.
.**Recursive wildcard member accessor that processes all levels of the JSON hierarchy of the current object and returns all the member values, regardless of their nesting level. This is a PostgreSQL extension of the SQL/JSON standard.
.**{level}, .**{start_level to end_level}Like .**, but selects only the specified levels of the JSON hierarchy. Nesting levels are specified as integers. Level zero corresponds to the current object. To access the lowest nesting level, you can use the last keyword. This is a PostgreSQL extension of the SQL/JSON standard.
[subscript, ...]Array element accessor. subscript can be given in two forms: index or start_index to end_index. The first form returns a single array element by its index. The second form returns an array slice by the range of indexes, including the elements that correspond to the provided start_index and end_index.

The specified index can be an integer, as well as an expression returning a single numeric value, which is automatically cast to integer. Index zero corresponds to the first array element. You can also use the last keyword to denote the last array element, which is useful for handling arrays of unknown length.
[*]Wildcard array element accessor that returns all array elements.

Operation

Path query functions pass the provided path expression to the path engine for evaluation. If the expression matches the queried JSON data, the corresponding set of JSON items, is returned as an []any slice. If there is no match, the result will be an empty slice, NULL, false, or an error, depending on the function. Path expressions are written in the SQL/JSON path language and can include arithmetic expressions and functions.

A path expression consists of a sequence of elements allowed by the SQL/JSON path language. The path expression is normally evaluated from left to right, but you can use parentheses to change the order of operations. If the evaluation is successful, a sequence of JSON items is produced, and the evaluation result is returned to the Path query function that completes the specified computation.

To refer to the JSON value being queried (the context item), use the $ variable in the path expression. The first element of a path must always be $. It can be followed by one or more accessor operators, which go down the JSON structure level by level to retrieve sub-items of the context item. Each accessor operator acts on the result(s) of the previous evaluation step, producing zero, one, or more output items from each input item.

For example, suppose you have some JSON data from a GPS tracker that you would like to parse, such as:

var src = []byte(`{
  "track": {
    "segments": [
      {
        "location":   [ 47.763, 13.4034 ],
        "start time": "2018-10-14 10:05:14",
        "HR": 73
      },
      {
        "location":   [ 47.706, 13.2635 ],
        "start time": "2018-10-14 10:39:21",
        "HR": 135
      }
    ]
  }
}`)

The path package expects JSON to be decoded into a Go value, one of string, float64, json.Number, map[string]any, or []any β€” which are the values produced by unmarshaling data into an any value. For the above JSON, unmarshal it like so:

var value any
if err := json.Unmarshal(src, &value); err != nil {
    log.Fatal(err)
}
fmt.Printf("%T\n", value)

The output shows the parsed data type:

map[string]interface {}

Note that examples below encode results as JSON for legibility using a function like this:

func pp(val any) {
    js, err := json.Marshal(val)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(string(js))
}

To retrieve the available track segments, you need to use the .key accessor operator to descend through surrounding JSON objects, for example:

pp(path.MustQuery("$.track.segments", value))

And the output (indented for legibility; playground):

[
  [
    {
      "HR": 73,
      "location": [
        47.763,
        13.4034
      ],
      "start time": "2018-10-14 10:05:14"
    },
    {
      "HR": 135,
      "location": [
        47.706,
        13.2635
      ],
      "start time": "2018-10-14 10:39:21"
    }
  ]
]

To retrieve the contents of an array, you typically use the [*] operator. The following example will return the location coordinates for all the available track segments (playground):

pp(path.MustQuery("$.track.segments[*].location", value))
[[47.763,13.4034],[47.706,13.2635]]

Here we started with the whole JSON input value ($), then the .track accessor selected the JSON object associated with the "track" object key, then the .segments accessor selected the JSON array associated with the "segments" key within that object, then the [*] accessor selected each element of that array (producing a series of items), then the .location accessor selected the JSON array associated with the "location" key within each of those objects. In this example, each of those objects had a "location" key; but if any of them did not, the .location accessor would have simply produced no output for that input item.

To return the coordinates of the first segment only, you can specify the corresponding subscript in the [] accessor operator. Recall that JSON array indexes are 0-relative (playground):

pp(path.MustQuery("$.track.segments[0].location", value))
[[47.763,13.4034]]

The result of each path evaluation step can be processed by one or more of the json path operators and methods listed below. Each method name must be preceded by a dot. For example, you can get the size of an array (playground):

pp(path.MustQuery("$.track.segments.size()", value))
[2]

More examples of using jsonpath operators and methods within path expressions appear below.

A path can also contain filter expressions that work similarly to the WHERE clause in SQL. A filter expression begins with a question mark and provides a condition in parentheses:

? (condition)

Filter expressions must be written just after the path evaluation step to which they should apply. The result of that step is filtered to include only those items that satisfy the provided condition. SQL/JSON defines three-valued logic, so the condition can produce true, false, or unknown. The unknown value plays the same role as SQL NULL and Go nil and can be tested for with the is unknown predicate. Further path evaluation steps use only those items for which the filter expression returned true.

The functions and operators that can be used in filter expressions are listed below. Within a filter expression, the @ variable denotes the value being considered (i.e., one result of the preceding path step). You can write accessor operators after @ to retrieve component items.

For example, suppose you would like to retrieve all heart rate values higher than 130. You can achieve this as follows (playground):

pp(path.MustQuery("$.track.segments[*].HR ? (@ > 130)", value))
[135]

To get the start times of segments with such values, you have to filter out irrelevant segments before selecting the start times, so the filter expression is applied to the previous step, and the path used in the condition is different (playground):

pp(path.MustQuery(
    `$.track.segments[*] ? (@.HR > 130)."start time"`,
    value,
))
["2018-10-14 10:39:21"]

You can use several filter expressions in sequence, if required. The following example selects start times of all segments that contain locations with relevant coordinates and high heart rate values (playground):

pp(path.MustQuery(
    `$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time"`,
    value,
))
["2018-10-14 10:39:21"]

Using filter expressions at different nesting levels is also allowed. The following example first filters all segments by location, and then returns high heart rate values for these segments, if available (playground):

pp(path.MustQuery(
    `$.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130)`,
    value,
))
[135]

You can also nest filter expressions within each other. This example returns the size of the track if it contains any segments with high heart rate values, or an empty sequence otherwise (playground):

pp(path.MustQuery(
    `$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()`,
    value,
))
[2]

Deviations From The SQL Standard

PostgreSQL's implementation of the SQL/JSON path language, and therefore also this Go implementation, has the following deviations from the SQL/JSON standard.

Boolean Predicate Check Expressions

As an extension to the SQL standard, a PostgreSQL path expression can be a Boolean predicate, whereas the SQL standard allows predicates only within filters. While SQL-standard path expressions return the relevant element(s) of the queried JSON value, predicate check expressions return the single three-valued result of the predicate: true, false, or unknown. For example, we could write this SQL-standard filter expression (playground):

pp(path.MustQuery("$.track.segments ?(@[*].HR > 130)", value))

The result:

[{"HR":135,"location":[47.706,13.2635],"start time":"2018-10-14 10:39:21"}]

The similar predicate check expression simply returns true, indicating that a match exists (playground):

pp(path.MustQuery("$.track.segments[*].HR > 130", value))
[true]

Note: PostgreSQL predicate check expressions require the @@ operator, while SQL-standard path expressions require the @? operator. Use the PgIndexOperator method to pass the appropriate operator to PostgreSQL.

Regular Expression Interpretation

There are minor differences in the interpretation of regular expression patterns used in like_regex filters, as described below.

Strict And Lax Modes

When you query JSON data, the path expression may not match the actual JSON data structure. An attempt to access a non-existent member of an object or element of an array is defined as a structural error. SQL/JSON path expressions have two modes of handling structural errors:

  • lax (default) β€” the path engine implicitly adapts the queried data to the specified path. Any structural errors that cannot be fixed as described below are suppressed, producing no match.

  • strict β€” if a structural error occurs, an error is raised.

Lax mode facilitates matching of a JSON document and path expression when the JSON data does not conform to the expected schema. If an operand does not match the requirements of a particular operation, it can be automatically wrapped as an SQL/JSON array, or unwrapped by converting its elements into an SQL/JSON sequence before performing the operation. Also, comparison operators and most methods automatically unwrap their operands in lax mode, so you can compare SQL/JSON arrays out-of-the-box. An array of size 1 is considered equal to its sole element. Automatic unwrapping is not performed when:

  • The path expression contains type() or size() methods that return the type and the number of elements in the array, respectively.

  • The queried JSON data contain nested arrays. In this case, only the outermost array is unwrapped, while all the inner arrays remain unchanged. Thus, implicit unwrapping can only go one level down within each path evaluation step.

For example, when querying the GPS data listed above, you can abstract from the fact that it stores an array of segments when using lax mode (playground):

pp(path.MustQuery("lax $.track.segments.location", value))
[[47.763,13.4034],[47.706,13.2635]]

In strict mode, the specified path must exactly match the structure of the queried JSON document, so using this path expression will cause an error (playground):

pp(path.MustQuery("strict $.track.segments.location", value))
panic: exec: jsonpath member accessor can only be applied to an object

To get the same result as in lax mode, you have to explicitly unwrap the segments array (playground):

pp(path.MustQuery("strict $.track.segments[*].location", value))
[[47.763,13.4034],[47.706,13.2635]]

The unwrapping behavior of lax mode can lead to surprising results. For instance, the following query using the .** accessor selects every HR value twice (playground):

pp(path.MustQuery("lax $.**.HR", value))
[73,135,73,135]

This happens because the .** accessor selects both the segments array and each of its elements, while the .HR accessor automatically unwraps arrays when using lax mode. To avoid surprising results, we recommend using the .** accessor only in strict mode. The following query selects each HR value just once (playground):

pp(path.MustQuery("strict $.**.HR", value))
[73,135]

The unwrapping of arrays can also lead to unexpected results. Consider this example, which selects all the location arrays (playground):

pp(path.MustQuery("lax $.track.segments[*].location", value))
[[47.763,13.4034],[47.706,13.2635]]

As expected it returns the full arrays. But applying a filter expression causes the arrays to be unwrapped to evaluate each item, returning only the items that match the expression (playground):

pp(path.MustQuery(
    "lax $.track.segments[*].location ?(@[*] > 15)",
    value,
))
[47.763,47.706]

This despite the fact that the full arrays are selected by the path expression. Use strict mode to restore selecting the arrays (playground):

pp(path.MustQuery(
    "strict $.track.segments[*].location ?(@[*] > 15)",
    value,
))
[[47.763,13.4034],[47.706,13.2635]]

SQL/JSON Path Operators And Methods

The list of operators and methods available in JSON path expressions. Note that while the unary operators and methods can be applied to multiple values resulting from a preceding path step, the binary operators (addition etc.) can only be applied to single values. In lax mode, methods applied to an array will be executed for each value in the array. The exceptions are .type() and .size(), which apply to the array itself.

Note: The examples below use this utility function to marshall JSON arguments:

func val(src string) any {
    var value any
    if err := json.Unmarshal([]byte(src), &value); err != nil {
        log.Fatal(err)
    }
    return value
}

number + number β†’ number

Addition (playground):

pp(path.MustQuery("$[0] + 3", val("2"))) // β†’ [5]

+ number β†’ number

Unary plus (no operation); unlike addition, this can iterate over multiple values (playground):

pp(path.MustQuery("+ $.x", val(`{"x": [2,3,4]}`))) // β†’ [2, 3, 4]

number - number β†’ number

Subtraction (playground):

pp(path.MustQuery("7 - $[0]", val("[2]"))) // β†’ [5]

- number β†’ number

Negation; unlike subtraction, this can iterate over multiple values (playground):

pp(path.MustQuery("- $.x", val(`{"x": [2,3,4]}`))) // β†’ [-2,-3,-4]

number * number β†’ number

Multiplication (playground):

pp(path.MustQuery("2 * $[0]", val("4"))) // β†’ [8]

number / number β†’ number

Division (playground):

pp(path.MustQuery("$[0] / 2", val("[8.5]"))) // β†’ [4.25]

number % number β†’ number

Modulo (remainder) (playground):

pp(path.MustQuery("$[0] % 10", val("[32]"))) // β†’ [2]

value . type() β†’ string

Type of the JSON item (playground):

pp(path.MustQuery("$[*].type()", val(`[1, "2", {}]`))) // β†’ ["number","string","object"]

value . size() β†’ number

Size of the JSON item (number of array elements, or 1 if not an array; playground):

pp(path.MustQuery("$.m.size()", val(`{"m": [11, 15]}`))) // β†’ [2]

value . boolean() β†’ boolean

Boolean value converted from a JSON boolean, number, or string (playground):

pp(path.MustQuery("$[*].boolean()", val(`[1, "yes", false]`))) // β†’ [true,true,false]

value . string() β†’ string

String value converted from a JSON boolean, number, string, or datetime (playground, playground):

pp(path.MustQuery("$[*].string()", val(`[1.23, "xyz", false]`)))    // β†’ ["1.23","xyz","false"]
pp(path.MustQuery("$.timestamp().string()", "2023-08-15 12:34:56")) // β†’ ["2023-08-15T12:34:56"]

value . double() β†’ number

Approximate floating-point number converted from a JSON number or string (playground):

pp(path.MustQuery(" ", val(`{"len": "1.9"}`))) // β†’ [3.8]

number . ceiling() β†’ number

Nearest integer greater than or equal to the given number (playground):

pp(path.MustQuery("$.h.ceiling()", val(`{"h": 1.3}`))) // β†’ [2]

number . floor() β†’ number

Nearest integer less than or equal to the given number (playground):

pp(path.MustQuery("$.h.floor()", val(`{"h": 1.7}`))) // β†’ [1]

number . abs() β†’ number

Absolute value of the given number (playground):

pp(path.MustQuery("$.z.abs()", val(`{"z": -0.3}`))) // β†’ [0.3]

value . bigint() β†’ bigint

Big integer value converted from a JSON number or string (playground):

pp(path.MustQuery("$.len.bigint()", val(`{"len": "9876543219"}`))) // β†’ [9876543219]

value . decimal( [ precision [ , scale ] ] ) β†’ decimal

Rounded decimal value converted from a JSON number or string. Precision and scale must be integer values (playground):

pp(path.MustQuery("$.decimal(6, 2)", val("1234.5678"))) // β†’ [1234.57]

value . integer() β†’ integer

Integer value converted from a JSON number or string (playground):

pp(path.MustQuery("$.len.integer()", val(`{"len": "12345"}`))) // β†’ [12345]

value . number() β†’ numeric

Numeric value converted from a JSON number or string (playground):

pp(path.MustQuery("$.len.number()", val(`{"len": "123.45"}`))) // β†’ [123.45]

string . datetime() β†’ types.DateTime

Date/time value converted from a string (playground):

pp(path.MustQuery(
    `$[*] ? (@.datetime() < "2015-08-02".datetime())`,
    val(`["2015-08-01", "2015-08-12"]`),
)) // β†’ ["2015-8-01"]

string . datetime(template) β†’ types.DateTime

Date/time value converted from a string using the specified to_timestamp template.

NOTE: Currently unimplemented, raises an error (playground):

pp(path.MustQuery(
    `$[*].datetime("HH24:MI")`, val(`["12:30", "18:40"]`),
)) // β†’ panic: exec: .datetime(template) is not yet supported

string . date() β†’ types.Date

Date value converted from a string (playground):

pp(path.MustQuery("$.date()", "2023-08-15")) // β†’ ["2023-08-15"]

string . time() β†’ types.Time

Time without time zone value converted from a string (playground):

pp(path.MustQuery("$.time()", "12:34:56")) // β†’ ["12:34:56"]

string . time(precision) β†’ types.Time

Time without time zone value converted from a string, with fractional seconds adjusted to the given precision (playground):

pp(path.MustQuery("$.time(2)", "12:34:56.789")) // β†’ ["12:34:56.79"]

string . time_tz() β†’ types.TimeTZ

Time with time zone value converted from a string (playground):

pp(path.MustQuery("$.time_tz()", "12:34:56+05:30")) // β†’ ["12:34:56+05:30"]

string . time_tz(precision) β†’ types.TimeTZ

Time with time zone value converted from a string, with fractional seconds adjusted to the given precision (playground):

pp(path.MustQuery("$.time_tz(2)", "12:34:56.789+05:30")) // β†’ ["12:34:56.79+05:30"]

string . timestamp() β†’ types.Timestamp

Timestamp without time zone value converted from a string (playground):

pp(path.MustQuery("$.timestamp()", "2023-08-15 12:34:56")) // β†’ ["2023-08-15T12:34:56"]

string . timestamp(precision) β†’ types.Timestamp

Timestamp without time zone value converted from a string, with fractional seconds adjusted to the given precision (playground):

arg := "2023-08-15 12:34:56.789"
pp(path.MustQuery("$.timestamp(2)", arg)) // β†’ ["2023-08-15T12:34:56.79"]

string . timestamp_tz() β†’ types.TimestampTZ

Timestamp with time zone value converted from a string (playground):

arg := "2023-08-15 12:34:56+05:30"
pp(path.MustQuery("$.timestamp_tz()", arg)) // β†’ ["2023-08-15T12:34:56+05:30"]

string . timestamp_tz(precision) β†’ types.TimestampTZ

Timestamp with time zone value converted from a string, with fractional seconds adjusted to the given precision (playground):

arg := "2023-08-15 12:34:56.789+05:30"
pp(path.MustQuery("$.timestamp_tz(2)", arg)) // β†’ ["2023-08-15T12:34:56.79+05:30"]

object . keyvalue() β†’ []map[string]any

The object's key-value pairs, represented as an array of objects containing three fields: "key", "value", and "id"; "id" is a unique identifier of the object the key-value pair belongs to (playground):

pp(path.MustQuery("$.keyvalue()", val(`{"x": "20", "y": 32}`)))
// β†’ [{"id":0,"key":"x","value":"20"},{"id":0,"key":"y","value":32}]

Filter Expression Elements

The filter expression elements available in JSON path.

value == value β†’ boolean

Equality comparison (this, and the other comparison operators, work on all JSON scalar values; playground, playground):

pp(path.MustQuery("$[*] ? (@ == 1)", val(`[1, "a", 1, 3]`)))   // β†’ [1,1]
pp(path.MustQuery(`$[*] ? (@ == "a")`, val(`[1, "a", 1, 3]`))) // β†’ ["a"]

value != value β†’ boolean

value <> value β†’ boolean

Non-equality comparison (playground, playground):

pp(path.MustQuery("$[*] ? (@ != 1)", val(`[1, 2, 1, 3]`)))      // β†’ [2,3]
pp(path.MustQuery(`$[*] ? (@ <> "b")`, val(`["a", "b", "c"]`))) // β†’ ["a","c"]

value < value β†’ boolean

Less-than comparison (playground):

pp(path.MustQuery("$[*] ? (@ < 2)", val(`[1, 2, 3]`))) // β†’ [1]

value <= value β†’ boolean

Less-than-or-equal-to comparison (playground):

pp(path.MustQuery(`$[*] ? (@ <= "b")`, val(`["a", "b", "c"]`))) // β†’ ["a","b"]

value > value β†’ boolean

Greater-than comparison (playground):

pp(path.MustQuery("$[*] ? (@ > 2)", val(`[1, 2, 3]`))) // β†’ [3]

value >= value β†’ boolean

Greater-than-or-equal-to comparison (playground):

pp(path.MustQuery("$[*] ? (@ >= 2)", val(`[1, 2, 3]`))) // β†’ [2,3]

true β†’ boolean

JSON constant true (playground):

arg := val(`[
  {"name": "John", "parent": false},
  {"name": "Chris", "parent": true}
]`)
pp(path.MustQuery("$[*] ? (@.parent == true)", arg)) // β†’ [{"name":"Chris","parent":true}]

false β†’ boolean

JSON constant false (playground):

arg := val(`[
  {"name": "John", "parent": false},
  {"name": "Chris", "parent": true}
]`)
pp(path.MustQuery("$[*] ? (@.parent == false)", arg)) // β†’ [{"name":"John","parent":false}]

null β†’ value

JSON constant null (note that, unlike in SQL, comparison to null works normally; playground):

arg := val(`[
  {"name": "Mary", "job": null},
  {"name": "Michael", "job": "driver"}
]`)
pp(path.MustQuery("$[*] ? (@.job == null) .name", arg)) // β†’ ["Mary"]

boolean && boolean β†’ boolean

Boolean AND (playground):

pp(path.MustQuery("$[*] ? (@ > 1 && @ < 5)", val(`[1, 3, 7]`))) // β†’ [3]

boolean || boolean β†’ boolean

Boolean OR (playground):

pp(path.MustQuery("$[*] ? (@ < 1 || @ > 5)", val(`[1, 3, 7]`))) // β†’ [7]

! boolean β†’ boolean

Boolean NOT (playground):

pp(path.MustQuery("$[*] ? (!(@ < 5))", val(`[1, 3, 7]`))) // β†’ [7]

boolean is unknown β†’ boolean

Tests whether a Boolean condition is unknown (playground):

pp(path.MustQuery("$[*] ? ((@ > 0) is unknown)", val(`[-1, 2, 7, "foo"]`))) // β†’ ["foo"]

string like_regex string [ flag string ] β†’ boolean

Tests whether the first operand matches the regular expression given by the second operand, optionally with modifications described by a string of flag characters (see SQL/JSON Regular Expressions; playground, playground):

arg := val(`["abc", "abd", "aBdC", "abdacb", "babc"]`)
pp(path.MustQuery(`$[*] ? (@ like_regex "^ab.*c")`, arg))          // β†’ ["abc","abdacb"]
pp(path.MustQuery(`$[*] ? (@ like_regex "^ab.*c" flag "i")`, arg)) // β†’ ["abc","aBdC","abdacb"]

string starts with string β†’ boolean

Tests whether the second operand is an initial substring of the first operand (playground):

arg := val(`["John Smith", "Mary Stone", "Bob Johnson"]`)
pp(path.MustQuery(`$[*] ? (@ starts with "John")`, arg)) // β†’ ["John Smith"]

exists ( path_expression ) β†’ boolean

Tests whether a path expression matches at least one SQL/JSON item. Returns unknown if the path expression would result in an error; the second example uses this to avoid a no-such-key error in strict mode (playground, playground):

arg := val(`{"x": [1, 2], "y": [2, 4]}`)
pp(path.MustQuery("strict $.* ? (exists (@ ? (@[*] > 2)))", arg))              // β†’ [[2,4]]
pp(path.MustQuery("strict $ ? (exists (@.name)) .name", val(`{"value": 42}`))) // β†’ []

SQL/JSON Regular Expressions

SQL/JSON path expressions allow matching text to a regular expression with the like_regex filter. For example, the following SQL/JSON path query would case-insensitively match all strings in an array that start with an English vowel:

$[*] ? (@ like_regex "^[aeiou]" flag "i")

The optional flag string may include one or more of the characters i for case-insensitive match, m to allow ^ and $ to match at newlines, s to allow . to match a newline, and q to quote the whole pattern (reducing the behavior to a simple substring match).

The SQL/JSON standard borrows its definition for regular expressions from the LIKE_REGEX operator, which in turn uses the XQuery standard. The path package follows the example of PostgreSQL, using the regexp package to implement like_regex. This leads to various minor discrepancies from standard SQL/JSON behavior, which are cataloged in Differences From SQL Standard And XQuery. Note, however, that the flag-letter incompatibilities described there do not apply to SQL/JSON, as it translates the XQuery flag letters to match what the regexp package expects.

There are also variations between PostgreSQL regular expression syntax and go regular expression syntax, cataloged below.

Keep in mind that the pattern argument of like_regex is a JSON path string literal, written according to the rules given above. This means in particular that any backslashes in the regular expression must be doubled in double-quoted strings. For example, to match string values of the root document that contain only digits:

p := path.MustParse("$.* ?(@ like_regex \"^\\\\d+$\")")
pp(p.MustQuery(context.Background(), val(`{"x": "42", "y": "no"}`))) // β†’ ["42"]

This doubling upon doubling is required to escape backslashes once for go parsing and a second time for JSON path string parsing.

We therefore recommend using raw string literals (backtick strings) to compose path expressions with double quotes or backslashes, both of which are common in like_regex expressions. Raw strings require double backslashes in regular expressions only once, for the path string parsing (playground):

p := path.MustParse(`$.* ?(@ like_regex "^\\d+$")`)
pp(p.MustQuery(context.Background(), val(`{"x": "42", "y": "no"}`))) // β†’ ["42"]

Compatibility

As a direct port from the Postgres source, the path package strives to maintain the highest level of compatibility. Still, there remain some unavoidable differences and to-dos. These include:

  • Numbers. The Postgres JSONB type implements numbers as arbitrary precision numbers. This contrasts with Go JSON parsing, which by default parses numbers into float64 values. Decimal numbers outside the range of float64 are not supported and will trigger an error. For numbers within float64 range, warnings about the precision of floating point math apply.

    For json.Numbers, however, the path package first attempts to treat them as int64 values and falls back on float64 only if all the values in an expression cannot be parsed as integers. This increases precision for integer-only expressions. We therefore recommend parsing JSON with json.Decoder.UseNumber.

    This incompatibility may be addressed in the future, perhaps by using decimal for all numeric operations.

  • datetime(template). The datetime() method has been implemented, but datetime(template) has not. Use of the template parameter will raise an error. This issue will likely be addressed in a future release.

  • Date and time parsing. The path package relies uses the time packages's layouts to parse values in the datetime methods (datetime(), timestamp(), timestamp_tz(), etc.). These layouts are stricter about the formats they'll parse than Postgres date/time formatting.

    As a result, some values parsed by the Postgres datetime methods will not be parsed by this package. Examples include values with extra spaces between the time and time zone, and missing leading zeros on the day and month.

    This issue will likely be addressed when the datetime(template) method is implemented, as it will require adopting the full Postgres date/time formatting language.

  • Time zones. Postgres operates on time and time values in the context of the time zone defined by the TimeZone GUC or the server's system time zone. The path package does not rely on such global configuration. It instead uses the time zone configured in the context passed by the path queries (playground), and defaults to UTC if it's not set or included in the value (playground):

    p := path.MustParse("$.timestamp_tz()")
    arg := "2023-08-15 12:34:56"
    pp(p.MustQuery(context.Background(), arg, exec.WithTZ())) // β†’ ["2023-08-15T12:34:56+00:00"]
    
    // Add a time zone to the context.
    tz, err := time.LoadLocation("America/New_York")
    if err != nil {
    	log.Fatal(err)
    }
    ctx := types.ContextWithTZ(context.Background(), tz)
    
    // The output will now be in the custom time zone.
    pp(p.MustQuery(ctx, arg, exec.WithTZ())) // β†’ ["2023-08-15T12:34:56-04:00"]
    
  • Regular expressions. Whereas the Postgres implementation of the like_regex expression relies on its POSIX regular expression engine, the Go version relies on the regexp package. We have attempted to configure things for full compatibility with the Postgres implementation (including the same diversions from XQuery regular expressions), but some variation is likely.

    Notably, a number of escapes and character classes vary:

    EscapePostgresSQLGo
    \aalert (bell) characteralert (bell) character
    \Aat beginning of textat beginning of text
    \bbackspaceat ASCII word boundary
    \Bsynonym for backslash (\)not at ASCII word boundary
    \cXlow-order 5 bits comparisonN/A
    \ddigitdigit
    \Dnon-digitnon-digit
    \eESC or octal 033N/A
    \fform feedform feed
    \mbeginning of a wordN/A
    \Mend of a wordN/A
    \nnewlinenewline
    \Q...\EN/Aliteral ...
    \rcarriage returncarriage return
    \swhitespace characterwhitespace character
    \Snon-whitespace characternon-whitespace character
    \thorizontal tabhorizontal tab
    \uwxyzcharacter with hex value 0xwxyzN/A (see \x{})
    \Ustuvwxyzcharacter with hex value 0xstuvwxyzN/A (see \x{})
    \vvertical tabvertical tab
    \wword characterword character
    \Wnon-word characternon-word character
    \xhhhcharacter with hex value 0xhhhcharacter with hex value 0xhhh
    \xycharacter with octal value 0xyN/A
    \x{10FFFF}N/A (see \U)hex character code
    \ybeginning or end of a wordN/A (see \b)
    \Ynot the beginning or end of a wordN/A (see \B)
    \zN/A (see \Z)end of text
    \Zend of textN/A (see \z)
    \0the null byteN/A
    \*literal punctuation character *literal * punctuation character *
  • Identifiers. Postgres jsonpath parsing is quite liberal in what it allows in unquoted identifiers. The allowed characters are defined by the ECMAScript standard are stricter, and this package hews closer to the standard.

    The upshot is that expressions allowed by Postgres, such as x.πŸŽ‰, are better written as x."πŸŽ‰" for compatibility with the standard and to work with both this package and Postgres.

  • keyvalue() IDs. Postgres creates IDs for the output of the keyvalue() method by comparing memory addresses between JSONB values. This works well for JSONB because it has a highly-structured, well-ordered layout. The path package follows this pattern.

    However, The addresses of nested map[string]any and []any values in Go are less stable. Ids will therefore sometimes vary between executions β€” especially for slices. However, the IDs determined for a single object or array should be stable through repeated query executions and calls to keyvalue().

Copyright

Copyright Β© 1996-2024 The PostgreSQL Global Development Group

Copyright Β© 2024 David E. Wheeler

# Packages

Package ast provides an abstract syntax tree for SQL/JSON paths.
Package exec provides the routines for SQL/JSON path execution.
Package parser parses SQL/JSON paths.
No description provided by the author
Package types provides PostgresSQL-compatible data types for SQL/JSON Path execution.

# Functions

MustParse is like Parse but panics on parse failure.
MustQuery is syntax sugar for MustParse(path).MustQuery(context.Background(), json).
New creates and returns a new Path query defined by ast.
Parse parses path and returns the resulting Path.

# Variables

ErrPath wraps parsing and execution errors.
ErrScan wraps scanning errors.

# Structs

Path provides SQL/JSON Path operations.