package
1.2.3
Repository: https://github.com/matrixorigin/matrixone.git
Documentation: pkg.go.dev

# README

UNNEST

函数说明

UNNEST是一个表函数,出现在 SQL 的 from 子句中,用于将 json[object|array]类型的数据展开为多行,每行包含json中的一个元素.

语法结构

> UNNEST(src[, path[, outer]])

相关参数

参数说明类型
src必要参数,待展开的数据源类型可以是 json 列或 json 字符串
path可选参数,指明待展开数据源的具体 json 路径。默认为"$",展开整个 json 数据path 字符串
outer可选参数,如果数据源展开后结果行为 0,是否加上一个空行作为标记。默认为 falsebool 类型

返回表结构

字段名类型说明
colvarchar数据源的名称。如果数据源是列,则是列名;如果数据源是 json 字符串,则是"UNNEST_DEFAULT"
seqint32数据源中元素的索引,从 0 开始
keyvarchar展开元素的键名,如果父级元素是数组,则为 null
pathvarchar展开元素的在数据源中的路径
indexint32展开元素在父级元素中的索引,如果数据源是对象,则为 null
valuevarchar展开元素的值
thisvarchar展开元素的父级元素值

示例

> select *
> from unnest('{"a":1,"b":2,"c":3}') as u;
+----------------+------+------+------+-------+-------+--------------------------+
| col            | seq  | key  | path | index | value | this                     |
+----------------+------+------+------+-------+-------+--------------------------+
| UNNEST_DEFAULT |    0 | a    | $.a  |  NULL | 1     | {"a": 1, "b": 2, "c": 3} |
| UNNEST_DEFAULT |    0 | b    | $.b  |  NULL | 2     | {"a": 1, "b": 2, "c": 3} |
| UNNEST_DEFAULT |    0 | c    | $.c  |  NULL | 3     | {"a": 1, "b": 2, "c": 3} |
+----------------+------+------+------+-------+-------+--------------------------+

> select *
> from unnest('{"a":1,"b":2,"c":3}') as u
> where u.`key` = 'b';
+----------------+------+------+------+-------+-------+--------------------------+
| col            | seq  | key  | path | index | value | this                     |
+----------------+------+------+------+-------+-------+--------------------------+
| UNNEST_DEFAULT |    0 | b    | $.b  |  NULL | 2     | {"a": 1, "b": 2, "c": 3} |
+----------------+------+------+------+-------+-------+--------------------------+

> select *
> from unnest('{"a":1,"b":2,"c":3}',"$.b") as u;
Empty set (0.01 sec)

> select *
> from unnest('{"a":1,"b":2,"c":3}',"$.b",true) as u;
+----------------+------+------+------+-------+-------+--------------------------+
| col            | seq  | key  | path | index | value | this                     |
+----------------+------+------+------+-------+-------+--------------------------+
| UNNEST_DEFAULT |    0 | NULL | $.b  |  NULL | NULL  | 2                        |

> drop table if exists t1;
> create table t1 (a json,b int);
> insert into t1
> values ('{"a":1,"b":[{"c":2,"d":3},false,4],"e":{"f":true,"g":[null,true,1.1]}}',1);
> insert into t1
> values ('[1,true,false,null,"aaa",1.1,{"t":false}]',2);
> select * from unnest(t1.a, "$.b") as u;
+------+------+------+--------+-------+------------------+------------------------------+
| col  | seq  | key  | path   | index | value            | this                         |
+------+------+------+--------+-------+------------------+------------------------------+
| a    |    0 | NULL | $.b[0] |     0 | {"c": 2, "d": 3} | [{"c": 2, "d": 3}, false, 4] |
| a    |    0 | NULL | $.b[1] |     1 | false            | [{"c": 2, "d": 3}, false, 4] |
| a    |    0 | NULL | $.b[2] |     2 | 4                | [{"c": 2, "d": 3}, false, 4] |
+------+------+------+--------+-------+------------------+------------------------------+

> select * from unnest(t1.a, "$.b[0]") as u;
+------+------+------+----------+-------+-------+------------------+
| col  | seq  | key  | path     | index | value | this             |
+------+------+------+----------+-------+-------+------------------+
| a    |    0 | c    | $.b[0].c |  NULL | 2     | {"c": 2, "d": 3} |
| a    |    0 | d    | $.b[0].d |  NULL | 3     | {"c": 2, "d": 3} |
+------+------+------+----------+-------+-------+------------------+

> select distinct(f.seq) from unnest(t1.a, "$") as f;
+-------+
| f.seq |
+-------+
|     0 |
|     1 |
+-------+

注意事项

  • key,index 和 value 全为 null 则代表当前行是 outer 为 true 时默认添加的空行

执行流程

  1. 数据源是 json 列 ...unnest -> project(jsonCol) -> tableScan
  2. 数据源是 json 字符串 ...unnest -> project(default) -> valueScan(parse jsonStr)

实现细节

数据源为 json 字符串

  1. 构建 plan 时将存储在tree.Unnest中的参数序列化后存储到unnestNode.TableDef.TableFunctionParam
  2. unnestNode中添加valueScan节点
  3. valueScan.TableDef.TableFunctionParam赋值为tree.Unnest中存储的 json 字符串转化的字节切片
  4. 在编译阶段首先将valueScanTableDef.TableFunctionParam存入scope.Datasource.Bat
  5. 在 scope 中添加vm.Unnest指令,并通过unnestNode.TableDef.TableFunctionParam构建运行参数
  6. 执行阶段通过bytejson包解析 json 字节切片,解析 path 字符串,通过 json,path, outer,filter参数调用bytejson.Unnest函数,返回UnnestResult结果集
  7. 通过makeBatch组装UnnestResult结果集为batch

数据源为 json 列

  1. 构建 plan 时将存储在tree.Unnest中的参数序列化后存储到unnestNode.TableDef.TableFunctionParam
  2. unnestNode中添加tableScan节点,并根据tree.Unnest中的参数初始化tableScanTableDef
  3. 编译阶段在 scope 中添加vm.Unnest指令,并通过unnestNode.TableDef.TableFunctionParam构建运行参数
  4. 执行阶段通过bytejson包解析由tableScan传递 bytejson 字节切片,解析 path 字符串,通过 json,path, outer,filter参数调用bytejson.Unnest 函数,返回UnnestResult结果集
  5. 通过makeBatch组装UnnestResult结果集为batch

filter参数是根据tree.Unnest中的Attrs字段构建的 string 切片,其目的是为了在bytejson.Unnest函数中过滤不需要的结果集

# Functions

No description provided by the author

# Structs

No description provided by the author