UNNEST

函数说明

UNNEST是一个表函数，出现在 SQL 的 from 子句中，用于将 json[object|array]类型的数据展开为多行,每行包含json中的一个元素.

语法结构

> UNNEST(src[, path[, outer]])

参数	说明	类型
src	必要参数，待展开的数据源	类型可以是 json 列或 json 字符串
path	可选参数，指明待展开数据源的具体 json 路径。默认为"$",展开整个 json 数据	path 字符串
outer	可选参数，如果数据源展开后结果行为 0，是否加上一个空行作为标记。默认为 false	bool 类型

返回表结构

字段名	类型	说明
col	varchar	数据源的名称。如果数据源是列，则是列名；如果数据源是 json 字符串，则是"UNNEST_DEFAULT"
seq	int32	数据源中元素的索引，从 0 开始
key	varchar	展开元素的键名，如果父级元素是数组，则为 null
path	varchar	展开元素的在数据源中的路径
index	int32	展开元素在父级元素中的索引，如果数据源是对象，则为 null
value	varchar	展开元素的值
this	varchar	展开元素的父级元素值

示例

> select *
> from unnest('{"a":1,"b":2,"c":3}') as u;
+----------------+------+------+------+-------+-------+--------------------------+
| col            | seq  | key  | path | index | value | this                     |
+----------------+------+------+------+-------+-------+--------------------------+
| UNNEST_DEFAULT |    0 | a    | $.a  |  NULL | 1     | {"a": 1, "b": 2, "c": 3} |
| UNNEST_DEFAULT |    0 | b    | $.b  |  NULL | 2     | {"a": 1, "b": 2, "c": 3} |
| UNNEST_DEFAULT |    0 | c    | $.c  |  NULL | 3     | {"a": 1, "b": 2, "c": 3} |
+----------------+------+------+------+-------+-------+--------------------------+

> select *
> from unnest('{"a":1,"b":2,"c":3}') as u
> where u.`key` = 'b';
+----------------+------+------+------+-------+-------+--------------------------+
| col            | seq  | key  | path | index | value | this                     |
+----------------+------+------+------+-------+-------+--------------------------+
| UNNEST_DEFAULT |    0 | b    | $.b  |  NULL | 2     | {"a": 1, "b": 2, "c": 3} |
+----------------+------+------+------+-------+-------+--------------------------+

> select *
> from unnest('{"a":1,"b":2,"c":3}',"$.b") as u;
Empty set (0.01 sec)

> select *
> from unnest('{"a":1,"b":2,"c":3}',"$.b",true) as u;
+----------------+------+------+------+-------+-------+--------------------------+
| col            | seq  | key  | path | index | value | this                     |
+----------------+------+------+------+-------+-------+--------------------------+
| UNNEST_DEFAULT |    0 | NULL | $.b  |  NULL | NULL  | 2                        |

> drop table if exists t1;
> create table t1 (a json,b int);
> insert into t1
> values ('{"a":1,"b":[{"c":2,"d":3},false,4],"e":{"f":true,"g":[null,true,1.1]}}',1);
> insert into t1
> values ('[1,true,false,null,"aaa",1.1,{"t":false}]',2);
> select * from unnest(t1.a, "$.b") as u;
+------+------+------+--------+-------+------------------+------------------------------+
| col  | seq  | key  | path   | index | value            | this                         |
+------+------+------+--------+-------+------------------+------------------------------+
| a    |    0 | NULL | $.b[0] |     0 | {"c": 2, "d": 3} | [{"c": 2, "d": 3}, false, 4] |
| a    |    0 | NULL | $.b[1] |     1 | false            | [{"c": 2, "d": 3}, false, 4] |
| a    |    0 | NULL | $.b[2] |     2 | 4                | [{"c": 2, "d": 3}, false, 4] |
+------+------+------+--------+-------+------------------+------------------------------+

> select * from unnest(t1.a, "$.b[0]") as u;
+------+------+------+----------+-------+-------+------------------+
| col  | seq  | key  | path     | index | value | this             |
+------+------+------+----------+-------+-------+------------------+
| a    |    0 | c    | $.b[0].c |  NULL | 2     | {"c": 2, "d": 3} |
| a    |    0 | d    | $.b[0].d |  NULL | 3     | {"c": 2, "d": 3} |
+------+------+------+----------+-------+-------+------------------+

> select distinct(f.seq) from unnest(t1.a, "$") as f;
+-------+
| f.seq |
+-------+
|     0 |
|     1 |
+-------+

注意事项

key,index 和 value 全为 null 则代表当前行是 outer 为 true 时默认添加的空行

执行流程

数据源是 json 列 ...unnest -> project(jsonCol) -> tableScan
数据源是 json 字符串 ...unnest -> project(default) -> valueScan(parse jsonStr)

实现细节

数据源为 json 字符串

构建 plan 时将存储在tree.Unnest中的参数序列化后存储到unnestNode.TableDef.TableFunctionParam中
在unnestNode中添加valueScan节点
给valueScan.TableDef.TableFunctionParam赋值为tree.Unnest中存储的 json 字符串转化的字节切片
在编译阶段首先将valueScan的TableDef.TableFunctionParam存入scope.Datasource.Bat
在 scope 中添加vm.Unnest指令，并通过unnestNode.TableDef.TableFunctionParam构建运行参数
执行阶段通过bytejson包解析 json 字节切片，解析 path 字符串，通过 json,path, outer,filter参数调用bytejson.Unnest函数，返回UnnestResult结果集
通过makeBatch组装UnnestResult结果集为batch

数据源为 json 列

构建 plan 时将存储在tree.Unnest中的参数序列化后存储到unnestNode.TableDef.TableFunctionParam中
在unnestNode中添加tableScan节点，并根据tree.Unnest中的参数初始化tableScan的TableDef
编译阶段在 scope 中添加vm.Unnest指令，并通过unnestNode.TableDef.TableFunctionParam构建运行参数
执行阶段通过bytejson包解析由tableScan传递 bytejson 字节切片，解析 path 字符串，通过 json,path, outer,filter参数调用bytejson.Unnest 函数，返回UnnestResult结果集
通过makeBatch组装UnnestResult结果集为batch

filter参数是根据tree.Unnest中的Attrs字段构建的 string 切片，其目的是为了在bytejson.Unnest函数中过滤不需要的结果集

# README

UNNEST

函数说明

语法结构

相关参数

返回表结构

示例

注意事项

执行流程

实现细节

数据源为 json 字符串

数据源为 json 列

# Functions

# Structs