# README
XPath Interpreter for YANG
Introduction
'xpath' and subdirectories contain an XPath parser / interpreter, intended primarily for use with YANG
It is inspired by the HOC (high order calculator) program in The Unix Programming Environment by Kernighan and Pike which implements a parser that creates an intermediate machine, albeit in C.
Reference should also be made to the 'expr' YACC-based calculator in the Go language source distribution which shows how to implement YACC in Go. This provided a useful reference point for getting started with YACC in Go!
Overview of functionality
Processing of YANG statements containing XPATH syntax can be split into the following parts:
- grammars (EBNF description of valid syntax)
- lexing (breaking up XPATH statement into tokens)
- parsing (verifying XPATH statement complies with expected grammar)
- creation of runnable machine (series of instructions that implement the XPATH statement)
- running the machine
Lexing and parsing are interlinked as they both stem from the grammar that defines what tokens are expected and what syntax is valid. Separately, parsing and creation of the runnable machine overlap, as the machine is constructed as the XPATH statement is parsed,
Each of these parts is described in the following sections.
Grammars
Multiple grammars are used for YANG XPATH statements, as each uses a different subset of XPATH. Each one is a representation of the EBNF description from either the XPATH RFC, or the YANG RFC converted to a format compatible with YACC. Specifically, this requires work for repetitive constructs such as '1*' and '+'.
Note on grammar file format
-
Some tokens have been given their own productions (eg '/', '//') so that it is possible to encode them in the sequence they are presented as separate entities to other path components before and after them when included in a group of items matching a production.
-
Actions are obviously not specified in the XPATH spec. They are described in detail in the section on creating machines.
-
While action code doesn't need trailing ';', being Go code, it makes auto-formatting with Emacs Bison-mode work properly if you add them. Similarly, unlike many YACC examples, NO other code is present in any of the '.y' files as you can't (easily) run gofmt on it. Instead all Go code other than the immediate actions to be taken for each production is in separate .go files.
when and must - grammars/expr/xpath.y
This is a rendering of the EBNF grammar in the XPATH 1.0 spec, converted to YACC format. It does not implement variables, and adds an extra current() function to the base functions in XPATH. It is used for 'when' and 'must' statements.
The YACC grammar is as close to the document spec as possible, though the order has been changed so all the Path-related productions come after Expr, which is the sensible starting point.
leafref / path - grammars/leafref/leafref.y
This is the implementation of the 'path-arg' EBNF in section 12 of RFC 6020 that is used by the 'path' statement (a substatement of 'leafref').
It's essentially a very cut-down version of full XPATH that just deals with paths (with or without predicates indicating a key/value pair).
instance-identifier - grammars/instid/instid.y
(Not implemented at time of writing ...)
This is the implementation of the 'instance-identifier' EBNF in section 12 of RFC 6020 that is used by the 'instance-identifier' statement in YANG.
It is a similar subset of XPATH to leafref/path, but predicates may specify either position or a key/value pair.
Lexer
Lexing functionality is split into the CommonLex(er) object that performs the default lexing of all tokens required by each of the various supported grammars, and specific Lexers for each individual grammar where functionality diverges from the common base.
CommonLex
Files:
- lexer.go
- xutils/tokens.go
- grammars/lexertest/lexer_test.go
(These could probably sensibly be moved (first 2) to 'grammars', and possibly this directory could be renamed as lexer.)
This is a pretty straightforward lexer that implements lexing of all the tokens specified in XPATH 1.0 (and associated docs eg for QNames). Specifically it takes care of the disambiguation rules in XPATH 1.0 Section 3.7. Note that token values (if not the direct ASCII value) are defined in tokens.go, and each lexer has mappings to/from its value for each token (these are auto-generated by YACC and so it is not possible to have a commmon set of values).
It is designed with specialisation in mind, and the CommonLex object and XpathLexer interface allow for this.
lexer_test.go provides generic test support functions. Actual testing is farmed out to the specific grammars for 2 reasons:
-
Some tokens / processing is grammar-specific
-
Initial implementation went hand-in-hand with the 'xpath.y' grammar, which covers almost all the functionality anyway, so rather than break up that test, expr_lexer_test.go in the grammars/expr directory covers essentially all the generic testing needed.
grammars/expr
Files:
- expr_lexer.go
- expr_lexer_test.go
As 'expr' is the original implementation, and most complete, subset of XPATH, the tests here cover the generic lexer code as well. The only functionality specific to this lexer is really just the token mapping and wrapper around machine creation.
grammars/leafref
Files:
- leafref_lexer.go
- leafref_lexer_test.go
Leafref uses a small subset of the full XPATH grammar, and we need to override a few of the CommonLex functions:
- LexName as we need to deal with only allowing 'current' as a function (no others supported), and not allowing names to be prefixed with 'XML' (any case).
- LexNum: no standalone numbers allowed (only as part of names)
- LexDot: only DOTDOT, not '.'
- LexPunctuation: restrict punctuation symbols allowed
Parser and Machine Creation
Due to the magic of YACC, there's really very litte explicit parsing code. What we add are the 'action' functions called when a production in the grammar is matched.
Files:
- machine.go
- symbol[_test].go
- various parser_tests.go files
- program[_test].go
- inst.go
- path_elem.go
- datum[_test].go
- xutils/path_type.go
Objects:
- inst
- stackable interface [datum / path_elem implement this]
- Machine
- symbol
- ProgramBuilder
Parser Actions
All the parser actions create functions that can be run later to implement the XPATH statement. All these functions may only take a Context object as their parameter when run, so they are generated as closures that contain the relevant extra data needed. The functions are all stored in a ProgramBuilder object that contains an Inst object per function.
When run, each function may push data to the stack, pop one or more data elements from the stack, or both. Each element on the stack is of the 'stackable' interface, being either a datum or path_elem concrete type. They take one of two forms in the YACC files.
Either they are a direct function call that typically takes the token value, generates a closure that encapsulates that value, and then call CodeFn with the generated function:
getProgBldr(leafreflex).CodePathOper(xutils.DOTDOT);
... or they cause a function that operates off stack data alone to be added to the machine in which case we can pass the function directly to CodeFn:
getProgBldr(leafreflex).CodeFn(
getProgBldr(leafreflex).LRefEquals, "lrefEquals");
Note the use of the getProgBldr() function. This is used to access the ProgramBuilder object in the lexer in a way that allows the parser to be re-entrant.
The following sections describe each group of functions:
Push data to stack
These are all called directly as they generate closures to push the data element in question onto the stack at the appropriate point in the generated machine's execution:
- CodeLiteral (strings contained in quotes)
- CodeNum (numbers)
- CodePathOper ('/', '.', '..' etc)
- CodeNameTest (names that are part of paths)
Basic Operations
These are all simple operations that operate off data on the stack:
- Or / And / Eq / Ne / LT / LE / GE / GT
- add / sub / mul / div / mod
- negate
- union
Built-in Functions
All the core functions in XPATH 1.0 are defined in symbol.go. This file effectively has 2 parts - creation of the symbol table so the parser can identify valid function names, and the function definitions which are used when executing the machine. Additionally there is the 'current()' function which is unique to YANG.
- built-ins (XPATH core function set)
- current()
Custom functions
These functions perform more complex data manipulation operations which are required to calculate intermediate or final results. Several make use of various fields in the Context structure to control their behaviour:
- pathOperPushes: each NameTest and PathElem object pushed to the stack increment this counter
- stackedNodesets: see description of EvalLocPath() below
Function | Description |
---|---|
Store | Used to transfer the final result off the stack into the Context structure for later retrieval. Note that it is an error if the stack is not empty following this operation. |
EvalLocPath | This takes a set of stacked path operations and generates a nodeset that is pushed onto the stack. In the absence of a 'stackedNodeset', the starting point is the context node. |
FilterExprEnd | Checks that the top of the stack contains a nodeset, and increments so that subsequent path operations know to use the nodeset as a starting point for createNodeset() to work on. Note that this function is only used for compound filter expressions, ie those that form part of a longer sub-expression. Simple filter expressions followed by whitespace or operators should not call this function. |
CodePredStart | Creates a new 'sub' program in which to construct the machine to implement the predicate |
CodePredEnd | Takes the sub-program and bundles it up into a closure that is embedded into the parent program (which now becomes the current program under construction again.) |
LRefPredStart | Maps directly to EvalLocPath, but to aid debugging, it is useful to see it called out as a Leafref predicate start operation. |
LRefPredEnd | This function evaluates a leafref predicate. When run, the stack will contain (a) Nodeset: set of nodes indicated by path up to '[' at start of predicate, (b) Key: key name to filter nodeset on, (c) Path: set of path operations that should resolve to a single leaf instance to give the value for the key. The function first converts (c) to the value by calling EvalLocPath(), performing various sanity checks, and extracting the value. Next, it calls FilterNodeset() to apply the key/value filter to it. Finally, the result is pushed to the stack and 'stackedNodesets' is incremented. This means that the next EvalLocPath() call will pick up this nodeset as its starting point, so if the predicate is followed by further path operations, those will start with this nodeset, not the current context node. Note that we will always consume the stackedNodeset before we next process a key/value pair (LRefPredStart will do this) and so we are guaranteed that when we evaluate the value, we will be starting from the current context node. This means we don't have to explicitly stack '.' / 'current()'. |
LRefEquals | Used to reset the pathOperPushes count when we encounter the '=' sign in a LeafRef/Path predicate. This allows the preceding key name to be stacked as a NameTest rather than have special handling to avoid pathOperPushes being incremented. Alternatively you could have special handling for that and avoid this operation, but when debugging it's quite nice to be able to explicitly see the '=' operation called out. |
Running the Machine
Files:
- datum[_test].go
- machine[_test].go
- parser_test.go
- symbol[_test].go
- xpath_node.go
- xpath[_node]_test.go
Objects:
- Inst
- Context
- Stackable (datum / pathElem)
- Machine
- Result
To run a machine, you need to provide the current 'context' aka location in the config tree. This allows the machine to know at what point in the config tree it is being run, so that relative path operations make sense.
The context provided to the machine.Execute() function must comply with the XPathNode interface which has navigation and value functions. It is stored in an internal context structure which contains a stack (consisting of Stackable objects) for the current invocation of the machine. As each instruction is run, the stack grows and shrinks. It is possible to dump the machine to screen when testing, and also dump the sequence of instructions and current stack at each stage to the configd log when COMMIT_DEBUG is set.
If the machine fails to run, a panic is raised by the context object which is caught by machine.Execute() and converted into an error. This avoids the need for endless cleaning up on error - we can pop straight up to report the error.
If the machine runs to completion, a Result object is returned. This holds the result as a datum object of the type of the result. There are methods on Result to convert this to any of the 4 basic XPATH types, as required.
An aside on XPATH Types and type-checking
XPATH has 4 native types - nodesets, boolean, number and literal (aka string). Each core function may take one or more type, or may take an object indicating it can take any of the 4 types. To avoid a production for every single function, all the core functions have the same prototype which takes a slice of datum objects (datum represents stack items and more generally data items).
A datum object can represent any of the 4 types (and various other data types).
Type-checking is done via various mechanisms, separately for arguments and return types. When symbols are registered, they register argument and return types. When running a machine, there is an option to validate it, which we do in UT only. This means that every time a function is called, we compare registered argument types with those specified in the function definitions. We also verify the return type is as expected. This isn't foolproof as both sets of specifications could be wrong, but at least the arg types are defined within the function body.
Another aside, this time on Nodesets
XPATH expects a specific relationship between YANG nodes when performing operations such as Parent / Child that doesn't match our internal representation. Specifically there are 2 main differences:
- Lists: XPATH expects an array of list entries (each containing all key values), with children for each element including key / tag nodes. We model this as an array of list entries with the single key we allow, with children only for non-tagnode elements.
- Leaves/LeafLists: XPATH expects an array of node(s) with values. We model a top level LeafEntry / LeafListEntry, with child(ren) LeafValue nodes.
The XPATH implementation assumes data is presented according to the XPATH model. It is up to the specific implementation of XpathNode to hide any divergence from us!
To test our implementation, xpath_node_test.go implements a representation of the config tree without needing to reference schema etc. Separately, Diff.Node implements the version we need to actually use XPath, and has test code that verifies it behaves as expected.
Building XPATH code
You need to tell go to explicitly run 'go yacc'. You do this by first embedding the instructions into a file (I chose lexer.go): //go:generate -command yacc go tool yacc //go:generate yacc -o xpath.go -p "expr" xpath.y
... and then calling 'go generate '. Hence the go generate call in 'rules'. I also added the new unit tests to the mix.
Debugging XPATH code
Production Code
At a high level, you will get error messages if an XPATH statement cannot be compiled, explaining the problem, whether it is due to as-yet-unsupported functionality or a syntax error, and an '[X]' to mark roughly where in the expression the problem lies.
Assuming therefore that you have got past that stage, debugging is related either to a machine failing to run, or due to an unexpected result.
When a machine fails to run to completion, a panic is generated and the error will be reported. This should be close to impossible to happen as if a statement compiles, then it would be expected to run to completion. However, if it fails, need to clarify how this is reported
TBC
If you get an unexpected result, then you need to enable COMMIT_DEBUG and look at the log. You will get a full dump of each instruction and current stack state. This is noisy as right now there is no filtering, so use with caution.
Code under development (internal debug)
If you need to debug code under development, then a different approach is needed. First, you should only run the single test that is failing, using '-run TestName'. This reduces the noise on the console. Secondly, if your test is passing when it should fail, add a t.Fatalf() call as you will only get fmt.Printf() output if a test fails - otherwise it gets suppressed.
You now need to enable debug on the machine when it is running. For now, this takes a bit of hackery as you need to find the test function (eg CheckNodeSetResult) that is executing the machine and change the 'false' param marked as '/* debug */' to 'true'. You will then get a dump of the machine that you can work through to see where the machine operation diverges from what is expected.