Categorygithub.com/prataprc/goparsec
modulepackage
0.0.0-20211219142520-daac0e635e7e
Repository: https://github.com/prataprc/goparsec.git
Documentation: pkg.go.dev

# README

Parser combinator library in Golang

Build Status Coverage Status GoDoc Sourcegraph Go Report Card

A library to construct top-down recursive backtracking parsers using parser-combinators. Before proceeding you might want to take at peep at theory of parser combinators. As for this package, it provides:

  • A standard set of combinators.
  • Regular expression based simple-scanner.
  • Standard set of tokenizers based on the simple-scanner.

To construct syntax-trees based on detailed grammar try with AST struct

  • Standard set of combinators are exported as methods to AST.
  • Generate dot-graph EG: dotfile for html.
  • Pretty print on the console.
  • Make debugging easier.

NOTE that AST object is a recent development and expect user to adapt to newer versions

Quick links

Combinators

Every combinator should confirm to the following signature,

    // ParsecNode type defines a node in the AST
    type ParsecNode interface{}

    // Parser function parses input text, higher order parsers are
    // constructed using combinators.
    type Parser func(Scanner) (ParsecNode, Scanner)

    // Nodify callback function to construct custom ParsecNode.
    type Nodify func([]ParsecNode) ParsecNode

Combinators take a variable number of parser functions and return a new parser function.

Using the builtin scanner

Builtin scanner library manages the input buffer and implements a cursor into the buffer. Create a new scanner instance,

    s := parsec.NewScanner(text)

The scanner library supplies method like Match(pattern), SkipAny(pattern) and Endof(), refer to for more information on each of these methods.

Panics and Recovery

Panics are to be expected when APIs are misused. Programmers might choose to ignore errors, but not panics. For example:

  • Kleene and Many combinators take one or two parsers as arguments. Less than one or more than two will throw a panic.
  • ManyUntil combinator take two or three parsers as arguments. Less than two or more than three will throw a panic.
  • Combinators accept Parser function or pointer to Parser function. Anything else will panic.
  • When using invalid regular expression to match a token.

Examples

  • expr/expr.go, implements a parsec grammar to parse arithmetic expressions.
  • json/json.go, implements a parsec grammar to parse JSON document.

Clone the repository run the benchmark suite

    $ cd expr/
    $ go test -test.bench=. -test.benchmem=true
    $ cd json/
    $ go test -test.bench=. -test.benchmem=true

To run the example program,

    # to parse expression
    $ go run tools/parsec/parsec.go -expr "10 + 29"

    # to parse JSON string
    $ go run tools/parsec/parsec.go -json '{ "key1" : [10, "hello", true, null, false] }'

Projects using goparsec

  • Monster, production system in golang.
  • GoLedger, ledger re-write in golang.

If your project is using goparsec you can raise an issue to list them under this section.

Articles

How to contribute

Issue Stats Issue Stats

  • Pick an issue, or create an new issue. Provide adequate documentation for the issue.
  • Assign the issue or get it assigned.
  • Work on the code, once finished, raise a pull request.
  • Goparsec is written in golang, hence expected to follow the global guidelines for writing go programs.
  • If the changeset is more than few lines, please generate a report card.
  • As of now, branch master is the development branch.

# Packages

No description provided by the author
No description provided by the author
Package json provide a parser to parse JSON string.
No description provided by the author

# Functions

And combinator accepts a list of `Parser`, or reference to a parser, that must match the input string, atleast until the last Parser argument.
Atom is similar to Token, takes a string to match with input byte-by-byte.
AtomExact is similar to Atom(), but string will be matched without skipping leading whitespace.
Char return parser function to match a single character in the input stream.
End is a parser function to detect end of scanner output, return boolean as ParseNode, hence incompatible with AST{}.
Float return parser function to match a float literal in the input stream.
Hex return parser function to match a hexadecimal literal in the input stream.
Ident return parser function to match an identifier token in the input stream, an identifier is matched with the following pattern: `^[A-Za-z][0-9a-zA-Z_]*`.
Int return parser function to match an integer literal in the input stream.
Kleene combinator accepts two parsers, or reference to parsers, namely opScan and sepScan, where opScan parser will be used to match input string and contruct ParsecNode, and sepScan parser will be used to match input string and ignore the matched string.
Many combinator accepts two parsers, or reference to parsers, namely opScan and sepScan, where opScan parser will be used to match input string and contruct ParsecNode, and sepScan parser will be used to match input string and ignore the matched string.
ManyUntil combinator accepts three parsers, or references to parsers, namely opScan, sepScan and untilScan, where opScan parser will be used to match input string and contruct ParsecNode, and sepScan parser will be used to match input string and ignore the matched string.
Maybe combinator accepts a single parser, or reference to a parser, and tries to match the input stream with it.
NewAST return a new instance of AST, maxnodes is size of internal buffer pool of nodes, it is directly proportional to number of nodes that you expect in the syntax-tree.
NewNonTerminal create and return a new NonTerminal instance.
NewScanner create and return a new instance of SimpleScanner object.
NewTerminal create a new Terminal instance.
NoEnd is a parser function to detect not-an-end of scanner output, return boolean as ParsecNode, hence incompatible with AST{}.
Oct return parser function to match an octal number literal in the input stream.
OrdChoice combinator accepts a list of `Parser`, or reference to a parser, where atleast one of the parser must match the input string.
OrdTokens to parse a single token based on one of the specified `patterns`.
String parse double quoted string in input text, this parser returns string type as ParsecNode, hence incompatible with AST combinators.
Token takes a regular-expression pattern and return a parser that will match input stream with supplied pattern.
TokenExact same as Token() but pattern will be matched without skipping leading whitespace.

# Structs

AST to parse and construct Abstract Syntax Tree whose nodes confirm to `Queryable` interface, facilitating tree processing algorithms.
NonTerminal will be used by AST methods to construct intermediate nodes.
SimpleScanner implements Scanner interface based on golang's regexp module.
Terminal type can be used to construct a terminal ParsecNode.

# Interfaces

ParsecNode for parsers return input text as parsed nodes.
Queryable interface to be implemented by all nodes, both terminal and non-terminal nodes constructed using AST object.
Scanner interface defines necessary methods to match the input stream.

# Type aliases

ASTNodify callback function to construct custom Queryable.
MaybeNone is a placeholder type, similar to Terminal type, used by Maybe combinator if parser does not match the input text.
Nodify callback function to construct custom ParsecNode.
Parser function parses input text encapsulated by Scanner, higher order parsers are constructed using combinators.