SLParser

An LARL parser generator (Work in Progress) -- DONE JUST FOR FUN; Not meant to be official or anything.

Table of Contents
Introduction

Introduction

In order to use the SLParser, the desired grammar must be specified in two separate files with two distinct grammars and formats: one for the lexer and the other for the parser.

EbnfParser

A parser that parses a modified version of the EBNF grammar.

Table of Contents
Introduction

Introduction

Language Specification

Lexical Elements

Here's a description of the various lexical elements that appears in the grammar.

Identifiers

An identifier is defined as a sequence of one or more letters of the English alphabet (i.e., from 'a' to 'z') that, optionally, can end with a sequence of one or more decimal digits (i.e., from '0' to '9').

This grammar distinguishes between two kinds of identifiers: uppercase and lowercase. A lowercase identifier is a type of identifies whose letters can only be lowercase letters while an uppercase identifier is a type of identifier whose letters can either be uppercase or lowercase letters. Finally, lowercase identifiers can use a single underscore character (_) to separate words.

For example, foo and f are valid lowercase identifiers while Foo, FooBar, and F are valid uppercase identifiers. On the contrary, fo o, 1bar, foo__bar, fooBar, and so on are not valid identifiers.

Symbols

A symbol is a special character that appears in the grammar.

Punctuation	Name	Description
`.`	dot	specifies the end of a rule.

Brackets	Name	Description
`(`, `)`	parentheses	specifies the start and end of a sub OR rule.

Operators	Name	Description
`=`	equal	separates the lhs from the rhs.
`\|`	pipe	exclusive or.

Whitespace	Name	Description
`\r\n`, `\n`	newline	separates multiple rules and/or lines.
`\t`	tab	indentation.
	ws	separates elements from each other.

Spaces and tabs are ignored in the grammar and so, stuff like a b and a b are equivalent.

Source

Overview

In this context, the term "source" refers to the file containing the EBNF grammar.

Syntax

Here's the syntax of the source file:

Source = Rule { "\n" Rule } EOF .

Where:

Rule refer to the rules of the grammar.
EOF is a special symbol that indicates the end of the file. Thus, outside of the rules, nothing else is allowed.

In essence, a source file is a sequence of one or more rules (each of which is separated by one or more newline characters (\n)) that are read until the end of the file.

Rule

Overview

A rule is the core of any grammar and it is used to describe how the grammar should be parsed.

Syntax

Here's the syntax of a rule:

Rule     = SlRule | MlRule .
SlRule   = uppercase_id "=" RhsCls "." .
MlRule   = uppercase_id "\n" LineRule "\n." .
LineRule = "=" RhsCls { "\n| "RhsCls } .
RhsCls   = Rhs { Rhs } .

Where:

uppercase_id refers to an uppercase identifier.
Rhs refers to the right-hand side of the rule.

In essence, a rule can either be a single-line rule or a multi-line rule. If it is a single-line rule, then the uppercase identifier is followed by an equal sign (=) and the right-hand side clause followed by the dot (.). On the other hand, if it is a multi-line rule, then the uppercase identifier is followed by a sequence of one or more right-hand side clauses preceded by a pipe (|). Each line is indented one level and the first one is the only one that stats with an equal sign (=) rather than a pipe. Finally, the dot (.) is written in a newline and indented one level as well.

Examples

Here are some examples of valid rules:

Color
   = red
   | green
   | blue
   .

This rule states that a color can either be "red", "green", or "blue".

Person = name age .

This rule states that a person has a name followed by an age.

Right-hand Side

Overview

A right-hand side is the unit of the grammar and it specifies the individual atoms/units that make up a rule.

Syntax

Here's the syntax of a right-hand side:

Rhs        = Identifier | OrGroup .
Identifier = uppercase_id | lowercase_id .
OrGroup    = "(" OrExpr ")" .
OrExpr     = Identifier "|" Identifier { "|" Identifier } .

In essence, a right-hand side can either be an identifier or an OR group. An identifier is any lowercase or uppercase word while, an OR group is an OR expression that is surrounded by parentheses (( and )). Finally, an OR expression is a sequence of two or more identifiers separated by a pipe (|).

Parsing

Full Grammar

equal = "=" .
dot = "." .
pipe = "|" .
newline = [ "\r" ] "\n" { [ "\r" ] "\n" } .
ws = " " | "\t" . -> skip
op_paren = "(" .
cl_paren = ")" .

uppercase_id = uppercase_word { uppercase_word } { digit } .
lowercase_id = lowercase_word { digit } .

fragment lowercase_word = "a".."z" { "a".."z" } . 
fragment uppercase_word = "A".."Z" { "a".."z" } .
fragment digit = "0".."9" .

Source = Source1 EOF .
Source1 = Rule .
Source1 = Rule newline Source1 .

Rule = uppercase_id equal RhsCls dot .
Rule = uppercase_id newline equal RhsCls RuleLine .
RuleLine = newline pipe RhsCls RuleLine .
RuleLine = newline dot  .

RhsCls = Rhs .
RhsCls = Rhs RhsCls .

Rhs = Identifier .
Rhs = op_paren OrExpr cl_paren .

OrExpr = Identifier pipe Identifier .
OrExpr = Identifier pipe OrExpr .

Identifier = uppercase_id .
Identifier = lowercase_id .

Lexer Grammar

equal = "=" .
dot = "." .
newline = [ "\r" ] "\n" .
quote = "\"" .
backslash = "\\" .
pipe = "|" .
tab = "\t" .
ws = " " .
right_arrow = "->" .
skip = "skip" .
range = ".." .

id = "a".."z" { "a".."z" } .

// \u0000..\u0021, "\"", \u0023..\u005B, "\\", \u005D..\uFFFF
char
   = \u0000..\u0021 | \u0023..\u005B | \u005D..\uFFFF
   | backslash ( quote | backslash )
   .

Source = Rule { newline { newline } Rule }  EOF .

Rule = id ws equal ws Rhs ws dot [ SkipCls ] .

Rule = id newline tab equal ws Rhs { newline tab pipe ws Rhs } newline tab dot [ SkipCls ] .

Rhs
   = quote char quote
   | Range
   .

Range
   = quote char quote range quote char quote
   .

SkipCls
   = right_arrow skip
   .

` equal = "=" . dot = "." . pipe = "|" . newline = [ "\r" ] "\n" . tab = "\t" . ws = " " . -> skip op_paren = "(" . cl_paren = ")" .

uppercase_id = "A".."Z" { "a".."z" } . lowercase_id = "a".."z" { "a".."z" } .

# Packages

# README

SLParser

Table of Contents

Introduction

EbnfParser

Table of Contents

Introduction

Language Specification

Lexical Elements

Source

Rule

Right-hand Side

Parsing

Lexer Grammar

Parser Grammar