package
0.1.4
Repository: https://github.com/tucats/gopackages.git
Documentation: pkg.go.dev

# README

tokenizer

The tokenizer package uses the strings.Scan() package to tokenize a string buffer according to the APP language rules. These extensions to the default Scan() behavior involve special tokens, such as ">=" which would normally scan as two tokens but are grouped together as a single token.

The package also contains utility functions for navigating a cursor in the token stream to peek, read, or advance the token stream during compilation. Finally, it contains routines for testing the nature of given tokens, such as determining if a token can be used as a symbol.

Creating A Token Stream

Use the New() function to pass in a string containing the text to tokenize, and receive a pointer to a Tokenizer object. This object contains all the information about the tokenization of the string, and a cursor that can be moved through the stream.

src := "print 3+5*2"
tokens := tokenizer.New(src)

The resulting tokens object can be used to scan through the tokens, read the token strings, detc.

Reading Tokens

You can read a token explicitly which also advances the cursor one position. You can also peek ahead or behind the cursor in the token stream to see other tokens that are not the current token. You can also test the next token, and if it matches then advance the cursor and return true.

t := tokens.Next()

This reads the next token in the stream, and advances the cursor. The string representation of the token can be accessed from the token result using it's Spelling() method.

t := tokens.Peek(1)

This peeks ahead one token in advance of the cursor, and reads that token. The current token position is not changed by this operation. Values greater than zero read ahead of the current position. A value of 0 re-reads the current position (the same value returned by the last Next() call, for example). A negative value reads previously-read tokens behind the current token position.

if tokens.IsNext("print") {
    // Handle print operations
}

The IsNext() function tests the next token to see if it matches the given token value. If it does match, then the cursor advances one position and the function returns true. If the next token does not match the string, then the function returns false and the cursor is not changed. In the above example, when the conditional block runs, the "print" token will be behind the cursor, which is positioned at whatever token followed "print".

thisToken := tokenizer.NewIdentifierToken("this")
thatToken := tokenizer.NewIdentifierToken("that")

if tokens.AnyNext(thisToken, thatToken) {
    // Handle this or that stuff
}

This is very similar to IsNext() but it compares the next token to the list of token values given, and if it matches any of those tokens, the function returns true and the token cursor is advanced. Inside the body of the condition statement in this example, the caller can use tokens.Peek(0) to see what the value of the token was that matched the list.

Token Position

The token cursor is normally moved only when a Next(), IsNext(), or AnyNext() call is made. However the caller can manually manipulate the cursor position in a number of ways.

tokens.Advance(1)

The Advance() method moves the cursor by the amount given. If the value is positive, the cursor is moved to ahead. If the value is zero, the cursor is unchanged. If the value is negative, the cursor is moved back that many positions. Note that you cannot move the cursor to before the first token or after the last token.

    if tokens.AtEnd() {
        return
    }

The AtEnd() function returns true if the cursor is at the end of the token stream. The cursor is not moved by this operation.

The caller can explicit change the token position to an absolute position, or to record the current position (this is useful if the tokenizer must parse ahead through a complex set of productions before deterining that the compilation is invalid and the tokenizer should be reset to try another path).

tokens.Reset()
t := tokens.Mark()
tokens.Set(t)

The first call will reset the token position to the start of the stream. This is the same position the cursor is after the tokenizer is first created with the New() method. The Mark() method will return the current token position, with the intention that the calling processor can mark the current location to return to it later. The Set() method is used to set a previously-collected cursor position as the new current cursor position.

# Functions

InList is a support function that checks to see if a string matches any of a list of other strings.
IsSymbol is a utility function to determine if a string contains is a symbol name.
New creates a tokenizer instance and breaks the string up into an array of tokens.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Constants

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
ToTheEnd means to advance the token stream to the end.
No description provided by the author
No description provided by the author

# Variables

"+=" token.
'&" token.
"+" token.
"&" token.
"assert" token.
"=" token.
"{" token.
"}" token.
"&&" token.
"||" token.
"bool" token.
"break" token.
"byte" token.
"call" token.
"case" token.
"catch" token.
"<-" token.
"chan" token.
"clear" token.
":" token.
"," token.
"const" token.
"continue" token.
"{" token.
"}" token.
"--" token.
"default" token.
"defer" token.
":=" token.
"@" token.
"/=" token.
"/" token.
"." token.
"else" token.
"{}" token.
"{}" token.
"interface{}" token.
Empty token.
"]" token.
")"" token.
EndOfTokens is a reserved token that means end of the buffer was reached.
"==" token.
"error" token.
"exit" token.
"^" token.
ExtendedReservedWords are additional reserved words when running with language extensions enabled.
"fallthrough" token.
"false" token.
"float32" token.
"float64" token.
"for" token.
"func" token.
"go" token.
">=" token.
">" token.
"if" token.
"import" token.
"++" token.
"int32" token.
"int64" token.
"interface" token.
"int" token.
"<=" token.
"<" token.
"make" token.
"map" token.
"%" token.
"*=" token.
"*" token.
"-" token.
"nil" token.
"!=" token.
"!" token.
"?" token.
"|" token.
"package" token.
"panic" token.
"*" token.
"print" token.
"range" token.
ReservedWords is the list of reserved words in the _APP_ language.
"return" token.
";" token.
"<<" token.
">>" token.
SpecialTokens is a list of tokens that are considered special symantic characters.
"[" token.
"(" token.
"string" token.
"struct" token.
"-=" token.
"-" token.
"switch" token.
"test" token.
"true" token.
"try" token.
"type" token.
TypeTokens is a list of tokens that represent type names.
"..." token.
"var" token.
"when" token.

# Structs

Token defines a single token from the lexical scanning operation.
Tokenizer is an instance of a tokenized string.

# Type aliases

No description provided by the author