Categorygithub.com/iwdgo/htmlutils
modulepackage
1.0.7
Repository: https://github.com/iwdgo/htmlutils.git
Documentation: pkg.go.dev

# README

Go Reference Go Report Card codecov

Build Status Build Status Build status Go

Exploring HTML structure

HTML is parsed using golang.org/x/net/html which produces a tree.

The module provides basic functionality to compare HTML tags or nodes and their trees. The search of an HTML tag using a *node.HTML type ignores pointers. It always returns the first match. By ignoring some properties, tags like <button> are easy to count. Text value of a tag (title, error message,...) can be checked.

Good to know

Parsing is not done according to the complete syntax checker of HTML. For instance, tags like <p> for which a closing tag would fail a comparison.

Siblings must always have the same order or comparison fails. Order of attributes is treated as irrelevant.

How to start

Detailed documentation includes examples.

Versions

v1.0.6 updates golang/go/x/net package to remove CVE-2022-27664 which does not affect x/net/html v1.0.5 requires Go 1.16+ as ioutil package use is removed.
v1.0.4 requires Go 1.17+ which implements lazy loading of modules to avoid go.mod updates.
v1.0.0 was created on Go 1.12 which supports modules.

# Functions

AttrIncluded returns true if list of attributes of n is included in reference node m whatever their order.
Equal returns true if all fields of nodes m and n are equal except pointers reflect.DeepEqual(tag1, tag2) is unusable as pointers are checked too.
ExploreNode prints node tags with name s and type t Without name, all tags are printed When type ErrorNode (iota == 0) prints tags of all types.
FindNode find the first occurrence of a node.
FindTag finds the first occurrence of a tag name (i.e.
FindTags finds all occurrences of a tag name whatever their attributes.
GetText prints the text content of a tree structure like PrintNodes w/o any formatting TODO Check usage of (* Tokenizer) Text equivalent in net/html package.
IdenticalNodes fails if trees have different size.
IncludedNode checks if n is included in m.
IncludedNodeTyped is like IncludeNode where only tags of type t are compared.
IsTextNode checks the presence of a node and its text value in a buffer.
IsTextTag checks the presence of a tag and its text value in a buffer.
ParseFile returns a *Node containing the parsed file or an error (file or parsing).
PrintData returns a string with Node information (not its relationships) nil will panic.
PrintNodes prints the tree structure of node m until n node is equal.
PrintTags prints node structure until a tag name is found (whatever attributes) Without name, all tags are printed tagOnly selects ElementNode, otherwise tags are printed whatever type.