# README
A little utility/library (written in Go) that enables REST-like access to HTML pages by scraping and parsing them into JSON.
usage: restify [<flags>] <url>
Flags:
--help Show context-sensitive help (also try --help-long and --help-man).
--class=CLASS If specified, first-level elements encountered with this class will be extracted.
--id=ID If specified, the element with this id will be extracted.
--attribute=ATTRIBUTE If specified, as key=value, the element with the given attribute name set to the given value is extracted.
--version Print version and exit
--debug Enable debugging output
--user-agent="restify/1.4.0" user-agent header to provide with request
Args:
<url> A URL to RESTify into JSON
Output Structure
When a successful URL retrieval and match occurs, the utility will output the JSON conversion to stdout.
The top-level structure is an array of jsonNode
, where each jsonNode
is
structured as:
{
"name": "...element name...",
"class": "...class attribute, if present...",
"id": "...id attribute, if present...",
"href": "...href attribute, if present...",
"text": "...element text content, if present...",
"elements": [
...jsonNodes, if present...
]
}
Examples
Locate the latest Minecraft Bedrock server version by picking off the <a>
's with data-platform
set:
restify --attribute=data-platform https://www.minecraft.net/en-us/download/server/bedrock/
which produces:
[
{"name":"a","attributes":{"data-platform":"serverBedrockWindows","role":"button"},"class":"btn btn-disabled-outline mt-4 downloadlink","href":"https://minecraft.azureedge.net/bin-win/bedrock-server-1.12.0.28.zip","text":"Download"},
{"name":"a","attributes":{"data-platform":"serverBedrockLinux","role":"button"},"class":"btn btn-disabled-outline mt-4 downloadlink","href":"https://minecraft.azureedge.net/bin-linux/bedrock-server-1.12.0.28.zip","text":"Download"}
]
or to grab just the Linux instance:
restify --attribute=data-platform=serverBedrockLinux https://www.minecraft.net/en-us/download/server/bedrock/
Using as a library
The package github.com/itzg/restify
provides the library functions used by the command-line utility.
# Packages
No description provided by the author
# Functions
ConvertHtmlToJson the given HTML nodes into JSON content where each HTML node is represented by the JsonNode structure.
FindSubsetByAttributeName retrieves the HTML nodes that have the requested attribute, regardless of their values.
FindSubsetByAttributeNameValue retrieves the HTML nodes that have the requested attribute with a specific value.
FindSubsetByClass locates the HTML nodes with the given root that have the given className.
FindSubsetById locates the HTML node within the given root that has an id attribute of given value.
LoadContent retrieves the HTML content from the given url.
WithHeaders configures additional headers in the request used in LoadContent.
# Type aliases
No description provided by the author