Categorygithub.com/astappiev/microdata
modulepackage
1.0.2
Repository: https://github.com/astappiev/microdata.git
Documentation: pkg.go.dev

# README

Microdata

Microdata is a package to extract Microdata and JSON-LD from HTML documents.

HTML Microdata is a markup specification often used in combination with the schema collection to make it easier for search engines to identify and understand content on web pages. One of the most common schemas is the rating you see when you google for something. Other schemas are persons, places, events, products, etc.

JSON-LD is a lightweight Linked Data format. It is easy for humans to read and write. It is based on the already successful JSON format and provides a way to help JSON data interoperate at Web-scale.

Go package use

Install the package:

go get -u github.com/astappiev/microdata

Use cases:

// Pass a URL to the `ParseURL` function.
data, err := microdata.ParseURL("https://example.com/page")

// Pass a `io.Reader`, content-type and a base URL to the `ParseHTML` function.
data, err := microdata.ParseHTML(reader, contentType, baseURL)

// Pass a `html.Node`, content-type and a base URL to the `ParseNode` function.
data, err := microdata.ParseNode(reader, contentType, baseURL)

An example program:

package main

import (
    "encoding/json"
    "fmt"

    "github.com/astappiev/microdata"
)

func main() {
    data, _ := microdata.ParseURL("https://www.allrecipes.com/recipe/84450/ukrainian-red-borscht-soup/")
    
    // iterate over metadata items:
    items := data.Items
	for _, item := range items {
		fmt.Println(item.Types)
		for key, prop := range item.Properties {
			fmt.Printf("%s: %v\n", key, prop)
		}
	}

    // print json schema
    jsonSchema, _ := json.MarshalIndent(data, "", "  ")
    fmt.Println(string(jsonSchema))
}

Command line use

Install the command line tool:

go install github.com/astappiev/microdata/cmd/microdata

Parse a URL:

microdata https://www.gog.com/game/...
{
  "items": [
    {
      "type": [
        "http://schema.org/Product"
      ],
      "properties": {
        "additionalProperty": [
          {
            "type": [
              "http://schema.org/PropertyValue"
            ],
{
...

Parse HTML from the stdin:

$ cat saved.html | microdata

Format the output with a Go template to return the "price" property:

microdata -format '{{with index .Items 0}}{{with index .Properties "offers" 0}}{{with index .Properties "price" 0 }}{{ . }}{{end}}{{end}}{{end}}' https://www.gog.com/game/...
8.99

# Packages

No description provided by the author

# Functions

NewItem returns a new Item.
ParseHTML parses the HTML document available in the given reader and returns the microdata.
ParseNode parses the root Node and returns the microdata.
ParseURL parses the HTML document available at the given URL and returns the microdata.

# Structs

No description provided by the author
No description provided by the author

# Type aliases

No description provided by the author
No description provided by the author