Categorygithub.com/asg017/sqlite-html
modulepackage
0.1.3
Repository: https://github.com/asg017/sqlite-html.git
Documentation: pkg.go.dev

# README

sqlite-html

A SQLite extension for querying, manipulating, and creating HTML elements.

  • Extract HTML or text from HTML with CSS selectors, like .querySelector(), .innerHTML, and .innerText
  • Generate a table of matching elements from a CSS selector, like .querySelectorAll()
  • Safely create HTML elements in a query, like .createElement() and .appendChild()

sqlite-html's API is modeled after the official JSON1 SQLite extension.

This extension is written in Go, thanks to riyaz-ali/sqlite. While this library aims to be fast and efficient, it is overall slower than what a pure C SQLite extension could be, but in practice you may not notice much of a difference.

Usage

.load ./html0
select html_extract('<p> Anakin <b>Skywalker</b> </p>', 'b');
-- "<b>Skywalker</b>"

sqlite-html is similar to other HTML scraping tools like BeautifulSoup (Python) or cheerio (Node.js) or nokogiri (Ruby). You can use CSS selectors to extract individual elements or groups of elements to query data from HTML sources.

For example, here we find all href links in an index.html file.

select
  text as name,
  html_attribute_get(anchors, 'a', 'href') as href
from html_each(readfile('index.html'), 'a') as anchors

We can also safely generate HTML with html_element, modeled after React's React.createElement.

select html_element('p', null,
  'Luke, I am your',
  html_element('b', null, 'father'),
  '!',

  html_element('img', json_object(
    'src', 'https://images.dog.ceo/breeds/groenendael/n02105056_4600.jpg',
    'width', 200
  ))
);

-- "<p>Luke, I am your<b>father</b>!<img src="https://images.dog.ceo/breeds/groenendael/n02105056_4600.jpg" width="200.000000"/></p>"

Documentation

See docs.md for a full API reference.

Installing

LanguageInstall
Pythonpip install sqlite-htmlPyPI
Datasettedatasette install datasette-sqlite-htmlDatasette
Node.jsnpm install sqlite-htmlnpm
Denodeno.land/x/sqlite_htmldeno.land/x release
Rubygem install sqlite-htmlGem
Github ReleaseGitHub tag (latest SemVer pre-release)

The Releases page contains pre-built binaries for Linux amd64, MacOS amd64 (no arm), and Windows.

As a loadable extension

If you want to use sqlite-html as a Runtime-loadable extension, Download the html0.dylib (for MacOS), html0.so (Linux), or html0.dll (Windows) file from a release and load it into your SQLite environment.

Note: The 0 in the filename (html0.dylib/ html0.so/html0.dll) denotes the major version of sqlite-html. Currently sqlite-html is pre v1, so expect breaking changes in future versions.

For example, if you are using the SQLite CLI, you can load the library like so:

.load ./html0
select html_version();
-- v0.0.1

Or in Python, using the builtin sqlite3 module:

import sqlite3

con = sqlite3.connect(":memory:")

con.enable_load_extension(True)
con.load_extension("./html0")

print(con.execute("select html_version()").fetchone())
# ('v0.0.1',)

Or in Node.js using better-sqlite3:

const Database = require("better-sqlite3");
const db = new Database(":memory:");

db.loadExtension("./html0");

console.log(db.prepare("select html_version()").get());
// { 'html_version()': 'v0.0.1' }

Or with Datasette:

datasette data.db --load-extension ./html0

See also

  • sqlite-http, for making HTTP requests in SQLite (pairs great with this tool)
  • htmlq, for a similar but CLI-based HTML query tool using CSS selectors
  • riyaz-ali/sqlite, the brilliant Go library that this library depends on
  • nalgeon/sqlean, several pre-compiled handy SQLite functions, in C

# Functions

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Constants

https://github.com/sqlite/sqlite/blob/8b554e2a1ea4de0cb30a49357684836710f44905/ext/misc/json1.c#L159.

# Variables

A random "magic" number to use for sqlite subtypes.
* html_each(document, selector) * A table value function returned a row for every matching element inside document using selector.

# Structs

* html_attribute_get(document, selector, name) * html_attr_get(document, selector, name) * Get the value of the "name" attribute from the element found in document, using selector **/.
* html_attribute_has(document, selector, name) * html_attr_has(document, selector, name) * Returns 1 or 0, if the "name" attribute from the element * found in document, using selector, exists **/ .
* html_count(document, selector) * Count the number of matching selected elements in the given document.
* html_debug() * Returns more information for the current html module, * including build date + commit hash.
No description provided by the author
* html_element(tag, attributes, child1, ...) * Create an HTML element with the given tag, attributes, and children.
* html_escape(content) * Returns an HTML escaped version of the given content.
* html_extract(document, selector) * Returns the entire HTML representation of the selected element from document, using selector.
* html(document) * Verifies and "cleans" (quotes attributes) the given document as HTML.
No description provided by the author
No description provided by the author
* html_table(content) * Wrap the given content around a HTML table.
* html_text(document [, selector]) * Returns the combined text contents of the selected element.
* html_trim(content) * Trim whitespace around the given text content.
* html_unescape(content) * Returns an HTML unescaped version of the given content.
* html_valid(document) * Returns 1 if the given document is valid HTML, 0 otherwise.
* html_version() * Returns the semver version of the current sqlite-html module.