modulepackage
0.0.0-20171027140936-8843561fbe36
Repository: https://github.com/datatogether/core.git
Documentation: pkg.go.dev
# README
archive
Core Data Model definitions for archival work
archive is a go implementation of standard data models for for data together.
Services that import archive so far:
Noticibly absent from this package is the definition of a user, please see the identity service for that stuff.
# Functions
BasePrimers lists primers that have no parent.
CalcHash calculates the multihash key for a given slice of bytes TODO - find a proper home for this.
No description provided by the author
No description provided by the author
No description provided by the author
CountPrimers returns the total number of primers.
CountSources grabs the total number of sources.
CrawlingSources lists sources with crawling = true, paginated.
No description provided by the author
No description provided by the author
LatestMetadata gives the most recent metadata timestamp for a given keyId & subject combination if one exists.
No description provided by the author
No description provided by the author
ListPrimers.
ListSources lists all sources from most to least recent, paginated.
No description provided by the author
No description provided by the author
No description provided by the author
MetadatasBySubject returns all metadata for a given subject hash.
No description provided by the author
NewFileFromRes generates a new file by consuming & closing a given response body.
NextMetadata returns the next metadata block for a given subject.
NormalizeURL removes inconsitencincies from a given url.
NormalizeURLString removes inconsitencincies from a given url string.
ReadDstContentLinks returns a list of links that specify a gien url as src that are content urls.
ReadDstLinks returns all links that specify a given url as src.
ReadSrcLinks returns all links that specify a given url as dst.
No description provided by the author
SnapshotsForUrl returns all snapshots for a given url string.
SumConsensus tallies the consensus around a given subject hash from a provided Metadata slice.
No description provided by the author
UnmarshalBoundedPrimers turns sql.Rows into primers, expecting len(rows) <= limit.
UnmarshalBoundedSources turns a standard sql.Rows of Source results into a *Source slice.
No description provided by the author
UnmarshalUrls takes an sql cursor & returns a slice of url pointers expects columns to math urlCols().
No description provided by the author
No description provided by the author
WriteSnapshot creates a snapshot record in the DB from a given Url struct.
# Variables
No description provided by the author
all these need to be set for file saving to work.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
how long before a url is considered stale.
# Structs
Collections are generic groupings of content collections can be thought of as a csv file listing content hashes as the first column, and whatever other information is necessary in subsequent columns.
CollectionItem is an item in a collection.
CustomCrawls are urls that contain content that cannot be extracted with traditional web crawling / scraping methods.
DataRepo is a place that holds data in a structured format.
File is a buffered byte slice often made from a GET response body.
A link represents an <a> tag in an html document src who's href attribute points to the url that resolves to dst.
Meta is a struct for sharing our knowledge of a url with other services.
A snapshot is a record of a GET request to a url There can be many metadata of a given url.
Primer is tracking information about an abstract group of content.
TODO - finish.
A snapshot is a record of a GET request to a url There can be many snapshots of a given url.
Source is a concreate handle for archiving.
No description provided by the author
Uncrawlables are urls that contain content that cannot be extracted with traditional web crawling / scraping methods.
URL represents..
# Type aliases
Consensus is an enumeration of Meta graph values arranged by key.