Categorygithub.com/mccutchen/urlresolver
modulepackage
0.2.2
Repository: https://github.com/mccutchen/urlresolver.git
Documentation: pkg.go.dev

# README

urlresolver

Documentation Build status Code coverage Go report card

A golang package that "resolves" a given URL by issuing a GET request, following any redirects, canonicalizing the final URL, and attempting to extract the title from the final response body.

Methodology

Resolving

A URL is resolved by issuing a GET request and following any redirects until a non-30x response is received.

Canonicalizing

The final URL is aggressively canonicalized using a combination of PuerkitoBio/purell and some manual heuristics for removing unnecessary query params (e.g. utm_* tracking params), normalizing case (e.g. twitter.com/Thresholderbot and twitter.com/thresholderbot are the same).

Canonicalization is optimized for URLs that are shared on social media.

Security

TL;DR: Use safedialer.Control in the transport's dialer to block attempts to resolve URLs pointing at internal, private IP addresses.

Exposing functionality like this on the internet can be dangerous, because it could theoretically allow a malicious client to discover information about your internal network by asking it to resolve URLs whose DNS points at private IP addresses.

The dangers, along with a golang-specific mitigation, are outlined in Andrew Ayer's excellent "Preventing Server Side Request Forgery in Golang" blog post.

To mitigate that danger, users are strongly encouraged to use safedialer.Control as the Control function in the dialer used by the transport given to urlresolver.New.

See github.com/mccutchen/urlresolverapi for a productionized example, deployed at https://urlresolver.com.

# Packages

No description provided by the author
No description provided by the author

# Functions

Canonicalize filters unnecessary query params and then normalizes a URL, ensuring consistent case, encoding, sorting of params, etc.
New creates a new Resolver that uses the given transport to make HTTP requests and applies the given timeout to the overall process (including any redirects that must be followed).

# Variables

NormalizationFlags defines the normalization flags the purell package will use during canonicalization.

# Structs

Resolver resolves URLs.
Result is the result of resolving a URL.

# Interfaces

Interface defines the interface for a URL resolver.