The crawler should be a command-line tool that accepts a starting URL and a destination directory. The crawler will then download the page at the URL, save it in the destination directory, and then recursively proceed to any valid links in this page. A valid link is the value of an href attribute in an <a> tag the resolves to https://start.url/abc/foo and https://start.url/abc/foo/bar are valid URLs, but ones that resolve to https://another.domain/ or to https://start.url/baz are not valid URLs, and should be skipped.

Additionally, the crawler should:

Correctly handle being interrupted by Ctrl-C
Perform work in parallel where reasonable
Support resume functionality by checking the destination directory for downloaded pages and skip downloading and processing where not necessary
Provide “happy-path” test coverage

Usage:

go run main.go <--url url> <--path destination>

> go run main.go --url http://www.w3school.com/cpp --path websites

Developed using Golang version 1.20

Main features:

File processing to guarantee that links are relative to base directory
Handle Ctrl-C

Not top priority feature while no concurrency implemented
Concurrency

Mutex to handle set visited url

# Packages

# README

Implement a recursive, mirroring web crawler