# README
🌐 Get wiki pages from the command line. Increase brain volume 🧠.
[!WARNING]
This project is unfinished! Not all of the features listed in theREADME
are available.
Wikiscrape
Get wiki pages. Export to your desired format. Wikiscrape is a command-line tool which aims to provide a wiki-agnostic method for retrieving and exporting data from wiki pages.

The whole motivation for this project is to provide a consistent and convenient interface for interacting with the sometimes frustrating world of wiki APIs. Despite the vast majority of wikis being built upon a small number of frameworks, I often found even those which shared a backend framework to have vastly different access patterns.
For example, despite both being built on top of MediaWiki, wikipedia
and the oldschool runescape wiki
differ in the following:
- API Endpoint:
en.wikipedia.org/w/api.php
vs.oldschool.runescape.wiki/api.php
- Page Prefix:
en.wikipedia.org/wiki/pageName
vs.oldschool.runescape.wiki/w/pageName
Features
- Bl-Moderately Fast 🚀🔥
- Effortless retrieval of full wiki pages or specific sections
- Support for multiple wiki backends
- Manifest file support: wikiscrape can iteratively scrape a from a list of pages given a json file.
Wiki Support
Because of the differences in API access patterns mentioned above, wikis must be explicitly supported by Wikiscrape in order to retrieve content from them. "Support" involves the following:
-
A
wikiInfo
entry ininternal/util/wikisupport.go
, which allows mapping known wiki names or URL host segments to information about their respective backends, API endpoints, and page prefixes for handling parsing page names from URLs. -
A
scraper
andresponse
ininternal/scraper
designed specifically for the wiki's backend to handle parsing API responses and their content.
For a list of the wikis and backends supported by Wikiscrape, please see the command wikiscrape list -h
. Currently, supported backends are:
- MediaWiki
If you have a wiki that you would like supported, and there is already existing support for its backend in the aforementioned internal/scraper
, please feel free to submit an issue. If you have the skill or the time, please also feel free to contribute directly to the project by adding the wiki to the wikiHostInfo
and wikiNameInfo
maps in internal/util/wikisupport.go
! Please see the contribution guide
Installation
Right now, the best way to get wikiscrape on your machine is to just use go
:
go install github.com/mal0ner/wikiscrape@latest
Usage
Wikiscrape gives you a simple and intuitive command-line interface.
Scrape a single page:
# by url
wikiscrape "https://en.wikipedia.org/wiki/Bear"
# by name
wikiscrape page "Bear" --wiki wikipedia
Scrape the list of section headings from a page:
# by url
wikiscrape "https://en.wikipedia.org/wiki/Bear" --section-titles
# by name
wikiscrape page "Bear" --wiki wikipedia -t
Scrape a specific section:
wikiscrape page "Bear" --wiki wikipedia --section "Taxonomy"
# short
wikiscrape page Bear -w wikipedia -s Taxonomy
Scrape multiple pages from a manifest file:
wikiscrape pages --wiki wikipedia --from-manifest "path/to/manifest.json"
# short
wikiscrape pages -w wikipedia -f path/to/manifest.json
Scrape just references from a list of pages:
wikiscrape pages --wiki wikipedia --section "References" --from-manifest "path/to/manifest.json"
# short
wikiscrape pages -w wikipedia -s References -f path/to/manifest.json
Manifest
The format of the manifest file is just a simple JSON array. This was probably a strange design decision but I don't really wan't to change it! Page titles can be included raw without the need for url encoding, as this step is taken care of by the program.
["Hammer", "Zulrah/Strategies"]
This could potentially be expanded in the future to allow for the user to specify a section to scrape on a per-page basis, i.e. {"page": "Hammer", "section": "Uses"}
but I have no plans for that now.
FAQ
Will you ever fix the logo alignment?
No 👍
Contribution
We welcome contributions, If you'd like to help out, please follow these steps:
- Fork the repository
- Create a new branch for your feature or bug fix
- Make your changes and commit them with descriptive messages
- Push your changes to your forked repository
- Submit a pull request to the main repository
Roadmap
- Multi-language support
- Fuzzy-find pages (low priority)
- Fuzzy-find sections (low priority)
- Add more export formats
- Link preservation
- Table parsing
- List parsing
- Reference parsing and potentially BibTeX export? Could have a
--references
flag - Tests!
- Adding more wikis (and the confluence backend)
- Proper SemVer
- Add configuration file for configuring default behaviour (for less verbosity)