Categorygithub.com/hypergopher/downcache

modulepackage

0.0.2

Repository: https://github.com/hypergopher/downcache.git

Documentation: pkg.go.dev

# README

DownCache

Status: Experimental

DownCache is a Go package that helps you organize, index, and search collections of Markdown files. It’s useful for projects with many Markdown posts, such as static site generators, documentation systems, or content management systems. It uses theyuin/goldmark package to convert Markdown to HTML.

What it does

Indexes markdown files, including their frontmatter metadata
It lets you search through all your markdown content
It categorizes posts (e.g., pages, posts, custom types) based on where they're stored and what's in their frontmatter
Supports both full and incremental reindexing to keep your index up-to-date
Uses BBolt to cache the markdown content and frontmatter for fast access
Uses Bleve to index and search through the markdown content
It can be extended with custom post types and frontmatter parsing rules
Takes inspiration from IndieWeb microformats for frontmatter naming and structure
Converts markdown content to HTML using the yuin/goldmark package
Supports creating, updating, and deleting markdown files and updating the indexes accordingly

It's designed to augment an existing Go application, so you can use it to add search and organization features to your Markdown-based projects. It doesn't handle routing, serving, or rendering HTML but can help you build those features into your application. It does not provide a user interface for managing content but can be used to build one.

Why?

I had a series of sites to build and wanted to use markdown files to store the content. I also wanted to search through the content and organize it. In addition, I wanted to add custom metadata to the Markdown files and make that metadata searchable. This package enables me to keep the posts in plaintext markdown files and still search through them from a web interface without needing a separate database or external search engine. The Markdown indexing and searching are ephemeral and can be rebuilt at any time.

Issues I wanted to address:

Keep all content in plaintext markdown files, as the source of truth
Use a fast, embedded database for caching converted content and metadata
Find specific content across many files quickly (e.g., searching full-text, tags, or other metadata)
Organize posts based on their type or other metadata
Handle different types of posts (like articles, pages, notes, bookmarks?) in one system
Use microformats for frontmatter to make it easier to work with the data
Convert markdown content to HTML for display
Have the ability to create, update, and delete Markdown posts programmatically, and have the indexes updated accordingly

Getting started

Here's a quick example of how to use it:

package main

import "github.com/hypergopher/downcache"

func main() {
	// A directory with markdown files
	markPath := "/path/to/markdown"

	// A directory to store the bbolt and bleve indexes
	dataPath := "/path/to/data"

	// A set of authors to associate with the markdown files
	authors := map[string]downcache.Author{
		"author1": {
			Name:      "Author 1",
			AvatarURL: "/images/author1.jpg",
			Links: []downcache.AuthorLink{
				{
					Name: "Mastodon",
					Icon: "mastodon",
					URL:  "https://example.social/@author1",
				},
			},
		},
	}

	// A set of taxonomies to associate with the markdown files
	taxonomies := map[string]string{
		"tags":       "tag",
		"categories": "category",
	}

	hd, err := downcache.NewDownCache(downcache.Options{
		MarkDir:      markPath,
		DataDir:      dataPath,
		Authors:      authors,
		Taxonomies:   taxonomies,
		ClearIndexes: true,
		Reindex:      true,
		Logger:       nil,
	})

	defer hd.Close()

	// Index everything
	hd.Reindex()

	// Get a post
	paginator, err := hd.GetPost("path/to/post-slug")

	// Get all articles (paginated)
	paginator, err := hd.GetPosts(downcache.FilterOptions{
		PageNum:              1,
		PageSize:             10,
		FilterByPostType: downcache.PostTypePost,
	})

	// Search for posts
	paginator, err := hd.GetPosts(downcache.FilterOptions{
		PageNum:              1,
		PageSize:             10,
		FilterByPostType: downcache.PostTypePost,
		FilterBySearch:       "your search query",
	})

	// Get posts by tag
	paginator, err := hd.GetPosts(downcache.FilterOptions{
		PageNum:    1,
		PageSize:   10,
		FilterType: downcache.FilterTypeTaxonomy,
		FilterKey:  "tags",
		FilterTerm: "tag3",
	})

	// Get posts by author
	paginator, err := hd.GetPosts(downcache.FilterOptions{
		PageNum:    1,
		PageSize:   10,
		FilterType: downcache.FilterTypeAuthor,
		FilterTerm: "author1",
	})
}

Where you might use this

In a static site generator to add search and help organize content
For a documentation system to manage and search through lots of docs
As part of a content management system for handling blog posts or articles
To create searchable collections of markdown-based knowledge articles

License

This project is under the Apache 2.0 License - check out the LICENSE file for details.

Frontmatter

The frontmatter for each markdown file can be in YAML or TOML format. Here's an example of what it might look like:

---
name: "Page 1"
summary: "Page 1 summary"
status: "published"
published: "2021-01-01T00:00:00Z"
authors:
  - author1
taxonomies:
  categories:
    - cat1
    - cat2
  tags:
    - tag1
    - tag2
---

+++
name = "Page 1"
summary = "Page 1 summary"
status = "published"
published = "2021-01-01T00:00:00Z"
authors = ["author1"]

[taxonomies]
categories = ["cat1", "cat2"]
tags = ["tag1", "tag2"]

[properties]
key1 = "value1"
key2 = "value2"
+++

Available frontmatter fields

Frontmatter fields adhere to the h-entry microformat. The following fields are available:

authors (array of strings): The authors of the post. Each string represents a key in the Authors map passed into DownCache.
featured (bool): Whether the post is featured
photo (string): The URL of a featured image
name (string): The name/title of the post
properties (map[string]any): Arbitrary key-value pairs for additional metadata, such as extra microformat properties.
published (time.Time): The time the post was published (Use RFC3339 format like "2006-01-02T00:00:00Z" or " 2006-01-02")
status (string): The status of the post (draft or published). If empty, the post is considered published.
subtitle (string): A subtitle for the post
summary (string): A summary of the post
taxonomies (map[string][]string): The taxonomies associated with the post
visibility (string): The visibility of the post (public, private, or unlisted). If empty, the post is considered public.

When working with status (published, draft) or visibility (public, private, unlisted), it is up to the caller to interpret these values as needed and to show/hide posts accordingly.

(Optional) Dates in filenames

If you want to use optional dates in your filenames, you can use the following format:

YYYY-MM-DD-post-slug.md

This will allow DownCache to extract the date from the filename and use it as the published date for the post.

If a published field is present in the frontmatter, it will take precedence over the date in the filename.

The published date will not be set if no date is found in the filename or frontmatter.

The slug will always have the date in the filename, but you can use the following methods to get a slug without the embedded filename date. Depending on your needs, however, this may cause conflicts if you have multiple posts with the same slug but different dates.

SlugWithoutDate() on a Post struct. For example, foobar/2024-08-21-post-slug would become foobar/post-slug.
SlugWithYear() on a Post struct. For example, foobar/2024-08-21-post-slug would become 2024/foobar/post-slug.
SlugWithYearMonth() on a Post struct. For example, foobar/2024-08-21-post-slug would become 2024/08/foobar/post-slug.
SlugWithYearMonthDay() on a Post struct. For example, foobar/2024-08-21-post-slug would become 2024/08/21/foobar/post-slug.

TODO

Improve documentation
Implement incremental reindexing
Align better with microformat properties

Possible future features

# Functions

DefaultPostTypes

No description provided by the author

Deserialize

Deserialize deserializes the byte slice to a post.

EstimateReadingTime

EstimateReadingTime estimates the reading time of the content.

GenerateETag

GenerateETag generates an ETag for the content.

IsValidPostPath

No description provided by the author

MarkdownToPost

MarkdownToPost converts markdown content to a Post.

NewDownCache

No description provided by the author

NewLocalMarkdownFS

No description provided by the author

NewMemoryCacheStore

NewMemoryCacheStore creates a new MemoryCacheStore.

NewPaginator

NewPaginator returns a Paginator struct with the given parameters.

PostPathID

PostPathID returns the unique identifier for a page of the specified type and slug.

SlugifyPath

SlugifyPath transforms a full OS path into a slugified path.

# Constants

FilterTypeAny

No description provided by the author

FilterTypeAuthor

No description provided by the author

FilterTypeTaxonomy

No description provided by the author

FrontmatterTOML

No description provided by the author

FrontmatterYAML

No description provided by the author

PostTypeKeyAny

No description provided by the author

PostTypeKeyArticle

No description provided by the author

PostTypeKeyBookmark

No description provided by the author

PostTypeKeyLink

No description provided by the author

PostTypeKeyNote

No description provided by the author

PostTypeKeyPage

No description provided by the author

# Variables

ErrInvalidPostMeta

No description provided by the author

# Structs

Author

No description provided by the author

AuthorLink

No description provided by the author

DefaultMarkdownProcessor

DefaultMarkdownProcessor is the default implementation of the MarkdownProcessor interface.

DownCache

DownCache is the main entry point for the markdown cache system.

FilterOptions contains the options to filter posts.

KeyValueFilter

No description provided by the author

LocalMarkdownFS

LocalMarkdownFS implements MarkdownFS for the local file system.

MemoryCacheStore

MemoryCacheStore implements CacheStore interface using in-memory storage.

Paginator

Paginator is a struct that holds information about pagination, such as the total number of pages, the current page, the next and previous pages, the page size, whether there are more pages, whether there are posts, the total number of posts, all posts, featured posts, non-featured posts, and whether the paginator is visible.

Post

Post represents a Markdown post.

PostMeta

PostMeta represents the frontmatter of a post.

SlugPath

No description provided by the author