Categorygithub.com/ebarped/sego
repositorypackage
0.0.0-20240626082504-c89d53721976
Repository: https://github.com/ebarped/sego.git
Documentation: pkg.go.dev

# Packages

No description provided by the author

# README

sego

Search Engine written in Go.

This engine will index the linux API documentation stored in linux-docs folder inside linux-kernel-docs.tgz archive using the TF-IDF method.

Also, it can:

  • Accept queries about the documents through an API.
  • Accept queries about the documents through a web.

Documentation

Wikipedia

:notebook:

  • For Term Frequency, we use the raw count weighting scheme.
  • For Inverse document Frequency, we use the inverse document frequency smooth weighting scheme.

Run

  • Index files:
go run main.go -index
  • Serve files:
go run main.go -serve
  • Query the server:
curl 'localhost:4000/search?query=memory%20management'
  • Specify the result count (defaults to 5):
curl 'localhost:4000/search?query=memory%20management&count=10'

Frontend

cd ui
npm install
npm run dev

Inner workings

  • Index: parse the .html docs into a json that maps, for each document, every word occurrence inside it.
  • Serve: load the json file and apply TF-IDF algorithm to the search terms.

TODO

  • enable debug logs
  • try changing representation format to a more performant one
  • docker/docker-compose

Indexed files

We will index the linux kernel documentation. We have obtained this docs from the linux repo:

git clone --depth 1 https://github.com/torvalds/linux.git
cd linux
make htmldocs

Now, inside Documentation/output, there will be all the docs in .html format.