repositorypackage
0.0.0-20240626082504-c89d53721976
Repository: https://github.com/ebarped/sego.git
Documentation: pkg.go.dev
# Packages
No description provided by the author
# README
sego
Search Engine written in Go.
This engine will index the linux API documentation stored in linux-docs
folder inside linux-kernel-docs.tgz
archive using the TF-IDF
method.
Also, it can:
- Accept queries about the documents through an API.
- Accept queries about the documents through a web.
Documentation
:notebook:
- For Term Frequency, we use the
raw count weighting scheme
. - For Inverse document Frequency, we use the
inverse document frequency smooth weighting scheme
.
Run
- Index files:
go run main.go -index
- Serve files:
go run main.go -serve
- Query the server:
curl 'localhost:4000/search?query=memory%20management'
- Specify the result count (defaults to 5):
curl 'localhost:4000/search?query=memory%20management&count=10'
Frontend
cd ui
npm install
npm run dev
Inner workings
- Index: parse the .html docs into a json that maps, for each document, every word occurrence inside it.
- Serve: load the json file and apply
TF-IDF
algorithm to the search terms.
TODO
- enable debug logs
- try changing representation format to a more performant one
- docker/docker-compose
Indexed files
We will index the linux kernel documentation. We have obtained this docs from the linux repo:
git clone --depth 1 https://github.com/torvalds/linux.git
cd linux
make htmldocs
Now, inside Documentation/output
, there will be all the docs in .html
format.