// pkg.gl

Categorygithub.com/go-go-golems/pinocchiocmdexampleseval

package

0.4.19

Repository: https://github.com/go-go-golems/pinocchio.git

Documentation: pkg.go.dev

# Packages

No description provided by the author

No description provided by the author

# README

I want an eval tool for my geppetto prompts:

Input:

eval dataset json file
prompt template

Output:

set of eval metrics

dataset + template -> llm calls -> compute accuracy -> eval results

step 0

create a glazed command for evals
generate mock rows for eval results
wrap as command line tool

step 1

load a eval data set from eval.json
- array of objects
- each object:
  - input: hash[string]interface{}
  - golden answer: interface{}
iterate over each entry in eval.json
load a prompt from complaint.yaml
interpolate the complaint.yaml command

Running the actual LLM inference

run it
- load the API key, etc...
- create the chat step
- get the step result
- store the metadata in the result json

Postprocessing the LLM response

store the answer
- store the LLM metadata
- store the date
- give it a unique UUID

go run ./cmd/eval --dataset eval.json --command complaint.yaml

step 2

run a grading function against the LLM answer
- take a javascript script grading
compute a accuracy score

go run ./cmd/eval --dataset eval.json --command complaint.yaml --scoring score.js

step 3

REST API
web ui (braintrust inspired)
- make it cancellable when pressing Ctrl-C
- show full conversation when expanding
- rerun a single conversation and get streaming completion
- import/export datasets
- import/export/manage prompts
- log + monitoring of testruns
- streaming display of running datasets
- edit prompt and save new revisions
- switch between different versions and compare results and metrics and accuracy

features

caching of inference

A Golang package index with better search, categorization and insights.

Made with ♥ in 2025