package
0.4.19
Repository: https://github.com/go-go-golems/pinocchio.git
Documentation: pkg.go.dev

# README

I want an eval tool for my geppetto prompts:

Input:

  • eval dataset json file
  • prompt template

Output:

  • set of eval metrics

dataset + template -> llm calls -> compute accuracy -> eval results

step 0

  • create a glazed command for evals
  • generate mock rows for eval results
  • wrap as command line tool

step 1

  • load a eval data set from eval.json

    • array of objects
    • each object:
      • input: hash[string]interface{}
      • golden answer: interface{}
  • iterate over each entry in eval.json

  • load a prompt from complaint.yaml

  • interpolate the complaint.yaml command

Running the actual LLM inference

  • run it
    • load the API key, etc...
    • create the chat step
    • get the step result
    • store the metadata in the result json

Postprocessing the LLM response

  • store the answer
    • store the LLM metadata
    • store the date
    • give it a unique UUID

go run ./cmd/eval --dataset eval.json --command complaint.yaml

step 2

  • run a grading function against the LLM answer
    • take a javascript script grading
  • compute a accuracy score

go run ./cmd/eval --dataset eval.json --command complaint.yaml --scoring score.js

step 3

  • REST API

  • web ui (braintrust inspired)

    • make it cancellable when pressing Ctrl-C

    • show full conversation when expanding

    • rerun a single conversation and get streaming completion

    • import/export datasets

    • import/export/manage prompts

    • log + monitoring of testruns

    • streaming display of running datasets

    • edit prompt and save new revisions

    • switch between different versions and compare results and metrics and accuracy

features

  • caching of inference

# Packages

No description provided by the author
templ: version: v0.2.793.