Cache LLM response

Avoids duplicate API calls for identical prompts - the hash keys the cache so reruns are free.

The problem

LLM API calls are slow and expensive. If you’re running the same prompt against the same document twice - during development, in a retry loop, or in a pipeline that restarts - you’re paying for a response you already have.

How the pipeline works

vrk prompt --json returns the LLM response as a JSON envelope that includes a request_hash - a deterministic hash of the model, system prompt, and input. You use this hash as the cache key in vrk kv. On the next run, check the cache first:

HASH=$(cat doc.txt | vrk tok --json | jq -r '.hash // empty')
CACHED=$(vrk kv get "cache:$HASH" 2>/dev/null)
if [ -n "$CACHED" ]; then
  echo "$CACHED"
else
  RESULT=$(cat doc.txt | vrk prompt --json)
  HASH=$(echo "$RESULT" | jq -r '.request_hash')
  echo "$RESULT" | jq -r '.response' | vrk kv set "cache:$HASH"
fi

Why this works

vrk kv is a SQLite-backed key-value store that lives on disk. No Redis, no external service. The cache survives process restarts, script reruns, and pipeline failures. It’s local and instant - a kv lookup takes microseconds compared to seconds for an API call.

Pipeline

The problem

How the pipeline works

Why this works