Batch LLM with rate limiting

Prevents failure at job 847 of 10,000 - throttle paces filenames so each API call respects the rate limit.

The problem

You have 10,000 documents to process with an LLM. Without rate limiting, you’ll hit the API’s requests-per-minute limit around document 60 and get 429 errors for the rest. Without token gating, oversized documents waste API calls that fail or return truncated results. Without result storage, a crash at document 5,000 means starting over.

How the pipeline works

ls emits one filename per line. vrk throttle --rate 60/m releases filenames at 60 per minute. The while loop reads each filename as it arrives and processes it:

vrk tok --check 8000 verifies the document fits in the context window. If not, exit 1 and the loop continues to the next file.
vrk prompt sends the document to the LLM and prints the response.
vrk kv set stores the result keyed by filename.

Because throttle controls the rate at which filenames enter the loop, each API call is paced. Because results are stored in vrk kv, you can check which documents have already been processed and skip them on rerun.

Making it resumable

Add a cache check at the start of each iteration:

ls docs/*.md | vrk throttle --rate 60/m \
  | while read -r f; do
      KEY="result:$(basename "$f")"
      vrk kv get "$KEY" >/dev/null 2>&1 && continue
      cat "$f" | vrk tok --check 8000 \
        | vrk prompt \
        | vrk kv set "$KEY"
    done

If the script crashes at document 5,000, restart it. The first 4,999 documents are already in kv and get skipped in milliseconds.

Pipeline

The problem

How the pipeline works

Making it resumable