Token-checked LLM call

Prevents silent truncation - the model never sees a prompt it can only half-fit.

The problem

LLMs have context windows. When your input exceeds the limit, the API either truncates silently or returns an error after you’ve already waited for the response and spent tokens on the request. In a pipeline, this is worse - downstream steps process garbage output without knowing the input was incomplete.

This is the most expensive bug in LLM pipelines because it looks like success. The API returns 200, the model responds, but the answer is based on a truncated document.

How the pipeline works

vrk tok --check 4000 reads stdin, counts tokens, and makes a decision. If the count is 4000 or under, it passes the text through unchanged to stdout. If over, it exits 1 with empty stdout. The pipe stops. vrk prompt never runs.

No wasted API call. No truncated output. No silent failure.

When it fails

When the token count exceeds the budget, vrk tok exits 1 and the pipeline stops immediately. The exit code propagates - your shell script, CI job, or agent sees a non-zero exit and knows the step failed. You can catch this and handle it (chunk the document, summarize a section, or report the error) instead of processing bad output.

Variations

Gate before a batch job to skip oversized documents:

for f in docs/*.md; do
  cat "$f" | vrk tok --check 8000 | vrk prompt --system "Extract key points" >> output.jsonl 2>/dev/null
done

Use --json for structured error reporting:

cat prompt.txt | vrk tok --check 4000 --json
# Over budget: {"error":"tok: 6201 tokens exceeds budget of 4000","code":1}

Pipeline

The problem

How the pipeline works

When it fails

Variations