vrk throttle

vrk throttle is a rate limiter for LLM API batch jobs - pace your pipeline to stay within API limits.

The problem

A pipeline sends 10,000 requests to an LLM API that allows 60 per minute. It fires as fast as stdin delivers and gets rate-limited after the first 60. Adding sleep 1 between requests is too conservative and turns a 3-minute job into a 3-hour one. sleep is per-iteration, not per-second, and ignores processing time.

The solution

vrk throttle paces pipeline flow to a specified rate. --rate 10/s or --rate 100/m delays lines to match. --burst lets the first N lines through immediately. --tokens-field enables token-aware rate limiting when records consume different API quotas.

Before and after

Before

cat records.jsonl | while read line; do
  process "$line"
  sleep 1  # Too slow. 60/min API allows faster.
done

After

cat records.jsonl | vrk throttle --rate 60/m | process-each

Example

cat records.jsonl | vrk throttle --rate 10/s --burst 5

Exit codes

Code	Meaning
0	All lines emitted at specified rate
1	Stdin read error, write error, or –tokens-field not found
2	–rate missing or invalid, interactive TTY

Flags

Flag	Short	Type	Description
`--rate`	-r	string	Rate limit in N/s or N/m format (required)
`--burst`		int	Emit first N lines immediately before applying rate limit
`--tokens-field`		string	Dot-path to JSONL field for token-based rate limiting
`--json`	-j	bool	Append metadata record after all output
`--quiet`	-q	bool	Suppress stderr output

How it works

Rate limiting

# Allow 10 lines per second
seq 20 | vrk throttle --rate 10/s

# Allow 100 lines per minute
cat records.jsonl | vrk throttle --rate 100/m

Lines are delayed to maintain the target rate. Output is the same as input, just paced.

Burst (–burst)

Let the first N lines through immediately, then enforce the rate:

cat records.jsonl | vrk throttle --rate 5/s --burst 10

The first 10 lines arrive instantly. After that, 5 per second. Use this for APIs that allow burst traffic but enforce sustained rate limits.

Token-aware rate limiting (–tokens-field)

When different records consume different amounts of API quota:

cat chunks.jsonl | vrk throttle --rate 100000/m --tokens-field tokens

Instead of counting lines, throttle counts the value of the tokens field in each JSONL record. This keeps you under token-per-minute limits even when chunk sizes vary.

JSON metadata (–json)

$ seq 5 | vrk throttle --rate 10/s --json
1
2
3
4
5
{"_vrk":"throttle","rate":"10/s","lines":5,"elapsed_ms":500}

Pipeline integration

Rate-limit LLM calls

# Process JSONL records through an LLM at 10 requests per second
cat data.jsonl | vrk throttle --rate 10/s | \
  while IFS= read -r record; do
    echo "$record" | jq -r '.text' | \
      vrk prompt --system 'Classify this text'
  done

Throttle web fetches

# Fetch URLs at a polite rate
cat urls.txt | vrk throttle --rate 2/s | \
  while IFS= read -r url; do
    vrk grab "$url" | vrk tok --json | jq -r '.tokens'
  done

Sample, throttle, then process

# Take a sample, pace the processing, and log results
cat large-dataset.jsonl | \
  vrk sip --count 100 --seed 42 | \
  vrk throttle --rate 5/s | \
  while IFS= read -r record; do
    RESULT=$(echo "$record" | vrk prompt --system 'Analyze')
    echo "$RESULT" | vrk emit --tag analysis
  done

When it fails

Missing –rate:

$ seq 10 | vrk throttle
usage error: throttle: --rate is required
$ echo $?
2

Invalid rate format:

$ seq 10 | vrk throttle --rate 10/h
usage error: throttle: invalid rate format (use N/s or N/m)
$ echo $?
2