vrk jsonl
vrk jsonl converts JSON arrays to JSONL and back for line-by-line pipeline processing.
The problem
An API returns a JSON array of 50,000 records. jq '.[]' flattens it but loads the entire array into memory first. On a 2GB response the process gets OOM-killed. Line-by-line processing requires JSONL, but the API only returns arrays.
The solution
vrk jsonl converts JSON arrays to JSONL (one object per line) and back. The default mode splits arrays for line-by-line pipeline processing. --collect gathers JSONL lines back into a JSON array. The streaming decoder handles files larger than available memory.
Before and after
Before
cat data.json | jq '.[]'
# Loads entire array into memory. OOM on large files.
After
cat data.json | vrk jsonl
Example
cat api-response.json | vrk jsonl | vrk validate --schema '{"name":"string"}'
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success, including empty input |
| 1 | Invalid JSON, I/O error |
| 2 | Interactive TTY with no input, unknown flag |
Flags
| Flag | Short | Type | Description |
|---|---|---|---|
--collect | -c | bool | Collect JSONL lines into a JSON array |
--json | -j | bool | Append metadata trailer after all records (split mode only) |
How it works
Split a JSON array into JSONL
$ echo '[{"name":"Alice"},{"name":"Bob"},{"name":"Carol"}]' | vrk jsonl
{"name":"Alice"}
{"name":"Bob"}
{"name":"Carol"}
Each array element becomes one line. Pipe to while read, vrk validate, or any line-oriented tool.
Collect JSONL back into an array (–collect)
$ printf '{"name":"Alice"}\n{"name":"Bob"}\n' | vrk jsonl --collect
[{"name":"Alice"},{"name":"Bob"}]
Use this when a downstream tool or API expects a JSON array.
Metadata trailer (–json)
$ echo '[{"a":1},{"b":2}]' | vrk jsonl --json
{"a":1}
{"b":2}
{"_vrk":"jsonl","count":2}
Pipeline integration
Split an API response for validation
# API returns a JSON array; split it for per-record validation
curl -s https://api.example.com/users | \
vrk jsonl | \
vrk validate --schema '{"name":"string","email":"string"}' --strict
Process array records through an LLM
# Split array, process each record, collect results back
cat data.json | vrk jsonl | \
while IFS= read -r record; do
echo "$record" | vrk prompt --system 'Classify this record'
done | vrk jsonl --collect > results.json
Sample from a large array
# Split a large JSON array, sample 100 records
cat large-dataset.json | vrk jsonl | vrk sip --count 100 --seed 42
Throttle array processing
# Split array and rate-limit processing to 5 records per second
cat data.json | vrk jsonl | vrk throttle --rate 5/s | \
while IFS= read -r record; do
process "$record"
done
When it fails
Invalid JSON input:
$ echo 'not json' | vrk jsonl
error: jsonl: invalid JSON
$ echo $?
1
No input:
$ vrk jsonl
usage error: jsonl: no input: pipe JSON to stdin
$ echo $?
2