vrk links

vrk links extracts hyperlinks from markdown, HTML, or plain text and outputs one JSONL record per link.

The problem

grep -oE 'https?://[^ ]+' finds bare URLs but misses [text](url) markdown links. A regex that handles inline links misses reference-style [text][ref] links. Neither approach gives you line numbers or link text.

The solution

vrk links extracts hyperlinks from markdown, HTML, or plain text and outputs one JSONL record per link with the link text, URL, and line number. Handles inline links, reference-style links, HTML anchor tags, and bare URLs. --bare outputs just the URLs, one per line.

Before and after

Before

grep -oE 'https?://[^ ]+' README.md
# misses [text](url) markdown links
# misses [text][ref] reference links
# no line numbers, no link text

After

cat README.md | vrk links

Example

vrk grab https://example.com/docs | vrk links --bare

Exit codes

Code	Meaning
0	Success, including empty input and documents with no links
1	I/O error reading stdin
2	Interactive TTY with no piped input, unknown flag

Flags

Flag	Short	Type	Description
`--bare`	-b	bool	Output URLs only, one per line
`--json`	-j	bool	Append metadata trailer after all records
`--quiet`	-q	bool	Suppress stderr output

How it works

JSONL output (default)

$ printf '# Getting Started\n\nVisit [our docs](https://docs.example.com) for the full guide.\nAlso check [GitHub](https://github.com/example/repo) and https://blog.example.com/updates.\n' | vrk links
{"text":"our docs","url":"https://docs.example.com","line":3}
{"text":"GitHub","url":"https://github.com/example/repo","line":4}
{"text":"https://blog.example.com/updates.","url":"https://blog.example.com/updates.","line":4}

Each record includes the link text, URL, and line number. Bare URLs use the URL itself as the text.

URLs only (–bare)

$ printf '# Getting Started\n\nVisit [our docs](https://docs.example.com) for the full guide.\nAlso check [GitHub](https://github.com/example/repo) and https://blog.example.com/updates.\n' | vrk links --bare
https://docs.example.com
https://github.com/example/repo
https://blog.example.com/updates.

One URL per line, no JSON. Pipe directly to xargs, while read, or vrk grab.

Metadata trailer (–json)

cat README.md | vrk links --json

Appends {"_vrk":"links","count":N} as the last record.

What gets detected

Markdown inline links: [text](url)
Markdown reference links: [text][ref] with [ref]: url
HTML anchor tags: <a href="url">text</a>
Bare URLs: https://example.com in plain text

Pipeline integration

Find broken links in documentation

# Extract all links from a doc and check each one
cat docs/README.md | vrk links --bare | while IFS= read -r url; do
  STATUS=$(curl -sI -o /dev/null -w '%{http_code}' "$url")
  if [ "$STATUS" != "200" ]; then
    echo "Broken: $url ($STATUS)" | vrk emit --level warn --tag linkcheck
  fi
done

Extract links from a web page

# Grab a page, extract all links, get unique domains
vrk grab https://example.com | vrk links --bare | \
  while IFS= read -r url; do
    vrk urlinfo --field host "$url"
  done | sort -u

Index links in kv

# Store all links from a document with their text for later lookup
cat document.md | vrk links | while IFS= read -r record; do
  URL=$(echo "$record" | jq -r '.url')
  TEXT=$(echo "$record" | jq -r '.text')
  vrk kv set --ns links "$(echo "$URL" | vrk slug)" "$TEXT"
done

When it fails

No input:

$ vrk links
usage error: links: no input: pipe text to stdin
$ echo $?
2

Empty input produces no output and exits 0 - there are simply no links to extract.