macvision

Turn any image into agent-friendly JSON — local macOS OCR and image understanding.

What is macvision?

macvision wraps Apple’s Vision framework in a tiny Swift binary. Point it at a screenshot, photo, or scan and get back text, scene labels, and detected faces, barcodes, and documents — all as compact JSON, all processed on your Mac. There’s no big model to download and nothing is uploaded.

Use it wherever you’d otherwise pay for an LLM vision call: OCR the image locally for free, then send only the text to your model.

Read a screenshot into text and feed it to an LLM
Find the QR code or the document outline in an image
Get a fingerprint of an image and compare it to another
Know what scene/objects a photo contains

At a glance

macvision ocr ./screenshot.png                       # extract text
macvision ocr ./screenshot.png --lang zh-Hans,en-US   # Chinese + English
macvision classify ./photo.jpg --top 5                # scene/object labels
macvision detect ./photo.jpg --barcodes               # barcodes / QR
macvision feature ./a.jpg --compare ./b.jpg           # image distance

Output schema: {"ok": true, ...} on success, {"ok": false, "error": "..."} on failure.

For AI agents

Paste this one-line prompt into Claude Code, Cursor, or any agent’s system prompt:

Use `macvision` to read images on macOS (OCR, classify, detect). Install if missing: `brew install ljh-sh/cli/macvision`. JSON output, check `ok`. Run `macvision --help` for subcommands.

Where to go next

Install macvision — Homebrew, direct binary, or build from source
Command reference — every subcommand, option, and output field
Design & principles — why macvision is shaped the way it is
Why macvision? — why a CLI over the Vision framework
FAQ — permissions, screencapture, coordinate conventions
Alternatives — how macvision compares to Tesseract and cloud OCR