macvision
Turn any image into agent-friendly JSON — local macOS OCR and image understanding.
What is macvision?
macvision wraps Apple’s Vision framework in a tiny Swift binary. Point it at a screenshot, photo, or scan and get back text, scene labels, and detected faces, barcodes, and documents — all as compact JSON, all processed on your Mac. There’s no big model to download and nothing is uploaded.
Use it wherever you’d otherwise pay for an LLM vision call: OCR the image locally for free, then send only the text to your model.
- Read a screenshot into text and feed it to an LLM
- Find the QR code or the document outline in an image
- Get a fingerprint of an image and compare it to another
- Know what scene/objects a photo contains
At a glance
macvision ocr ./screenshot.png # extract text
macvision ocr ./screenshot.png --lang zh-Hans,en-US # Chinese + English
macvision classify ./photo.jpg --top 5 # scene/object labels
macvision detect ./photo.jpg --barcodes # barcodes / QR
macvision feature ./a.jpg --compare ./b.jpg # image distance
Output schema: {"ok": true, ...} on success, {"ok": false, "error": "..."} on failure.
For AI agents
Paste this one-line prompt into Claude Code, Cursor, or any agent’s system prompt:
Use `macvision` to read images on macOS (OCR, classify, detect). Install if missing: `brew install ljh-sh/cli/macvision`. JSON output, check `ok`. Run `macvision --help` for subcommands.
Where to go next
- Install macvision — Homebrew, direct binary, or build from source
- Command reference — every subcommand, option, and output field
- Design & principles — why macvision is shaped the way it is
- Why macvision? — why a CLI over the Vision framework
- FAQ — permissions, screencapture, coordinate conventions
- Alternatives — how macvision compares to Tesseract and cloud OCR