Command reference

All commands print compact JSON by default. Success is {"ok": true, ...}; failure is {"ok": false, "error": "..."}.

Every vision command takes the image as the first positional argument. It accepts a file path, - for a base64 image on stdin, or the --clipboard / --screen flags.

Bounding boxes are pixel coordinates [x, y, w, h] with a top-left origin. Each box also includes norm, the same rectangle normalized to [0, 1].


macvision ocr <image>

Extract text with VNRecognizeTextRequest, which is natively multi-language — pass a list and one request reads every script.

macvision ocr ./screenshot.png             # auto: broad default languages
macvision ocr ./screenshot.png --lang cjk  # Chinese/Japanese/Korean preset
macvision ocr ./screenshot.png --lang zh-Hans,en-US
macvision ocr ./screenshot.png --ja --en   # shorthand: Japanese first, then English
macvision ocr -                            # base64 image on stdin
macvision ocr ./scan.png --level fast

Options

Option Default Description
--lang all preset Languages/presets, repeatable or comma-separated. Presets: all(default),cjk,cn,latin,en. Custom set via $MACVISION_LANG_<NAME>=a,b,c. e.g. --lang cjk,en-US
--en --zh --ja --ko off Shorthand: add English / Chinese(zh-Hans,zh-Hant) / Japanese / Korean to --lang at the position they appear. --ja --en equals --lang ja-JP,en-US
--level accurate accurate or fast
--min-confidence 0 Drop results below this confidence
--top all Keep at most N results
--no-language-correction off Disable Vision language correction
--clipboard off Read the image from the clipboard
--screen off Take a fresh screenshot first

Language shortcuts--en, --zh, --ja, --ko each append a script to --lang at their CLI position, so flag order is the language order: --ja --enja-JP,en-US. Combine freely with --lang, e.g. --lang cjk --ja.

Language presetsall = zh-Hans,zh-Hant,en-US,ja-JP,ko-KR,fr,de,es,pt,it,ru (auto-detect, no --lang needed). cjk = zh+ja+ko. cn = zh-Hans,zh-Hant. latin/european = en+fr+de+es+pt+it. en = en-US only. Define your own: export MACVISION_LANG_SEA=th-TH,vi-VN,id-ID then macvision ocr img.png --lang sea.

CJK ordering note: Vision prioritizes the first CJK language in the list. The all/cjk presets are zh-first, so Chinese + Latin read reliably, but pure Japanese/Korean text can be starved — for those, lead with the language: --lang ja-JP / --lang ko-KR.

Output fields

Field Type Meaning
ok bool Success or failure
image string Source label
width, height int Image dimensions
languages [string] Languages used
count int Number of text results
confidence float Mean top-candidate confidence (0 if no text). With count, lets a caller judge whether to retry with a different --lang
texts [object] One per recognized line/word
texts[].text string Top candidate string
texts[].confidence float Candidate confidence
texts[].bbox [int]×4 [x, y, w, h] pixels, top-left origin
texts[].norm [float]×4 Same box normalized
texts[].candidates [string] Other candidates, when present

macvision classify <image>

Classify a scene/objects with VNClassifyImageRequest, or animal species with VNRecognizeAnimalsRequest.

macvision classify ./photo.jpg --top 5
macvision classify ./photo.jpg --animals

Options

Option Default Description
--top 10 Keep top N labels
--min-confidence 0 Drop labels below this confidence
--animals off Recognize animal species instead of scene

Output fields

Field Type Meaning
mode string scene or animals
count int Number of labels
labels [object] {name, confidence} per label

macvision detect <image>

Run one or more detectors in a single pass. With no detector flag it runs the broad set: faces, barcodes, text regions, and horizon. A flag like --faces narrows to only that detector. --rects (document/card rectangles) and --ocr (recognize the actual text) are additive--ocr runs text recognition on top of whatever else runs, so detect img --ocr = broad + read the words.

macvision detect ./photo.jpg                              # broad: faces, barcodes, text regions, horizon
macvision detect ./shot.png --ocr --lang zh-Hans,en-US   # broad + read the text
macvision detect ./shot.png --ocr --ja --en              # broad + read text (shorthand langs)
macvision detect ./card.jpg --rects                       # document/card rectangles (opt-in)
macvision detect ./qr.png --barcodes --symbologies qr,ean13
macvision detect ./photo.jpg --faces --horizon            # only faces + horizon

Options

Option Default Description
--faces off (on by default if none set) Detect faces
--rects off Detect document/card rectangles
--barcodes off (on by default if none set) Detect barcodes / QR
--text-regions off (on by default if none set) Detect text region boxes (no content)
--ocr off Recognize the actual text (additive)
--lang all preset OCR languages/presets with --ocr. Presets: all(default),cjk,cn,latin,en. Repeatable or comma-separated
--en --zh --ja --ko off Shorthand: add English / Chinese / Japanese / Korean to --lang at their CLI position
--horizon off (on by default if none set) Detect horizon angle
--symbologies Vision default qr,ean13,ean8,upce,code128,code39,code93,datamatrix,pdf417,aztec,itf14
--min-size 0.2 Minimum rectangle size for --rects (0–1)
--min-confidence 0 Drop detections below this confidence

Output fields

Field Type Meaning
count int Total detections across detectors
detections.faces [object] {bbox, norm}
detections.rectangles [object] {bbox, norm, corners, confidence}corners is 4 [x, y] pixel points
detections.barcodes [object] {payload, symbology, bbox, norm}
detections.texts [object] Recognized text (with --ocr): {text, confidence, bbox, norm}
detections.text_regions [object] {bbox, norm, character_count?}
detections.horizon object {angle} in radians
detections.*_count int Per-detector counts

macvision document <image>

Find the document outline (a quadrilateral) with VNDetectDocumentSegmentationRequest, so you can crop or deskew it.

macvision document ./scan.jpg

Output fields

Field Type Meaning
count int Number of documents found
documents [object] {bbox, norm, corners, confidence}corners is 4 [x, y] pixel points (top-left, top-right, bottom-right, bottom-left)

macvision salient <image>

Produce a saliency heatmap PNG with VNGenerateAttentionBasedSaliencyImageRequest (or objectness with --mode objectness).

macvision salient ./photo.jpg
macvision salient ./photo.jpg --mode objectness --output ./heat.png

Output fields

Field Type Meaning
mode string attention or objectness
mask_width, mask_height int Heatmap dimensions
output string Path to the written PNG
saved bool Whether the PNG was written

macvision feature <image>

Image fingerprint via VNGenerateImageFeaturePrintRequest. With --compare, compute the distance between two images (0 = identical).

macvision feature ./a.jpg                       # base64 feature vector
macvision feature ./a.jpg --compare ./b.jpg     # distance between two images
macvision feature ./a.jpg --level 2             # higher-precision model (macOS 14+)

Output fields (single)

Field Type Meaning
level int 1 or 2
element_count int Vector dimensionality
element_type string float or double
bytes int Raw vector size in bytes
data string Base64-encoded vector bytes

Output fields (compare)

Field Type Meaning
compare string Second image label
distance float 0 = identical; larger = more dissimilar
same bool distance < 0.5 (loose cutoff; pick your own)

macvision daemon

Long-lived daemon that serves NDJSON requests over named pipes (FIFO). Blocks until cancelled.

macvision daemon --req /tmp/macvision.req --res /tmp/macvision.res &

Options

Option Default Description
--req /tmp/macvision.req Request FIFO path
--res /tmp/macvision.res Response FIFO path

Request schema — one JSON object per line on the request pipe:

{"action": "ocr", "image": "/tmp/s.png", "lang": ["zh-Hans", "en-US"]}

Supported action values: doctor, ocr, classify, detect, feature, salient, document. Fields mirror the CLI options:

Action Fields
ocr image, lang, level ("fast"), min_confidence, top, use_correction
classify image, top, min_confidence, animals
detect image, faces, rects, barcodes, text_regions, horizon, symbologies, min_size, min_confidence
feature image, compare, level (1 or 2)
salient image, mode, output
document image
doctor none

Each request produces one NDJSON response line on the response pipe, identical to the matching CLI command’s JSON.


macvision doctor

Report the macOS version, architecture, and the supported Vision capabilities.

macvision doctor

Output fields

Field Type Meaning
ok bool Environment is usable (macOS >= 13)
macos_version string e.g. 26.5.1
architecture string e.g. arm64
apple_silicon bool Apple Silicon vs Intel
checks [object] {name, value, ok, requirement?}
capabilities [object] {name, ok, api} per Vision request

Common error fields

Field Type Meaning
ok bool false
error string Human-readable error message

Common causes: